[Resolved issue] Datapoints "not submitting" from Intend & associated slowdown in support responses

It’s been a few days of this issue and many of you are already aware, but I thought I’d write a bit about here in order to give us something to link to, and so folks know what we’ve tried to do about the issue.

Short version

In the last week, support response times have been impacted lately by high volumes of error emails from the Intend service. These don’t include useful error messages and we can’t figure out what is triggering them. The vast majority of the datapoints have actually been submitted without a problem. We have tried to contact Intend without success so far.

As a result, the support team will no longer check all Intend error emails, due to most of them being spurious and the high volume of them we’re receiving.

Long version:

Since the weekend, we’ve been receiving hundreds and hundreds of emails from Intend when people try to submit datapoints to sync up to Beeminder. Each one says that the datapoints didn’t go through, and says they are supplying the error message so we can look into it. The error message in every single email has been totally blank.

We’ve checked into it on our side, and have found that almost all of these datapoints are going through. Bee dug into the logs on our side too, and my understanding is that the only errors we can find on our side indicate that Intend is submitting duplicate datapoint IDs. We’ve had this issue in the past and thought it was resolved with them, as it’s correct for us to send an error in that case.

This might all be our bad, some weirdness on our side after one of our servers had to be swapped to new hardware by our host at the weekend, or some coincidentally timed bug… but we haven’t been able to find any evidence that points to anything specific, apart from the duplicate ID issue where it’s correct for us not to accept the datapoint. At this point, we need more information from Intend in order to pursue some kind of fix, and to make sure it isn’t an issue on their side.

We’ve tried to contact Intend and its owner via multiple channels, but so far haven’t had an answer back.

In the meantime, the support team have been checking each error email individually due to some reports there are instances where the datapoint isn’t going through. This means that our responses to other emails are sometimes slower, and other work that I do like refreshing the help docs, handling user feedback, etc, is on hold. We’ve been putting in extra time on the inbox, with Simone jumping in for a non-scheduled slot a couple of times to work on it, so hopefully the impact on users has been kept to a minimum. That’s been Oliver, Simone and myself, so enormous thanks to Oliver and Simone for the extra work they’ve been putting in here!

Given that the majority of the datapoints do go through, the support team will from today start explaining the situation to users as they get error emails, and then cease checking the errors associated with that user. We’ll work like that for a few days, trying to make sure people get notified what’s happening, and then I’ll set up a filter in our inbox to automatically close all of these emails so we don’t check any of them from that point.

That does mean that a very small percentage of cases where the datapoint didn’t go through won’t receive attention from us, and I’m really sorry about that. If that happens to you, you can add your datapoint manually via the email bot, and the support team are always reachable – if I get things set up right (and I will test it!), if a user replies to that error email from Intend it should get through to our inbox still, and of course you can just email us directly or reply to legit checks.

Hopefully we’ll soon be able to get in touch with Intend and get this resolved properly, as well.

In the meantime, if you have any problem at all, the support team are still here and in most cases managing to keep our average response time below six hours. Please don’t hesitate to reach out as you normally would!

4 Likes

Hello folks!

No real updates here, except that we’ve sent an email explaining that we’re no longer checking the errors to a few people. The list isn’t that long compared to the number of people we know use Intend, and of course people may not have been using it so much over the weekend, so we’ll keep an eye on it for a day or two more as people get back to weekday usage.

For a bit of context about the impact, we received 1,200+ emails from Intend over the weekend. Now that we’re closing most of them, it’s obviously relieved the burden a bit, but it’s a very good thing I put the plan to do that in place on Friday instead of waiting for today as I’d originally intended. :sweat_smile:

We might have one more avenue to try in contacting Intend, so that effort continues for now. If we can sort things out, hopefully we can go back to checking error messages in future, but otherwise I’ll be setting up a workflow to automatically close them starting probably Wednesday.

Continued thanks to the support team who’ve been working hard, especially Simone, who has been jumping in to do extra time in the inbox to help out me, Clive and Oliver during our scheduled slots (as well as continuing to do her scheduled time) and Oliver who jumped in to help out Simone when she needed to start a bit later on Saturday!

(Clive’s been super busy with a move, so wish him luck with finding everything he needs in all those boxes!)

4 Likes

We may have a fix now! I don’t want to risk misrepresenting the technical details, so I won’t go deep into it, but Malcolm gave us access to work out what was going on and @dreev had a poke around, the problem has definitely been identified, and Danny has submitted a fix.

We don’t know yet if it’s been deployed/when it might be deployed, so now I’m keeping an eye on things to see whether there’s a change and whether I can turn off the workflow that’s been hiding emails from Intend’s bot.

3 Likes

We’re pretty sure the deploy went live, as we haven’t received any emails from Intend all day.

I’ll keep monitoring it for a few days, but if things keep up like this, we’ll be able to go back to checking any error emails we receive from Intend. :tada:

2 Likes

Did the weirdness just start again? Or is it just me? I just got the same error when submitting Intend review for the day, for all of the goals that should have been triggered - but in fact everything that was expected to be posted has been posted.

1 Like

It looks like it may have though I think you are the only one so far! Monitoring.

If it continues I’ll alert Danny to check what’s happening, and turn the workflow back on so the support team no longer check these emails again.

2 Likes

Unfortunately, yes, Intend have started sending us blank error emails again, probably for all users.

I’ve turned on a workflow preventing any of these emails reaching the support inbox, so support volumes should be fine.

We’ll try to look again at what’s happening and why, because we’re aware that users are also getting spammed by it. I’m reasonably confident the problem is not on our end, since the symptoms are the same as before… but it’s a bit of a mystery, since the fix @dreev submitted to Intend’s repository should have prevented this happening.

2 Likes

Sure enough, the fix got accidentally reverted. I’ve just now pushed a more robust fix!

4 Likes

No error emails since then, so I’ll tentatively turn the workflow back off. Here’s hoping that’s solved for good this time! Thanks @scarabaea for the early heads up – you spotted it before anyone else.

4 Likes