Is it being coy to say that at least one of our integrations uses an undocumented API where we’ve had to do things like sending cookies?
I’m not speaking for Beeminder as a whole, just for me, but I think two key things are to do our best to reduce the amount of autodata maintenance, and also not wanting to irritate the other party.
Autodata maintenance is actually a really big chunk of how we use our engineering time, and often comes with very little (or no) warning. Beyond causing problems with our users’ goals, this sort of “gotta do it right now with no warning” emergency also helps makes work estimates a nightmare, especially for a very small team. The official APIs tend to be a little bit better than the undocumented ones with regards to warning before changes that break our integrations.
For Readwise, I actually asked them about an API when I was one of the early beta testers, and they had me outline different use cases, and then added an endpoint. I can’t say for certain that they added it just for us, but I think it’s likely. We’ve had some integrations that have brought in (and continue to bring in!) many many new users. The Readwise folks seem super cool, have a bunch of cool users, and I think it’s possible they’d feature the Beeminder integration in a prominent way, which would really help increase our audience. I don’t want to irritate folks on the other side–I’m “a person on the other side” to a lot of folks–but beyond being a friendly dude, I want to make sure I don’t reduce the chances of spreading Beeminder to a bunch of new folks who likely wouldn’t hear about us otherwise.
Now, why would using an undocumented API potentially irritate folks on the other side? A lot of companies don’t expose the data Beeminder users want except in a “live” way. I don’t know if there are better terms for this, but one of the ways you can categorize our third party autodata is if it’s critical that we fetch data at the deadline. Strava, for instance, gives us the start and end datetimes of each session, and it isn’t intrinsic to the API that we sync right at the deadline. Project Euler, on the other hand, only gives us “the current number of points”. If we sync 30 minutes after the deadline, there’s no real way for us to know what the value was at the deadline. For those integrations, we have to sync as close to the deadline as we possibly can. Many folks have deadlines at local midnight, and there are a few timezones that have many many many goals for the same integration. 10k API calls to the same provider in as small of a time period as we can? As my 8YO says, “we look sus!” (Beyond looking suspicious, the system load for 10k near-simultaneous API calls can be intense, depending on what it’s doing and how they’re set up, and my heart aches for the poor engineer on the other side. <3)
(For the Strava type integrations, we do still sync as close to deadline as we can, but if we had to spread them out over an hour or something, it could be done, you know?)
The not-wanting-to-irritate-folks then feeds back into the “trying to reduce autodata maintenance” too, because it’s easy for someone on the other end to purposely make it tougher for us or just to tell us we’re not allowed in some other way.
(OTOH, when I was just a beeminder user, I wrote some integrations that the third party company would have been … less than pleased… had they known about them. I remember one, I tore apart an Android APK and put in mitmproxy (and probably Frida, I forget the exact details) and got everything so I could pretend I was the Android app. gazes wistfully I don’t use it anymore for other reasons, but it’s been nearly a decade and they’re still saying they’re hoping to come out with a user API in the next quarter.)
I do have “improve user-built integrations” on the list (not at the exact top, but I still like getting feedback and ideas regardless). The thing that I wanted the most when I wrote a bunch of custom integrations was a day-bucketed idempotent … upsert, almost? I wanted a way to say “here is a real-world time, a value, maybe a comment” and have Beeminder figure out which “day” that corresponds to, per any custom deadline, and then look at the value, and the aggregated daily value for the goal, and if the value would change the aggday’ed value, to either add the new datapoint to the day, or to update the most recent datapoint for the day to the new datapoint. I wanted this so badly, and yet I don’t really get many “oh yeah, we want that!” when I bring it up now. I see a lot of user integrations where the authors don’t want to add state, so they either add a datapoint regardless, leading to days on end of 48 datapoints with the same value, or they have to ask Beeminder what the last values are and then figure it out themselves. (Sometimes they ask for years of their goal’s history, every single time, which isn’t necessarily great for us.) I get it, I do, so I’m not really grumpy about it, but I would like to make it trivial to make a much better integration.
Among other things, I’d also like to improve the DX for having Beeminder asking user integrations for data when Beeminder wants it, have good sample user integrations at Val Town and replit and whatever else, and someday make it easier to promote user integrations to official ones… If there’s anything in particular that’d help, I’m all ears!
Oh, clouedoc, I suspect you’re massively more on top of scraping than I am. I’d love to pick your brain sometime about some Cloudfront and Cloudflare issues we’ve been seeing in autodata…