Examples of curlminder usage

clivemeister · December 8, 2024, 4:20pm

We’ve just launched the curlminder integration, so I thought it might be useful to gather a few examples for people. Please contribute your own - or maybe ask questions about one, if you’d like to try it out but don’t speak regex well enough!

Here’s an example of one that doesn’t work right now, but - I hope - we can get to work with our joint efforts! I want to extract my XP points score from Chessable. Here’s the URL: Chessable And here’s how the relevant segment looks in my browser:

Now I can read the html source in my browser, and could probably cook up the necessary regex… but right now, when I start setting up the goal, when it shows me the initial html in Beeminder it ends with “You have been blocked If you are using a VPN, disconnect and try again”. So once this has been fixed (if it can be fixed!), I’ll have another go.

But meantime, do post your own url+regex recipes here, for others to try!

baronvonchickenpants · December 8, 2024, 5:45pm

Thanks for starting this thread off @clivemeister

shanaqui · December 9, 2024, 1:53pm

Mine was written for me by @dreev so I could experiment! It pulls my number of minions collected in the game Final Fantasy XIV from this page, using this expression:

Total:\s*\<span\>\s*([\d\,]+)

Other stuff I might be tempted to do in the future, all FFXIV-related:

Number of mounts: from here [can probably use the same regex on this URL, I think?]
Number of achievements: from this page
Achievement points: ditto
Levequest completion: via FFXIVcollect
Relic completion: ditto

Buuut that’d require help or me learning what all this stuff is about in a lot more detail, so it’s not a current project.

felixm · December 9, 2024, 10:59pm

It could be helpful to specify which regex flavor/style Curlex supports in the documentation (Curlex - Beeminder Help). It’s likely Perl-style, but making this explicit might help users, especially with GPT requests.

Speaking about GPT, the process to create your own regexes might be relatively straightforward, @shanaqui.

Go to the site you care about, I used your first example (Eirian Evanna | FINAL FANTASY XIV, The Lodestone), but clicked on “Mounts,” so this is the URL I used: Eirian Evanna | FINAL FANTASY XIV, The Lodestone.
Right click and Inspect the number you care about. It should then look something like this:
image401×130 10.9 KB
Now, right click on an outer tag that includes your target number and select copy outer HTML. You have to apply your best judgement to know which tag to use.  is probably not enough context, but class="minion__sort__total" kind of looks like we have enough context to unambiguously identify the position.
It might then look like this when you paste it:
Total: 201
Now, you can ask GPT to create the regex for you:

Please create a Perl-style regex that matches the following HTML code. Please include a match group that matches the number that is part of the HTML. The number might change and there should be only one match group for that number.

<p class="minion__sort__total">Total: <span>201</span></p>

Claude 3.5 Sonnet then gives me this:

Total: (\d+)

And GPT 4o this:

/Total: (\d+)<\/span><\/p>/

Here, you will have to remove the slashes before pasting it into Beeminder.

As a sanity check, I will try your second example: Eirian Evanna | FINAL FANTASY XIV, The Lodestone.

Right click, inspect:

Select marked HTML tag for enough context, and copy outer:

						<div class="select-pulldown en-us">
								<form action="?">
									<select name="order" onchange="this.form.submit()" class="select-pulldown__open">
										<option value="1">Sort by most recent</option>
										<option value="2">Sort by oldest</option>
									</select>
								</form>
						</div>
						<div class="parts__total">2117 Total</div>
					</div>

Paste into Claude with prompt; I guess it didn’t agree that we need that much context:

<div class="parts__total">(\d+) Total</div>

Test in Beeminder:

Edit:

For achievement points: (\d+)

Also: obligatory disclaimer. I don’t endorse “parsing” HTML with regexes and this shouldn’t be used for anything serious. (See first answer here:
html - RegEx match open tags except XHTML self-contained tags - Stack Overflow)

dreev · December 12, 2024, 12:32am

I have failed to replicate that specifically. When I do view-source on the page and grep for your XP it seems to simply be absent. So I think this is fundamentally the same problem described in the blog post for Fatebook, where the number is populated by Javascript. (Getting blocked from fetching the html would then be an orthogonal problem…)

PS: Huge thanks to @felixm for the tutorial on getting LLMs to help with the regex part. Now I’m getting tempted to make a muggle-friendly version of Curlminder where you can just describe the number on the page you want to beemind in English and Beeminder makes the calls to the LLM to construct the regex. But I think we need to collect more use cases before we can justify that. (Keep em coming, y’all!)

clivemeister · December 12, 2024, 9:35am

I’ve tried using curl from the command line, and I get back a page with stuff saying things like “This website is using a security service to protect itself from online attack… you can email the site owner to let them know you were blocked”. So I did!

I suspect if I fiddled about enough with curl settings, to make myself look more like a browser, I could get back the page. Then we could see if it’s a Javascript-populated thing or not. I may have a go at some point… or just ask Gemini for suggestions, probably!

baronvonchickenpants · December 12, 2024, 9:46am

I think this is going to be a bit of both problems

The message you get when trying to set up a curl request looks to me like that request is being blocked by cloudfire at chessable’s end. However, even when accessing it via a full browser the XP figure is not available when viewing the page source so is most likely getting populated by a script.

If that’s the case then I dont think solving the ‘blocking’ issue will make the XP figure available to curl

aad · December 21, 2024, 8:43am

I set up a goal to track my Criticker.com ratings as a proxy for how many movies I watch. With my cinema subscription restarted, this helps me make sure I’m not paying too much for the subscription.

Criticker profiles show the review count in plain text:

Use your public profile URL: https://www.criticker.com/profile/<username>
Extract the count with: >(\d+)\sFilm\sRatings<

dreev · December 22, 2024, 10:31pm

Nice! I hadn’t heard of Criticker but know someone who uses Letterboxd, which seems similar and also works with Curlminder!

The number of watched and reviewed films are in tooltips but Curlminder can handle that fine.

For example, you can find a chunk of html for number of watched films that looks like this:

title="102 films">Watched</a>

Which you can turn into a regex like so:

title\=\"([\d\,]+)[^\d]+?(?i)films.+watched

baronvonchickenpants · January 31, 2025, 2:33pm

Did we ever figure out a way to scrape the Chessable XP site?

clivemeister · February 9, 2025, 1:37pm

I did a bit of experimenting, and even when using curl with a bunch of plausible headers, it looks like it won’t work unless you’re doing the GET from a real browser. There are various ways to do this on the server side (e.g. Capybara), but they’re all relatively expensive (in cpu cycles/memory), and not super-reliable, so I don’t know if they’re terribly practical.

philip · February 11, 2025, 8:52pm

Interesting! Thanks to this thread I’ve tried both Criticker and now, today, Letterboxd. The latter has an import feature which sounded tempting but the data quality seems poor. So far 32% of my watched titles don’t exist there, even when they’re present in the underlying tmdb database – and those tmdb ID’s are wrongly mapped to other titles.

Update: I guess Letterboxd is obsessed with whatever they think of as films, because most of the missing titles (58/192) are series of one kind or another. But not all series in my list are missing, and not all movies are present. The worst thing from my pov is that if you feed it an ID from their official data source, tmdb, it returns a confident (but wrong!) result rather than a not-found.

Update the second: turns out that tmdb distinguishes between tv and film, using different ID sequences, so Letterboxd was indeed finding the film numbered X when I meant the series. Confusingly, some TV series are present on Letterboxd. The response from support was that they “don’t currently support cancelled TV series on the platform”. So, despite the slicker presentation of Letterboxd, I think I’ll be sticking with Criticker.

aad · March 24, 2025, 3:12pm

Criticker changed design. This seems to work now:

This URL is easier now: Film Database - Recommendations & Reviews | Criticker/
With this expression: of\s+(\d+)\s+Titles

philip · March 28, 2025, 11:33am

I’ve just hooked up criticker to curlminder, thank you!

A few things to add:

The public URL includes your criticker username: https://criticker.com/ratings/<username>

Go into the goal’s settings tab and untick the cumulative box near the bottom of the page; this is an odometer-style goal, where the current total is posted as the datapoint.

If you want to import some historical data, you can email bot@beeminder.com with a subject of <username>/<goalname> and datapoints in the body formatted as yyyy mm dd value.

I asked my friendly neighbourhood LLM to summarise the recent ratings copied from my public profile page in a short conversation, something like:

how many entries are on each date?
could you format those as yyyy mm dd value ?
cumulative values please
no, cumulative the other way!
recalculate the cumulative values so that the most recent total is X

So much faster that either counting them or scripting it.

divide · June 10, 2025, 1:06pm

Total DuoLingo XP:
URL: https://www.duolingo.com/2017-06-30/users?username=
regex: /"totalXp":(\d+)/

Topic		Replies	Views
Help with Curl Regex ( is this ever possible) Tech	1	22	June 29, 2025
Anti-websurfing quantification using Google Chrome's history & beeminder	4	839	June 30, 2014
Consider linking beeminder.com/changelog from beeminder.com/meta/uvi Bugabee	1	292	November 28, 2019
List of public minders	21	1492	June 27, 2013
Script for integrating codeacademy and beeminder	4	717	February 28, 2013

Examples of curlminder usage

Related topics