Examples of curlminder usage

We’ve just launched the curlminder integration, so I thought it might be useful to gather a few examples for people. Please contribute your own - or maybe ask questions about one, if you’d like to try it out but don’t speak regex well enough!

Here’s an example of one that doesn’t work right now, but - I hope - we can get to work with our joint efforts! I want to extract my XP points score from Chessable. Here’s the URL: Chessable And here’s how the relevant segment looks in my browser:
image

Now I can read the html source in my browser, and could probably cook up the necessary regex… but right now, when I start setting up the goal, when it shows me the initial html in Beeminder it ends with “You have been blocked If you are using a VPN, disconnect and try again”. So once this has been fixed (if it can be fixed!), I’ll have another go.

But meantime, do post your own url+regex recipes here, for others to try!

4 Likes

Thanks for starting this thread off @clivemeister

1 Like

Mine was written for me by @dreev so I could experiment! It pulls my number of minions collected in the game Final Fantasy XIV from this page, using this expression:

Total:\s*\<span\>\s*([\d\,]+)

Other stuff I might be tempted to do in the future, all FFXIV-related:

  • Number of mounts: from here [can probably use the same regex on this URL, I think?]
  • Number of achievements: from this page
  • Achievement points: ditto
  • Levequest completion: via FFXIVcollect
  • Relic completion: ditto

Buuut that’d require help or me learning what all this stuff is about in a lot more detail, so it’s not a current project. :stuck_out_tongue:

5 Likes

It could be helpful to specify which regex flavor/style Curlex supports in the documentation (Curlex - Beeminder Help). It’s likely Perl-style, but making this explicit might help users, especially with GPT requests.

Speaking about GPT, the process to create your own regexes might be relatively straightforward, @shanaqui.

  1. Go to the site you care about, I used your first example (Eirian Evanna | FINAL FANTASY XIV, The Lodestone), but clicked on “Mounts,” so this is the URL I used: Eirian Evanna | FINAL FANTASY XIV, The Lodestone.
  2. Right click and Inspect the number you care about. It should then look something like this:
  3. Now, right click on an outer tag that includes your target number and select copy outer HTML. You have to apply your best judgement to know which tag to use. <span> is probably not enough context, but class="minion__sort__total" kind of looks like we have enough context to unambiguously identify the position.
  4. It might then look like this when you paste it:
  5. <p class="minion__sort__total">Total: <span>201</span></p>
  6. Now, you can ask GPT to create the regex for you:

Please create a Perl-style regex that matches the following HTML code. Please include a match group that matches the number that is part of the HTML. The number might change and there should be only one match group for that number.

<p class="minion__sort__total">Total: <span>201</span></p>

Claude 3.5 Sonnet then gives me this:

<p class="minion__sort__total">Total: <span>(\d+)</span></p>

And GPT 4o this:

/<p class="minion__sort__total">Total: <span>(\d+)<\/span><\/p>/

Here, you will have to remove the slashes before pasting it into Beeminder.

As a sanity check, I will try your second example: Eirian Evanna | FINAL FANTASY XIV, The Lodestone.

Right click, inspect:

image

Select marked HTML tag for enough context, and copy outer:

						<div class="select-pulldown en-us">
								<form action="?">
									<select name="order" onchange="this.form.submit()" class="select-pulldown__open">
										<option value="1">Sort by most recent</option>
										<option value="2">Sort by oldest</option>
									</select>
								</form>
						</div>
						<div class="parts__total">2117 Total</div>
					</div>

Paste into Claude with prompt; I guess it didn’t agree that we need that much context:

<div class="parts__total">(\d+) Total</div>

Test in Beeminder:

Edit:

For achievement points: <p class="achievement__point">(\d+)</p>

Also: obligatory disclaimer. I don’t endorse “parsing” HTML with regexes and this shouldn’t be used for anything serious. (See first answer here:
html - RegEx match open tags except XHTML self-contained tags - Stack Overflow)

5 Likes

I have failed to replicate that specifically. When I do view-source on the page and grep for your XP it seems to simply be absent. So I think this is fundamentally the same problem described in the blog post for Fatebook, where the number is populated by Javascript. (Getting blocked from fetching the html would then be an orthogonal problem…)

PS: Huge thanks to @felixm for the tutorial on getting LLMs to help with the regex part. Now I’m getting tempted to make a muggle-friendly version of Curlminder where you can just describe the number on the page you want to beemind in English and Beeminder makes the calls to the LLM to construct the regex. But I think we need to collect more use cases before we can justify that. (Keep em coming, y’all!)

2 Likes

I’ve tried using curl from the command line, and I get back a page with stuff saying things like “This website is using a security service to protect itself from online attack… you can email the site owner to let them know you were blocked”. So I did!

I suspect if I fiddled about enough with curl settings, to make myself look more like a browser, I could get back the page. Then we could see if it’s a Javascript-populated thing or not. I may have a go at some point… or just ask Gemini for suggestions, probably!

1 Like

I think this is going to be a bit of both problems

The message you get when trying to set up a curl request looks to me like that request is being blocked by cloudfire at chessable’s end. However, even when accessing it via a full browser the XP figure is not available when viewing the page source so is most likely getting populated by a script.

If that’s the case then I dont think solving the ‘blocking’ issue will make the XP figure available to curl

2 Likes

I set up a goal to track my Criticker.com ratings as a proxy for how many movies I watch. With my cinema subscription restarted, this helps me make sure I’m not paying too much for the subscription.

Criticker profiles show the review count in plain text:

  1. Use your public profile URL: https://www.criticker.com/profile/<username>
  2. Extract the count with: >(\d+)\sFilm\sRatings<
2 Likes

Nice! I hadn’t heard of Criticker but know someone who uses Letterboxd, which seems similar and also works with Curlminder!

The number of watched and reviewed films are in tooltips but Curlminder can handle that fine.

For example, you can find a chunk of html for number of watched films that looks like this:

title="102&nbsp;films">Watched</a>

Which you can turn into a regex like so:

title\=\"([\d\,]+)[^\d]+?(?i)films.+watched