Eugenio's beeminder journal

eugeniobruno · June 14, 2019, 9:08pm

Let’s start easy: if I’m not spending at least half an hour a day learning useful knowledge, in my particular situation, I’m a freaking idiot. I’ll clean things up in the goal tomorrow morning and put a public chart in here.

We’ll have to bring that half hour way up but if I don’t do that in my situation that’s simply idiotic of me.

Edit 1: I may need to fix the road in the editor, maybe? But it looks reasonable, so for now, https://www.beeminder.com/eugeniobruno/check_and_do_todoist_todos

Edit 2: many of the last pre-restart datapoints are weaseled in, chiefly because of the fine print readable by worker bees. I won’t input bs data this time around, but I might ask for the accomodation described in there if need arises. Check my payment history - sorry for the extra work I cause from time to time, but hopefully I am not a net loss in your bottom line!

eugeniobruno · June 14, 2019, 9:24pm

Please do call me out if I don’t at least follow through with this small commitment.

eugeniobruno · June 16, 2019, 7:43pm

I decided to include japanese study in this study time.

If this chart isn’t a great example of consistency and disclipline, I don’t know what is.

adamwolf · June 16, 2019, 10:26pm

Nice work!

eugeniobruno · June 17, 2019, 7:41pm

Thanks @adamwolf! As that chart shows 50 days of no study, it was meant sarcastically but appreciated nonetheless. Maybe it’s more apparent with a chart showing the delta:

I have to catch up with reading and then regularly add new words to study, again, a delta chart of that, which is even worse:

Ideally I’d like the first chart to go around ~0.3%/day, to reach a “satisfactory” vocabulary knowledge in about 12 months.

adamwolf · June 17, 2019, 8:47pm

I meant nice work in the last 12 days or so

eugeniobruno · June 17, 2019, 9:38pm

Thanks
Another couple of hours left in the UTC day to put some work in and start to test if indeed adding words to study from reading material before doing vocabulary review increases score-increasing-work.

I think this score is really hard to “game” long time, so trying to maximize it should lead to maximizing practical vocabulary knowledge.

As I can change the algorithm and rerun it on old data to regenerate all the charts, I might ask for help from some forum bees on a more direct representation of a level of vocabulary knowledge…

zedmango · June 17, 2019, 10:47pm

Can you count the number of words you didn’t know and had to look up as a representation of your vocab level? Like if you read 1000 words and needed to look up 20 you know 98% of the words?

eugeniobruno · June 17, 2019, 10:52pm

I could but it would be less practical. I’d have to introduce the notion of looking up a word to double check vs because I don’t remember the word; or I’d have to do “knowledge checks” or something. I get these charts “for free”, just studying.

zedmango · June 17, 2019, 11:34pm

What are these charts showing? What program are you using?

eugeniobruno · June 18, 2019, 11:43am

The explanation following is likely to be bad. Please do ask for clarification.

The x axis is always “days ago”. To understanding the various y axis:

“japanese vocabulary knowledge score %” explanation:
0% is I know no words. 50% is I know half of the words in a model corpus (mainly comprised of light novels) perfectly, or I know all of the words in the model corpus but not very well, 100% is I know all words in that corpus perfectly.

the “daily change in japanese vocabulay knowledge store %” is just the daily delta in the chart above.

“number of new words under study” shows how many words I had never seen before, are added and I start studying every day.

A homegrown language learning platform that uses an homegrown spaced repetition system (think anki, with some differences) and an aided readed with vocabulary lookups directly connected to the SRS.

eugeniobruno · June 18, 2019, 11:59am

TODO:

But that reminds me, I need to make a better corpus. I’m thinking it should be relatively easy and effective to have a folder with the first volume of a number of light novels, and then using a word for the vocabulary proficiency calculation only if it appears in two, maybe three, volumes from different light novels.

This should avoid names and fantasy terms specific to a novel appearing in the calculation.

I should probably also exclude any katakana-only word from the calculation.

zedmango · June 18, 2019, 7:26pm

Why? I thought most words were katakana-only (but I know almost nothing about Japanese).

Your homegrown platform sounds very cool. What are the differences between your spaced repetition system and anki?

So you calculate the percentage yourself? Is it weighted by frequency or does it consider all words equally?

eugeniobruno · June 18, 2019, 8:10pm

It’s definitely a small portion of words, but since they’re almost always just the pronunciation of an English word, it feels like “cheating” to say I’ve made progress when I can already understand what most katakana words mean without needing to study them. Examples: puroguramu → (computer) program

As a guideline: Kanji → what most words are made of. Hiragana → grammatical pieces and some common words of japanese origin. Katakana → imported words, onomatopeia and similar.

The easiest way to “patch” around this problem might be to only count word that contain kanji for the purposes of the progress chart.

It’s a slightly different flavour, but it’s the same idea. The intervals are 5m, 25m, ~2.5h, ~10h, ~day, and then they double for every right word. When you get a word wrong, it sets the interval to 30%.

There is no “easyness” modifier that makes intervals become spaced tighter as you get the word wrong many times.

Here’s the biggest difference IMO: the correct answer is shown after x milliseconds (currently 4000, but adjustable and could reasonably go lower). It gets you into a flow in a way that having to decide whether you’ve thought about the answer for enough time just doesn’t. Hundreds of reviews due just seem to disappear without the process being annoying or mentally fatiguing.

Right now it’s weighted by the log of the frequency. The most common words are weighted 3-4 times as much as the least common words.

As of a few minutes ago, the frequency of words is calculated off a little corpus of the first volume of 5 different light novels, and a word is considered and added to the “universe” of words only if it appears in at least two different volumes. This is probably not quite right, but I feel it’s a step in the right direction. It deals with names, fantasy words specific to a single novel and such problems. It brings the total # of words used to calculate the score to about 6000 from about 12000. I feel it would work better if I had a bigger corpus, but we’ll go with this for now…

it looks suspiciously good, I suspect due to:

I also weight by “knowledge” (log of interval), and that weighting is a bit wonky right now
the impact of the most common words and word-looking-tokens.

eugeniobruno · June 18, 2019, 8:33pm

I’ve patched the chart for the time being to weigh a bit less heavily words in the first phase of learning.

I think the best thing that these graphs show is that as long as the criteria I choose to evaluate my knowledge are sensible and I study regularly, I get a monotonically increasing chart that will eventually get to 100%.

I can probably figure out a good “shape” out of daily, consistent study by coding up a monte carlo student and seeing how the graphs look under various scoring conditions. The only things that are going to change by changing scoring functions are probably the early and late “shapes” of the chart.

Topic		Replies	Views
Eugenio's language learning journal Life	8	744	September 16, 2019
Beeminder diary, take 2 Life	26	1278	March 15, 2020
Help beeminding intensive language learning Akrasia	1	301	January 13, 2020
mattepp's Beeminder/Anki experiment journal Life	27	3307	May 18, 2018
Malakai's 2023 Beeminder Journal - travel and learning Akrasia	9	317	October 28, 2023

Eugenio's beeminder journal

Related topics