Eugenio's beeminder journal

Let’s start easy: if I’m not spending at least half an hour a day learning useful knowledge, in my particular situation, I’m a freaking idiot. I’ll clean things up in the goal tomorrow morning and put a public chart in here.

We’ll have to bring that half hour way up but if I don’t do that in my situation that’s simply idiotic of me.

Edit 1: I may need to fix the road in the editor, maybe? But it looks reasonable, so for now, https://www.beeminder.com/eugeniobruno/check_and_do_todoist_todos

Edit 2: many of the last pre-restart datapoints are weaseled in, chiefly because of the fine print readable by worker bees. I won’t input bs data this time around, but I might ask for the accomodation described in there if need arises. Check my payment history - sorry for the extra work I cause from time to time, but hopefully I am not a net loss in your bottom line!

Please do call me out if I don’t at least follow through with this small commitment.

I decided to include japanese study in this study time.

image

If this chart isn’t a great example of consistency and disclipline, I don’t know what is. :stuck_out_tongue:

1 Like

Nice work!

1 Like

Thanks @adamwolf! As that chart shows 50 days of no study, it was meant sarcastically :stuck_out_tongue: but appreciated nonetheless. Maybe it’s more apparent with a chart showing the delta:

image

I have to catch up with reading and then regularly add new words to study, again, a delta chart of that, which is even worse:

image

Ideally I’d like the first chart to go around ~0.3%/day, to reach a “satisfactory” vocabulary knowledge in about 12 months.

I meant nice work in the last 12 days or so :slight_smile:

1 Like

Thanks :slight_smile:
Another couple of hours left in the UTC day to put some work in and start to test if indeed adding words to study from reading material before doing vocabulary review increases score-increasing-work.

I think this score is really hard to “game” long time, so trying to maximize it should lead to maximizing practical vocabulary knowledge.

As I can change the algorithm and rerun it on old data to regenerate all the charts, I might ask for help from some forum bees on a more direct representation of a level of vocabulary knowledge…

Can you count the number of words you didn’t know and had to look up as a representation of your vocab level? Like if you read 1000 words and needed to look up 20 you know 98% of the words?

I could but it would be less practical. I’d have to introduce the notion of looking up a word to double check vs because I don’t remember the word; or I’d have to do “knowledge checks” or something. I get these charts “for free”, just studying.

What are these charts showing? What program are you using?

The explanation following is likely to be bad. Please do ask for clarification.

The x axis is always “days ago”. To understanding the various y axis:

“japanese vocabulary knowledge score %” explanation:
0% is I know no words. 50% is I know half of the words in a model corpus (mainly comprised of light novels) perfectly, or I know all of the words in the model corpus but not very well, 100% is I know all words in that corpus perfectly.

the “daily change in japanese vocabulay knowledge store %” is just the daily delta in the chart above.

“number of new words under study” shows how many words I had never seen before, are added and I start studying every day.

A homegrown language learning platform that uses an homegrown spaced repetition system (think anki, with some differences) and an aided readed with vocabulary lookups directly connected to the SRS.

TODO:

But that reminds me, I need to make a better corpus. I’m thinking it should be relatively easy and effective to have a folder with the first volume of a number of light novels, and then using a word for the vocabulary proficiency calculation only if it appears in two, maybe three, volumes from different light novels.

This should avoid names and fantasy terms specific to a novel appearing in the calculation.

I should probably also exclude any katakana-only word from the calculation.

Why? I thought most words were katakana-only (but I know almost nothing about Japanese).

Your homegrown platform sounds very cool. What are the differences between your spaced repetition system and anki?

So you calculate the percentage yourself? Is it weighted by frequency or does it consider all words equally?

It’s definitely a small portion of words, but since they’re almost always just the pronunciation of an English word, it feels like “cheating” to say I’ve made progress when I can already understand what most katakana words mean without needing to study them. Examples: puroguramu → (computer) program

As a guideline: Kanji → what most words are made of. Hiragana → grammatical pieces and some common words of japanese origin. Katakana → imported words, onomatopeia and similar.

The easiest way to “patch” around this problem might be to only count word that contain kanji for the purposes of the progress chart.

It’s a slightly different flavour, but it’s the same idea. The intervals are 5m, 25m, ~2.5h, ~10h, ~day, and then they double for every right word. When you get a word wrong, it sets the interval to 30%.

There is no “easyness” modifier that makes intervals become spaced tighter as you get the word wrong many times.

Here’s the biggest difference IMO: the correct answer is shown after x milliseconds (currently 4000, but adjustable and could reasonably go lower). It gets you into a flow in a way that having to decide whether you’ve thought about the answer for enough time just doesn’t. Hundreds of reviews due just seem to disappear without the process being annoying or mentally fatiguing.

Right now it’s weighted by the log of the frequency. The most common words are weighted 3-4 times as much as the least common words.

As of a few minutes ago, the frequency of words is calculated off a little corpus of the first volume of 5 different light novels, and a word is considered and added to the “universe” of words only if it appears in at least two different volumes. This is probably not quite right, but I feel it’s a step in the right direction. It deals with names, fantasy words specific to a single novel and such problems. It brings the total # of words used to calculate the score to about 6000 from about 12000. I feel it would work better if I had a bigger corpus, but we’ll go with this for now…

it looks suspiciously good, I suspect due to:

  1. I also weight by “knowledge” (log of interval), and that weighting is a bit wonky right now
  2. the impact of the most common words and word-looking-tokens.

image

I’ve patched the chart for the time being to weigh a bit less heavily words in the first phase of learning.

image

I think the best thing that these graphs show is that as long as the criteria I choose to evaluate my knowledge are sensible and I study regularly, I get a monotonically increasing chart that will eventually get to 100%.

I can probably figure out a good “shape” out of daily, consistent study by coding up a monte carlo student and seeing how the graphs look under various scoring conditions. The only things that are going to change by changing scoring functions are probably the early and late “shapes” of the chart.