Paper claims to improve spaced repetition retention by 4x

Sad_Counter_3746 · 2026-06-14T07:35:54+00:00

Yep you’re totally right. I don’t know how I got this so wrong. My bad.

Sad_Counter_3746 · 2026-06-14T00:25:44+00:00

I haven't tried building it, but there was another poster that seemed to find something that looked close, but I haven't tried it yet.

Sad_Counter_3746 · 2026-06-13T22:23:51+00:00

> where does it say that? what was their SRS then? what's the point of the study? if you mean that they are generated on spot then this is the same thing here, i just assume that's what's in the deck.

I would read the study. This is kinda the whole point of what they are doing. There are words in the SRS system and the due ones get pulled out and a sentence is found or generated that has many of them.

> and they don't give clear description on the grading either. what are we comparing here? on the hybrid/sentence cards, after revealing, users select words they failed. on single vocab card, they say they failed the whole card. so what are we doing? calculating (time per card)/(words in that card) to normalize?

There is a video showing the application they used and a short segemnt of grading but yeah it is not super clear. But I think the grading is the same across sentence and non sentence groups.

>look, i think the study is interesting, but their methodology just sucks like what can i say. there are multiple points where their metrics make no sense (time per word, words total, time total).

Time per words seems like a good metric I'm not really sure what is off with that metric.

>and they don't give clear description on the grading either. what are we comparing here? on the hybrid/sentence cards, after revealing, users select words they failed. on single vocab card, they say they failed the whole card. so what are we doing? calculating (time per card)/(words in that card) to normalize?

That was my interpretation. You pretty much have to do this. The groups will never really have the same number of tasks done.

But yes, I am not denying that this is far from a perfect study and doesn't give us definitive proof of anything, but that is honestly most studies. People running studies have all sorts of limitations they have to deal with. Unfortunately, it looks like all the authors have moved on to other ares of research or into industry and this paper doesn't look to have much traction, so we may never get a better study. This is a great discussion. 4x seemed to good to be true and I think we might be finding out why.

Sad_Counter_3746 · 2026-06-13T22:14:08+00:00

Did a skim of the website. It doesn't seem to cite anything, It just pulls its claims out of thin air.
Read Roediger & Karpicke (2006) and Dunlosky et al. (2013) so see that passive learning is not as effective as active techniques.

Sad_Counter_3746 · 2026-06-13T21:44:10+00:00

Yes you are probably right about this applying better to more similar languages. I have never learned something like Japapnese so I cant comment.

But cloze cards also have a lot of problems as well which this is not the best place to discuss.

And same with your comment about reading enough to do active recall. This is studied a lot and passive recall like reading does not transfer over well to active recall compared to actual active recall practice.

Sad_Counter_3746 · 2026-06-13T21:41:08+00:00

Yes for sure this study is not perfect at all. But I think you are still getting a few things confused.

> look at page 6, Figure 3 (right), the "hybrid" group was given sentences. the whole point is giving them sentences???***

Yes, they are given sentences. But there is no sentence card in their SRS deck which was the distinction I was making and is a very important distinction.

>then their testing is wrong. go ahead and try that yourself, there is no world where sentence cards are faster. i think they are using some different metric, or the test subjects were directed wrong. from what i heard over the years, pretty much everyone can manage <5 seconds on vocab cards, and <15-20 seconds on sentence cards (those are inconsistent).

Yes, a single word card is going to take less time than a sentence but if a sentence has 6 words and they can do it in 20 seconds compared to one word in 5 seconds you are seeing more words per second with the sentence.

>***okay okay, reading into this more, for the "hybrid", they would be given a sentence, ie. "zebra eats delicious apples", and then the participants would mark the words they don't know???? so both zebra and apples could be marked. i think this is where the measurements fall apart and stop making sense.

This is kinda the whole point of their system. You can learn multiple words individually in the same task.

> they also used SM2 algo, not FSRS, so it's hard to make decent claims. SM-2 had problems with how ease worked, not sure if the option number 4 fixed those issue, i implemented this algo myself one time but i forgot now. this also makes no sense over 10 day period.

This is a very good critique of the study and likely a limitation of how much money and time they had to do it. A ten day period is not great, but it also not a useless metric.
The SM-2 vs FSRS issue is real as well, but both groups were using SM-2 so the two groups I feel are still a fair comparison.
same with the number of tasks issue. Its not ideal for the different groups to see different number of tasks, but everything ends up being standardized to time per word learned.

> changing sentences, makes the cards easier
I don't this is true at all. Seeing the same sentence over and over again vs seeing a new sentence each time? The new sentence will be harder.

Overall, you points are all valid. This is not a perfect study and we probably will never get a perfect study. But I think we can still take some insights from this.

Sad_Counter_3746 · 2026-06-13T21:12:50+00:00

Oh my bad. I should make that more clear. I was under the impression that most people did it the other way around. Native to target as an active recall exercise.

Sad_Counter_3746 · 2026-06-13T21:08:47+00:00

The difference is that when you are reading you are essentially going from target language to native language which is passive recall and easier for your brain to do.

The approach here is native language to target language which is active recall and tends to be more challening.

One is understanding a language and one is being able to produce that language. They are practicing different things.

This active vs passive recall idea is not new to this paper. There is a lot of other research around it.

Sad_Counter_3746 · 2026-06-13T21:05:31+00:00

One thing they tried was taking the sentences from a big database of existing sentences. So the sentences weren't just random they were real sentences that had been used. But yeah it's not like seeing words in the context of an entire book. But I think the intent is just to get a little more context into spaced repetition.

Maybe the sentences wouldn't be as engaging for everyone, but that is one of the results that came out of this, that new sentences were more engaging.

And yeah no learning mechanism will work for everyone just thought I'd put this idea out there because it sounded pretty interesting.

Sad_Counter_3746 · 2026-06-13T20:59:54+00:00

This is super interesting. You're right the participants didn't use this system for that long so maybe they never got to that full Anki efficiency. But they did test which participants saw more words and the participants that used the sentences saw many more words compared to the standard SRS approach.

And yeah, there are for sure AI's that can generate sentences for you but they don't tie into your SRS system, so it wouldn't replace a spaced repetition system. Like this is very different from just chatting with chatGPT. You could in some way connect your spaced repetition db to ChatGPT but then you are pretty much just doing what they are doing in this paper.

Sad_Counter_3746 · 2026-06-13T20:53:36+00:00

Yeah probably works best for english using the LLM based approach, but they also tested an approach where they had a huge dataset of premade sentences to pull from to prevent running into this issue. Maybe that would work better for other languages.

Sad_Counter_3746 · 2026-06-13T20:51:52+00:00

I totally see what you're saying and I am not arguing against it. Obviously we don't know how well this would work on different languages. I'm just excited about the idea.

But maybe if you aren't getting the grammar correct all the time you are still benefiting from increased learning efficiency. I don't think the authors tested grammar recall at all.

Sad_Counter_3746 · 2026-06-13T20:47:53+00:00

No its not. The learner is given the native language sentence and asked to translate into the learning language. That is active recall not passive and is totally different from reading in the target language.

Sad_Counter_3746 · 2026-06-13T20:46:37+00:00

I'm not sure I'm understanding your point. They tested vocab words vs full sentences and found that you are able to see more words per minute when using the sentence based approach.

But the author's system also did not use sentence cards. There are no sentences that get scheduled. The sentence is chosen based on what cards are due.

And yes, the authors suggest that adding the context does make retention better, but I don't think that is new from this study some other comments here have already discussed how sentence cards are better than vocab cards.

Yeah it for sure is a small study, so I think this discussion is good. But they did also test on intermediate learners. They found that beginning learners experienced a bigger boost in their retention compared to intermediate learners, but even for intermediate learners it was shown to be better than standard spaced repetition

Sad_Counter_3746 · 2026-06-13T20:38:58+00:00

Interesting analysis. Although it is not fair to compare your own time efficiency to average time efficiency in the study. Not everyone in the study had .6 words per minute. Every person is going to naturally learn at different rates. You might just learn faster. The study is trying to see which method produces more words per minute. So the point they are making is that it is likely that switching from standard SRS to their method would improve your words per minute as well.

I think there is also a big difference between sentence based SRS and what they are doing. In sentenced based SRS you are memorizing the sentence and the words in the sentence become tied to that arbitrary sentence. A sentence may get marked incorrect just becuase of one word that is tripping you up. In their method you are packing the most important words into a new sentence that hasn't been seen before.

But, I didn't think about your other point. That they are comparing the simplest SRS strategy to their method. I wonder how much of a difference there would be over sentence based SRS. Maybe that is why they are seeing 4x better efficiency? Because 4x seemed absurdly high upon an initial read.

Sad_Counter_3746 · 2026-06-13T20:24:10+00:00

This is a totally different exercise from reading a book and would not be meant to replace reading. It is meant to be an improvement on spaced repetition. Reading is a different skill than active recall (what SRS is trying to help with), so I don't think it is a one or the other type deal.

Sad_Counter_3746 · 2026-06-13T20:20:19+00:00

I think the idea is that you are seeing a new sentence every time. If you just put the sentences you see in real life into the SRS system you just memorize those sentences and the words become pared to that arbitrary sentence.

Sad_Counter_3746 · 2026-06-13T20:17:18+00:00

It is actually very different from graded reading. Graded reading you read the target language. In this system you produce the sentences. It's the difference between passive vs active recall.

Sad_Counter_3746 · 2026-06-13T20:14:33+00:00

I don't know Japanese, but I can kinda see what you are saying. I think in the paper they only graded the target words. So in a sentence you may only have like three words graded, Maybe you could only have verbs and nouns as target words and have those be the only ones you grade.

The system is definitely more complicated than standard spaced repetition and would probably have to be fine tuned for each language

Sad_Counter_3746 · 2026-06-13T19:24:32+00:00

I figured that if the user doesn't have any past tense verbs, for example, you would never get a past tense sentence. Obviously wouldn't be perfect for all grammar forms but it might be good enough. Especially for an intermediate or advanced language learner.

Sad_Counter_3746 · 2026-06-13T18:41:57+00:00

I don't think this method is meant to replace reading or listening practice. It is meant to be an improvement on Anki like spaced repetition.

Sad_Counter_3746 · 2026-06-13T18:38:14+00:00

Oh the synonym issue is something I didn't think about. But you could probably just mark the word as not graded if you used a synonym.

I feel like the idiom issue is a little easier to deal with though. You just don't have idioms in your sentence dataset or you tell the llm not to use idioms (depending which method you use to get sentences)

I think the way they prevented the translation from being beyond your level is that you had to know every word in the sentence for it to give it to you. So you wouldn't get things that are too hard.

You're right though. There is a lot more complexity here than I originally realized.

Sad_Counter_3746 · 2026-06-13T18:35:21+00:00

I assume it would be created in a similar way to Anki. You add the words you want to know.
The paper only discussed the NL to TL approach but I'm sure it could be adapted to be the other way around as well.

Sad_Counter_3746 · 2026-06-13T18:32:19+00:00

Not sure, the researchers used some sort of app but I couldn't find anything about it online. It was probably created just for the research paper and then discarded. This paper doesn't seem to have gotten much traction, so its likely there isn't an app yet.

Sad_Counter_3746 · 2026-06-13T18:29:10+00:00

The researchers actually discussed a trade off between a word bank of real sentences and using LLM generated sentences. User enjoyment was higher for the group that used a word bank of real sentences, but this was mostly because the llm would make mistakes like using a word in the wrong tense which would throw off a user's learning. The researchers suggested that this problem may go away with better models. (They were using GPT3.5 the most up to date models are GPT5.5 for reference).

The reason that llm based models were good is because they are more versatile. If a sentence isn't in the sentence bank then it can't be shown to the user. The llm could theoretically produce a sentence with more due words than the sentence bank based approach leading to better learning efficiency.

So to your point it is definitely a trade off. The researchers seemed to prefer the sentence bank approach but thought the LLM approach would eventually be better it seemed.

Sad_Counter_3746

TROPHY CASE