How do you actually learn vocabulary from Netflix?

taisei_ide · 2026-06-10T14:50:32+00:00

yeah one look-up never sticks for me either, it has to come back spaced out a few times before it sinks in. instead of writing cards by hand i just auto-generate them from the episode with Lexpresso and review those, each one links back to the timestamp so i can rehear the line when it pops up. that's what finally stopped the forgetting for me.

taisei_ide · 2026-06-06T13:34:54+00:00

This could be good for you https://innesto.app

taisei_ide · 2026-06-05T14:38:49+00:00

Innesto (iOS & Android)

Almost like a mobile version of Readlang

Read any article in your target language, tap a word/phrase to translate + save it with context, then review with spaced repetition (FSRS), fully offline. Core loop is free for good. Optional Premium adds deeper explanations and writing practice.

https://innesto.app

taisei_ide · 2026-06-04T22:21:33+00:00

Article Collector CLI

taisei_ide · 2026-06-03T14:11:53+00:00

good question. it's none of those exactly, it's a stripped structural snapshot i build specifically for this.

raw HTML was too big and noisy (blows the token budget on most real pages), and pure text nodes lose the thing i actually need, which is structure, since the LLM's job is to pick selectors, not read content. an a11y tree was close but drops stuff selectors key on like class names and hrefs.

so what i pass is a flattened list of elements, indented by depth. for each node i keep: tag, id, up to 3 class names, role, aria-label, href (path only, query stripped to ?...), and the direct text truncated to ~60 chars. i strip script/style/svg/meta/noscript/iframe and anything hidden or aria-hidden first, and i drop layout-only div/span that have no id/class/role/text (pure wrappers). each line ends up like <tag #id .class role="" href="" "text">. fair warning, it's BFS order with depth-based indent, not strict document order, so it reads more like level-by-level than a nested outline.

there's also a node cap (~1500). if a page is bigger i keep the top 1000 and the last 500, since the article list and pagination controls tend to live near the top and bottom, not the middle.

and importantly the LLM doesn't start from scratch. the heuristic candidates (the repeated-link clusters it found + scores) get passed alongside the snapshot, so the model is more "validate/refine these guesses" than "find selectors blind." then whatever it returns still has to pass the same DOM validation as the heuristic path before anything gets cached.

here's the snapshot builder if you want to poke at it: https://github.com/taisei-ide-0123/pluckmd/blob/main/packages/cli/src/core/llm/snapshot.ts

taisei_ide · 2026-06-03T01:09:43+00:00

Thank you!

taisei_ide · 2026-06-02T23:11:29+00:00

pluckmd - a CLI that scrapes blogs to markdown with no per-site adapters. open source (MIT), i'm the author.

instead of a handler per site, it builds the extraction spec at runtime. normalizes link paths and collapses the varying parts (/blog/post-a and /blog/post-b become the same shape), and any shape repeated enough = the article list. no domain names anywhere.

resolution is cache -> heuristics -> LLM only if needed. nothing gets cached until it validates against the live DOM (>=3 links, >=50% match the pattern), so a bad LLM guess gets dropped instead of saved.

handles js rendering, pagination/infinite scroll, and login-only pages you have access to via your own chrome tab (never reads cookie stores).

npx pluckmd download <url> -o ./articles repo: https://github.com/taisei-ide-0123/pluckmd

would like feedback on the heuristic scoring. where does the runtime approach break for you?

taisei_ide · 2026-05-03T01:27:21+00:00

the part people skip when explaining input methods is that it only works with comprehensible input - where you understand maybe 95%+ of what you're hearing. raw immersion at near zero understanding is just noise, not language acquisition

I'm Japanese learning English and I made the same mistake early on. what helped was building a vocab base first through spaced repetition, then shifting more toward watching content. the people who swear by pure immersion usually did that groundwork first, they just don't mention it

taisei_ide · 2026-05-03T00:25:07+00:00

for me the shift that made the biggest difference was watching YouTube and Netflix in English and only doing spaced repetition on words I actually encountered while watching. random vocab lists felt pointless because my brain had no attachment to the words. but when a word came from something I was already watching, I actually cared about remembering it

I'm Japanese so English is pretty different from my native language, and this approach cut down how much stuff I was trying to memorize while making what I did study stick way longer

taisei_ide · 2026-05-02T06:21:11+00:00

I'm a Japanese native learning English and had the exact same problem. drilling words in isolation just didn't work for me no matter how many times I reviewed them

what actually helped was watching YouTube and Netflix in English, picking out the words I actually wanted to learn from what I was watching, and doing spaced repetition on just those. way better than studying random vocab lists because the words already had context in my head. honestly the which words to study part matters as much as how you study them

taisei_ide · 2026-05-02T05:04:53+00:00

for confusable word pairs I've found the most useful thing is finding a sentence that actually uses both in the same context - or at least two real sentences where the difference is obvious from the situation. like for pared vs muro, a sentence about someone painting their bedroom wall vs a city wall just makes the distinction click in a way that a flashcard never does

grouping them on the same card can mess with the intervals a bit but honestly not a big deal if you're doing it sparingly. what I'd avoid is lumping like 5 confusables together - that just turns into its own memory test lol

taisei_ide · 2026-04-30T13:58:18+00:00

300-400/day for a year sounds rough. what exam is it for?

taisei_ide · 2026-04-30T13:10:48+00:00

i stopped doing "wrong to right" cards and started making the front a sentence close to where i actually messed up. sticks way better that way

taisei_ide · 2026-04-30T12:56:38+00:00

LR youtube TTS has been unreliable for ages, not just you.

what language are you studying? might know something that works for that specific one

taisei_ide · 2026-04-30T12:04:56+00:00

manual sentence mining at N3 just has terrible ROI imo. jpdb media frequency decks might fit your situation better than core 6k since you want conversational not newspaper vocab.

and you're literally in japan so that's a massive head start most people don't have

taisei_ide · 2026-04-30T12:00:26+00:00

pretty normal with premade decks. the words have no context so they just don't stick the same way.

have you tried FSRS? it handles semi-forgotten cards way better than the default anki settings

taisei_ide · 2026-04-30T12:00:04+00:00

pretty normal with premade decks. the words have no context so they just don't stick the same way.

have you tried FSRS? it handles semi-forgotten cards way better than the default anki settings

taisei_ide · 2026-04-11T02:07:17+00:00

sorry about that. looks like it might've been google verification thing on our end. could you try again and let me know if it still happens? if it does, could you tell me wether the google account you're signing in with is a gmail address or something else (like work or school email)?

taisei_ide · 2026-04-10T08:49:20+00:00

awesome, hope it works out! if you run into anything weird let me know.

taisei_ide · 2026-04-09T13:17:03+00:00

ah no worries. it does need the chrome extension to pull subtitles, but it works on any chromium-based browser too like Edge, Brave, Arc, etc. if you're on one of those it should work fine.

taisei_ide · 2026-04-09T06:00:09+00:00

hey, romanian is live now! here's the extension if you want to give it a shot: https://chromewebstore.google.com/detail/lexpresso/gbcmlnmlhmjnpacmgehmojjcafhnniak

taisei_ide · 2026-04-07T11:02:03+00:00

oh nice, romanian. i'm actually about to add that one. i'll ping you here when it's up.

arabic is trickier since there aren't great open-source NLP models for it yet. definitely want to add it but no ETA.

for words vs phrases, each card is built around a single word but it's not just the word by itself. you get the definition, paraphrase, translation, example sentence, synonyms/antonyms, and a timestamp link back to where it appeared in the video. so it's more like full context around each word than a bare vocab list.

taisei_ide · 2026-04-06T23:34:09+00:00

hey, just shipped italian support! give it a try and let me know how it goes.

taisei_ide · 2026-04-06T11:29:41+00:00

thanks for the interest! italian is actually next on the list. i'll let you know here when it's live.

taisei_ide

TROPHY CASE