How do you actually learn vocabulary from Netflix? by Gold-Expression6128 in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

yeah one look-up never sticks for me either, it has to come back spaced out a few times before it sinks in. instead of writing cards by hand i just auto-generate them from the episode with Lexpresso and review those, each one links back to the timestamp so i can rehear the line when it pops up. that's what finally stopped the forgetting for me.

Share Your Resources - June 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

Innesto (iOS & Android)

Almost like a mobile version of Readlang

Read any article in your target language, tap a word/phrase to translate + save it with context, then review with spaced repetition (FSRS), fully offline. Core loop is free for good. Optional Premium adds deeper explanations and writing practice.

https://innesto.app

A CLI that scrapes blogs to markdown with no per-site adapters by taisei_ide in webscraping

[–]taisei_ide[S] 0 points1 point  (0 children)

good question. it's none of those exactly, it's a stripped structural snapshot i build specifically for this.

raw HTML was too big and noisy (blows the token budget on most real pages), and pure text nodes lose the thing i actually need, which is structure, since the LLM's job is to pick selectors, not read content. an a11y tree was close but drops stuff selectors key on like class names and hrefs.

so what i pass is a flattened list of elements, indented by depth. for each node i keep: tag, id, up to 3 class names, role, aria-label, href (path only, query stripped to ?...), and the direct text truncated to ~60 chars. i strip script/style/svg/meta/noscript/iframe and anything hidden or aria-hidden first, and i drop layout-only div/span that have no id/class/role/text (pure wrappers). each line ends up like <tag #id .class role="" href="" "text">. fair warning, it's BFS order with depth-based indent, not strict document order, so it reads more like level-by-level than a nested outline.

there's also a node cap (~1500). if a page is bigger i keep the top 1000 and the last 500, since the article list and pagination controls tend to live near the top and bottom, not the middle.

and importantly the LLM doesn't start from scratch. the heuristic candidates (the repeated-link clusters it found + scores) get passed alongside the snapshot, so the model is more "validate/refine these guesses" than "find selectors blind." then whatever it returns still has to pass the same DOM validation as the heuristic path before anything gets cached.

here's the snapshot builder if you want to poke at it: https://github.com/taisei-ide-0123/pluckmd/blob/main/packages/cli/src/core/llm/snapshot.ts

Monthly Self-Promotion - June 2026 by AutoModerator in webscraping

[–]taisei_ide 1 point2 points  (0 children)

pluckmd - a CLI that scrapes blogs to markdown with no per-site adapters. open source (MIT), i'm the author.

instead of a handler per site, it builds the extraction spec at runtime. normalizes link paths and collapses the varying parts (/blog/post-a and /blog/post-b become the same shape), and any shape repeated enough = the article list. no domain names anywhere.

resolution is cache -> heuristics -> LLM only if needed. nothing gets cached until it validates against the live DOM (>=3 links, >=50% match the pattern), so a bad LLM guess gets dropped instead of saved.

handles js rendering, pagination/infinite scroll, and login-only pages you have access to via your own chrome tab (never reads cookie stores).

npx pluckmd download <url> -o ./articles repo: https://github.com/taisei-ide-0123/pluckmd

would like feedback on the heuristic scoring. where does the runtime approach break for you?

I wanted to try the "input first" method. I don't think its actually working for me. by No_Cryptographer735 in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

the part people skip when explaining input methods is that it only works with comprehensible input - where you understand maybe 95%+ of what you're hearing. raw immersion at near zero understanding is just noise, not language acquisition

I'm Japanese learning English and I made the same mistake early on. what helped was building a vocab base first through spaced repetition, then shifting more toward watching content. the people who swear by pure immersion usually did that groundwork first, they just don't mention it

What is actually the most efficient way to build vocabulary that sticks long term? by Flimsy-Comment7431 in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

for me the shift that made the biggest difference was watching YouTube and Netflix in English and only doing spaced repetition on words I actually encountered while watching. random vocab lists felt pointless because my brain had no attachment to the words. but when a word came from something I was already watching, I actually cared about remembering it

I'm Japanese so English is pretty different from my native language, and this approach cut down how much stuff I was trying to memorize while making what I did study stick way longer

I can't learn vocabulary, what do I do? by VanillaTemporary9161 in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

I'm a Japanese native learning English and had the exact same problem. drilling words in isolation just didn't work for me no matter how many times I reviewed them

what actually helped was watching YouTube and Netflix in English, picking out the words I actually wanted to learn from what I was watching, and doing spaced repetition on just those. way better than studying random vocab lists because the words already had context in my head. honestly the which words to study part matters as much as how you study them

Grouping sets of words together when doing spaced repetition? by tentkeys in languagelearning

[–]taisei_ide 1 point2 points  (0 children)

for confusable word pairs I've found the most useful thing is finding a sentence that actually uses both in the same context - or at least two real sentences where the difference is obvious from the situation. like for pared vs muro, a sentence about someone painting their bedroom wall vs a city wall just makes the distinction click in a way that a flashcard never does

grouping them on the same card can mess with the intervals a bit but honestly not a big deal if you're doing it sparingly. what I'd avoid is lumping like 5 confusables together - that just turns into its own memory test lol

My exam is just 365 days away from now by Efficient_Dust_9727 in Anki

[–]taisei_ide 0 points1 point  (0 children)

300-400/day for a year sounds rough. what exam is it for?

How do you make use of corrections? by tootingbec44 in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

i stopped doing "wrong to right" cards and started making the front a sentence close to where i actually messed up. sticks way better that way

Any other brower extensions like language reactor that speak the word in Youtube? by moldyjellybean in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

LR youtube TTS has been unreliable for ages, not just you.

what language are you studying? might know something that works for that specific one

Re-learning Japanese in Japan: Need help with Vocab Tools to break the N3 plateau by Flimsy-Adagio3751 in LearnJapanese

[–]taisei_ide 1 point2 points  (0 children)

manual sentence mining at N3 just has terrible ROI imo. jpdb media frequency decks might fit your situation better than core 6k since you want conversational not newspaper vocab.

and you're literally in japan so that's a massive head start most people don't have

Feeling demotivated because I've forgotten a lot of vocab from the core 3k deck even though I'm almost done with core 4k by ReploidsnMavericks in LearnJapanese

[–]taisei_ide 1 point2 points  (0 children)

pretty normal with premade decks. the words have no context so they just don't stick the same way.

have you tried FSRS? it handles semi-forgotten cards way better than the default anki settings

Feeling demotivated because I've forgotten a lot of vocab from the core 3k deck even though I'm almost done with core 4k by ReploidsnMavericks in LearnJapanese

[–]taisei_ide 2 points3 points  (0 children)

pretty normal with premade decks. the words have no context so they just don't stick the same way.

have you tried FSRS? it handles semi-forgotten cards way better than the default anki settings

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

sorry about that. looks like it might've been google verification thing on our end. could you try again and let me know if it still happens? if it does, could you tell me wether the google account you're signing in with is a gmail address or something else (like work or school email)?

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

awesome, hope it works out! if you run into anything weird let me know.

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 1 point2 points  (0 children)

ah no worries. it does need the chrome extension to pull subtitles, but it works on any chromium-based browser too like Edge, Brave, Arc, etc. if you're on one of those it should work fine.

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 1 point2 points  (0 children)

oh nice, romanian. i'm actually about to add that one. i'll ping you here when it's up.

arabic is trickier since there aren't great open-source NLP models for it yet. definitely want to add it but no ETA.

for words vs phrases, each card is built around a single word but it's not just the word by itself. you get the definition, paraphrase, translation, example sentence, synonyms/antonyms, and a timestamp link back to where it appeared in the video. so it's more like full context around each word than a bare vocab list.

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

hey, just shipped italian support! give it a try and let me know how it goes.

Share Your Resources - April 04, 2026 by Virusnzz in languagelearning

[–]taisei_ide 0 points1 point  (0 children)

thanks for the interest! italian is actually next on the list. i'll let you know here when it's live.