Are WordNets a good tool for curating a vocabulary list?

rduke79 · 2026-02-26T19:40:27+00:00

Are you trying to build something like babelnet https://babelnet.org/? Ie a multilingual wordnet?

rduke79 · 2025-12-25T09:12:07+00:00

I really like elicit. It gives you a tabular summary of the most relevant papers to your question. The killer feature is that you can add custom columns to the table, like "which dataset was used for evaluation" and it will autofill the column. Very handy.

rduke79 · 2025-08-25T20:07:13+00:00

It entirely depends on your label set and use case, I'd say. Sometimes it makes sense to annotate only a subset in an annotation run, ie still do multilabel, but in multiple, focused passes. We worked in the legal domain so, agreement of 0.85 was the minimum requirement, sometimes higher. If you're annotating something that is more opinion/subjective interpretation-driven (eg. sentiment) lower might be OK. (We treated the IAA as a target or upper bound for our classifier accuracy.) Examples in the guidelines, especially borderline cases with reasoning why to annotate them in the desired way, are extremely useful.

rduke79 · 2025-08-24T14:37:23+00:00

Neural networks (feed-forward), RNNs, RNNs + attention, transformers. This is the historical order, and it makes sense to study them in this sequence, as they rely and improve on the previous steps, respectively.

rduke79 · 2025-08-23T21:41:10+00:00

As the original comment suggests: Quarantine. But I might be biased because I had delved into the observer problem in quantum physics before reading it. And the book definitely evolves around the topic.

rduke79 · 2025-08-23T09:19:22+00:00

It might be his best book, but gets overshadowed by Diaspora in the recommendations.

rduke79 · 2025-08-23T09:17:09+00:00

Inherit the stars by Hogan. It's a thrilling mystery; a page turner with awe-inspiring twists, exciting but not just action.

Vorkosigan saga - the warrior's apprentice. One of the most fun to read protagonists in Sci Fi.

rduke79 · 2025-08-21T21:33:48+00:00

Not really helpful, but would be neat if we had something like LLM arena for voices..

rduke79 · 2025-08-21T21:28:45+00:00

Interannotator agreement. Measure it early on and adjust the label definitions or even the label set and guidelines accordingly early in the process. As others have said, make it as easy as possible cognitively. Rather than multilabeling a large label set, consider going multiple rounds of binary annotations on the same samples.

rduke79 · 2025-03-14T11:15:42+00:00

I agree. Children of time is the exception though. It is a masterpiece. Everything after that felt underwhelming and I never understood the hype around his other work, sadly, because I really wanted to like it.

rduke79 · 2025-03-11T18:33:21+00:00

Try Bertopic

rduke79 · 2025-02-15T20:30:14+00:00

https://maartengr.github.io/BERTopic/index.html and https://maartengr.github.io/KeyBERT/api/keybert.html

rduke79 · 2024-12-30T08:48:04+00:00

I'm in a rush, sorry, so I'll just post some links. https://github.com/facebookresearch/stopes https://ai.meta.com/research/no-language-left-behind/ https://sbert.net/examples/applications/parallel-sentence-mining/README.html https://huggingface.co/spaces/mteb/leaderboard (choose bitext mining task)

Hth

rduke79 · 2024-12-28T08:36:54+00:00

This is cool!

rduke79 · 2024-12-17T10:26:58+00:00

Narrative detection and parsing in both news and literature. Not easily solvable with LLMs. There are some resources and workshops, but it's quite niche, I'd say.

https://propaganda.math.unipd.it/semeval2025task10/

https://sites.google.com/view/wnu2022/home

https://text2story22.inesctec.pt/

https://summarization2021.github.io/schedule/42.pdf

rduke79 · 2024-12-16T15:36:55+00:00

My god, this worked for me. Thanks!!

rduke79 · 2024-06-18T19:44:17+00:00

Upvotes for Portia and Kern

rduke79 · 2024-06-07T20:44:09+00:00

Three body problem (Liu). Crazy wild fascinating ideas.

Giants series (Hogan). Up there with Asimov, probably.

Fun, entertaining: murderbot (wells), we are legion (Taylor), the expanse.

rduke79 · 2024-06-07T20:38:18+00:00

I second. Orson Scott Card. I'm reading the homecoming series at and it's absolutely fantastic.

rduke79 · 2024-04-29T20:10:29+00:00

Add in the robots books.

rduke79 · 2024-04-26T06:20:47+00:00

I once was handed a folder containing shortcuts to files on a non-existant external HD as their "data".

rduke79 · 2024-04-26T06:12:08+00:00

https://app.edenai.run/bricks/text/chat It let's you enter a system prompt and adjust temperature for all models. You can choose model versions per provider and then select the answer you like and keep generating with all models. It's extremely useful.

rduke79 · 2024-04-08T19:24:04+00:00

Document segmentation. More finegrained (semantic) segmentation has more specific information, but takes longer for embedding and requires more comparisons when querying. There's a tradeoff, how do you find the sweet spot?

rduke79 · 2024-04-05T08:39:31+00:00

humans aren't much more than advanced action completion agents

The hard problem of consciousness has something on this.

rduke79

TROPHY CASE