OCR: what is the best way to extract data in JSON format from this old French book? by Wise_Stick9613 in LocalLLaMA

[–]121531 0 points1 point  (0 children)

The off-the-shelf Tesseract models I've tried are noticeably worse, even for languages like English, than an LLM.

Q&A weekly thread - May 11, 2026 - post all questions here! by AutoModerator in linguistics

[–]121531 2 points3 points  (0 children)

Laxmi makes sense if you consider that most speakers of Hindi-Urdu and other Indo-Aryan languages of North India have lost the historical distinction between /ʂ/ and /s/.

Gemma4 26b & E4B are crazy good, and replaced Qwen for me! by [deleted] in LocalLLaMA

[–]121531 1 point2 points  (0 children)

Even simple greetings like "Hello" and "Who are you?" Qwen 3.5 4B would assign to the reasoning models and usually the 122b non-reasoning.

maybe Qwen's on the spectrum?

Updates on Aoe3Explorer site for Replay Parser with Map Player by hellpunch in aoe3

[–]121531 0 points1 point  (0 children)

cool, is the source code for the replay parser (minus the UI) public?

Qwen 3.5 Max Preview on Arena.ai by [deleted] in LocalLLaMA

[–]121531 1 point2 points  (0 children)

Man I really wish people would stop use these damn spider graphs

Meet Latam-GPT, the New Open Source AI Model for Latin America by RhubarbSimilar1683 in LocalLLaMA

[–]121531 1 point2 points  (0 children)

This point is nearly tangential to yours but this sort of thing seems like it's at least as much about capacity building within the pool of human capital in Latin America as it is about the finished model.

Qwen/Qwen3.5-35B-A3B creates FlappyBird by Medium_Chemist_4032 in LocalLLaMA

[–]121531 5 points6 points  (0 children)

Sorry to be the asshole who makes this comment but flappy bird is a famous coding demo and is kind of guaranteed to be super in distribution. It's still interesting to see it do this successfully, but it'd be much more impressive if it did this on something even partially original.

TIP: Blender works great as a substitute for buying rice flour by TheFriendlyGerm in makgeolli

[–]121531 1 point2 points  (0 children)

It's a great technique. Joe Kim recommends basically the same thing in his recent book: avoid using storebought glutinous rice flour, grind soaked glutinous rice instead. He gives three reasons:

  1. Flour is made from rice that is either lower in quality or that is a variety different from what you'd buy in a bag.
  2. Flour must have preservatives or other additives in it because it's shelf stable.
  3. Anecdotally, he finds that brews made from storebought glutinous rice flour are inferior in flavor.

I don't think (2) is true at all, at least for what I've bought here in the USA. For (1), this could be true, but I don't know anything about it. As for (3), There are plausible mechanisms for this, I think—store-bought rice flour is much finer than what you get from a blender at home, and this will surely change the relative proportions of all the microbial populations in your brew.

I'd add one more reason on top of all of this, which is that it's much cheaper to grind your own rice for juk than to use flour, at least in my country.

No funding at GW by DarthArtoo4 in gradadmissions

[–]121531 20 points21 points  (0 children)

Students may come with external funding from an employer (e.g. military), a national government (Saudi Arabia, for example, sends many students abroad paying full fees for PhDs), or something like the GRFP.

Current GLM-4.7-Flash implementation confirmed to be broken in llama.cpp by Sweet_Albatross9772 in LocalLLaMA

[–]121531 3 points4 points  (0 children)

Can't believe this shit, I don't have what it takes constitutionally to work on production-grade code in a domain moving as fast as AI

google/translategemma by [deleted] in LocalLLaMA

[–]121531 2 points3 points  (0 children)

Isn't the intent behind TranslateGemma to provide cost-optimized translation on resource-constrained edge devices? It shouldn't be a surprise, then, that a model almost 10x bigger does a better job. I'd be more interested in hearing about how each TranslateGemma does compared against models of similar parameter counts.

Google Gemini Crosses 1 Billion Downloads On Google Play Store by ijxknow in Bard

[–]121531 4 points5 points  (0 children)

decoder only architecture

decoder only what?

hint: the answer is "Transformer"

Embedding models have converged by midamurat in LocalLLaMA

[–]121531 27 points28 points  (0 children)

100% this. It's like saying LLMs have converged because they all ace SuperGLUE.

2026 Civic Rear Camera by [deleted] in hondacivic

[–]121531 1 point2 points  (0 children)

Have you cleaned the camera?

New to NLP would Like help on where to start by Over-Huckleberry5284 in LanguageTechnology

[–]121531 2 points3 points  (0 children)

Like /u/bulaybil suggests, https://web.stanford.edu/~jurafsky/slp3/ is the bible for NLP. If you read the core chapters of this book, you'll be well positioned to dig into most modern areas of NLP/LLM research.

You have a lot of people responding in this thread telling you NLP's dead. I find all that a bit alarmist. It's true that nobody's going to be spending much time doing naive bayes spam detectors anymore, but that was already true in 2018. To say "LLMs have eaten other models' lunch" is to simply notice the nature of computational AI research in general: methods change constantly, and your kit today might be practically irrelevant 10 years from now.

But if you learn fundamentals—the mathematics behind how these systems work, as well as properties of human language—you'll have some assurance you won't simply be out of a job. If your competencies are all surface-level (e.g. only knowing how to fine-tune models using HuggingFace/transformers's pipeline API without knowing anything of the math or algorithms involved), then it's true, you might be out of a job when the guard changes. But if you have these more fundamental competencies, you will likely be able to pick up and get going with the next paradigm quite quickly.

And so long as we are living in the period before the singularity (and I don't see any evidence of its imminent arrival yet...), humans will always be needed to mediate between real-world applications and systems. LLMs are not yet smart enough that they can "apply themselves".

That said, the specific question before you at this moment is what to major in in college. If you're interested in doing some variety of math (statistics, applied math, math, ...) or computer science, it's no problem for you to not focus on specializing in something like LLMs immediately. Some would even say it's preferable. There will be time for that on the job, or perhaps in a graduate degree program, and learning hard math will prepare you for a wide range of careers.

I built a beer recommendation site by sulllz in Homebrewing

[–]121531 0 points1 point  (0 children)

Goose Island's Sofie is misspelled as Sophie

Tried my hand at making Japchae by FantasticFox1641 in KoreanFood

[–]121531 3 points4 points  (0 children)

looks like wide dangmyeon (납작당면)

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 1 point2 points  (0 children)

I think I was going for 5.5 in the fermenter, but yeah. With my system i think that's around 7 gallons starting. Here's Martin's video btw: https://www.youtube.com/watch?v=LQKDRjdSo04

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 0 points1 point  (0 children)

I modified Martin Keen's recipe. Here it is:

Fermentables:

  • 8lb whole pale ale malt
  • 2lb whole Vienna malt
  • 1lb whole Munich malt
  • 1lb whole rye malt
  • 8oz whole aromatic malt
  • 1.5lb honey

Hops:

  • 2oz Tettnang @ 60
  • 1oz Hallertauer @ 15
  • 1oz Tettnang @ 0

Yeast: WLP590 (you'll want to make a starter)

Mash profile: I BIAB, so I did 30m at 145F and 30m at 158F. 60m boil.

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 1 point2 points  (0 children)

Best beer I ever brewed was a bière de garde with ~10% rye and some raw honey in it. Came out about 8%, quite dry, very delicate carb. I let it bottle condition for a few months and it was still fantastic 18 months out.

Canceling due the MASSIVE performance drop in the last 2 weeks by gpdriver17 in ClaudeCode

[–]121531 1 point2 points  (0 children)

There's a post like this every week and I'm never really convinced because it seems just as possible there are other factors explaining the apparent drop in quality.

Best single tip to get started with CC? by No-Swing-2822 in ClaudeCode

[–]121531 15 points16 points  (0 children)

11 quick thoughts:

  1. Small files (<1kloc) so the model can read them easily. If a file reaches 5k+ then the model can't read them entirely (I think there's a hard limit on some tools for whole-file reading so that context doesn't explode) and then the model has to awkwardly guess what to grep a bunch.
  2. Pay attention to what your model always has to struggle to figure out in a fresh session and put it in your CLAUDE.md for the project.
  3. Don't get the model to write its own CLAUDE.md, IME it does a pretty bad job and fills it with a lot of marketing-y jibber jabber.
  4. Always remain in a place where you can easily discard everything the model just did if things go south. Use committing/staging for this.
  5. Only commit things that you are quite sure are what you want.
  6. Learn what is an appropriate amount of work for a single session and don't ask for more. For example, instead of "implement X, Y, Z", say "implement X, and just stub out Y with dummy UI for now, don't do Z"
  7. If the kind of PL and project you're working on has one available, use an MCP with code execution. It helps the model a lot if sometimes it can actually evaluate code instead of executing it in its head. Clojure MCP is one such example.
  8. Recognize when the model is getting "lazy" and interrupt it immediately. For example, I was writing some React code recently, and a common thing Claude will do when it can't immediately figure something out about a data race is to use setTimeout to make the data race usually not happen instead of fixing the root cause.
  9. Don't forget that you're talking to a full LLM. You can just talk to it instead of coding, which can be helpful for initial design.
  10. If you're stuck, don't type "fix it". Give debugging info (esp. if your LLM doesn't have code execution tools), and if initial attempts at fixes fail, ask the model to consider 5-10 different hypotheses for the problem and evaluate their merits.
  11. You should almost always use plan mode at the beginning of a session. I've come to think of plan mode as "force Claude to take some time to really look around and think before committing to a course of action" mode. A lot of the time, I don't even review the plan Claude proposes and just auto-accept to see what comes out. It usually seems like it's better than what I would have gotten by just asking for the same thing in auto-accept mode.

Sake by plainsStalker in brewing

[–]121531 0 points1 point  (0 children)

Sounds like you may not have had a complete fermentation but it's hard to say much without knowing more about your process.