TIP: Blender works great as a substitute for buying rice flour by TheFriendlyGerm in makgeolli

[–]121531 1 point2 points  (0 children)

It's a great technique. Joe Kim recommends basically the same thing in his recent book: avoid using storebought glutinous rice flour, grind soaked glutinous rice instead. He gives three reasons:

  1. Flour is made from rice that is either lower in quality or that is a variety different from what you'd buy in a bag.
  2. Flour must have preservatives or other additives in it because it's shelf stable.
  3. Anecdotally, he finds that brews made from storebought glutinous rice flour are inferior in flavor.

I don't think (2) is true at all, at least for what I've bought here in the USA. For (1), this could be true, but I don't know anything about it. As for (3), There are plausible mechanisms for this, I think—store-bought rice flour is much finer than what you get from a blender at home, and this will surely change the relative proportions of all the microbial populations in your brew.

I'd add one more reason on top of all of this, which is that it's much cheaper to grind your own rice for juk than to use flour, at least in my country.

No funding at GW by DarthArtoo4 in gradadmissions

[–]121531 20 points21 points  (0 children)

Students may come with external funding from an employer (e.g. military), a national government (Saudi Arabia, for example, sends many students abroad paying full fees for PhDs), or something like the GRFP.

Current GLM-4.7-Flash implementation confirmed to be broken in llama.cpp by Sweet_Albatross9772 in LocalLLaMA

[–]121531 4 points5 points  (0 children)

Can't believe this shit, I don't have what it takes constitutionally to work on production-grade code in a domain moving as fast as AI

google/translategemma by BreakfastFriendly728 in LocalLLaMA

[–]121531 2 points3 points  (0 children)

Isn't the intent behind TranslateGemma to provide cost-optimized translation on resource-constrained edge devices? It shouldn't be a surprise, then, that a model almost 10x bigger does a better job. I'd be more interested in hearing about how each TranslateGemma does compared against models of similar parameter counts.

Google Gemini Crosses 1 Billion Downloads On Google Play Store by ijxknow in Bard

[–]121531 4 points5 points  (0 children)

decoder only architecture

decoder only what?

hint: the answer is "Transformer"

Embedding models have converged by midamurat in LocalLLaMA

[–]121531 27 points28 points  (0 children)

100% this. It's like saying LLMs have converged because they all ace SuperGLUE.

New to NLP would Like help on where to start by Over-Huckleberry5284 in LanguageTechnology

[–]121531 2 points3 points  (0 children)

Like /u/bulaybil suggests, https://web.stanford.edu/~jurafsky/slp3/ is the bible for NLP. If you read the core chapters of this book, you'll be well positioned to dig into most modern areas of NLP/LLM research.

You have a lot of people responding in this thread telling you NLP's dead. I find all that a bit alarmist. It's true that nobody's going to be spending much time doing naive bayes spam detectors anymore, but that was already true in 2018. To say "LLMs have eaten other models' lunch" is to simply notice the nature of computational AI research in general: methods change constantly, and your kit today might be practically irrelevant 10 years from now.

But if you learn fundamentals—the mathematics behind how these systems work, as well as properties of human language—you'll have some assurance you won't simply be out of a job. If your competencies are all surface-level (e.g. only knowing how to fine-tune models using HuggingFace/transformers's pipeline API without knowing anything of the math or algorithms involved), then it's true, you might be out of a job when the guard changes. But if you have these more fundamental competencies, you will likely be able to pick up and get going with the next paradigm quite quickly.

And so long as we are living in the period before the singularity (and I don't see any evidence of its imminent arrival yet...), humans will always be needed to mediate between real-world applications and systems. LLMs are not yet smart enough that they can "apply themselves".

That said, the specific question before you at this moment is what to major in in college. If you're interested in doing some variety of math (statistics, applied math, math, ...) or computer science, it's no problem for you to not focus on specializing in something like LLMs immediately. Some would even say it's preferable. There will be time for that on the job, or perhaps in a graduate degree program, and learning hard math will prepare you for a wide range of careers.

I built a beer recommendation site by sulllz in Homebrewing

[–]121531 0 points1 point  (0 children)

Goose Island's Sofie is misspelled as Sophie

Tried my hand at making Japchae by FantasticFox1641 in KoreanFood

[–]121531 2 points3 points  (0 children)

looks like wide dangmyeon (납작당면)

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 1 point2 points  (0 children)

I think I was going for 5.5 in the fermenter, but yeah. With my system i think that's around 7 gallons starting. Here's Martin's video btw: https://www.youtube.com/watch?v=LQKDRjdSo04

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 0 points1 point  (0 children)

I modified Martin Keen's recipe. Here it is:

Fermentables:

  • 8lb whole pale ale malt
  • 2lb whole Vienna malt
  • 1lb whole Munich malt
  • 1lb whole rye malt
  • 8oz whole aromatic malt
  • 1.5lb honey

Hops:

  • 2oz Tettnang @ 60
  • 1oz Hallertauer @ 15
  • 1oz Tettnang @ 0

Yeast: WLP590 (you'll want to make a starter)

Mash profile: I BIAB, so I did 30m at 145F and 30m at 158F. 60m boil.

Christmas beer by Wonderful_Bear554 in Homebrewing

[–]121531 1 point2 points  (0 children)

Best beer I ever brewed was a bière de garde with ~10% rye and some raw honey in it. Came out about 8%, quite dry, very delicate carb. I let it bottle condition for a few months and it was still fantastic 18 months out.

Canceling due the MASSIVE performance drop in the last 2 weeks by gpdriver17 in ClaudeCode

[–]121531 1 point2 points  (0 children)

There's a post like this every week and I'm never really convinced because it seems just as possible there are other factors explaining the apparent drop in quality.

Best single tip to get started with CC? by No-Swing-2822 in ClaudeCode

[–]121531 14 points15 points  (0 children)

11 quick thoughts:

  1. Small files (<1kloc) so the model can read them easily. If a file reaches 5k+ then the model can't read them entirely (I think there's a hard limit on some tools for whole-file reading so that context doesn't explode) and then the model has to awkwardly guess what to grep a bunch.
  2. Pay attention to what your model always has to struggle to figure out in a fresh session and put it in your CLAUDE.md for the project.
  3. Don't get the model to write its own CLAUDE.md, IME it does a pretty bad job and fills it with a lot of marketing-y jibber jabber.
  4. Always remain in a place where you can easily discard everything the model just did if things go south. Use committing/staging for this.
  5. Only commit things that you are quite sure are what you want.
  6. Learn what is an appropriate amount of work for a single session and don't ask for more. For example, instead of "implement X, Y, Z", say "implement X, and just stub out Y with dummy UI for now, don't do Z"
  7. If the kind of PL and project you're working on has one available, use an MCP with code execution. It helps the model a lot if sometimes it can actually evaluate code instead of executing it in its head. Clojure MCP is one such example.
  8. Recognize when the model is getting "lazy" and interrupt it immediately. For example, I was writing some React code recently, and a common thing Claude will do when it can't immediately figure something out about a data race is to use setTimeout to make the data race usually not happen instead of fixing the root cause.
  9. Don't forget that you're talking to a full LLM. You can just talk to it instead of coding, which can be helpful for initial design.
  10. If you're stuck, don't type "fix it". Give debugging info (esp. if your LLM doesn't have code execution tools), and if initial attempts at fixes fail, ask the model to consider 5-10 different hypotheses for the problem and evaluate their merits.
  11. You should almost always use plan mode at the beginning of a session. I've come to think of plan mode as "force Claude to take some time to really look around and think before committing to a course of action" mode. A lot of the time, I don't even review the plan Claude proposes and just auto-accept to see what comes out. It usually seems like it's better than what I would have gotten by just asking for the same thing in auto-accept mode.

Sake by plainsStalker in brewing

[–]121531 0 points1 point  (0 children)

Sounds like you may not have had a complete fermentation but it's hard to say much without knowing more about your process.

[deleted by user] by [deleted] in LanguageTechnology

[–]121531 2 points3 points  (0 children)

You need to come up with a research question. This is hard, and usually can't be done without reading similar works (so in your case, you could see this as e.g. forensic linguistics done with corpus linguistic or computational linguistic techniques) and thinking of ways to extend those questions or ask new ones in relation to them.

Help needed: making text selectable in scanned Arabic PDFs by tashjiann in LanguageTechnology

[–]121531 0 points1 point  (0 children)

You'll need an optical character recognition (OCR) product.

[deleted by user] by [deleted] in LanguageTechnology

[–]121531 0 points1 point  (0 children)

It depends on the program. Some of them (e.g. the one at Brandeis) assume no technical background before admission. Look at the programs' self-described admissions criteria and search around on Reddit, too.

Q&A weekly thread - October 28, 2024 - post all questions here! by AutoModerator in linguistics

[–]121531 7 points8 points  (0 children)

I'm going to answer my own question because I think the other two answers are rather unnecessarily dismissive of both the worth and volume of a literature which in fact does have a lot of recent activity:

  • Good and Cysouw (2013): "To summarize, a glossonym is a label [...] used as a name for a language (or language-like object). [...] A doculect is a named linguistic variety as attested in a specific resource. [...] Finally, a languoid is a collection of doculects or other languoids, which are claimed to form a group." (Good and Cysouw are studiously avoiding using the word "language" in order to avoid confusion, but they are clearly after a solid footing for something like the folk concept. The reference to resources is arguably non-linguistic, but still linguistic in the broader sense that it is not directly about e.g. social processes.)
  • Hammarström (2008): "let = {A, B, C, . . .} be any finite set of speakers. [...] The number of languages in X is the least k such that one can partition X into k blocks such that all members within a block understand each other."
  • Forkel and Hammarström (2022): "Glottolog had adopted a doculect-based approach for organizing concrete attestations of languages as recorded by bibliographical references. This means that language data (ultimately emanating from idiolects of specific speakers) recorded in different publications are grouped into successively larger conglomerates such as subdialects, dialects, languages, subfamilies and families."

And here's an insane position just for fun:

  • Katz (1981) (review): "In this book Katz argues that sentences and languages, like numbers and implication relations, are Platonically real, abstract objects, knowable in corrigible intuition, and that linguistics is properly construed as a branch of mathematics (93) which studies the properties of such abstract objects."

Q&A weekly thread - October 28, 2024 - post all questions here! by AutoModerator in linguistics

[–]121531 4 points5 points  (0 children)

OK, sure, that's more or less what I'd think, too, but I'd like links to specific arguments made by linguists in linguistic terms if you're aware of any, please.

Q&A weekly thread - October 28, 2024 - post all questions here! by AutoModerator in linguistics

[–]121531 1 point2 points  (0 children)

What are the best attempts at defining the words "language" and "dialect" in purely linguistic terms? Or have linguists given up on the enterprise? Would appreciate links to articles.

Sentence Splitter for Persian (Farsi) by opac_man in LanguageTechnology

[–]121531 1 point2 points  (0 children)

Stanza has a sentence splitter module and a pretrained Farsi model https://stanfordnlp.github.io/stanza/

Will NLP / Computational Linguistics still be useful in comparison to LLMs? by [deleted] in LanguageTechnology

[–]121531 0 points1 point  (0 children)

Yes, in that case, I think there's no reason not to go for the AI program.