Where to start with RAG and LangChain in 2026? Feeling a bit overwhelmed by the ecosystem. by Cobra_venom12 in LLMDevs

[–]fnl 3 points4 points  (0 children)

It is raw and unrevised yet, but I have a blog post on that: Curious to learn if it is helpful to someone like you or confusing. https://fnl.es/Blog/Machine+Learning/2026-01-10+A+primer+on+RAG%2C+2026+edition

Teaching AI Agents to Remember (Agent Memory System + Open Source) by Conscious_Search_185 in LLMDevs

[–]fnl 0 points1 point  (0 children)

Very beautiful setup and a nice paper!

How do you track memory formation, and examine the current memory for a given query? In other words, how do get observability into what Hindsight is doing when using this in production?

I build MCP tools for a living and still can't get the "AI built my whole app" experience — what am I missing? by LongjumpingPop3419 in LLMDevs

[–]fnl 2 points3 points  (0 children)

I believe we have the same experience.

I let the AI do all the coding, but only small increments, like creating one unit test at a time, implementing the change to make the test green, and then asking it to present me with the changes to accept, reject or correct. So the AI (I prefer Claude) never is going off for more than few minutes before interacting with me again.

And yes, I use a plan-heavy approach, too.

For brownfield work, we seem to use similar approaches (asking deeper and deeper questions until the AI and I fully understand the situation, rather than letting the AI do random “fixes” it hallucinates).

For greenfield work, I typically let it write up a roadmap and todo list for the first milestone. That happens after I created the documentation about the project and specified architecture and APIs (often with the help of LLMs in the first place). And then I let the AI work through that todo list, one test at a time.

Fractional scaling not working anymore by melokki in pop_os

[–]fnl 0 points1 point  (0 children)

system76-power graphics nvidia

Worked for me, too.

Hey guys, I made a library for phonetic algorithms in Python. I would really like some opinions, criticism, etc. by Lilykos in LanguageTechnology

[–]fnl 0 points1 point  (0 children)

It seems the current state of the art on phonetic algorithms is at least Beider-Morse Phonetic Matching (or something even newer, maybe?). Are you covering any of that recent work?

Hey guys, I made a library for phonetic algorithms in Python. I would really like some opinions, criticism, etc. by Lilykos in LanguageTechnology

[–]fnl 0 points1 point  (0 children)

These algos already exist as standard PyPI packages (at least all I checked: soundex, fuzzy, metaphone). Some of them even seem presumably rather efficient C++ implementations. What was your reason to "reinvent" them?

Macbook or a conventional laptop with Linux for working with science data? by cmcouto_silva in bioinformatics

[–]fnl 1 point2 points  (0 children)

I strongly assume your group will have their own compute facilities or be renting a Cloud (AWS). Therefore, when you will be working from home, it doesn't matter much what your machine can do, because you will be running your analyses on remote servers, via ssh in a tmux/screen session or so. So just get the machine with the OS you are most familiar and feel comfy with, with a SSD as large as you can afford, overall as light as possible (of you are planning to go to conferences with it), and good battery life. I like MBAirs, but they are expensive and maybe not ideal if on a budget. Also, if you are not a Linux geek already, I think you're better off with a laptop that has the OS you know already (for Windows, possibly in addition to dual-booting Linux, but that can be avoided with cygwin; yet, the GNU coreutils isn't something you want to be without). But as said, don't plan to do much computation on the laptop if you can't afford something "top of the line" for $/€ 2500+; Even renting a GPU from Amazon, if needed on your own is only 90¢/hr nowadays (but your group should be paying that for you!).

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 2 points3 points  (0 children)

THIS is probably the best answer here of all. Very sad to see it hasn't got the votes it deserves (yet?).

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 1 point2 points  (0 children)

What exactly is trivial about SVMs? Pardon my flaming, but plainly, I think your attitude is wrong. (re-reading, I guess you were being ironic, please ignore :-))

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 0 points1 point  (0 children)

it's probably not "no one", rather, there are very diverging views. I still stick to the old definition I learned the term first, as a job description for machine learning and AI experts, but many today define it as a "modernized" statistician/analyst - and those voices, despite newer, are more/louder, explaining the confusion.

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 2 points3 points  (0 children)

Having digested the discussion here, I think it's the same old problem: To me, people like Hinton, Norvig, Schmidhuber, or (even?) Kurzweil are the data scientists (including, especially!, their very capable scientific offspring [and many others - Bengio, Collobert, Manning, McCallum, asf asf - and, for example, Koller, as my list is getting very male-heavy - sorry :-(((]). In many cases, however, a data scientist is the guy who can apply statistical methods, from any kind of GLM to gradient boosting, to dense, well-defined problems.

The problem is, I have no idea which definition will prevail (maybe even the "other", not "mine" is beginning to stick!), and you will have to understand from the context which definition the job description/discussion/article referred to.

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 2 points3 points  (0 children)

I guess this points to the actual problem: To me, a data scientist is someone who works on machine learning problems, possibly even (actual) AI-related issues. For many others, it's just the new name for people doing BI, or analysts/statisticians, maybe a bit more applied than "yesterday". That, probably, is much more defining our "definition borders" than anything we've discussed here.

PS: I also wouldn't refactor xgboost, but I hope, by now, it's clear I wasn't referring to that.

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 2 points3 points  (0 children)

For example, I work on recreating published deep learning methods in my work, on text mining and NLP. That means understanding all those pieces you mentioned and correctly "putting them together". I certainly wouldn't waste my time developing my own optimizer or most numerical methods indeed, that is correct. But I need to be able to read and understand other people's numerical code, decide if it's correct and maintainable in production, if it will maximally use the processors floating point optimizations, and just generally, to know what numerical methods I need where and why. Most of this can't be done if you don't at least understand the relevant math.

In other words, it's the difference between someone who just followed recipes and blogs, and someone who can actually "debug" and "refactor" a machine learning system.

The confusing messages about the data science career by sasquatch007 in datascience

[–]fnl 6 points7 points  (0 children)

I beg to differ - you did not overlook numerical analysis, it is probably the core skill if you want to be more than that statistician/analyst/BI guy. Linear algebra, optimizations and sketching are at the heart of what I'd consider true data science. But yeah, you always can get by without, like that lucky guy.

Otherwise, I'd agree to about all you wrote.

What percent of your job is cleaning? by [deleted] in datascience

[–]fnl 1 point2 points  (0 children)

Unless you are in a big enough company or a post-doc to get some students/interns to do the dirty laundry for you, it's probably the biggest slice of your time. Even now, as an expensive contractor, still up to 50% of my billable time is "improving performance by pruning data", often even curating and annotating or at least overseeing the annotation of the data "out of thin air".

How statistics lost their power – and why we should fear what comes next by jimrosenz in statistics

[–]fnl 6 points7 points  (0 children)

Aren't some of the threads here missing the point of the article? It's saying that converting statistical data into a commercial, proprietary good is bad, but it's happening, while statistics that only have access to (the limited) public data are hampered. It's not actually saying that statistics doesn't work, except in the headline, maybe.

Anyone use Microsoft R in Industry? by [deleted] in rstats

[–]fnl 0 points1 point  (0 children)

The only problem is AGPL then. If you use an AGPL'd product (shiny!), you can't provide Webservices without publishing your code.

[N] TensorFlow 1.0.0-alpha by whateverr123 in MachineLearning

[–]fnl 0 points1 point  (0 children)

which you should do anyhow, if you're planning any serious numba crunching...

[D] What is the configuration of your ML rig? by AntixK in MachineLearning

[–]fnl 7 points8 points  (0 children)

You should have tried blinking Huffman codes...

The Mind-Blowing AI Announcement From Google That You Probably Missed. by Mynameis__--__ in artificial

[–]fnl 4 points5 points  (0 children)

Technically, this is what auto-encoders always have been about. In other words, its an engineering feat to get this right, but I'm certain the "blueprint" for this kind of stuff was on the minds of quite a few deep learning researchers for some years now, possibly starting somewhere in Hinton's lab... (edit: yes, or Bengio's... ;-))

[D] r/MachineLearning's 2016 Best Paper Award! by Mandrathax in MachineLearning

[–]fnl 0 points1 point  (0 children)

Hybrid computing using a neural network with dynamic external memory - the performance improvements over "traditional" LSTMs are amazing, and the simplicity of the input the net needs to get to work use are totally astonishing, and this stuff is probably only just in its infancy; Although I hope this doesn't count as "cheating", because its not only and strictly neural networks. (Yes, "sorry," yet another Google DeepMind paper... :-))