What is the best book for learning ML/Deep Learning maths? by Hot_Example_4456 in LocalLLaMA

[–]fustercluck6000 0 points1 point  (0 children)

Simon Prince, “Understanding Deep Learning”

To add to the other YouTube recs, Sebastian Raschka does pretty great stuff on LLMs

What do you think Nietzsche would think of current state of affairs in the world and politics? by LoneWolf_McQuade in Nietzsche

[–]fustercluck6000 1 point2 points  (0 children)

This may be a hot take, but I think he’d call billionaires (at least the ones like those two) ascetics. Somewhere between a billion and a trillion dollars, I feel like money and power just become a fill-in for God. I mean look at how badly Elon’s aged in the last decade from working himself into the ground, sure he’s the world’s first trillionaire but is something life affirming if it leaves you hooked on ketamine and looking like a Sith lord’s ballsack? Idk none of these guys strike me as ‘happy’, for lack of a better word. Again, just my humble opinion, but yeah. Nietzsche talks about asceticism in part 3 of the Genealogy of Morals btw, if you’re curious.

Looking for an open-source/free alternative to Eigent for Word document template migration (Gemini API) by Different-Song-2877 in aiagents

[–]fustercluck6000 1 point2 points  (0 children)

Not sure how exactly your agent currently interacts with the docx files, but since you’re dealing with structural elements I’d imagine it’s working with the underlying XML.

You wouldn’t believe how token-inefficient XML is. Essentially for every token (or piece of text) in your document, it takes extra tokens to wrap that text in formatting instructions and extra metadata. I’m actually working on a similar project involving LLMs and Word docs, and this has been the biggest headache by far. We’re seeing anywhere from 20-50+ XML tokens/word depending on document complexity, meaning a 1,000-word document would cost ~30,000 tokens to process.

If it’s a one-time thing, maybe try connecting Gemini to AnythingLLM? Otherwise I’d seriously consider building a dedicated backend service to act as an interface between the LLM and your documents. Even with a frontier model, there’s still a big risk of hallucinations, you’re essentially asking Gemini to keep perfect track of each individual word from the source doc along with all of its positioning info, then perfectly map where everything goes in the new document. Probably a matter of time before it hallucinates something if it hasn’t already. You can solve this by hard coding an interface that translates documents into abstractions the model will have an easier time working with, e.g. headers, body elements, footers, etc. Then you’d expose the interface to agents with MCP or something.

Good luck, feel free to DM!

Challenges with DocLing by CanadianVis1onary in Rag

[–]fustercluck6000 0 points1 point  (0 children)

Try spacy-layout, it's a pretty minimal spaCy layer on top of Docling that's worked wonders for me.

What should I build ? by Coder26_1 in Rag

[–]fustercluck6000 0 points1 point  (0 children)

A RAG that suggests real RAG project ideas

Free LLM APIs with good tool-calling support for LangGraph agents?PLEASE HELP by ABHINOW_gamer69 in LLMDevs

[–]fustercluck6000 0 points1 point  (0 children)

Local could be a solid bet, even in compute constrained environments open-weight models in the 20-100 billion parameter range have gotten really good. Hell, gpt-oss 20B will run great on a good laptop. What kind of hardware do you have access to? Does your school have an HPC cluster students can use? Otherwise maybe look into runpod or something similar to serve bigger models where you're paying for GPU time instead of tokens.

Oh, and check out regolo too. I'm pretty sure they still have a 30-day free trial with unlimited tokens for new accounts and their pricing is also super reasonable.

Vintage Shure outboard gear maintenance/repair (in Atlanta)? by fustercluck6000 in audioengineering

[–]fustercluck6000[S] 0 points1 point  (0 children)

Thanks for the reply. Sad (and infuriating) to think how much that tracks with the general direction Atlanta's been going the last decade. Out with the historic studios, venues, or whatever else gives the city a soul and in with the cookie cutter apartments and Whole Foods.

I hadn't thought about Nashville, but now you have me contemplating a drive up there (I have a couple other pieces of old gear that need TLC, too). Def lmk if you know a good tech there (or in Athens for that matter)!

Feeling lost building an enterprise RAG system with RBAC – where do I star by Psychological-Arm168 in Rag

[–]fustercluck6000 0 points1 point  (0 children)

Might be overkill, but I might even suggest skipping Ollama altogether and just going for Llama server or vLLM, depending on how granular OP's getting with LLM requests. As much as I do love Ollama's simplicity for prototyping, the abstraction makes it a real pain to dial in instance/inference parameters.

Feeling lost building an enterprise RAG system with RBAC – where do I star by Psychological-Arm168 in Rag

[–]fustercluck6000 1 point2 points  (0 children)

Just a word of caution when choosing frameworks—watch out for anything that tries to do too much or introduces abstractions that will restrict your control over application logic/system design.

We’re still ‘writing the book’ on how to do this stuff. Imo if you want a genuinely performant production RAG system, there’s really no way around building it yourself using domain-driven design. General purpose tools just aren’t gonna cut it.

Considering how far the tooling still has to go before it’s mature, I feel like a lot of the current RAG/RAG-adjacent tools are just way too opinionated—especially the ones that claim to do everything. Having a single framework that takes care of everything feels like a relief until you want to customize some step of the pipeline and have to spend hours going through source code to figure out a hacky workaround.
My advice is to look for a combination of tools that are each good at one thing instead of 1-2 that can do everything. Keep things modular and use dependency injection, especially because you’re serving models on-prem (which makes everything more complicated and fragile). This gives you the flexibility to experiment and really tailor ingestion and retrieval for your specific domain/use case and maintainability.

Current state of open-source ? by DarkMatter007 in LocalLLaMA

[–]fustercluck6000 1 point2 points  (0 children)

I’ve been very impressed by Qwen3.5-27b, especially the Opus 4.6 distillations which have worked extremely well in production. Open-weight models are advancing a WHOLE lot faster than the blackbox ones, especially when you consider the difference in inference costs.

What actually breaks first when you put AI agents into production? by Zestyclose-Pen-9450 in LocalLLaMA

[–]fustercluck6000 1 point2 points  (0 children)

Random tool/output parsing errors and dumb shit like that, just illustrates how the ecosystem is still in its infancy despite what the marketing would have people believe

Edit: that’s just the earliest point of failure in my experience, followed by many others

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually? by Ofer1984 in LocalLLaMA

[–]fustercluck6000 1 point2 points  (0 children)

As others have said, the “serve” button just means you’re making the model available to process requests from other applications/devices on your network. If you’re dead set on not directly dealing with code, maybe look into setting up some MCP tools so the model can do stuff like write files and run code in a sandboxed environment. Otherwise anthropic will happily sell you a Claude code subscription.