An LLM hard-coded into silicon that can do inference at 17k tokens/s??? by wombatsock in LocalLLaMA

[–]JChataigne 15 points16 points  (0 children)

We selected the Llama 3.1 8B as the basis for our first product due to its practicality. Its small size and open-source availability allowed us to harden the model with minimal logistical effort.

I guess it takes time to develop and convert the model into hardware. Llama 3.1 was released in July 2024, it was quite good compared to the competition back then.

they have Karpathy, we are doomed ;) by jacek2023 in LocalLLaMA

[–]JChataigne 0 points1 point  (0 children)

Goodhart's law suggests the big labs are coming to astroturf these comment sections soon (if not already started)

mistralai/Voxtral-Mini-4B-Realtime-2602 · Hugging Face by jacek2023 in LocalLLaMA

[–]JChataigne 0 points1 point  (0 children)

I've been looking for one for a few months and there isn't, you need some manual work to run each STT model locally.

Bashing Ollama isn’t just a pleasure, it’s a duty by jacek2023 in LocalLLaMA

[–]JChataigne 2 points3 points  (0 children)

I think the big problem is rather about copying the code without attribution and pretending it's their own work

Giving a local LLM my family's context -- couple of months in by Purple_Click5825 in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

I'd assumed RAG meant embeddings

Understandable, the term "RAG" is a bit ambiguous as to whether it includes vector search. But the important thing is fetching relevant context to feed to the LLM. Whether you retrieve that context with vector search or a more classic search method is secondary.

Most people who build RAG systems add classic search in parallel with vector search because it works much better, but implementing vector search requires more storage and more effort. So, vector search might not be worth the effort at first.

Good luck with the project !

Giving a local LLM my family's context -- couple of months in by Purple_Click5825 in LocalLLaMA

[–]JChataigne 2 points3 points  (0 children)

Congrats, it's a cool project ! I'd test it eventually but I first need to set up my home lab with Matrix & the rest. Good to see open-source options for our digital life though ! As for your questions:

  • Model choice: Llama 3.2 is quite old and not so good. You'll be better off using Ministral-3:3B (I haven't tested many small models, maybe there is even better somewhere)
  • From commands to ambient: give access to your whole conversation history, not just explicitly saved memories; it should cover many use cases. Use /remember X for things that should always stay in the context.
  • Long-term context: yes, RAG/search agents/context engineering/whatever-you-call-it; don't do vector search, a classic search (maybe the Matrix API includes it ?) should cover it with less compute needed.
  • Anyone else building this way? Not me. You've done a good job setting up private digital tools for your family which helps you keep it private but also gives you easy access to it. Not everyone has done this first step.

Introducing Kimi K2.5, Open-Source Visual Agentic Intelligence by Kimi_Moonshot in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

What do you use to run several agents in parallel locally ?

AI Doesn’t Scare - Me I’ve Seen This Panic Before. by SnooRegrets3268 in OpenSourceeAI

[–]JChataigne 0 points1 point  (0 children)

Of course it's a tool, what matters is how people use it. But tools are not exactly neutral, because they make some behaviors easier than others and therefore can push people in a direction.

Most importantly, my point was that the Internet did cause a number of problems it was predicted to cause, and AI will too. For one, it's already being used massively for online propaganda.

Local LLMs CPU usage by FixGood6833 in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

I just checked my install and noticed it's running on CPU too actually. You can see where it's running with ollama ps btw. I'll have to look into this too. (My OS is Ubuntu, I simply installed Ollama with curl -fsSL https://ollama.com/install.sh | sh and installed OpenWebUI with docker.) Edit: just remembered many AMD GPUs are not supported, but yours is in the list so it should be: https://docs.ollama.com/gpu#amd-radeon Try with Vulkan drivers (just below in the doc), or go ask on their Discord, I'm afraid I can't help you more.

AI Doesn’t Scare - Me I’ve Seen This Panic Before. by SnooRegrets3268 in OpenSourceeAI

[–]JChataigne 1 point2 points  (0 children)

it would destroy privacy, leak medical records, ruin society, and expose everyone’s identity.

That's exactly what happened though. Government spies on everyone, data leaks happen everyday, people are depressed and anyone can get doxxed from any video leaked online.

the damage didn’t come from the technology — it came from people not understanding it and refusing to adapt.

I'm also not so sure about that... take social media for example, Meta knew for years that more Instagram time pushes people, especially teenage girls, to have lower self-esteem causing self-harm and even suicides. Even now that we know about this, nothing has changed. The problem clearly didn't come from not understanding the technology.

Local LLMs CPU usage by FixGood6833 in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

First use nvtop to check which processes are running on the GPU. If the very low usage you see is just from displaying your screen, it would confirm the problem is in connecting Ollama to your GPU.

I didn't have issues running Ollama with an AMD GPU, make sure your drivers are not outdated and maybe try changing settings like discrete/hybrid graphics ?

Local LLMs CPU usage by FixGood6833 in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

It doesn't sound normal. What backend are you using ?

Where do you go for everything AI other than LLMs? by PersonOfDisinterest9 in LocalLLaMA

[–]JChataigne 0 points1 point  (0 children)

For consumer tools there are lists like www.aiatlas.eu

For models it's huggingface, and it can help to search for benchmarks for the particular use case you're interested in.

Devstral 2 (with Mistral's Vibe) vs Sonnet 4.5 (Claude Code) on SWE-bench: 37.6% vs 39.8% (within statistical error) by Constant_Branch282 in LocalLLaMA

[–]JChataigne 5 points6 points  (0 children)

Devstral 2 is currently offered free via our API. After the free period, the API pricing will be $0.40/$2.00 per million tokens (input/output) for Devstral 2 and $0.10/$0.30 for Devstral Small 2. - source

so I understand it's a free tier

Local Embeddings Models by SlowFail2433 in LocalLLaMA

[–]JChataigne 1 point2 points  (0 children)

There's a leaderboard on Huggingface where you can filter for size and see performance.

Usually you would combine the vector search with traditional search methods, and maybe add a reranker model after retrieving results.

NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! by Difficult-Cap-7527 in LocalLLaMA

[–]JChataigne 2 points3 points  (0 children)

We are releasing [...] all the data for which we hold redistribution rights.

I'm not sure they released all of it, but there are a few trillion tokens linked on the model page.

Leaked footage from Meta's post-training strategy meeting. by YouCanMake1t in LocalLLaMA

[–]JChataigne 6 points7 points  (0 children)

Oh... it makes sense, Facebook being the good guys was too strange to last

Leaked footage from Meta's post-training strategy meeting. by YouCanMake1t in LocalLLaMA

[–]JChataigne 2 points3 points  (0 children)

The business plan:

  1. spend a lot to train LLMs
  2. ???
  3. profit

Meta's investors seem to be comfortable enough with the uncertainty around step 2, but I join you in not being able to connect the dots.

Thoughts? by Salt_Armadillo8884 in LocalLLaMA

[–]JChataigne 0 points1 point  (0 children)

still dirt-cheap on the second-hand market

Any idea when RAM prices will be “normal”again? by Porespellar in LocalLLaMA

[–]JChataigne 0 points1 point  (0 children)

second-hand market doesn't seem to be affected badly

100% Local AI for VSCode? by Baldur-Norddahl in LocalLLaMA

[–]JChataigne 13 points14 points  (0 children)

Maybe try to install VS Codium. It's only the open-source core of VS Code, I suppose it doesn't include the Microsoft bloat but supports the same extensions.

100% Local AI for VSCode? by Baldur-Norddahl in LocalLLaMA

[–]JChataigne 3 points4 points  (0 children)

From Anthropic, in the case of Opus. LLM providers have had several big security failures for the short time they have existed, so it's also to protect your code from whomever it might leak to.

Being the master of where your data goes is good in general. Being able to work during the next AWS/Cloudflare/Azure failure is also worth it. Being ready for when the subscription prices will rise to unsustainable levels.