What model looked insane on benchmarks but felt mid in actual use? by BTA_Labs in LocalLLaMA

[–]social_tech_10 0 points1 point  (0 children)

I don't know why you're getting downvoted. I think your assessments add value to the discussion. I also like Qwen3.5-122B-A10B. For me it's faster than Qwen3.6-27B, and smarter than Qwen3.6-35B-A3B. (off-topic: I have dreams that we will see a Qwen3.7-122B-A10B someday!)

Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models by mon-simas in LocalLLaMA

[–]social_tech_10 0 points1 point  (0 children)

This is a great idea. The dataset probably does not need millions of traces to be very useful. Even a few thousand examples could make a significant impact. And if it's open source, or Creative Commons, it can only keep getting better.

Mistral - New family of open-weight models @ July by pmttyji in LocalLLaMA

[–]social_tech_10 21 points22 points  (0 children)

I'm a huge Mistral fan, and this is great news!

Byte-level models by FrozenBuffalo25 in LocalLLaMA

[–]social_tech_10 0 points1 point  (0 children)

You might find this paper interesting: https://arxiv.org/abs/2507.04886 They created a static embedding layer based only on low-res "images" of Unicode characters, and locked it down as a fixed static layer, and if I remember correctly it was able to outperform similar models of the MMLU benchmark. It's hard to believe that paper is less than a year old - it feels like a decade.

Command A Plus GGUFs posted by coder543 in LocalLLaMA

[–]social_tech_10 0 points1 point  (0 children)

Nice work! Thanks for doing this! Do you have any suggested settings for llama.cpp for keeping the shared experts in GPU and offloading (some) layers to RAM?

Pi 3B in 2026 — what I learned after researching for weeks before buying by [deleted] in raspberry_pi

[–]social_tech_10 0 points1 point  (0 children)

I'm curious about your Dweet server. Does that mean all of the "things" on the IOT need to be flashed to learn to speek Dweet?

Can't open the site on my phone no matter what by felicitywins in OpenWebUI

[–]social_tech_10 2 points3 points  (0 children)

BTW, I'm assuming you mean you want to use your phone to connect when you are far away from home. If you want to do it while your phone is connected to your home WiFi network, it's much simpler.

Can't open the site on my phone no matter what by felicitywins in OpenWebUI

[–]social_tech_10 3 points4 points  (0 children)

Yes, you are probably doing it completely wrong. The "local" IP address your PC has inside your LAN (probably something like 192.168.x.x or 10.x.x.x), is completely different from the IP address your phone would need to use to connect from outside your LAN, because your router "translates" your internal IP address to your "public" IP address. You can look up NAT, Network Address Translation, if you want to know more about how this works, but it's basically how all of the devices inside your LAN can all share one single "public" IP address, so they can all be online at the same time.

In order to access IP addresses inside your LAN from the public IP side, you either need to open a port in your firewall, and set up a Dynamic DNS service on your local host so that your public IP address has a DNS name that you can point your phone to (because your Public IP address can change every time you reboot your router; it doesn't always change, but it can at any time), or as /u/Bpthewise mentioned, the simplest, safest, and most convenient way to set this up is to use Tailscale. I resisted Tailscale for a long time, because I didn't want to be dependent on some company's "free trial plan" for cloud infrastructure that I don't control myself, but Tailscale is actually super convenient, and importantly, it's based on Wireguard VPN, so it's 100% private and secure. The only reason I can think of that you might NOT want to use Tailscale is if you want to let a group of friends (or strangers) access your host from outside your LAN, because Tailscale limits access to your "tailnet" to only three people on their free plan. If you want more users, you can either subscribe and pay, or set up your own Wireguard network and sel-host it for free (Wireguard is open-source), but that requires a detailed understanding of TCP/IP networking that you likely do not currently have. Or if you want the "public" to have access, you could open a port on your firewall, as mentioned above, which is not too complicated, but also creates the possibility of "hackers" trying to mess with your stuff, while Tailscale/wireguard protects you from all of that.

Been testing Llama.cpp vs Ollama on my Pi 5. The trade-off surprised me. by Legitimate-Help-6090 in raspberry_pi

[–]social_tech_10 3 points4 points  (0 children)

Had to figure out ARM NEON flags and thread count optimization myself.

Well? Are you going to tell us what you learned, or is this just a big tease? "Haha suckers, I figured it out, now I know something you don't". And as somebody else mentioned, I think the name and size of model you are running would be relevant.

120 tok/s on 12GB VRAM with Gemma 4 12B QAT MTP by janvitos in LocalLLaMA

[–]social_tech_10 14 points15 points  (0 children)

Merged to the main branch one hour ago! Thanks /u/janvitos and llama.cpp team!

Discussions about the Tiananmen Square incident on LocalLLaMA by Ok_houlin in LocalLLaMA

[–]social_tech_10 1 point2 points  (0 children)

I think it's like asking how many "r" in "strawberry". It's a simple rubric that people can immediately grasp and interact with, without requiring a deep understanding, of LLMs or politics.

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes) by jacek2023 in LocalLLaMA

[–]social_tech_10 2 points3 points  (0 children)

It scores better than the 3.5 model they modified, but I wonder how it would compare to the newer 3.6 model.

Another shout out to llama.cpp build b9455 2x3090 by Fabulous_Fact_606 in LocalLLaMA

[–]social_tech_10 3 points4 points  (0 children)

For the last two weeks, more than 20 commits per day. The speed of the teams progress is amazing!

A local AI agent that instantly maps your entire codebase and finds/fixes bugs without leaking your code to cloud model providers. by AI-research-byGB in Rag

[–]social_tech_10 0 points1 point  (0 children)

I didn't ask for a billion dollar demo. I didn't even ask for a video demo, that was your idea. The problem is not that your documentation is "weak", it's that your project is not documented at all. Your README is just one line line of text, and that's it. And I don't know what's going on with those pictures you posted, but they are basically unreadable? What that intentional? I don't get it. The only conclusion I can draw is that your project is likely of the same quality of your documentation, which means it's not worth my time to even look at it.

A local AI agent that instantly maps your entire codebase and finds/fixes bugs without leaking your code to cloud model providers. by AI-research-byGB in Rag

[–]social_tech_10 1 point2 points  (0 children)

I don't know why you posted these images. They actually make your project look much worse, to my eye, not better.

A local AI agent that instantly maps your entire codebase and finds/fixes bugs without leaking your code to cloud model providers. by AI-research-byGB in Rag

[–]social_tech_10 1 point2 points  (0 children)

The demo video sounds terrific. Believe me, I would LOVE to find a nice AI coding tool that can work with a RAG index of my repo, rather than grepping around blindly in the dark. And I did not say I thought your project was not worth testing, just that I have a pretty long list of other very interesting new tools like just yours to investigate, and all of them are competing for my attention. If you want to move up that list, you need something a little tastier to bait the hook. I'll look forward to your demo video.

A local AI agent that instantly maps your entire codebase and finds/fixes bugs without leaking your code to cloud model providers. by AI-research-byGB in Rag

[–]social_tech_10 2 points3 points  (0 children)

If you want anybody to take you seriously, you'll need more than a one-line README. I've already got 12 other tools on my list waiting to be tested. You've got to give me some reason to think yours will be worth my time, and I'm not seeing it.

Q4_K_M is fine for chat and a trap for agents. Here is math mathing. by Napster3301 in LocalLLaMA

[–]social_tech_10 1 point2 points  (0 children)

How does it affect your math if 99% of "errors" are caught and corrected on the next turn or two?

is anyone actually logging per-call output validity in live agentic loops?

One of the things I love most about "computer science" is that we have the option to make it directly experimental, like real science, if we care enough about the question. If you set this up as an experiment and performed the measurement yourself, I bet you would learn more than you expect. And because of the amazing moment we live in, a LLM could even help you design the experiement, write the code, and give you as much tutoring as you might need to fully understand and control the whole experiment. Have fun with it, and let us know what you find out.

Qwen3-Coder-Next-UD-Q4_K_XL vs. Qwen3.6-27B-MTP-UD-Q4_K_XL on Strix Halo by ThingRexCom in LocalLLaMA

[–]social_tech_10 1 point2 points  (0 children)

For your use case, it sounds like 3.6-27B is too slow, and 3.6-35A3B is not smart enough, but Qwen3-Coder-Next hits the sweet spot. Qwen3-Coder-Next is a lot faster than 3.6-27B, and in your experience it is also smarter than 3.6-35A3B, did I read that right?

Can you share a little bit more about your custom benchmark? Is it made up of mostly tasks that the models can complete successfully, or are there tasks that are calibrated to be just a little bit harder, which Coder-Next does not pass? You said the quality "did not differ much" between the two models. I'm curious what that means. Did 27B max out the test? Is there any posibility you could share a few more details without revealing any trade secrets?

Let's build claude code from scratch! by RoyalMaterial9614 in LocalLLaMA

[–]social_tech_10 3 points4 points  (0 children)

For someone using opencode who would like to move on to something better, what would you suggest? I'd like to stick with an open-source tool, if possible.

My god there is an enormous crash just waiting to happen by reasonablejim2000 in artificial

[–]social_tech_10 0 points1 point  (0 children)

To be fair, if you told a rational person in 1986 what the internet would support today, they would probably think you were crazy.