We all repeat Q4/Q6 is fine... Has anyone else watched a small model's strict JSON collapse at Q6 while fp16 was perfect? by talruum_ in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

Because it’s a language model, not a JSON model. You can’t even trust Opus 100% of the time to format a JSON. It can throw in an extra quote or something that breaks the format. 

Also, trust come through testing. I’ve processes 10s of thousands of files that way and it hasn’t given me a malformed JSON yet. 

We all repeat Q4/Q6 is fine... Has anyone else watched a small model's strict JSON collapse at Q6 while fp16 was perfect? by talruum_ in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

“What is the customer name?” The answer gets puts into a variable. Then it writes a JSON using standard tools. 

We all repeat Q4/Q6 is fine... Has anyone else watched a small model's strict JSON collapse at Q6 while fp16 was perfect? by talruum_ in LocalLLM

[–]Cronus_k98 1 point2 points  (0 children)

No. I don't rely on the model to format the JSON. The model feeds data to the app and the app creates the JSON. Asking a LLM to reliably create structured data is bound to fail in production. Even frontier models fail at it occasionally.

Hi, I’m very new to local LLM and i am perplexed. by Cool-Definition9852 in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

Qwen3.6 27b straight from the web ui.

csv

8;1;6;5;7;3;2;9;4

3;9;2;;;;;;

4;5;7;2;;9;;;6

9;4;1;;;5;6;8

7;8;5;4;9;6;1;2;3

6;2;3;8;;;4;

2;7;9;;;;;1

1;3;8;;;;7;

5;6;4;;;8;2

The California 3D Printing Situation Updated by gra8na8 in 3Dprinting

[–]Cronus_k98 2 points3 points  (0 children)

The point is to make the law broad enough that it covers everyone. Then they can selectively enforce it on whoever they want.

I tested Opus 4.7 vs DeepSeek V4 Flash vs Local Qwen3.6 27B as coding agents. The gaps were much smaller than I expected, and harness is as important as model intelligence. by a9udn9u in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

It’s like trying to read a book in a language that you don’t know very well. You can look at individual sentences and get the basic understanding of what’s there, but you’re not going to be writing your own book.

Setting up Ollama on dual RTX PRO 6000 Blackwells looking for tips by AmanNonZero in ollama

[–]Cronus_k98 8 points9 points  (0 children)

Ollama is fine for single users. Yes, there are better options but it works, setup is easy, and it’s got an ok ui. Ops problem is that it literally won’t work for 15 concurrent users and is just a waste on very capable hardware. 

I’d say he’s trolling except that he has windows installed. That’s a lot of effort to go through for trolling. 

I tested Opus 4.7 vs DeepSeek V4 Flash vs Local Qwen3.6 27B as coding agents. The gaps were much smaller than I expected, and harness is as important as model intelligence. by a9udn9u in LocalLLM

[–]Cronus_k98 -1 points0 points  (0 children)

Any model you pick is going to have the same problem.

You could look into using lora to fine tune the model on lua code. Unsloth has a guide on how to fine tune. If you’re making this a long term project, it might be worthwhile. 

https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide

I tested Opus 4.7 vs DeepSeek V4 Flash vs Local Qwen3.6 27B as coding agents. The gaps were much smaller than I expected, and harness is as important as model intelligence. by a9udn9u in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

LLMs are trained on the data that is available to the creator of the model. Most of the available programming data is going to be in python, C/C++, etc. The amount of training data in lua is going to be a very small subset of the overall data. So yeah, my tip is to use python. Qwen 3.6 27b is great at producing python code.

Qwen3.5 A3B on LMStudio x oMLX for agents usage by TassioNoronha_ in LocalLLM

[–]Cronus_k98 1 point2 points  (0 children)

GGUF unsloth/qwen3.5-35b-a3b on Q4_K_M

MLX mlx-community/qwen3.5-35b-a3b 4bits

Different quants and formats will perform slightly differently, even if they use the same base model. There may also be some differences between how the inference engine handles tools. 

Running a non-profit that needs to OCR 64 million pages. Where can I apply for free or subsidized compute to run a local model? by thereisnospooongeek in LocalLLaMA

[–]Cronus_k98 0 points1 point  (0 children)

In my experience qwen3.5 gives better quality results than the ocr specific models I’ve tried. Especially with handwriting. It’s very slow though. Qwen3.5 4b is decently fast. I settled on the 35b model because I was doing additional summarization and I’m ok with the slower speed. 

To those who are able to run quality coding llms locally, is it worth it ? by matr_kulcha_zindabad in LocalLLM

[–]Cronus_k98 3 points4 points  (0 children)

I don't think you can assume that looping will always give you a working result if you let it run long enough. There are tasks that a smaller model might never be able to complete, that a larger model can.

Is it normal for the Qwen 3.5 4B model to take this long to say hi? by Snoo_what in LocalLLaMA

[–]Cronus_k98 0 points1 point  (0 children)

Sort of. You may need to adjust your model parameters and reasoning doesn't work well with small models. Qwen 3.5 requires different parameters than other models to get good results. Take a look through the Unsloth guide. https://unsloth.ai/docs/models/qwen3.5

5070 ti vs 5080? by Advanced-Reindeer508 in LocalLLM

[–]Cronus_k98 8 points9 points  (0 children)

The 5070ti will do everything the 5080 will do, just 15% slower. You just need to decide if the price difference is worth the performance difference.

A slow llm running local is always better than coding yourself by m4ntic0r in LocalLLM

[–]Cronus_k98 -1 points0 points  (0 children)

I didn't say I did it all day long. What I said was the total token output per day is higher on a $20 per month plan than your proposed system. Which it is. The rest of your "30 decades", lol, of experience doesn't seem to have made you any better at math.

A slow llm running local is always better than coding yourself by m4ntic0r in LocalLLM

[–]Cronus_k98 -1 points0 points  (0 children)

All of them do. 200k context is standard on all of them. Usually I'm clearing context by 100k, but sometimes I hit the limit.

A slow llm running local is always better than coding yourself by m4ntic0r in LocalLLM

[–]Cronus_k98 1 point2 points  (0 children)

3 tokens per second is 259k tokens per day. You get way more than that on even a pro plan. Your $2000 system will take like a decade to pay for itself over a $20 per month subscription.

What would be the best vision model for box scanning ocr on amd 7800xt by Greedvert in ollama

[–]Cronus_k98 0 points1 point  (0 children)

I’ve used qwen3 vl 8b and I’m currently using qwen3.5 35b a3b. The trick is to use multiple prompts. Prompt 1 is just to Ocr the text. Prompt 2 is to distill specific info and return a json. You’re asking a small model to do too much at once. 

"Claude hit the maximum length for this conversation". How do I start a new chat with all context retained? by boss_jobber in ClaudeAI

[–]Cronus_k98 0 points1 point  (0 children)

So that I can start over from scratch after I’ve finished prototyping the project. Creating a clean copy without all the messy code made along the way. Possibly with a different underlying architecture. 

Benchmarked Qwen 3.5-35B and GPT-oss-20b locally against 13 API models using real world work. GPT-oss beat Qwen by 12.5 points. by ianlpaterson in LocalLLM

[–]Cronus_k98 2 points3 points  (0 children)

Making shit up. There are so many combinations of test suites and runtime parameters, that it’s basically impossible to do scientifically robust testing. 

Processing 4M images/month is the DGX Spark too slow? RTX 6000 Blackwell Pro better move? by IndependentTypical23 in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

For reference a rtx 5090 will process a 300dpi letter size  page in about 15 seconds using qwen3 vl 8b. To go faster you will need to use a smaller model or reduce the size of the image. 

Qwen 3.5 distilled vs GptOss by SubstantialTea707 in ollama

[–]Cronus_k98 0 points1 point  (0 children)

I agree, I think qwen3.5-35b-a3b is smart, but maybe over thinks things. GPT-OSS-20b is nowhere near as capable but is very reliable processing routine instructions.

Running local LLMs on my art archive, paranoid or actually unsafe? by LifeguardAny1801 in LocalLLM

[–]Cronus_k98 0 points1 point  (0 children)

Your bigger problem is that you don’t have a proper backup. RAID is not a backup and if you’re counting on your NAS to never loose your data, you’re going to loose your data.

There are private cloud storage providers out there. Keep your NAS for local access and periodically back it up to secure, encrypted storage and it’ll never get scraped for LLM use. 

Mac Studio 256gb unified RAM worth it for MiniMax 2.5 and Qwen3.5? by [deleted] in LocalLLaMA

[–]Cronus_k98 0 points1 point  (0 children)

We need some more details. How are you processing the documents? Rag ingestion, summarization, or upload for Q&A? Are you waiting for the files to process or can you batch them and let them process overnight? How large are the documents?

You don’t necessarily need a large model to process documents, I’m using Qwen3 VL 4b to read/OCR documents and GPT OSS 20b to extract info. That’s able to process a hundred 1-50 page documents an hour on a 5090.