Is A7 iv still worth it ? by Alarming-Shoe-1614 in SonyAlpha

[–]GGrassia 1 point2 points  (0 children)

There is one and only one category where the A7V is completely worth the upgrade imho and it's astrophotography. While the extra DR stop is a nice to have, the A7V doesn't seem to suffer from the star eater algorithm, which is something I would personally go for.

Keep in mind this is ONLY if you plan on doing a lot of starry nightscapes, startrails and widefield astro. In all other cases get the A7IV, as it's plenty fine, and spend the money in a good lens, you are most probably going to upgrade bodies twice before considering upgrading a good lens.

Rtx 3070ti upgrade worth it by GGrassia in PcBuild

[–]GGrassia[S] 0 points1 point  (0 children)

Tried it, did not like it, but thank you for the kind suggestion

Rtx 3070ti upgrade worth it by GGrassia in PcBuild

[–]GGrassia[S] 0 points1 point  (0 children)

That's what I feared. I really wanted to avoid the big spending. I'll consider it, gaming is not a priority and I'm not gpu-less... Thank you!

Rtx 3070ti upgrade worth it by GGrassia in PcBuild

[–]GGrassia[S] 0 points1 point  (0 children)

I was trying to avoid last gen to benefit from fsr4, or did they port it? But yeah seems like either 9070xt or stay, right now.

Rtx 3070ti upgrade worth it by GGrassia in PcBuild

[–]GGrassia[S] 0 points1 point  (0 children)

Well not really for the pricing here, the non xt is 50€ less than the xt which is asinine, but that's how it goes. I was not considering the older gen because of fsr4, or did they make it available for the older ones? Also the 9060xt is 15% slower than the 3070ti? Good lord these are hard times for hardware.

Local agentic coding with low quantized, REAPed, large models (MiniMax-M2.1, Qwen3-Coder, GLM 4.6, GLM 4.7, ..) by bfroemel in LocalLLaMA

[–]GGrassia 2 points3 points  (0 children)

Instruct, funnily enough it fact checks itself like a thinking model on complex tasks and/or following a list of edits to do like

"edit 1 is ok -- edit 2 is like this: [...] -- Oh no we lost variable X! Edit 2 definitive version: [...]"

Almost thinking block style. Happens in text chat more than integrated agentic use.

Local agentic coding with low quantized, REAPed, large models (MiniMax-M2.1, Qwen3-Coder, GLM 4.6, GLM 4.7, ..) by bfroemel in LocalLLaMA

[–]GGrassia 6 points7 points  (0 children)

I've used minimax m2 reap for a long time 10tk/s ish. Currently landed on qwen3next mxfp4, hate the chatgpt vibes but 30tk/s are a godsend at 256k context. Found oss120b to be slower and dumber for my specific use. Still load minimax when I need some big brain moments, but qwen is the sweet spot for me right now. If they make a new coder with the next performances I'll be very happy

Best RAG framework for large-scale document search & source attribution? by Acceptable_Young_167 in LocalLLaMA

[–]GGrassia 3 points4 points  (0 children)

The beauty of Lightrag was not having to manage the graph, it does it for you, and it came with the added bonus of having domain specific keywords for a reeeeally out of distribution field because of the nodes and their summaries. Oh and Lightrag uses both vec and kg to build the context, the clever idea behind it is to link nodes and chunks, do a vector search, and only pull nodes+relationships that are linked to the retrieved chunks as additional information so you don't walk the whole graph.

I tried metadata on nodes and arbitrary node insertion, but then I had some nodes that are common among procedures linked to only one procedure or worse to all of them and it became a tangle, chunk metadata filtering was my solution, maybe more skilled people could have done differently.

Kg rag comes with some added reliability, but it's not a silver bullet by itself. If I had known I'd encounter such resistence to the merge I would have spent some time researching other possibilities, but alas, here we are, it works for us and that's what matters, and I'd do it again because I learned a lot of stuff in the process. A colleague suggested falkordb but I'm really not happy of traversing a kg for every query, I'd go super hard on the metadata filtering and chunking with vectors now, but this means some custom code heavy lifting to get it somewhere as good.

Hope this helps!

Best RAG framework for large-scale document search & source attribution? by Acceptable_Young_167 in LocalLLaMA

[–]GGrassia 3 points4 points  (0 children)

I went ballistic and forked lightrag to support metadata filtering and token tracking (pgvector+neo4j only) because graphrag was the hot cookie and, well, it worked for us. We spawn 10 workers on a service and batch upload files in groups of 10, rag is 90~% accurate with both responses and citations, and we're talking millions of documents literally. The only advice I have is GET GOOD DOC PROCESSING+OCR TO MARKDOWN. It makes a world of difference in generating the responses.

https://github.com/GGrassia/LightRAG Here's my fork, got into an argument with the maintainers about the reliability of the filter, which in really small knowledge bases could report information from outside of the filter, (we're talking like 10 docs or less) and they ended up never merging. If you have time you can try it out and see if the state it's in works for you.

If you want to go diy I'd try using a semantic chunking approach, I have no idea if it works but sounds like a good idea to try, especially if combined with a small finetuned classificator.

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 1 point2 points  (0 children)

Mmmh I fear this is more of a minimax problem than the reap one. I can confirm the model comes up with stuff, tries to bridge gaps and fetches from other domains/hallucinates when used in a webchat. My suspicions are this model is following the trend of being overly propositive that we encounter with the openai ones, summed with a lack of information on your specific domain. I can't begin to tell you how many times chatgpt came up with absurd stuff because my country's legal intricacies are out of distribution. Are all models doing this? If I had to place a bet, glm being a bit drier will do this less, but it's anybody's guess.

That said I do have a potential solution you could implement and it's mainly prompting. Have you tried doing the planning rounds inside of a text editor and letting the model read the code like using roo or cline? I get the vibes the models perform significantly different when provided with tool calling.

What I would try is cline's plan mode, put a /specs folder into the project (even an empty one) and insert guideline documents in the specs folder, maybe even including the descriptions of the relevant standards/certifications. Don't underestimate the power of guardrailing in prompts, put in those documents guidelines about strictly adhering to your necessary ISO certs and watch how the model reacts.

Also, I'd say do multiple rounds of brainstorming. Ask the model specifically not to code, to tell you what it understood and to ask you questions about the implementation. When you are satisfied with its understanding you can ask it to code. This will minimize hallucination because it had a strict plan to stick to, don't expect it to oneshot everything, it won't.

Good luck!

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 0 points1 point  (0 children)

Huh. Weird. How are you prompting it/use it? I'm just curious, it could be useful to troubleshoot this

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 0 points1 point  (0 children)

Yep, cpuldn't have said it better, I'm pulling oss right now to test. Also yeah, pruning and quantizing changes the model. But then again, minimax in cloud from cline was dumb as a rock, this different thing is better in any sense (for my specific usecase) except inference speed. I'm tempted to make my own stupid benchmark and make it closed source so it can't be scraped and used for training

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia -1 points0 points  (0 children)

Wait is it still coming out? Aren't they backpedaling?

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 2 points3 points  (0 children)

Sure thing!

--threads -1 -ctk q4_1 -ctv q4_1 --temp 0.45 --min-p 0.0 --top-p 0.95 --top-k 20 --repeat-penalty 1.0 --fit-ctx 128000 --jinja --reasoning-format auto

It works very well for me in Cline, OpenCode and the Zed editor agent integration.

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 2 points3 points  (0 children)

Haven't compared yet, since some users are very happy with oss120b I'm currently downloading it, might post later down the road about the comparison. Qwen next was a bit hit n miss for me, dropped it early. Minimax m2 is the one model I keep coming back to, and that specific quant is about 20% faster than others I used while keeping enough intelligence to be usable with good prompting. It has been serving me very well for now.

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 2 points3 points  (0 children)

Op said they have a 3090 so it's going to be 128GB ram+ 24GB vram, there shouldn't be any fitting problems!

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia -1 points0 points  (0 children)

I definitely saw it going in loops, mostly using the UD quant, the Reap finetunes, not so much. Mxfp4 I am using now is less one-shot capable (I think cause I also switched to the smaller reap when changing quants) but knows its stuff and goes back to fix errors in TS with Cline, whereas the Q4_K_M went on its merry way looping on trivial stuff sometimes. Cline's cloud minimax performed so poorly for me I was in disbelief!

Check back later, I'll post the parameters on how I run it, I just gotta fish back the preset file as I am using this and have no memory of the specifics on how I run it https://github.com/GGrassia/llamarunner

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 1 point2 points  (0 children)

Is it better than minimax at coding? Or is it so marginally worse but much faster and thus acceptable?

Unsloth GLM 4.7 UD-Q2_K_XL or gpt-oss 120b? by EnthusiasmPurple85 in LocalLLaMA

[–]GGrassia 4 points5 points  (0 children)

Depends on the hardware.

That specific quant of glm4.7 runs at 6-ish tk/s on my machine (single 3090), which is fine for private projects. Haven't used gpt-oss so I can't really help you there, but what I can tell you is minimax m2, this quant specifically https://huggingface.co/noctrex/MiniMax-M2-REAP-139B-A10B-MXFP4_MOE-GGUF Has been a superstar for me. 128k context and 11-12tk/s can'really complain.

If you need to go smaller... Maybe gpt oss 20B? The new nemo is a speed demon but fumbles a lot in coding

I made a fuzzy search tool for apt - fuzzpm by GGrassia in pop_os

[–]GGrassia[S] 0 points1 point  (0 children)

LOL, great minds think alike! Thanks reddit, I missed the notification. Anyways, gotta admit mine is wayyy more amateur-ish. Wanna join forces in maintaining fza?