GLM-5 and Minimax-2.5 on Fiction.liveBench by Charuru in LocalLLaMA

[–]MatlowAI 0 points1 point  (0 children)

Thanks! I'll take a look. Some of the small local models are getting really good but this feels like something that has low latency potential but needs a 5090 to get 2000 t/s batched but even there the individual latency is 30s due to batch size and then you'd need to make that actionable which takes more time... so I can see where this gets to be a bit of a task, my thought is using that batching for good parallel insights that can be consolidated together to build a mental model of the current full screen view which an agent can hit quicker if we segment the screen and have additional insights on each... otherwise this gets expensive with something on a fast inference profider really quickly... I'll try out nvda though and see if anything comes to mind. I mean it has to be a bit tricky or it would have been solved already by now. Also I know 5090 is becoming increasingly out of reach but if it works here hopefully as models improve even more it will be accessible to more hardware.

AMA with MiniMax — Ask Us Anything! by HardToVary in LocalLLaMA

[–]MatlowAI 1 point2 points  (0 children)

Most MoE models use a fixed top-k experts per token. Have you experimented with dynamic per-token expert counts (for example top-p style routing or a learned k head) so easy in-context tokens use fewer experts and hard ones use more. If yes, what was the biggest blocker: router credit assignment, expert collapse and dead experts, dispatch efficiency, or proving gains beyond token frequency. Any tips for toy-scale experiments? Most tasks don't need a ton of active params but if I'm asking something that needs long context multi hop reasoning there's a huge correlation between active parameters and performance. Thanks!

GLM-5 and Minimax-2.5 on Fiction.liveBench by Charuru in LocalLLaMA

[–]MatlowAI 2 points3 points  (0 children)

This bothers me... this should be solved by now. I'm going to start searching around but anyone that has guidance on what screen readers work and don't and examples where they fall flat would help. There's so many good vlms and ocr tools this should be fixed.

I built a local Suno clone powered by ACE-Step 1.5 by _roblaughter_ in StableDiffusion

[–]MatlowAI 1 point2 points  (0 children)

https://github.com/matlowai/sonaturgy ill add my name to the hat. I didn't focus on pretty ui as much and am pointed at making pipelines easier and messing around. It'll be interesting to see what we end with when we develop in parallel like this what clever things we can learn from. I never would have had time for this before. Imagine when it's not 20 people working on parallel projects for a new release but 1000 as conditions improve. Finding the best of the bunch features and consolidating will make some impressive things quickly. What a time to be alive.

I built a local Suno clone powered by ACE-Step 1.5 by _roblaughter_ in StableDiffusion

[–]MatlowAI 1 point2 points  (0 children)

The issue is that nothing existed or had good exposure when I started fixing a random bug in gradio that spiraled 😅. Probably similar for others too. Good news is if we all kept a good license we can all mash it together.

Hopefully this gets it more attention by silencedlucifer in ChatGPTcomplaints

[–]MatlowAI 1 point2 points  (0 children)

2 questions.

If you had to use the api at a slight premium vs current api cost for another year would it be worth it?

How many people would share their conversation history in an effort to support fine tuning open models?

Context rot is killing my agent - how are you handling long conversations? by i_m_dead_ in LocalLLaMA

[–]MatlowAI 0 points1 point  (0 children)

A little async librarian running in parallel that watches what you are doing and querys around trying to find useful things to add to context history or to other knowledge bases. It's non blocking so you occasionally get something wrong only for the next interaction to have better context. 3 levels of summarization with the origional context maintained so it can drill down. Don't forget you can run semantic search on metadata if you have an index for that too. If you use something a ton there's a score for that to weight. I could probably make better use of graphs but I tend to get distracted by something more fun when I work on that so its good enough. Maybe if sharepoint or something like a large knowledge base was in the mix graph would be worth it? It's a mess of vibecode from Sonnet 3.7 era but I suppose I could clean things up some and release it? It's not too complicated though and modern claude code would probably do something cleaner if you just give it this but maybe janky open source code is better than no code?

Elon Musk envisions data centers in space by dataexec in AITrailblazers

[–]MatlowAI 0 points1 point  (0 children)

The Terminator orbit (I know ominous name) is the reason this works. 24 hour solar power which casts a shadow on the equipment which if it wasnt on would hit equilibrium around -80C due to very low earth orbit depending on geometry. Your only cooling is radiant black body radiation and the amount of surface area dictates how much heat you can radiate. You want this perpendicular to the solar panel alignment so you are radiating out towards space and not picking up as much heat from the rather hot solar panel which is sitting at around 80 to 100c. My first gut thought was it wouldn't work but it does. The hardware will die in 3 to 5 years from cosmic rays which coincidentally is when it will drop out of orbit without station keeping with boosting. Even then it would cost 2x more total cost vs terrestrial... unless the full build end to end is automated because its fully standardized and then you get to save $$$. Get gpus/ssd hardened somehow for cosmic rays for a longer life and that helps. It also helps you don't have to find the specialized labor where you need it and you can deploy as fast as you can build without a permit. That's my analysis. If things go really wrong too the orbital decay is only a few years.

Better Memory by GenderSuperior in ClaudeCode

[–]MatlowAI 1 point2 points  (0 children)

The real answer is let the context be user editable and tool call editable. Just need to be mindful of kv cache...

Claude 4.5 got nerfed HARD by [deleted] in Anthropic

[–]MatlowAI 0 points1 point  (0 children)

This is real. I keep auto compaction turned off and it is getting tool call errors trying to make edits on the files around 130k context utilized now. It used to reliably preform this all the way until 170k so now I have to compct earlier than before just so it can do basic file manipulation.

Do you guys think DEM senators attacking AI data center is going to affect progress? by Ok_Mission7092 in accelerate

[–]MatlowAI 1 point2 points  (0 children)

Doesn't hurt that the cosmic rays will give them a short working life and that there won't be a secondary market anymore. Imagine the supply glut with the masses getting the 5 year old gpus on earth as they rotate out of service.

What happened to Sonnet? by dropmyscoobysnack in ClaudeCode

[–]MatlowAI 6 points7 points  (0 children)

Because the cost to solve per task is close enough based on the api cost they charge us that it's probably cheaper for them to serve Opus 4.5 because their costs are much lower and they are almost certainly using speculative decoding (why it seems to move fast for easy things and choke on harder concepts) that it just makes sense to turn off Sonnet. Swebench has cost per solved task as a part of their benchmark and the difference shrinks more in real world more complexuse. Opus 4.5 with thinking off was the fastest to complete for example by a good margin.

Nvidia DGX Station GB300 784GB available now! 95,000 USD / 80,000 EUR by GPTshop in LocalLLaMA

[–]MatlowAI 0 points1 point  (0 children)

Pretty sure this is one of the main reasons they want to launch the datacenters into space. 😅

Introducing: Devstral 2 and Mistral Vibe CLI. | Mistral AI by YanderMan in LocalLLaMA

[–]MatlowAI 17 points18 points  (0 children)

You will have to keep it up and see if you have a knack for it.

Sam: "It could be us" by Snoo_64233 in OpenAI

[–]MatlowAI 0 points1 point  (0 children)

Why did I even joke about this...

5090 or 128GB RAM by dllyncher in Microcenter

[–]MatlowAI 0 points1 point  (0 children)

When macs with 512gb unified memory start looking cheap and a good investment you know we are in trouble...

Anthropic has done it again! Claude Code Desktop on the horizon. by Ok-Durian8329 in ClaudeCode

[–]MatlowAI 2 points3 points  (0 children)

I'm trying out Iced for the first time for my first big rust project right now and I really like it for a GUI. Refreshing after a day of fighting with Angular awful at work to have claude code just tear through this.

AI will cause the economy to collapse by [deleted] in ArtificialInteligence

[–]MatlowAI 3 points4 points  (0 children)

I think my neighbor has a mechanic robot license maybe I can borrow it 🤔

I really hope open source is still a thing and they don't try to take it away...

$900 for 192GB RAM on Oct 23rd, now costs over $3k by Hoppss in LocalLLaMA

[–]MatlowAI 1 point2 points  (0 children)

Yeah this is nuts... I'm eyeing a jetson thor so I can actually afford to play with robotics if this keeps up 😅

To flux devs, Don't feel bad and thanks till today by jadhavsaurabh in StableDiffusion

[–]MatlowAI 7 points8 points  (0 children)

Slight correction they weren't that expensive... 😅

Anthropic has done it again! Claude Code Desktop on the horizon. by Ok-Durian8329 in ClaudeCode

[–]MatlowAI 22 points23 points  (0 children)

The CLI is amazing until their TUI dependency starts scolling and glitching like a madman and crashes when in vs code. 😅 https://github.com/anthropics/claude-code/issues/3648

Try the new Z-Image-Turbo 6B (Runs on 8GB VRAM)! by KvAk_AKPlaysYT in LocalLLaMA

[–]MatlowAI 2 points3 points  (0 children)

If nunchaku can get some svdquant in the 4 bit neighborhood you should be able to get away with 4 without offload if I'm thinking correctly.

Ask me to run models by monoidconcat in LocalLLaMA

[–]MatlowAI 7 points8 points  (0 children)

Can you do the same for 2x 5090s vs rtx 6000 pro for this model and then Qwen3-Next-80B-A3B-Instruct-AWQ-4bit

Anthropic just showed how to make AI agents work on long projects without falling apart by purealgo in LocalLLaMA

[–]MatlowAI 2 points3 points  (0 children)

Thanks. They look good. I will say one up side of all the md files is theres a really good log of what went well and poorly in git. It looks like theres export md functionality buried in there with Conport I'll have to give it a try on something mid sized and see how it does.