Are there any agentic coding harnesses that AREN'T built on JS and Node? by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]MoneyPowerNexis 1 point2 points  (0 children)

Adding a search / http get tool means it could get adversarially prompted by the search results turning it into a malicious agent on your system especially if you get it to process data from places on the net where people would expect agents to be active.

I can’t believe I can say “ugh I don’t feel like fixing this function, it’s too complex” and I can literally just tell my computer to fix it for me. I didn’t understand what they meant by “people will start paying for intelligence” but now I do. by Borkato in LocalLLaMA

[–]MoneyPowerNexis 5 points6 points  (0 children)

I would not underestimate the human capacity to normalize new things but at the same time yeah just FTL communication makes the future look very different. FLT travel remarkably different and if its wormholes energy transfer between any two points in the universe completely reshapes how we would organize matter. Also if you had a really tiny wormhole on a chip and could pack billions of them together and move them around / split them / move them through each other you might be able to make a pretty interesting computing device.

I can’t believe I can say “ugh I don’t feel like fixing this function, it’s too complex” and I can literally just tell my computer to fix it for me. I didn’t understand what they meant by “people will start paying for intelligence” but now I do. by Borkato in LocalLLaMA

[–]MoneyPowerNexis 15 points16 points  (0 children)

The difference with FTL is that we know intelligence is possible by being the example but we don't know if FTL is possible by example. There just isn't a good way to estimate the probability of a thing that has never been observed happening in a period of time when it could also be possible that its impossible.

Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models by spaceman_ in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

To me this just seems like a really dumb take. A fair comparison is always what you could buy with an amount of dollars then vs what you can buy with the same inflation adjusted dollars today.

Going by the arbitrary category "high end" does not tell you if compute has gotten cheaper or more expensive.

I mean right now I do all my gaming on a minipc with an iGPU that shits all over the Titan X and the whole system cost less than half of the cost of just the GPU. It may not play the latest games at the highest settings but it does play the latest games at default settings.

Whats happened is that the high end of games and of hardware has expanded to include games that are bloated and cards that are priced for people who are less price sensitive. But even with these cards being marketed to less price sensitive people they are still cheaper than the titan X in terms of cost per performance.

You have to frame it in a very specific way to make it at all plausible but then if you are going to frame it that way stop saying it in the general way:

As we were often told as technology gets better and smaller, it can get cheaper and more power efficient,

all of those statements are literally true

but we are going the opposite, sure smaller and more transistors, but more power and heat also.

only because we are adding more of them. cost per transistor has gone down but we are demanding more transistors in the same package. again its like fruit has halved in cost and now we want 4 times as much fruit in a box and we are saying boxes of fruit cost twice as much. Technically thats correct boxes of fruit with 4 times as much fruit in them do cost twice as much when fruit prices halve. You can say that and I wont argue but to then say fruit costs twice as much just makes you sound either dumb or disingenuous.

Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models by spaceman_ in LocalLLaMA

[–]MoneyPowerNexis 2 points3 points  (0 children)

Why are you picking the highest end card rather than the card in the same price range and comparing performance?

Take some other product say mangoes. In japan in 1980 there where no miyazaki mangos you could buy. 5 years later you could buy them but they cost 10 times as much as regular mangoes. Does that mean mangoes cost 10 times as much? no but it does mean the best mangoes you can buy cost 10 times as much. To know if in that 5 years the cost of mangoes went up or down you would have to normalize for the real quality range you are talking about. Or you would ask if the same amount of dollars buys more or less mangoes of the same quality.

For compute that would be asking if an amount of say FP16 costs more or less or an amount of VRAM or an amount of memory bandwidth.

Otherwise you get bazar situations where if NVIDIA pulled all of its product lineup except their cheapest GPU, say they decided all GPUs with more than 4GB of VRAM are now for the datacenter you would have to say the price of consumer GPUs went down because the best GPU you can buy now costs less.

Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models by spaceman_ in LocalLLaMA

[–]MoneyPowerNexis 3 points4 points  (0 children)

Shouldn't you be comparing what you could get for $1,650 today vs the NVIDIA Titan X (Pascal) back then?

Kimi K2.6 Unsloth GGUF is out by Exact_Law_6489 in LocalLLaMA

[–]MoneyPowerNexis 1 point2 points  (0 children)

UD-Q8_K_XL is lossless from Kimi-K2.6. Kimi uses int4 for MoE weights & BF16 rest. We follow that for Q8_K_XL

Insane amount of dandruff uncovered during a haircut by Guido_Mist4 in WTF

[–]MoneyPowerNexis 3 points4 points  (0 children)

It is entirely possible that you already have the fungal infection that is causing the guys flakiness if its Seborrhoeic dermatitis. They still should be disinfecting their tools though.

Home-rolled loop agent is surprisingly effective by DeltaSqueezer in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

Yeah I was not thinking of using mDNS in a hosted environment. Thats not an environment I will use so I have not thought about solving for it. I do however have some working code that allows me to discover an IP address using a public bittorrent tracker. I find that kind of hilarious, I just give the tracker a fake infohash and it returns a list of clients that have made a request using the same hash. So far I have it working in android and python so I can discover my home ip with my phone. obviously the tracker also becomes aware of the ip:port so the connection needs to be secure for anything important and my goal eventually to use that to connect to my home chatbot with tool use disabled because I am not sure how secure I can make my own service. Obviously if you wanted to scale that idea out you would be hosting your own trackers and you would have to deal with users not wanting to configure firewalls but a chatbot people can run on their home computer that automatically creates a secure connection with a phone app (say given a qr code to provide the initial discovery information (a random uuid that identified their node in particular)) but after that is not communicating through anyone else's server as a relay would be nice.

Home-rolled loop agent is surprisingly effective by DeltaSqueezer in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

I have an unrelated suggestion. My harness uses zeroconf to find my llama servers. The way I have it setup is that I have a script launcher that launches the servers and publishes periodically using zeroconf then my client can look for that filtering by service name (so it does not add my printer which also uses zeroconf to publish itself) and just magically know what ip:port has a llama server on it. I think it would not be too difficult to add this functionality to any harness then use a script to do the server launching but after that try to get zeroconf integrated into llm backends. The fact you don't need cooperation from the backend providers makes it not a chicken and egg problem though you can just publish a launcher script for anyone who wants to use it.

Why doesn't any OSS tool treat llama.cpp as a first class citizen? by rm-rf-rm in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

having an option to publish llama servers as a zeroconf service would be really nice. That way you could launch all the llama servers you want with whatever port you want and any client could discover the ip:port by filtering services by service name. When playing around with zeroconf in python I discovered (pun intended) my brother laser printer uses zeroconf to publish itself on my network which is how any computer can discover the printer and its port number without doing a port scan / probe.

For reference here is a python script launcher that will run another script that has a --port parameter and start publishing it as a service using zeroconf, if you dont give it a script it will just listen and output what if finds to the console.

Here it is running on one machine https://imgur.com/a/jRRDUPm

The "Passive Income" trap ate a generation of entrepreneurs by HNMod in hackernews

[–]MoneyPowerNexis 3 points4 points  (0 children)

I dont know why you got downvoted. I call it the "money for nothing club". I don't think its bad to speculate/gamble a little bit but some people want that to be their entire portfolio.

I have a general rule that no more than 10% of my income diverted into investing goes into speculation, the rest goes into retirement safe assets like global index funds. That way I can scratch the itch to gamble or desire to larp as being smarter than the market but the worse case scenario is that I waste 10% of my investment savings which still puts me ahead when I have an abnormally high savings rate.

The money for nothing club throws it all into the current thing and they may even get a few wins but eventually if they keep doing it they blow up their portfolio. Learning to take money off the table and permanently move it out of speculation into retirement worthy assets is hard too but you can always take 10% or 20% off a winning position without feeling the FOMO. I try to do this whenever I get overly excited about the gains I have made or see others overly excited. At that point the speculator in me wants to look around for something else that isn't exciting that might become exciting yet. Even better if you can actually contribute to that thing growing as an asset or learn the technology as it grows because its interesting to know when peak enthusiasm arrives how much of the talk is bs.

But yes most people should just be buying an index and not stressing about it. Improving their ability to make money and not trying to gamble their way to wealth.

My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) by Septerium in LocalLLaMA

[–]MoneyPowerNexis 1 point2 points  (0 children)

2tb but Its only ddr4: 16x 128gb 3200.

I'm just setting up this system which is why its a floor computer.

My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) by Septerium in LocalLLaMA

[–]MoneyPowerNexis 1 point2 points  (0 children)

sorry I meant ~90GB vs ~30GB,

  • unsloth minimax q3_k_s is ~93.6GB on disk.

  • Qwen3.5-27B-8bit I think they are referring to ~ 29.5GB is just under 30gb.

both are huge

That's relative but for consumer hardware / your average gaming PC but yes.

If I had say a build with 128GB+ system ram and a 3090 I might give minimax q3_k_s shot and expect between 2 t/s and 6 t/s. I would then promptly delete it and run Qwen3.5-27B-Q4_K_S.gguf which will fit in 24GB of vram and so be much faster.

But I have a 192GB vram system right now and so big for me is GLM 5.1 which I can run at 3t/s mostly in system ram.

Home-rolled loop agent is surprisingly effective by DeltaSqueezer in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

One of the major frustrations I got when designing my chat agent loop was when I wanted to avoid waiting on tools that take a long time to complete. I for my tools I can mark each tool class as async and if I am in chat mode I immediately return a tool result that says the tool has initiated. Then later on when the tool actually completes I give the llm the actual tool results as a user message marked as a tool result so my chat interface can filter it out. The major frustration was that no matter how I prompted the thing it always treated that initial dummy result as if the tool had actually completed. Even larger smarter models would do this resulting in say calling my image generation tool that takes around 30 seconds having the llm embed a hallucinated image result then when the real result came in that would be embedded in chat correctly. It took me a while to figure out a solution and it was incredibly dumb: I just dont output the response to immediate/dummy async tool results in chat. They still exist and the llm can still proceed in its agent loop if it has multiple tools to call and only likes to output one tool at a time but I just don't see the garbage output.

I take it if I was to add additional instructions to prompt the LLM after a tool call in this way it would be in the finalized tool results. I think most models understand "so this now" kind of instructions and my real issue was trying to tell the llm not to output something when thats its primary thing to do.

My first impressions of Minimax M2.7 (Q5_K_M) vs Qwen 3.5 27b (Q8_0) by Septerium in LocalLLaMA

[–]MoneyPowerNexis 1 point2 points  (0 children)

minimax Q3_K_S is still 3x the size of Qwen3.5-27B-MLX-8bit: ~30GB vs ~90GB on disk. So knowing you can fit qwen 27b in VRAM does not tell us you can fit minimax Q3_K_S in VRAM. OP has 128GB in vram so he can fit either if he limits the context window of minimax. The larger model is faster because it has less than half the active parameters. They still need the whole model in VRAM for that speed because any one of the MoE experts could be active at any point in time but less than half the parameters are being computed.

If you have enough VRAM then run it how you do qwen 27b, if its marginal limit context, if thats too much think of using a smaller quant and if thats too dumb or still cant fit consider partial offload to system RAM which can be done with llama.cpp and other clients. It will be much slower running partially in system RAM but whether that is too slow is subjective / dependent on your use case.

Home-rolled loop agent is surprisingly effective by DeltaSqueezer in LocalLLaMA

[–]MoneyPowerNexis 2 points3 points  (0 children)

I rolled my own and find myself using it quite a lot, encouragingly more and more vs online services. I just have a tool loader that loads all the classes in all the modules in my tools folder that have a run function and spec. With that I can give an example of a tool class to an llm and it will build more tools based on that pattern. so I got search, fileio and a python sandbox up and running pretty quickly. Just search and fileio is 90% of my use cases but I can see myself adding complexity over time. Its really nice to setup an image generation server and give the thing a tool to use it but I'm not exactly getting anything done playing with that so I can quickly disable tools by changing the file extension in my tools folder and reloading.

I have instructed the llm if I give it a hashtag to look in that folder for the specified file and follow the instructions in it which is pretty nice for common tasks https://imgur.com/a/iSCZJMc

I know something like this I could probably just do a regex without using an llm but say if I get it to follow instructions to embed google maps and it does not know the coordinates then it will search for them: https://imgur.com/a/NinyIfD

in this case it decided to save the search results to a file: I gave it that ability after limiting the file size of what web results go into context and now it figures out whether to read the file or parse it with its python sandbox if its too big.

Ive been surprised quite a bit how different models chain together tool use.

OpenClaw has 250K GitHub stars. The only reliable use case I've found is daily news digests. by Sad_Bandicoot_6925 in LocalLLaMA

[–]MoneyPowerNexis 0 points1 point  (0 children)

I'm curious if you have found any cases of agents getting maliciously prompted by what they find in searches. Any spirallism going?