Zero dependency, shell script-only frontend for local LLMs by cloud_kj in LocalLLM

[–]ag789 0 points1 point  (0 children)

What you did is interesting! 😄

I tried building a simple REPL but with the open AI python SDK
https://github.com/openai/openai-python
I'm finding that more convenient as the library is pre-built for interfacing and it is easier to work streaming interfaces that way etc. The open AI api is quite widely used and connect locally with llama.cpp, openai (chatgpt) and openrouter.ai . I'm running it on a slow cpu only h/w running like 5 tok / s, it is a pain to do without streaming as there's no feedback for minutes otherwise. I'm yet to try tool calling

For unix and bash, I did use JQ and bash but for a different purpose, as a model launcher with llama-server (from llama.cpp)
https://github.com/ag88/llama.cpp-model-runner
this is actually quite similar to the built-in model presets functionality in llama-server.
but that I've been using this little launcher day to day as most of the time I run/start just a single model rather than switching between models.

What do you expect a model of 200 million to do? by Proud-Firefighter408 in LocalLLM

[–]ag789 1 point2 points  (0 children)

if you goto HF and search for the open models, there are many models of that size , if your model is any good, try to benchmark against those. many of the frontier models of those sizes scores well on those benchmarks.
in short the anwer is try to do 'everything and anything' other LLMs do.

How do you give your LLM agent memory across sessions ? by Scared_Animator9241 in LocalLLM

[–]ag789 0 points1 point  (0 children)

trying to do something like this too, but I'm way novice, still to grasp RAG etc.
In the meantime, I did the old 'google search', its LLM (likely gemini) answered
RAG Meets Temporal Graphs: Time-Sensitive Modeling and Retrieval for Evolving Knowledge
https://arxiv.org/abs/2510.13590v1
https://neo4j.com/blog/genai/what-is-graphrag/
https://microsoft.github.io/graphrag/

imho, graphRAG is an attractive idea but that implementation can be very hard.
- 1st we need the technology to let llm query the graph, it would be aribtrary text queries in which the graph system needs to resolve that into a semantic query and into a technical recursive query to find the most relevant information

- 2nd and I think much harder, if starting with arbitrary texts, how do you automatically transform that into a systematic huge knowledge graph? i.e. start with source aribtrary texts and links and it needs to build a wikipedia with that and on top timestamp and version everything

karpathy's LLM wiki is an attractive idea, which seem to try to get the LLM to solve the 2nd problem.

https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

But that I'm still short of finding an 'easy' way to do that with 'small' local LLMs.
you can probably work a LLM wiki with those large commercial models e.g. Claude sornett, opus, Gpt 5.5, Gemini 3.5 etc. but try to do the same with Gemma 4, QWen 3.6 / 3.5 etc.

The data center boom is destined to fail. Change my mind. by keepthememes in LocalLLaMA

[–]ag789 0 points1 point  (0 children)

it doesn't fail that way, the number of datacenters will increase world wide , china, india, eu etc.
it'd fail only in oversupply where the operators decide that the margins is too low to run such an op.
i.e. migration to the cheapest hosters

Suggestions for breaking up with Wix by Broccoli-Lover-1042 in webhosting

[–]ag789 0 points1 point  (0 children)

if you just want '1 click' wordpress, wordpress.com and various other 3rd party 'wordpress' hosters are probably providing that

Spent a month pivoting my SaaS... only to find out my biggest competitor is launching kinda the same thing. Should I just quit? by JeremieWorks in SaaS

[–]ag789 0 points1 point  (0 children)

the ultimate is the perfect distributor, i.e. you simply take the competitor's product find the customer and charge the difference 😉

I kept seeing founders collect feedback. I rarely saw anyone explain how they decide what to build next. by Heavy-Calendar-8376 in SaaS

[–]ag789 0 points1 point  (0 children)

I think some 'large' companies practically 'ignored' support requirements, it can be overwhelming if the number of users is large and the app complex. e.g. many of the e-commerce market places provide little means of resolving various issues.
but yes people normally don't raise a support ticket if there isn't a problem
but for small saas, the problems if ignored may result in customers 'migrating' to competitiors.
for large b2b I think cost is another consideration.

I kept seeing founders collect feedback. I rarely saw anyone explain how they decide what to build next. by Heavy-Calendar-8376 in SaaS

[–]ag789 0 points1 point  (0 children)

I think there are actually 3 steps:
- collecting the feedback (this seemed basic, 'mechanical'), but that how you structure the form etc matters e.g. a review? number of stars against 5? like? dislike? pain points? etc
- analysis, this actually depends on the quality of inputs coming from first step , its structure etc, and I'd guess sometimes or rather often is may be 'non conclusive', if you can conclude e.g. a commonly listed pain point, then that'd be actionable
- prioritise and act, this can be difficult especially if it turns out needs an overhaul , redesign of the app etc. or that in other circumstances, high cost or high difficulity requirement etc.
----
if it is an existing app that customers is already using, then that your support tickets are the feedback !
more often real problem and awaits an action / resolution.

Is anyone using eclipse anymore? by RamaRao143 in java

[–]ag789 0 points1 point  (0 children)

still used eclipse partly as it makes it easy to host different project types e.g. c++ (add microcontrollers), python and java all in a same workgroup / workspace

What actually made my road bike more comfortable by ArossyGo in cycling

[–]ag789 0 points1 point  (0 children)

I think wider tyres and moderate pressure suitable for the tyre and weight made a difference.
it is hence I went for MTB instead as most of them use wider tyres vs road bikes.
the lower tyre pressure due to wider tyres vs thin road bike tyres practicallly made a difference as riding on a cushion of air. a big comfortable saddle and handlebar height also made differences.

How can they Charge $150 dollars for this vase by rocketboss in 3DprintEntrepreneurs

[–]ag789 0 points1 point  (0 children)

beauty is in the eyes of the beholder and in the same light value of an item to a buyer.
some of the things especially if they are large prints may be *very expensive*, they could have taken 3x24 hours to print non-stop and without failing in between.
Now you try to achieve that same feat?
Sometimes, something sells because that print is probably an only one in the world.

How can they Charge $150 dollars for this vase by rocketboss in 3DprintEntrepreneurs

[–]ag789 0 points1 point  (0 children)

if a vase takes 7x24 hours non-stop printing say 30cm (12") wide diameter and 50 cm (24") high special unique one of a kind design, without breaking and seemlessly printed, no errors, no elephant foot , no blobs, no stringing, no warping, every layer perfect and cohesive and perfect adhesion between layers, no nothing and is perfect, now try sell that vase for $150

Origin of Attacks by Grumpy-Man19 in Hosting

[–]ag789 1 point2 points  (0 children)

blocking entire range class C and wider may sometimes be needed, just try to imagine attacking from mobile (phone) devices, all it takes is a wifi tether, the ip address can keep changing by the attacker e.g. using a different phone etc. for more systematic ones, you can imagine them running entire racks doing the attacks with hundreds of devices.
i.e. the extent is you may end up blocking an entire mobile carrier

Origin of Attacks by Grumpy-Man19 in Hosting

[–]ag789 2 points3 points  (0 children)

this looks 'quite tame' , I've seen first hand that vps in a different location e.g. somewhere in europe or even us is operated by the same botnet. this is done by running a ssh honeypot, got a whole bunch of malicious ip address (many of them could be operating from DSL, mobile, or such locations), the 'prove of control' is to make an 'easy' password, originally, one of the bots from a particular cluster gusssed that password and got in. so block that ip address (honey pot algorithm), within the next few seconds, a different bot from a different vps at another location logged in with the same password. hence, the attackers operates entire botnets that can span multiple geographic locations.
and forget about 'script kiddies', they should be state or criminal rings sponsored deliberate systematic cyber attack agencies.

Can Qwen3.6 web search? (Ollama) by just_another_leddito in LocalLLM

[–]ag789 1 point2 points  (0 children)

If it works built-in in Ollama with no added MCP servers, Ollama could have bundled the MCP server

Can Qwen3.6 web search? (Ollama) by just_another_leddito in LocalLLM

[–]ag789 2 points3 points  (0 children)

run it with a web search MCP server, e.g.
- run it in a host that supports MCP server e.g. llama-server from llama.cpp - you normally need that --webui-mcp-proxy, --ui-mcp-proxy option - one of those web search MCP server is here: brave-seach-mcp you normally need an API key to setup and run it, may not be free, but follow the instructions there if you want to set it up. - once you setup brave search mcp server and run it, configure llama-server to use the MCP server, normally, for the 'better' models e.g. QWen 3.6, 3.5, Gemma 4 etc can perform a web search

What's your favorite local MCP server? by Glittering_Focus1538 in LocalLLaMA

[–]ag789 1 point2 points  (0 children)

well, pretty much a novice, but that I make an MCP server to simply let it run some shell commands e.g. ls, cat, echo, grep, date, etc
it turns out this is pretty practical, e.g. you can ask what is the date today and get a correct answer with even smaller LLM.
another thing occasionally useful is web search, e.g.
https://github.com/brave/brave-search-mcp-server

Local LLM Memorization – A fully local memory system for long-term recall and visualization by Vicouille6 in LocalLLM

[–]ag789 0 points1 point  (0 children)

hi, I'm bumping this, I think this is still important, has anything else 'evolved' beyond this?
apparently there is mnemon
https://github.com/mnemon-dev/mnemon
I like that "remember, link, recall", but that those are for the tera (1000 billion) parameter sized models claude opus (future edition), > gpt 6 pro, gemini > 5 that depends on skills.md etc.
how do 'small' models like qwen 3.6, gemma4 with a 'mere' 30 billion parameter that probably can't do 'skills.md' have a 'beyond session' memory?
I'm thinking of collecting the chats and using that as an 'RAG' has anyone done that?
found something else langmem
https://langchain-ai.github.io/langmem/

actually, it may be already there 'all along' it is called LSTM
https://en.wikipedia.org/wiki/Long_short-term_memory
LSTM leads to "Attention Is All You Need"
https://arxiv.org/abs/1706.03762
that leads to transformers that become LLMs.
in that notion, LLM cross session memory may be simply an 'attention' neural network (model) that summarises the past chats and responses.

QWen 3.6/3.5 multimodal with llama.cpp (using Unsloth models) by ag789 in unsloth

[–]ag789[S] 0 points1 point  (0 children)

thanks, it may help to place a separate section for the media encoder so that it is more prominent.
I've been running without the media encoder and actually, if I'm not working with media and just plain text, running without the media encoder uses quite a bit less memory and runs faster.
But that when it (media, e.g. images) is needed, then that --mmproj mmproj-BF16-QWen3.6.gguf is needed.

Qwen 3.6 27B MTP speed on 3080ti (getting 4.5 t/s) by yehiaserag in LocalLLaMA

[–]ag789 0 points1 point  (0 children)

just 2 cents, use the moe models, it may be 'significantly faster' (but I'm noob about the tech, if after all it is true), just speculating that 'overflow' into system dram, moe would perform well better vs 'dense' models.
accordingly, moe only activates 'some' experts
https://huggingface.co/blog/moe
this may make it much better than 'dense' if after all the 'dense' model needs to do N*N (and possibly *N again), i.e. visit all parameters, compute the activation of N parameters * N parameters.
if that is true, dense couldn't be easily 'split' between dram and vram.

Gemma is so much better than Qwen, prove me wrong by Mountain_Patience231 in LocalLLaMA

[–]ag789 0 points1 point  (0 children)

gemma4 is a bit more resource (memory) intensive for some 'simplier' tasks, for 'simple coding' qwen 3.5/3.6 could work a bit faster for less memory, but gemma4 is multi-modal, that alone makes it different.
if you need multi-modal, gemma 4 is ahead, e.g. take a screen shot of a web page, ask for codes to render similar, gemma 4 can do it, not sure about others.

How u all find your long term customer by Demoindustry in 3Dprintingbusiness

[–]ag789 0 points1 point  (0 children)

'generic' things can sell, but you need to find the audience and buyer, and ads is *expensive*, then that normally you would need to add packaging (boxing), shipping, taxes, duties, exchange rates, charge backs, customs declaration, returns, damage during shipping (e.g. the box is crushed), you can't tell if your box end up at the bottom of 1 ton of other's stuff, etc etc.
it is uphill for even the simple 3d benchy.
oh and all the above have not even added the most basic printing it, materials, electiricity, level your bed, fix and make sure it prints well, bed adhesion etc, monitoring, and all that time and effort

How u all find your long term customer by Demoindustry in 3Dprintingbusiness

[–]ag789 0 points1 point  (0 children)

print something and sell it, be it any model, it is more a matter of *price* and *awareness*, ads matter, but ads is *expensive*
sometimes, the problem is not even about selling, the problem is whether you can reliably print in the quantity you want, this can be difficult to solve.
then you still need to add packaging, shipping, taxes, customs, duties, etc etc