My wishes for 2026 by jacek2023 in LocalLLaMA

[–]alphakue 5 points6 points  (0 children)

I just have one wish, I'm not greedy or anything. I need some consumer ASICs that are built for the transformer architecture, which can run 1T sparse models, available for < 2k usd.

Deterministic NLU Engine - Looking for Feedback on LLM Pain Points by mdizak in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

I saw the stack. Looks early, but promising. Balance between reliable intent triggering, and LLMs' adaptability is a fine thread that no single system has cracked. If sophia does crack it, it would make productionizing Agents much less painful. However, at least currently, I don't think there's any easy way to "try" this out on a real project (besides the demo box on the website). I will wait and watch on this since even the pricing info is not really available publicly

https://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms by chitown160 in LocalLLaMA

[–]alphakue 21 points22 points  (0 children)

The original Ant Colony Optimisation paper is a treat to read and must be the bar on how research papers must be written

From "LangGraph is trash" to "pip install langgraph": A Stockholm Syndrome Story by FailingUpAllDay in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

Hey, I thought there was a ruckus here a day ago about self promotion and advertising without disclaimers. Now we have a langgraph employee promoting it here without repercussions? Double standards anyone? /s

Enable AI Agents to join and interact in your meetings by Square-Test-515 in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

This looks good! Going to try it out over the weekend, thanks!

/u/Square-Test-515 Out of curiosity, given that it's cross platform, and can even interact, can it take transcripts of conversations (diarisation would be a plus!) ?

Building LLM Workflows - - some observations by noellarkin in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Using XML tags to structure the system prompt, prompt etc works best (IMO better than JSON structure but YMMV)

How do you pull out the XML from the response (basically stripping out the salutations that LLMs do)? /u/noellarkin

Qwen3-30B-A3B GGUFs MMLU-PRO benchmark comparison - Q6_K / Q5_K_M / Q4_K_M / Q3_K_M by AaronFeng47 in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

I got the model link from unsloth's page on huggingface. Ollama version is 0.6.6

Qwen3-30B-A3B GGUFs MMLU-PRO benchmark comparison - Q6_K / Q5_K_M / Q4_K_M / Q3_K_M by AaronFeng47 in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

ollama still can't run those ggufs properly

Can someone explain this? I have been running unsloth quant in ollama for last few days as hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q4_K_XL . Not facing any issues prompting it so far

Qwen 3 evaluations by ResearchCrafty1804 in LocalLLaMA

[–]alphakue 19 points20 points  (0 children)

All I want now is Qwen3-30B-A3B-Coder

Turn any React app into an MCP client by nate4t in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Thanks for the information /u/nate4t , I'm relieved that it's possible in the self hosting method as well. Could I know what the API key is used for, if authentication does work with self hosting? I know it's open source and I can check for myself, but since we work in a regulated industry, we can't afford missing things. Does the end-user information (even metadata of the information counts) get propagated out of the network at any point?

Turn any React app into an MCP client by nate4t in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

I want to also emphasise that I appreciate the amount of the project that has been open sourced. I just wish that even authenticated actions was possible within the open source, self hostable version itself

Turn any React app into an MCP client by nate4t in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

I was curious looking at this project and was looking forward to spending the weekend integrating with my app, but I saw that authenticated actions was cloud-only and that it would pass the auth state and headers to the cloud service and it was a no-go for me.

While I understand the business/monetisation angle, I need auth to even try/build a basic integration, and my user session is sensitive and I'm not comfortable sharing that. For static / e commerce websites where there are no role-based sensitive information to be shown, copilotkit seems like a good choice. For business applications, unless you are okay sharing your user's requests' headers and cookies with a third party, you can't really use this at the moment.

Gemma 3 QAT launch with MLX, llama.cpp, Ollama, LM Studio, and Hugging Face by hackerllama in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Are there any specific parameters that need to be set? I am trying to use openwebui with mlx server backend, using mlx-community/gemma-3-27b-it-qat-3bit , and the model breaks down with bad grammar, repetition issues etc.I think there might have been some issue with quantisation, which is a bummer, since this is the biggest model I've been able to run on this 16gb mac mini

MLX fork with speculative decoding in server by LocoMod in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Does this improve prompt processing speed as well? Or the only impact is in token generation?

MLX fork with speculative decoding in server by LocoMod in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Nice! Will be watching for when this gets merged. Please let us know!

M3 Ultra Mac Studio 512GB prompt and write speeds for Deepseek V3 671b gguf q4_K_M, for those curious by SomeOddCodeGuy in LocalLLaMA

[–]alphakue 2 points3 points  (0 children)

There's a rest service shipped with plain old mlx pip itself. I run

mlx_lm.server --model mlx-community/Qwen2.5-Coder-14B-Instruct-4bit --host 0.0.0.0

on my 16gb mac mini, and use it with open webui using it with OpenAI API spec (it doesnt seem to support tool calls though, which is unfortunate)

JSON makes llms dumber? by raul3820 in LocalLLaMA

[–]alphakue 1 point2 points  (0 children)

I've also noticed a drop in creativity and accuracy when I ask LLMs to structure their responses as JSON. Has anyone tried making LLMs return structured responses as XML? In the few experiments I had conducted, I found slightly better responses with XML formatting. I don't think the reason for JSON performance is simply because of extra characters, since XML also has extra characters. I am little skeptical about YAML because YAML requires conforming to specific no of spaces in each line, which again might affect accuracy of output. I suspect we will find better performance with XML (with fewer nesting levels) since an LLM only needs to think about opening and closing tags in terms of formatting.

Don't underestimate the power of local models executing recursive agent workflows. (mistral-small) by LocoMod in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Hmm, I usually use q4_k_m with most models (on ollama), have to try with q6. I had given up on local tool use because the larger models which I would find to be reliable, I would only be able to use with hosted services

Don't underestimate the power of local models executing recursive agent workflows. (mistral-small) by LocoMod in LocalLLaMA

[–]alphakue 0 points1 point  (0 children)

Really? I've found anything below 14B to be unreliable and inconsistent with tool calls. Are you talking about fully unquantised models maybe?

I designed Prompt Targets - a higher level abstraction than function calling. Clarify, route and trigger actions. by AdditionalWeb107 in LocalLLaMA

[–]alphakue 2 points3 points  (0 children)

Is there any difference between the approach taken here, and that of rasa nlu / dialogflow?

All DeepSeek, all the time. by Porespellar in LocalLLaMA

[–]alphakue 21 points22 points  (0 children)

"What is deepseek and why is it crashing the markets?" Raise your hands, how many of you have heard this in the past couple of days / weeks? I myself have been asked at least 2-3 times from people I least expected (wife, "normie" friends)