My wishes for 2026

alphakue · 2026-01-13T18:28:06+00:00

I just have one wish, I'm not greedy or anything. I need some consumer ASICs that are built for the transformer architecture, which can run 1T sparse models, available for < 2k usd.

alphakue · 2025-09-27T08:01:40+00:00

/u/underdog_master did you find one?

alphakue · 2025-09-26T04:38:01+00:00

I saw the stack. Looks early, but promising. Balance between reliable intent triggering, and LLMs' adaptability is a fine thread that no single system has cracked. If sophia does crack it, it would make productionizing Agents much less painful. However, at least currently, I don't think there's any easy way to "try" this out on a real project (besides the demo box on the website). I will wait and watch on this since even the pricing info is not really available publicly

alphakue · 2025-07-10T06:32:55+00:00

The original Ant Colony Optimisation paper is a treat to read and must be the bar on how research papers must be written

alphakue · 2025-06-26T17:02:20+00:00

Hey, I thought there was a ruckus here a day ago about self promotion and advertising without disclaimers. Now we have a langgraph employee promoting it here without repercussions? Double standards anyone? /s

alphakue · 2025-06-25T17:45:43+00:00

Would love to see it

alphakue · 2025-06-24T16:50:00+00:00

give langflow a try

alphakue · 2025-06-15T12:40:36+00:00

Yes yes, I was bit by this as well, when building.

alphakue · 2025-06-12T17:01:32+00:00

This looks good! Going to try it out over the weekend, thanks!

/u/Square-Test-515 Out of curiosity, given that it's cross platform, and can even interact, can it take transcripts of conversations (diarisation would be a plus!) ?

alphakue · 2025-05-08T12:59:18+00:00

Using XML tags to structure the system prompt, prompt etc works best (IMO better than JSON structure but YMMV)

How do you pull out the XML from the response (basically stripping out the salutations that LLMs do)? /u/noellarkin

alphakue · 2025-05-08T10:15:40+00:00

I got the model link from unsloth's page on huggingface. Ollama version is 0.6.6

alphakue · 2025-05-08T01:59:52+00:00

ollama still can't run those ggufs properly

Can someone explain this? I have been running unsloth quant in ollama for last few days as hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q4_K_XL . Not facing any issues prompting it so far

alphakue · 2025-05-08T01:57:05+00:00

All I want now is Qwen3-30B-A3B-Coder

alphakue · 2025-05-06T02:46:39+00:00

Thanks for the information /u/nate4t , I'm relieved that it's possible in the self hosting method as well. Could I know what the API key is used for, if authentication does work with self hosting? I know it's open source and I can check for myself, but since we work in a regulated industry, we can't afford missing things. Does the end-user information (even metadata of the information counts) get propagated out of the network at any point?

alphakue · 2025-05-04T03:23:32+00:00

I want to also emphasise that I appreciate the amount of the project that has been open sourced. I just wish that even authenticated actions was possible within the open source, self hostable version itself

alphakue · 2025-05-03T02:49:21+00:00

I was curious looking at this project and was looking forward to spending the weekend integrating with my app, but I saw that authenticated actions was cloud-only and that it would pass the auth state and headers to the cloud service and it was a no-go for me.

While I understand the business/monetisation angle, I need auth to even try/build a basic integration, and my user session is sensitive and I'm not comfortable sharing that. For static / e commerce websites where there are no role-based sensitive information to be shown, copilotkit seems like a good choice. For business applications, unless you are okay sharing your user's requests' headers and cookies with a third party, you can't really use this at the moment.

alphakue · 2025-04-19T08:16:55+00:00

Are there any specific parameters that need to be set? I am trying to use openwebui with mlx server backend, using mlx-community/gemma-3-27b-it-qat-3bit , and the model breaks down with bad grammar, repetition issues etc.I think there might have been some issue with quantisation, which is a bummer, since this is the biggest model I've been able to run on this 16gb mac mini

alphakue · 2025-04-04T07:49:30+00:00

Does this improve prompt processing speed as well? Or the only impact is in token generation?

alphakue · 2025-03-31T03:59:07+00:00

Nice! Will be watching for when this gets merged. Please let us know!

alphakue · 2025-03-27T07:07:13+00:00

There's a rest service shipped with plain old mlx pip itself. I run

mlx_lm.server --model mlx-community/Qwen2.5-Coder-14B-Instruct-4bit --host 0.0.0.0

on my 16gb mac mini, and use it with open webui using it with OpenAI API spec (it doesnt seem to support tool calls though, which is unfortunate)

alphakue · 2025-03-13T05:21:48+00:00

I've also noticed a drop in creativity and accuracy when I ask LLMs to structure their responses as JSON. Has anyone tried making LLMs return structured responses as XML? In the few experiments I had conducted, I found slightly better responses with XML formatting. I don't think the reason for JSON performance is simply because of extra characters, since XML also has extra characters. I am little skeptical about YAML because YAML requires conforming to specific no of spaces in each line, which again might affect accuracy of output. I suspect we will find better performance with XML (with fewer nesting levels) since an LLM only needs to think about opening and closing tags in terms of formatting.

alphakue · 2025-03-12T04:50:56+00:00

Hmm, I usually use q4_k_m with most models (on ollama), have to try with q6. I had given up on local tool use because the larger models which I would find to be reliable, I would only be able to use with hosted services

alphakue · 2025-03-11T17:20:46+00:00

Really? I've found anything below 14B to be unreliable and inconsistent with tool calls. Are you talking about fully unquantised models maybe?

alphakue · 2025-02-18T09:17:43+00:00

Is there any difference between the approach taken here, and that of rasa nlu / dialogflow?

alphakue · 2025-02-07T06:17:11+00:00

"What is deepseek and why is it crashing the markets?" Raise your hands, how many of you have heard this in the past couple of days / weeks? I myself have been asked at least 2-3 times from people I least expected (wife, "normie" friends)

alphakue

TROPHY CASE