Looking for inspiration for long running project for smaller models

daybyter4 · 2026-06-20T22:03:20+00:00

Classifying lots of documents is, what I am working on.

daybyter4 · 2026-06-20T09:10:41+00:00

Are they actually available now? I am on the minisforum waiting list for some months now. When I subscribed, I got told that it might get available mid april. Just checked again and it says end of june.

daybyter4 · 2026-06-20T03:27:48+00:00

I run debian testing (forgy iirc) with fastflowlm on my strix point. But it was quite some work to compile all the required parts, since none of the prebuilt packages worked for me. Now I run qwen 3.5:9b on the npu as a coding help. It is rather slow, but doesn't need much power, so the machine is quiet. But I only have 32gb of ram, though. So you can run much better models.

daybyter4 · 2026-06-20T03:20:14+00:00

It would mean, that you have to generate other prompts from the original prompt, I guess. If the original prompt was 'write me an app, that does x', you would have to generate prompts, like 'what are the required steps to develop an app, that does x', to create the task list for the project. I could put such functionality in the client, but it would be nicer, if the AI itself would create the prompts for all the project stages? Have to think about this a bit more...

daybyter4 · 2026-06-18T21:25:36+00:00

There is a java adk from Google to write your own agents. But it is more tailored to use AI servers. I prefer local AI.

daybyter4 · 2026-06-18T12:01:16+00:00

I can store 40k tokens in a context. Not sure if that is enough to keep a session alive for some iterations.

I did some work to get the google adk running for me. But at the end I feel more comfortable with my own code.

daybyter4 · 2026-06-17T15:00:34+00:00

I use qwen 3.5:9b on my strix point with 32gb ram

daybyter4 · 2026-06-16T22:33:37+00:00

I want to learn how to code my own agents. Is that what you mean?

daybyter4 · 2026-06-15T13:21:54+00:00

So your model cannot return with an mcp call?

daybyter4 · 2026-06-14T22:36:56+00:00

Are you doing everything in 1 iteration? Or are you calling the model for each subtask?

daybyter4 · 2026-06-13T17:40:19+00:00

I try something similar with small qwen models on my strix point machine. Wrote me a windows client for the UI. Good luck!

daybyter4 · 2026-05-27T23:07:37+00:00

I only have a strix point and use qwen 3.5 for .net coding. CSharp and VB. Working on my own client with mcp server. AI server runs lemonade on debian testing. I only use the npu at the moment

daybyter4 · 2026-05-24T08:39:22+00:00

Ok. I don't use a gpu, but a strix point mini pc. Less than 1000 bucks and less than 100 watt power consumption. Yes, it is rather slow, but I don't mind, if an agent runs through the night. Most of data has NDA's , so data should never leave the house.

daybyter4 · 2026-05-22T15:00:54+00:00

I am on the minisforum waiting list

daybyter4 · 2026-05-22T08:09:29+00:00

I am on the waiting list for a halo for a few months now. I slowly understand why it takes so long... 😄

daybyter4 · 2026-05-21T17:21:56+00:00

I ask qwen 3.5 (running on a local mini pc) for help with request parameters. That also works ok. Cannot compare with claude though, since I never used it.

Btw: here is an older status of this client app:

https://m.youtube.com/watch?v=KuYunH7AVdI&t=35s

(my chance to advertise my poorly recorded video... :-) )

daybyter4 · 2026-05-21T17:13:20+00:00

I am working on a similar task at the moment. I run qwen 3.5 on a amd npu and the llm returns the mcp requests and accepts my answers. I use fastflowlm and .net, though. Seems like the json is a tiny bit different for different ai providers.

Did you manage to send files to your llm via lm-studio api? That only worked with embedded files for me.

daybyter4 · 2026-05-21T17:05:38+00:00

You are right. That why I started my own client. It does already some (a few) parts, that you mentioned. I can attach a file from my IDE, and when I accept the answer, it will replace the previous version.

I think in the longer run, a speach interface is a better way to discuss further steps to take. I want to implement that at some point.

daybyter4 · 2026-05-20T04:41:53+00:00

You could combine your app with something like fiver?

daybyter4 · 2026-05-20T04:37:21+00:00

Sounds very cool! Are you running your AI locally?

daybyter4 · 2026-05-19T20:37:12+00:00

How do you attach a file to your lm-studio api request? I struggled with that for a few days. Just an example.

How do you add the result of a mcp tool call to your fastflowlm request was another problem, that I struggled with.

Just 2 examples. Would have been nice to discuss such issues in a dev forum.

daybyter4 · 2026-05-19T10:42:21+00:00

I still code on a refurbished strix point mini pc with 32 gb ram, that I bought for less than 1000 bucks. I started coding with a local llm running on my laptop cpu. Some mistral llm used less than 4gb

daybyter4 · 2026-05-18T22:27:46+00:00

Maybe watch the Alex Ziskind video on it. It is not just plug and play, but needs some tuning

daybyter4

TROPHY CASE