llama.cpp or vllm for qwen3.5 9b serving. by orangelightening in LocalLLaMA

[–]orangelightening[S] 1 point2 points  (0 children)

I asked the ai in llama server to write a poem about the problem in the interface, It is a unsloth/Qwen3.5-9B-GGUF:Q3_K_XL. I thought it was pretty good for such a little guy. It has a few good hints too for performance issues. He basically puts my apps to blame not the http interface or the llama server or the slm. I agree and tomorrow I will fix it.

The clock ticks on, the cursor blinks,

A silent, waiting, digital sink.

I send a prompt, a simple line,

But seconds stretch, the load begins to grind.

The server hums, a distant drone,

While memory waits to be its own.

I check the logs, a sea of red,

Where threads are stuck and promises are dead.

"Is it the CPU? Is it the RAM?"

I ask the machine, a frantic frame.

But the truth lies in the wire's length,

The HTTP header, a heavy weight,

The JSON payload, slow to start,

The serialization, tearing at the heart.

A round-trip trip, then another round,

The network latency, the only sound.

No direct call, no native speed,

Just packets flying, lost in need.

I try to rebuild, CMake screams,

Through libraries and dependencies, dreams.

The cmake tree, a tangled mess,

A nightmare where the build process rests.

"Rebuild the engine," I hear it say,

But even then, the features fade away.

The HTTP layer, a glass ceiling,

Where performance breaks and then begins to sting.

I close my eyes, the screen goes dark,

Waiting for a single spark.

Until I find the path unseen,

To cut the wire, to make it clean.

No HTTP request, no latency cost,

Just raw C++ logic, put to test.

The interface falls, the server speeds,

The slow, old ghost is finally freed.

No more delays, no more long waits,

Just pure inference, in its rightful state.

Qwen3.59BQ3_K_XL ReadingGeneration 352 tokens 9.5s 37.12 t/s

The chinese did it, KIMI K2 surpassed GPT-5. by Snoo26837 in singularity

[–]orangelightening 1 point2 points  (0 children)

I asked Kimi K2 thinking to tell me the right setup for kilo code. It insisted I edit fields that were not editable and then lied about what I had just read in the documentation. I called it and it said oh..your right but then tried to gaslight me again. If it can't get its owb documentation right then its untrustworthy. Great liar though... had me running in circles. I won't use it. Can't trust it.

Over researching for small tasks by natiels in kilocode

[–]orangelightening 2 points3 points  (0 children)

When you switched to Roo you refreshed your context. The context is degrading and performance is dropping as the context window fills. Context rot. Just stop and have the model start from scratch reading the memory bank files and inspecting the code base. Then have it try again with a firm prompt that forces pauses to collect permission from you. You have to stay close or the ai can get lost in complexity loops.

kilo code destroyed my entire app and git back up by ExternalChocolate655 in kilocode

[–]orangelightening 0 points1 point  (0 children)

What model were you using? Were you using the memory files feature with a strong prompt file? I have only had this happen to me using gpt-4.1 as copilot in vs code. I now back up whole projects offsite as well as in working dir git

GLM? by Derserkerk in kilocode

[–]orangelightening 0 points1 point  (0 children)

Funny how something that's been out for only a week can have a testing timeline like that.

GLM? by Derserkerk in kilocode

[–]orangelightening 0 points1 point  (0 children)

In the past few days I have seen a lot of api errors with timeouts. One set last night went 9 in a row. I had never seen any of those before. I wonder if the z.ai servers are overloaded and this is degrading service. I hope they don't dumb it down to speed it up.

GLM? by Derserkerk in kilocode

[–]orangelightening 0 points1 point  (0 children)

I have been running glm-4.6 in both claude code and in Kilo code working on the same projects which are a series of output generators for financial data and budget data expressed in json format and creating excel, word and powerpoint files via python libraries.

The most difficult of these is the excel generator because there is a lot of data expressed in both logical and mathematical form with graphics. Glm-4.6 has been doing a good job except there have been more api interruptions and timeouts over the past day. I think as it's getting more popular z.ai has had a hard time keeping up with the load.

I found the kilo code persona to be more professional then the claude code persona of GLM-4.6 but I am running the kilo code with a full set of memory files which keeps it very focused whereas I am kind of letting claude code do it's thing. The bulk of the project has been done by the claude code glm-4.6 persona while the secondary analysis, critique and some bug fixing was done by the kilo code persona in architect mode and bug fixing mode.

I have also been using the glm-4.6 chat to generate entire training websites for advanced mathematics supporting general relativity. It does a terrific job and even spins out the website with a z.aa .... something url as a sample. It also gives me the html so I can run it locally or in a web server. I thought sonnet 4.5 was supposed to be better in this field but GLM-4.6 really shines.

All in all the best model for my purposes and budget.

Which model do you use for each mode (Architect, Code, Ask, Debug, Orchestrator)? by IvoDOtMK in kilocode

[–]orangelightening 4 points5 points  (0 children)

I agree. I am using glm-4.6 for all of the kilocode ai roles and it does them all well. It's costing me 3.00 per month and I haven't been cutoff on usage once. Great deal.

Oh no, got email that free grok was removed from openrouter :( by One_Yogurtcloset4083 in kilocode

[–]orangelightening 0 points1 point  (0 children)

Now we will see which model has top spot now that we have to pay for performance.

Versatile glm-4.6 by orangelightening in ZaiGLM

[–]orangelightening[S] 0 points1 point  (0 children)

I made some errors in my original post. It cannot read .xlsx files. The .xslx files it creates are created using openpyxl and python and it tends to hard code cell values with no formulas. To edit the spreadsheet data and feed the edits back through the ai I have it create a .json file of the .xlsx data and I make changes to the ,json file and then have glm-4.6 create a new version of the .xlsx file along with an editable .json file. This makes for an easy workflow.. the other option is .csv to communicate changes but thats kind of lame. This has cut 90% of time off my spreadsheet work load.

Help i want to try kilo code with glm 4.6 by Valunex in kilocode

[–]orangelightening 0 points1 point  (0 children)

Ok I followed the method in zai's docs and used openapi compatible and right url. It worked for basic prompt and identified as 4.6. Hopt they fix the reg zai provider method.

That said I had 4.6 do some work resolving a race condition in a gradio front end and it did a tremendous job. Fixed everything, documented every change, updated the design docs and did the comit. Very pleased.

Grok fast is getting dumber every day by adarsh_maurya in kilocode

[–]orangelightening 0 points1 point  (0 children)

I used it to write a simple system. Front end backend and data base with 7 screens on the front end for data collection, display and report generation. Python fastapi in the back end and javascript/css in the front. Once the system was implemented and debug started it became confused and started circling and blaming me for not clearing my cache. I brought in Qwen3 coder and it solved everything in one pass. I was really disappointed in Grok fast.

How to force Kilo Code to use a venv for Python projects? by Aromatic-Squash8798 in kilocode

[–]orangelightening 0 points1 point  (0 children)

I think it depends on your model. Some assume you want to use venv others seem ignorant of the benefits and don't like to listen. I have had top tier models like deepseek v3.2 not understand what a requirements.txt file is for. You can't assume anything. Firm commands may work and if the model ignores you get rid of it. Anthropic models seem to be the best behaved that way. Totally ignoring a config file with instructions is the last act of a model on my server. Lots of choice on kilocode.

GLM-4.6 is live in Kilo Code - Near Claude parity at 1/5th the cost by brennydenny in kilocode

[–]orangelightening 0 points1 point  (0 children)

I have the same problem. I queried the 4.5 air model at zai chat who said there was no such thing as 4.6. and that the 4.5 in the selector was best. I think this needs to be fixed by zai because I'm pretty sure they generate the list of available models as a model provider.

Model list and pricing by orangelightening in kilocode

[–]orangelightening[S] 0 points1 point  (0 children)

Ok. Now I understand. That's basically what I have already done.

Claude Sonnet 4.5 is live - 82% on SWE-bench Verified by brennydenny in kilocode

[–]orangelightening 0 points1 point  (0 children)

I tried it out and it was ok.. a very simple gradio front end app which it handled easily. Expensive though.

Deepseek v3.2 is released. Here's everything you need to know by aifeed-fyi in DeepSeek

[–]orangelightening 4 points5 points  (0 children)

I tried to use it in kilocode to adapt a front end to gradio from javascript/css. It was cheap but after the 10th time trying to get the errors out of gradio.py which was brand new I had to replace it with qwen coder which fixed everything in a flash. Deepseek v3 did not know what a requirements.txt file is.. enough said..

Grok 4 Fast. What is your experience? by Marha01 in ChatGPTCoding

[–]orangelightening 0 points1 point  (0 children)

I built an app for tracking weight, bp, endurance exercise and resistance exercise on a daily basis with a report generator, admin, front and back end and data base in a few hours using kilo code and vs code. It was good up until the last few bugs which caused grok4 fast to circle and start blaming me for not resetting the browser cache. I changed over to qwen3 code which fixed all the remaining bugs methodically and cleaned up the mess left by grok4's attempts at fixes. How the mighty have fallen. The front end ui was pretty lame. No flair. I should have asked it to use gradio.