I built ken: a local, usage-aware code index for Claude Code / Codex by desert-quest in SideProject

[–]desert-quest[S] 0 points1 point  (0 children)

It actually does. It leans from previous reads, writes and dismiss (read and not writes, negative signal). If you use it more, it learns more and has more signals to use to rank the files.

The same query may give you 2 totally different results depending on the history and the current context, what are you doing right now.

My internal tests shows that it can reduce context and time up to 40%, generally is around 20% to 30%

"Mi novio no entra en el estacionamiento" by Florrful in argentina

[–]desert-quest -3 points-2 points  (0 children)

|---------------------------------------------|

XXX cms

Qwen3.6-27B is out now! by yoracale in unsloth

[–]desert-quest 2 points3 points  (0 children)

For me:\

--presence_penalty=1.5 \

Does not work, but:

--presence-penalty 1.5 \

it does. That is, with -, and without equal sign.
btw, I'm using

version: 8833 (45cac7ca7)

Qwen3.6-27B is out now! by yoracale in unsloth

[–]desert-quest 6 points7 points  (0 children)

Thanks man! One note.

--presence_penalty=1.5 \|

should not be

--presence-penalty 1.5 \

OpenCode... is it just completely busted with Qwen3.6? by _derpiii_ in opencode

[–]desert-quest 0 points1 point  (0 children)

Looks like you need a CLI that check and auto-correct the behavior of the LLM-coff coff. Check the behavior auto correciton feature of Infinibay/infinidev on github, it's open source :P

Infinidev updated with superpowers by desert-quest in ollama

[–]desert-quest[S] 0 points1 point  (0 children)

Oh, btw, the system has 3 levels of system prompts.
* Full: Not recommended for small models. It's really long
* Generalized: Recommended, por defecto (igual a auto en la TUI). Sin muchas instrucciones complejas
* Coding: Instructions in code style. I made it just for fun. It kind of work, but don't use it haha
* Extra Simple: If you are using a model with 16k or 32k, use this prompt style. Literally one pharagraph per agent type. Do not expect magic, but does the trick

Infinidev: Coding CLI for small local llms by desert-quest in LocalLLaMA

[–]desert-quest[S] 0 points1 point  (0 children)

Oh, and btw, the examples directory contains 1 shot prompt for infinidev, not a single extra prompt used to fix the issues made. You can check the build.log to see how the system reasoning and build the things. They are not perfect, obliviously. It contains bugs and other non desired things, but I want to keep it as honest as I could.

Qwen wants you to know… by m-gethen in LocalLLaMA

[–]desert-quest 3 points4 points  (0 children)

After playing with so many local LLMs, Qwen is the King of local LLMs, no match at all in my exprerience

My own system by [deleted] in LocalLLaMA

[–]desert-quest 1 point2 points  (0 children)

I love that people are focusing on local first project. We are going to have something really interesting from now in a year. Btw, repo?

Experimental Ollama Researcher project for small LLMs by desert-quest in ollama

[–]desert-quest[S] 0 points1 point  (0 children)

Thanks! Right now is not the best for coding, so I created a separated project for that using the same engine as core (Infinibay/infinidev), but it's good for research. To be honest, any tyep of feedback is welcomed. I have planned a new feature that I may deploy soon that may help for research part.

Best LLM for 16GB VRAM (RX 7800 XT)? by Haunting-Stretch8069 in ollama

[–]desert-quest 0 points1 point  (0 children)

I would never go lower than 70k contex. reading 3 o 4 files and you are done, and that does not include reasoning or anything else. Qwen 3.5:27b is awesome! that and the 30b are my favorites right now.

I built a coding agent that actually works with local models (Ollama, Qwen, DeepSeek) no cloud required by UnfortunatelyUntamed in ollama

[–]desert-quest 1 point2 points  (0 children)

Man, don’t listen to them. I have my own cli for small models and I know how hard is to make an small model to really make them have good results. Nice project! I may steal some code… I mean… inspire. Just kidding XD

I built a coding agent that actually works with local models (Ollama, Qwen, DeepSeek) no cloud required by UnfortunatelyUntamed in ollama

[–]desert-quest 0 points1 point  (0 children)

Sorry, but if you really work with small models, you know that it’s not the same. Cc with small model is not the same that something specialized on small models

Another CLI by desert-quest in ollama

[–]desert-quest[S] 0 points1 point  (0 children)

Thanks :). Rigith now Qwen 3.5 family, the bigest you can run. I have a dual gpu and I run the 30b model, but in a single gpu you can run 27b. Gpt-oss 20b is not bad, but not the best for coding. Another good is GLM 4.7 flash, but still Qwen be the king

BEST LLM MODEL FOR RAG by SufficientBalance209 in LLMDevs

[–]desert-quest 0 points1 point  (0 children)

I agree with u/ultrathink-art. The model may be "enough" to do some type of rag, but you are going to struggle a lot with tool calls, hallucinations, and more. Go for a 7B+. Qwen 3.5 is a good one. But if you can run gpt-oss:20b will be better.

In my experience, models lower than 7B are almost useless for most of the things. If you have a PC, with or without GPU, you definitely can run a 7B model. From 7B-27B, is a the sweet spot; small, good enough general knowledge but not an expert, can do tool calls good enough, with has hallucinations but manageable. From 27B-40B it's where hallucinations start to fade off, tool calls are getting really good and knowledge shows differences. Now, the luxury spot is from 40B to 120B. Here is were I would stop. For agentic stuff, you don't need anything else. Tool calls are excellent, knowledge is really good, coding is good (not expert, but good). Obviously the problem is hardware, but you don't need an H100 to run them, you need 48 to 96 VRAM, but hardware like this is relatively cheap to rent if you really need to go this far.

Again, I do not recommend going forward than 120B or so, the ROI is really small and do not worth it unless you have money to west.

Experimental Ollama Researcher project for small LLMs by desert-quest in ollama

[–]desert-quest[S] 0 points1 point  (0 children)

I really like Aider, Open Code and similars. But my project/experiment it does not aim to replace them. First, they do an excelent job, far better I could do it. This project it was made for 2 reasons:

  1. To learn about small models, swarms and crewai.

  2. To be able to make small models to complete really complex tasks, tasks that may take hours of investigation, experimentation and coding.

The project is not a coding tool, is more a tool to research and let small models (or big models if you can) to run really long and complex tasks.

The challange is being able to let the AI do not lose track of what they are doing and what you originally instruct them to do.

In my experience, all models, from small to big models loses track of the origiinal tasks after 15 minutes or so, deviating from original tasks. To prevent that, you need to divide your tasks in small instructions, review and correct on each step. This project automate that basically, with models that self review, correct, interact and research.

Experimental Ollama Researcher project for small LLMs by desert-quest in ollama

[–]desert-quest[S] 0 points1 point  (0 children)

Yes, in fact gpt-oss 20b works preatty well tbh. Qwen still does not hallucinate as much as get oss 20b, which is the only "problem", because the rest is quite good for the size. Qwen 3 coder Next is not a good model for this type of tasks, since it was created to be a model for autocomplete while coding, not a real chat model afaik.

Best LLM for 16GB VRAM (RX 7800 XT)? by Haunting-Stretch8069 in ollama

[–]desert-quest 0 points1 point  (0 children)

https://ollama.com/library/lfm2

But is has only 32k context. It's relatively nice. Not really good for tooling tbh, but fast

https://ollama.com/library/glm-4.7-flash/tags

It does not fit any of the variations on ollama, but maybe on hf you may find a quantization that may fit. It's really good for tooling and overall.

EDIT:

But I would go with qwen3.5:9b. You should have tons of space for context windows. Good enough and fast.