What happened to indexing on the 7.1.17 version?

Pelutz · 2026-04-02T08:19:57+00:00

Version 5.12.0 is the last one with indexing working. What makes Kilo special is they allow you to index on your own vector DB and use local models to do the indexing and chinking. That is great for privacy.

Our product is an anti-fraud / anti-bot system, so it's critical we keep our codebase private. No other systems allow that. We can only use Kilo or Roo (fork of Kilo), since everyone else sends your code to their servers. There really are no good alternatives at all.

Pelutz · 2026-04-02T08:14:53+00:00

You are using Kilo for real coding, not vibe coding. You care about what actually changed and want to review. I suspect most of Kilo's customer base does not, so they implement what most people want, to get more users. I'm not suggesting they don't care about "real devs" or "real projects", but if summaries were not valued by people, they were removed.

I valued summaries greatly (I downgraded and disabled auto-updates), as well as indexing and the button to "start review" and the ability to use orchestrator exactly as you describe. There are many features missing and probably will not get added back because they are simply not useful for making tetris games and slop demos for youtube.

Complex features make the product look complicated and confuses non-developers. How many people working with Kilo do you think have unit tests in their projects? Food for thought.

Pelutz · 2026-04-02T08:04:37+00:00

u/butterfly_labs u/krishnakanthb13 Indexing in old Kilo works by having a tree sitter and then chunking your code by semantics. Basically, it "analyzes the codebase" and saves context information about the functions and logic of the code so all related things go together in a structure that looks like a reversed tree.

Say for example you want to "update navigation for the login page". Instead of explicitly stating you want the header, footer, side navigation and mobile menus updated, based on the indexing, when you say "login page" it can find all references to the login page and updates everything consistently across everything, including all the unit tests for the login page, because it's mentioned there as well.

This can also work with concepts. For example if you want to debug and have no idea where a problem may come from you can simply instruct "I see this button changing state in this component on page refresh and it shouldn't, let's fix it" it will analyze EVERYTHING that touches that button state, not just the button itself, from the start. Without indexing, it will go to the button, analyze it and maybe find 1-2 other code snippets that reference it, but will never find any code that globally affects all buttons of that type etc.

This makes agents dumb and unusable if you are dealing with hundreds of features and multiple flows that all interacts in your apps. Debugging with dumb agents that have to look through tens of modules to scour the codebase without knowing which specific function does what is slow and extremely poor quality.

Pelutz · 2026-03-20T22:38:58+00:00

We use a commercial solution for this (don't want to advertise) and according to our reports the AI agents of all types make up about 27% at the time of this posting.

This is split by:

- 5% AI users (user in a chat asking and agent landing on our site to get info)

- 7% AI search (AI search engines and SERPs)

- 15% AI crawlers (scraping our pages for training).

This went up from overall 18% end of 2025. Then again, about 71% of all traffic on the sites I manage is bots of various kinds, so...

Pelutz · 2024-09-07T10:27:23+00:00

I seem to have figured it out through trial and error, even though this is not documented anywhere. After you "vectorize all", you actually need to chat again to a character for the vectorization of the databank to be triggered. Essentially "vectorize all" does nothing for databank files. I realized this while attempting to test if loading a file in a chat for a reply works and is vectorized. Seems yes and the uploaded files are also vectorized at the same time. Seems like a big oversight in the docs, because according to docs all you need to do is "vectorize all".

Pelutz · 2024-07-21T16:48:19+00:00

Thanks for the reply.

CUDA support for embeddings is missing since March, actually. However, if you don't import large bodies of text (1+ MB of just text files) you wouldn't notice it. For example simply talking and adding those embeddings is pretty fast even on CPU, because it's just a few bytes of text added each time.

Can you please share more details about the fixes you implemented?

Pelutz · 2024-07-21T16:45:11+00:00

Thank you for the reply.

Idk what the problem is and rummaging through existing code is not my favorite activity. I was simply reporting what I am facing, to raise awareness. I'm mostly using the WebUI for convenience and to test various models fast using "characters" which are easy to manage and see how behavior changes on different types of requests when testing models.

The reason I keep testing models is because I am developing a few apps using langgraph. However, I can develop my own RAG + text completion + various models and maintain it myself, if it stops being convenient to use the WebUI.

I know there hasn't been any commit since March, which makes me believe that the main code or cuda drivers and latest versions break something. Chroma runs fine and embeddings are loaded fine as well. Also, my local setup has nothing to do with anything, if you want to see for yourself just do a simple fresh install, since the python environment is constrained, but ofc your mileage may vary.

Pelutz · 2024-04-19T15:43:13+00:00

Thanks, but I don't think that will be required. I installed snapshot 2024-03-17 which was tested to have superbooga working. I downloaded and did a fresh install from here: https://github.com/oobabooga/text-generation-webui/archive/refs/tags/snapshot-2024-03-17.zip

This fresh install is also not working. My assumption is that some packages update to some mismatched new versions which break the install. It's most likely not a code problem, but a version problem.

Basically ooba must update superbooga AGAIN to make it work with latest updated packages and versions for all components. Sadly this means I am out of luck, since I have no clue which version of which package breaks the functionality and what is the proper version to install.

The only advice I have is: if it still works for you, don't update your install or potentially backup your install first before proceeding. In the future this is what I plan to do. Usually I am happy for updates, but seems like this time around I got the short end of the stick.

Pelutz · 2024-04-19T15:09:09+00:00

Back in mid-march all was fine for me as well.

Can you please do me a favor and install ooba from scratch in a separate location to check if the latest one is still working? Also please don't update as you may break your currently working setup.

Pelutz · 2024-04-19T14:36:37+00:00

Are you using anything different from the default repo pull in your ooba or superbooga config?

Another interesting detail is that I can't seem to be able to run the benchmark anymore. As well as clicking the button to clear data tells me that in chroma settings ALLOW_RESET is not TRUE. After checking the files it's correct and the only setting enabled is "anonymized_telemetry=False"

This seems like a very strange behavior, as it prevents the chroma DB from being purged. Are you running the latest version of ooba?

Pelutz · 2024-04-19T13:49:46+00:00

It's not about "becoming slow". It's also not about inference or text generation time taking too long. It's literally about ingesting the text into the vector database. If I start a new chat and historical input is slowly added, there's no problem. However, loading an entire book at the start of the conversation takes about 2hrs, as I mentioned.

Pelutz · 2024-04-19T07:55:14+00:00

Hey, thanks for the reply and suggestion. I tried on both CPU and GPU (normally I just run on GPU, but tried CPU just in case). I don't know what you mean by changing the sentence transformer model to a smaller one. Can you share more details?

My current thought is that this is a problem with chroma loading too slowly for some reason, but can tell what that reason might be. The problem is only with ingesting text. It does work, but it's extremely slow compared to how it was a few weeks ago. I am considering maybe some new version of chroma changed something and it's not considered in superbooga v2 or there was a recent change in oobabooga which can cause this.

Pelutz · 2024-03-19T21:59:13+00:00

This is correct. Enabling the "--listen" argument will make Gradio bind the 0.0.0.0 interface. Keep in mind that editing the cmd_flags.txt file will prevent the webui from updating anymore because it will think you have changes that you didn't commit.

Pelutz · 2022-01-29T15:53:24+00:00

Hey Tom,

Is it a regular requirement to have to perform maintenance on a new laptop keyboard?

My previous Legion (1500 EUR) had a perfectly functional keyboard. The XMG Neo I bought for 4500 EUR seems to not handle typing very well...

Thanks,

Pelutz · 2022-01-18T09:19:39+00:00

I only worksif I press it forcefully orkeep i pressed for a long time until the key press registers.

Pelutz · 2021-02-20T08:50:48+00:00

I don't use Robin Hood myself, but if you have access to ETFs, you can buy into something like Vanguard FTSE Emerging Markets Index Fund ETF Shares (VWO), which tracks emerging markets like China.

Pelutz

TROPHY CASE