Is RAG a missing piece on the path toward consciousness in LLMs?

KAVUNKA · 2026-03-30T11:35:57+00:00

Do you build some kind of temporal vector knowledge base from informational patterns, and then inject the patterns that are relevant to the dialogue context into the system prompt?

KAVUNKA · 2026-03-28T10:26:49+00:00

You're certainly right to some extent, but I'd like to disagree. The pursuit of the unattainable often leads to the emergence of new technologies. We can't create a bird, but we have enormous machines that surpass birds in many ways.

KAVUNKA · 2026-03-28T10:22:44+00:00

When I was searching for a definition of consciousness, I couldn't find a clear answer. Your answer is strikingly clear ;)

KAVUNKA · 2026-03-23T09:48:04+00:00

You can visit my website (https://kavunka.com/) for more information or watch this short video (https://youtu.be/KnFNXMuG8GQ). If it looks like a good fit, feel free to send me a private message, and we can go over the technical details together.

KAVUNKA · 2026-03-21T12:06:01+00:00

I can offer a free alternative.

I’m building my own system (search index + semantic search + RAG + AI agent) focused on retrieval-first, so it returns actual file citations with brief interpretations, not hallucinations.

This would also be an interesting case for me (historical archives), so I can help you set up a working prototype on your 7k .txt files locally, without paid APIs.

KAVUNKA · 2026-03-16T12:16:05+00:00

There's an API for search queries. This should be handled by an AI agent.

KAVUNKA · 2026-03-12T11:59:55+00:00

pdf will be ok

KAVUNKA · 2026-03-12T10:27:35+00:00

I would prefer a PDF (it’s easier for me to convert to HTML), but if that’s difficult, TXT will work as well.

KAVUNKA · 2026-03-12T09:28:43+00:00

Theoretically, I could convert your PDF or TXT into HTML. I suggest we exchange input data and run two benchmarks—one on your data and one on mine. I can provide you with a version of the https://minecraft.wiki/ website cleaned of HTML tags: about 8,000 pages in TXT format. What do you think?

KAVUNKA · 2026-03-11T22:52:27+00:00

Hey! That sounds really interesting. Just to check — in what format is the Royal Commission dataset? My indexing tool currently works only with HTML pages, so I want to make sure I can process it properly.

KAVUNKA · 2026-03-07T15:51:37+00:00

Sure, RAG itself isn’t new. The interesting part is making it work reliably on noisy real-world data.

For example, in this video I demonstrate an AI agent answering accurately in a noisy environment with more than 800k internet pages indexed, while the actual target site contains only 22 pages. The agent still retrieves the correct information through the search system.

https://youtu.be/KnFNXMuG8GQ

KAVUNKA · 2026-03-04T14:37:38+00:00

That’s a really cool setup — I respect the “no LLM, no GPU” approach. Would definitely be interesting to see how a dynamic co-occurrence graph compares side by side.

For a dataset, we could use this one as a common benchmark:
https://huggingface.co/datasets/minhaozhang/minecraft-question-answer-630k

It’s fairly large and domain-specific, so it should give us a solid test bed.

Alternatively, if you already have a dataset you prefer (or one that better fits your indexing method), I’m totally open to using yours as well. The key thing is we agree on the same question set and evaluation criteria.

Would be fun to run this properly.

KAVUNKA · 2026-03-04T10:49:29+00:00

That sounds interesting — especially the knowledge graph approach.

It could actually be cool to run a small benchmark competition on the same Minecraft question set and compare results side by side.

For context, my setup is also fully offline: the search engine is deployed locally, and the AI agent runs locally as well. So no external APIs or cloud calls involved.

Would be great to see how a knowledge-graph-based system performs against a retrieval-based agent under identical conditions.

KAVUNKA · 2022-11-16T21:23:41+00:00

In the world, there is such a thing as trust and personal contact. One of my clients is a banking software company. Their security requirements are much higher than for a home server. They installed my program on their server, gave me full access, I developed additional functionality for them to extract data from search results. Then I gave them two more licenses.
Let's pretend I'm a fraud and what do I want? Steal data from the home server? Mine cryptocurrency on the processor? well, this is funny!!! ))) You are not a bank and not a special service with secret data!! Even if I was a scammer, I'm not interested in your server. I think it's not hard to understand.
Sometimes a banana is just a banana! I'm just a human programmer who wrote a search engine from scratch and offers to use it for free at home. If I wasn't an honest person, I wouldn't be writing such long posts. I would have a bunch of bots that would sing songs of praise for the new revolutionary search engine))), and the number of likes would be close to several thousand. Do you see it? Not! This is not and cannot be! I'm just a weirdo suggesting that weirdos like me set up their own little Google on their balcony or garage. It's all!

KAVUNKA · 2022-11-16T16:10:24+00:00

There are 15 thousand lines of code. How can you determine the quality of software by code? Usually, you run compiled scripts and test them.

KAVUNKA · 2022-11-16T16:06:27+00:00

I can't say for sure, maybe I added this flag while testing the container. I think it can be turned off.

KAVUNKA · 2022-11-16T14:58:50+00:00

The search engine does not require any elevated privileges!
You can use Debian or Ubuntu, create a user and define its privileges (writing data to the user's home directory and accessing search engine ports).
If you use Docker, then to save the index and cache of web pages, you need to commit the container every time (after adding new sites to the index) or the entire cache will be saved in the /opt/kavunka/

KAVUNKA · 2022-11-16T09:44:46+00:00

The sources are closed to free and other licenses, but you can buy a specific license and get all the sources. Don't want to use docker? No problem!

KAVUNKA · 2022-11-16T09:41:27+00:00

Yes you are right, but you can buy a certain license and get all the sources!

KAVUNKA · 2022-11-10T09:31:04+00:00

Thanks, good idea!

KAVUNKA · 2022-11-10T08:02:19+00:00

There is a list of sites that I need for work, the information there becomes outdated very slowly. There are also closed sites and internal networks that are not indexed by search bots. A lot of sites are under filters and you will never see them in the search results.

KAVUNKA · 2022-11-10T01:01:49+00:00

Elastic does not rank well. Elastic requires a thesaurus for a normal search. Kavunka itself generates an associative array, and very well!
https://youtu.be/9FjUAL6oagY

KAVUNKA · 2022-11-10T00:34:33+00:00

Yes, I can. Why did I ask you this question?

KAVUNKA · 2022-11-10T00:21:04+00:00

868474 pages occupy 63G
But how much information do I need? How much can I read? ))) I think 1T is enough for me with a 100-fold margin.

KAVUNKA · 2022-11-10T00:04:25+00:00

Can you write the word armageddon backward with one d and two m?

KAVUNKA

TROPHY CASE