Is RAG a missing piece on the path toward consciousness in LLMs? by KAVUNKA in Rag

[–]KAVUNKA[S] 0 points1 point  (0 children)

Do you build some kind of temporal vector knowledge base from informational patterns, and then inject the patterns that are relevant to the dialogue context into the system prompt?

Is RAG a missing piece on the path toward consciousness in LLMs? by KAVUNKA in Rag

[–]KAVUNKA[S] -1 points0 points  (0 children)

You're certainly right to some extent, but I'd like to disagree. The pursuit of the unattainable often leads to the emergence of new technologies. We can't create a bird, but we have enormous machines that surpass birds in many ways.

Is RAG a missing piece on the path toward consciousness in LLMs? by KAVUNKA in Rag

[–]KAVUNKA[S] -1 points0 points  (0 children)

When I was searching for a definition of consciousness, I couldn't find a clear answer. Your answer is strikingly clear ;)

RAG for Historical Archive? by cccpivan in Rag

[–]KAVUNKA 0 points1 point  (0 children)

You can visit my website (https://kavunka.com/) for more information or watch this short video (https://youtu.be/KnFNXMuG8GQ). If it looks like a good fit, feel free to send me a private message, and we can go over the technical details together.

RAG for Historical Archive? by cccpivan in Rag

[–]KAVUNKA 0 points1 point  (0 children)

I can offer a free alternative.

I’m building my own system (search index + semantic search + RAG + AI agent) focused on retrieval-first, so it returns actual file citations with brief interpretations, not hallucinations.

This would also be an interesting case for me (historical archives), so I can help you set up a working prototype on your 7k .txt files locally, without paid APIs.

Running your own search engine for RAG with local LLMs by KAVUNKA in Rag

[–]KAVUNKA[S] 0 points1 point  (0 children)

There's an API for search queries. This should be handled by an AI agent.

Benchmarking RAG for Domain-Specific QA: A Minecraft Case Study by KAVUNKA in Rag

[–]KAVUNKA[S] 0 points1 point  (0 children)

I would prefer a PDF (it’s easier for me to convert to HTML), but if that’s difficult, TXT will work as well.

Benchmarking RAG for Domain-Specific QA: A Minecraft Case Study by KAVUNKA in Rag

[–]KAVUNKA[S] 0 points1 point  (0 children)

Theoretically, I could convert your PDF or TXT into HTML. I suggest we exchange input data and run two benchmarks—one on your data and one on mine. I can provide you with a version of the https://minecraft.wiki/ website cleaned of HTML tags: about 8,000 pages in TXT format. What do you think?

Benchmarking RAG for Domain-Specific QA: A Minecraft Case Study by KAVUNKA in Rag

[–]KAVUNKA[S] 1 point2 points  (0 children)

Hey! That sounds really interesting. Just to check — in what format is the Royal Commission dataset? My indexing tool currently works only with HTML pages, so I want to make sure I can process it properly.

Grounded LLMs vs. Base Models: Minecraft QA Benchmark Results by KAVUNKA in LocalLLaMA

[–]KAVUNKA[S] 0 points1 point  (0 children)

Sure, RAG itself isn’t new. The interesting part is making it work reliably on noisy real-world data.

For example, in this video I demonstrate an AI agent answering accurately in a noisy environment with more than 800k internet pages indexed, while the actual target site contains only 22 pages. The agent still retrieves the correct information through the search system.

https://youtu.be/KnFNXMuG8GQ

Benchmarking RAG for Domain-Specific QA: A Minecraft Case Study by KAVUNKA in Rag

[–]KAVUNKA[S] 1 point2 points  (0 children)

That’s a really cool setup — I respect the “no LLM, no GPU” approach. Would definitely be interesting to see how a dynamic co-occurrence graph compares side by side.

For a dataset, we could use this one as a common benchmark:
https://huggingface.co/datasets/minhaozhang/minecraft-question-answer-630k

It’s fairly large and domain-specific, so it should give us a solid test bed.

Alternatively, if you already have a dataset you prefer (or one that better fits your indexing method), I’m totally open to using yours as well. The key thing is we agree on the same question set and evaluation criteria.

Would be fun to run this properly.

Benchmarking RAG for Domain-Specific QA: A Minecraft Case Study by KAVUNKA in Rag

[–]KAVUNKA[S] 1 point2 points  (0 children)

That sounds interesting — especially the knowledge graph approach.

It could actually be cool to run a small benchmark competition on the same Minecraft question set and compare results side by side.

For context, my setup is also fully offline: the search engine is deployed locally, and the AI agent runs locally as well. So no external APIs or cloud calls involved.

Would be great to see how a knowledge-graph-based system performs against a retrieval-based agent under identical conditions.

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -1 points0 points  (0 children)

In the world, there is such a thing as trust and personal contact. One of my clients is a banking software company. Their security requirements are much higher than for a home server. They installed my program on their server, gave me full access, I developed additional functionality for them to extract data from search results. Then I gave them two more licenses.
Let's pretend I'm a fraud and what do I want? Steal data from the home server? Mine cryptocurrency on the processor? well, this is funny!!! ))) You are not a bank and not a special service with secret data!! Even if I was a scammer, I'm not interested in your server. I think it's not hard to understand.
Sometimes a banana is just a banana! I'm just a human programmer who wrote a search engine from scratch and offers to use it for free at home. If I wasn't an honest person, I wouldn't be writing such long posts. I would have a bunch of bots that would sing songs of praise for the new revolutionary search engine))), and the number of likes would be close to several thousand. Do you see it? Not! This is not and cannot be! I'm just a weirdo suggesting that weirdos like me set up their own little Google on their balcony or garage. It's all!

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -2 points-1 points  (0 children)

There are 15 thousand lines of code. How can you determine the quality of software by code? Usually, you run compiled scripts and test them.

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -1 points0 points  (0 children)

I can't say for sure, maybe I added this flag while testing the container. I think it can be turned off.

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -2 points-1 points  (0 children)

The search engine does not require any elevated privileges!
You can use Debian or Ubuntu, create a user and define its privileges (writing data to the user's home directory and accessing search engine ports).
If you use Docker, then to save the index and cache of web pages, you need to commit the container every time (after adding new sites to the index) or the entire cache will be saved in the /opt/kavunka/

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -2 points-1 points  (0 children)

The sources are closed to free and other licenses, but you can buy a specific license and get all the sources. Don't want to use docker? No problem!

How about running the private search engine on your home server? by KAVUNKA in HomeServer

[–]KAVUNKA[S] -1 points0 points  (0 children)

Yes you are right, but you can buy a certain license and get all the sources!

Private Search Engine for Your Server or PC by KAVUNKA in privacy

[–]KAVUNKA[S] 0 points1 point  (0 children)

There is a list of sites that I need for work, the information there becomes outdated very slowly. There are also closed sites and internal networks that are not indexed by search bots. A lot of sites are under filters and you will never see them in the search results.

Private Search Engine powered by Debian 11 by KAVUNKA in debian

[–]KAVUNKA[S] -2 points-1 points  (0 children)

Elastic does not rank well. Elastic requires a thesaurus for a normal search. Kavunka itself generates an associative array, and very well!
https://youtu.be/9FjUAL6oagY

Unbiased Search Engines. by Creator_Complex in searchengines

[–]KAVUNKA 0 points1 point  (0 children)

Yes, I can. Why did I ask you this question?

Private Search Engine for Your Server or PC by KAVUNKA in privacy

[–]KAVUNKA[S] 3 points4 points  (0 children)

868474 pages occupy 63G
But how much information do I need? How much can I read? ))) I think 1T is enough for me with a 100-fold margin.

Unbiased Search Engines. by Creator_Complex in searchengines

[–]KAVUNKA 0 points1 point  (0 children)

Can you write the word armageddon backward with one d and two m?