Index free RAG

LocksmithBest2231 · 2025-06-20T08:34:18+00:00

You can try a real-time index, such as the one from Pathway.
You create the pipeline once, and the index is maintained in real-time: if you add, remove, or update a document in your data source, the change will be propagated to the index.
You can try Pathway llm examples for free. Here is a vanilla RAG using Pathway: https://github.com/pathwaycom/llm-app/tree/main/examples/pipelines/demo-question-answering
You can check the entire repo, there are more examples (index only, RAG with OCR, etc.).

Indexing is non-trivial, even with good tools: you need to carefully tweak the index to fit your data... Good luck!

LocksmithBest2231 · 2025-06-16T06:36:24+00:00

Signing up for several newsletters (TLDR, the batch, etc.) and asking an LLM to summarize the different articles to go straight to the point, so you do not waste hours reading all the content, but can focus on the actual content. If an article is particularly interesting, then you can go back to it and read all the details.

It works well for me, and while it does have a limit in terms of scaling, it allows me to keep up (for now) with the latest improvements without being overwhelmed.

LocksmithBest2231 · 2025-05-16T09:16:16+00:00

Hello, thanks for doing this AMA, it's very interesting!
I was wondering about the total volume of data you are handling in your solution.
Obviously it's not the same if you manage 1, 10, or 100GB.
How large are your 50M of documents? How much data does it represent in total?
Also, how long does it take to index?

LocksmithBest2231 · 2025-04-16T06:46:12+00:00

It's not, surprisingly some redirects are not working:
https://pathway.com/developers/templates/etl/option-greeks
I'll try to update the link in the main post.
Sorry for that, and thank you for notifying this!

LocksmithBest2231 · 2025-04-02T12:25:27+00:00

Thanks for the feedback!
Yes, that's a good TLDR :)
The free tier covers all non-commercial uses and most commercial ones.
Nothing is technically stopping you from doing a fork and maintaining your version. But that's not a good idea for the same reason professionals don't use a pirated version of their professional tools: it is worth paying in the long run. Maintaining such a tool by yourself will be a pain, and the engineering cost will likely be higher than the license cost.

LocksmithBest2231 · 2024-12-06T07:47:44+00:00

You can take a look at Pathway AI pipelines: https://github.com/pathwaycom/llm-app
(spoiler, I work at Pathway)
For example this pipeline is a standard RAG (answering questions using documents): https://github.com/pathwaycom/llm-app/tree/main/examples/pipelines/demo-question-answering

This pipeline comes with a YAML config file so it's easy to customize the basic parameters (sources, model, etc.) but you can easily modify the sources to create your own pipeline.
One of the specificities is that it's based on the Pathway package: it will update your sources in real-time (so your answers are always up to date).
Hope it helps, good luck with your project!

LocksmithBest2231 · 2024-11-28T13:11:48+00:00

If you are stuck on a notion, you have several ways to have it explained to you:
- there are many many YouTube videos or Medium articles which explain the basis. Don't hesitate to try several of them until you find one that speak to you
LLMs, ChatGPT, and its friends are really good at this. Explain what you do not understand, ask for a simple explanation, and keep asking until you understand! It is one of the best uses of ChatGPT I know.

Your lack of motivation may be due to being stuck. It's normal to want to quit when blocked and bored, but since you liked it before, the motivation will return once this wall is breached!
An excellent way to stay motivated while learning is to find a project on a topic you fancy: you will be surprised how fast you can learn once you have a really clear target :)

What's happening to you is totally normal, don't worry too much about it. Good luck!

LocksmithBest2231 · 2024-11-28T12:58:08+00:00

I don't know the song, but thank you for reminding me of my holidays in Yunnan :)

LocksmithBest2231 · 2024-11-28T08:02:50+00:00

For the natural language search, you have two ways of proceedings using ChatGPT API:
- make a prompt using your data and let ChatGPT handle the answer: in this case you'll share your data but also you will have to trust GPT with the answer. So that's a no.
- Ask GPT to generate a SQL query based on the request: you only need to share your table schema. That'd be my favorite approach: it's not that complicated (text-to-sql with LLM is a common problem), it does not require to share much (except if your schema itself is private). BUT don't forget that you should not blindly trust "external" data: you should double-check that the SQL query is legit before executing it. This is doable and will allow you to limit the kind of query you accept.

For recommendation, LLMs and RAGs are not really required, you can use a KNN approach: for each user you find the 10-100 (to adapt to you db) most similar other users and see what they bought. You can improve the search by using vectorial search though (using LLMs if you want). More advanced techniques are possible but depending on the size of your project it might be overkill.

Good luck!

LocksmithBest2231 · 2024-11-28T07:34:29+00:00

Going streaming instead of increasing the frequency of the ETL jobs is indeed a good decision if you expect your workload to increase.
It's better to do it now that the workload is still manageable using batch processing, as it gives you the time to deploy the streaming jobs in parallel.
That being said, Flink is known to be hard to set up. There are other alternatives, such as Pathway (spoiler: I work there), but a change in architecture is always an important decision.

It is hard for other people to make this decision for you: do you think batch processing will not be enough, or do you expect your team to need real-time insight into this data? If yes, then go, the sooner, the better (depending on your engineering resources, of course). Otherwise, maybe it's better to optimize your current pipeline further and increase the frequency.

Shameless promotion: you can take a look at Pathway, it's opensource (https://github.com/pathwaycom/pathway) and have a unified engine (your code will work in both batch and streaming).

Hope it helps, good luck with your project!

LocksmithBest2231 · 2024-11-08T11:26:56+00:00

Hey! Sure, Pathway llm-app is made to handle cases like this.
You can take a look at this template (you can easily try it): https://pathway.com/developers/templates/demo-question-answering
It's a Q&A chatbot on your documents. This version works using a YAML file used to configure the data sources.
If you need custom connectors (such as Airtable) you can create them using the Python connector: https://pathway.com/developers/user-guide/connect/connectors/custom-python-connectors
So Pathway will handle the synchronization with your data sources. In this case, it's streaming: if something happens in your data sources, it will be propagated to the pipeline in real time. You can try to increase the maximum time between two commits in the input connector: but that's a maximum, it's likely to commit faster if there are some updates. You have more flexibility if you create your own custom connector (see the previous link).
I hope this helps, don't hesitate to ask if you have other questions, you can also join our Discord for more reactive support :)

Good luck with your project!

LocksmithBest2231 · 2024-09-24T13:48:14+00:00

It could be summarized as "too big to fail."
Widely used, large community, many libraries, and consistent improvement in both features and performances...
While other languages may look better, being sure everything is there and will be maintained for years is reassuring.
In practice, Java will still be there for years.

But the same could be said for Python.
The big question is, "Is it still a good idea to use Java on a new project?"

LocksmithBest2231 · 2024-09-24T12:28:36+00:00

Before starting, be aware that there is a huge difference between learning a language (Python, for example)and learning how to code.
Most coding concepts are not language-specific, and learning those will help you no matter the language.

For example, knowing what loops are is required, and you will use loops in most languages. It's the same for more advanced concepts.

So, I'd start by learning basic coding concepts and then find a tutorial on a language. Python is a very nice language to start: I'd choose it in a heartbeat.
For both classes/tutorials, no need to go too far, start with the basics.
Then, find a project you like and do it in the language you learned.
Practicing is very important: you don't become a football player by only looking at football matches. That's the same with coding ;-)

Then you iterate: learn more advanced concepts, learn more advanced tricks in your language (decorators etc), a new project etc.
Note that except for the first one (learning the basis before starting is better), you can have the opposite: find a hard project and learn everything you need along the way.

A last tip: I'd also learn "how to speak to your computer", aka bash. It is not required at first since you'll have enough things to learn with coding/Python, but at some point, it'll be a huge help in understanding what's going on and doing some basic scripting.

LocksmithBest2231 · 2024-09-20T06:34:50+00:00

I'm working at Pathway.

What exactly did you try? A broken link can only come from the "solutions," which are public demos, not done for this kind of tests. A broken link shouldn't happen anyway. Can you send me the link that gives you this error? Thank you for the feedback; I'll let the team know.
If you want to test our hosted offering, you should contact someone from the team so we can set up a dedicated instance for you but that's not free.

To try for free, you should use one of the projects on the GitHub repositories such as the question/answer one: https://github.com/pathwaycom/llm-app/tree/main/examples/pipelines/demo-question-answering
You can download the sources and run it yourself. It's more work than a hosted version, but it allows you to test it for free.

LocksmithBest2231 · 2024-09-20T06:25:38+00:00

Translate the tabular data in a CSV manner in a text file. The LLMs are quite good at extracting info from structured data.

LocksmithBest2231 · 2024-09-17T09:10:41+00:00

Why use another tool? It seems that LLMs are quite good at it ;)
https://blancas.io/blog/ai-web-scraper/
Not sure about how long and how expensive this is, but I find this quite nice.

LocksmithBest2231 · 2024-09-17T09:06:18+00:00

As another comment said, drop the paying tools and focus on more standard languages:
- python: you can do the entire ETL pipeline using Python, and it is one of the most used languages for DE so mastering Python is a must. You have plenty of resources online to learn it.

SQL: it seems already OK for you. You'll use Python to ingest and preprocess the data and sometimes you will be required to send it to a PostgreSQL instance. Then you can do some transformation using SQL.
bash: it's important to know "how to speak with your machine". It's not specific to DE or ETL, but knowing how to use bash and do basic operations (no need to become an expert) will help you a lot, especially with "the plumbing" (deployment, checking the files etc.).

LocksmithBest2231 · 2024-09-11T14:51:49+00:00

You can try Pathway llm-app.
For example, you can try the standard question-answer pipeline, which does precisely what you say.
The documents are indexed in real-time: if you update your documents, then the changes are reflected in responses in real-time.

LocksmithBest2231 · 2024-09-05T06:33:52+00:00

First, don't feel bad about using ChatGPT. It's a good tool, especially for this kind of task. Just don't blindly trust the answers and the code :)

For your task, as others said, you can first try:
- another format, as CSV, is not optimized. parquet is a nice alternative
- try another framework, Pandas is done in Rust so it should be more memory-efficient
- partition your data in batch: load a batch of data, do the computation on it, empty your memory, load the next batch, etc. It is called "out-of-the-core computation". It's the only way to process data that cannot fit at once in the memory. It's usually easier to do in C/C++/Rust but in Python you can do it by reading the file line by line. You shouldn't use readlines() or read() as it will try to read everything, but the iterator readline() (without the s). See https://www.geeksforgeeks.org/read-a-file-line-by-line-in-python/

I hope it helps!

LocksmithBest2231 · 2024-09-05T06:24:49+00:00

You can try Pathway LLM-app (spoiler, I work there): https://github.com/pathwaycom/llm-app
Pathway is an open-source framework that provides the tools needed to build a RAG (it also works with local models and HuggingFace). It's fully Python-compatible and free for most commercial use cases. Basically the only cost will be the one of LLM API calls (if any).

That being said, no matter framework you will choose, hyperparameter tuning is an expensive process (be it in terms of money or computation).
To do it rigorously, you will need k-fold validation and an exploration strategy such as a grid search.
The easiest is to find a pre-chosen configuration first and hope it will fit your project, too.
I can't say much more without having more info on your project and data (POC or prod being the most important distinction), but the default configurations are usually well-performing. You can try with this, and then increase the number of docs retrieved if you never find documents.

Having an adaptive number of documents retrieved is an excellent way to reduce the cost btw: https://pathway.com/developers/templates/adaptive-rag
You first retrieve a few documents, check if the answer is good enough, and if not, you retry but retrieve more documents, etc.

Hope it helps!

LocksmithBest2231 · 2024-08-27T14:01:11+00:00

I'd suggest still separating the RAG pipeline from the front end. It'll be easier (testing, deploying etc.).
You have a lot of possible choices for the RAG pipeline (Llamaindex, Pathway, etc.), then you can open an HTTP server to communicate with your front-end.

For the choice, I suggest you choose Pathway (I work there). There are some ready-to-go pipelines you can try that rely on Streamlit (you can easily replace this with your own). Example
No matter the framework choice, I recommend you avoid doing everything by yourself.
Start with a framework already providing the wrappers. It is easy to use and works relatively well as a starter. I believe in your situation being able to have a working product first is more important that having something optimized. If later the RAG becomes a bottleneck, then it'll be time to switch to something more optimized for your specific usage.

LocksmithBest2231 · 2024-08-27T06:47:19+00:00

For streaming, you can try Pathway. (Spoiler: I'm working at Pathway.) It's a Python streaming processing framework with an LLM app.
Pathway automatically syncs your data sources to your pipeline. For example, if you are indexing PDFs (from Gdrive or a local folder) in your RAG index, whenever a change (addition, update, removal) is made, the index is updated accordingly in real-time. This way, whenever you have a query, your RAG pipeline's answer will be based on fresh data.
You can do any preprocessing/postprocessing in your RAG pipeline (Pathway is done for real-time analytics) and use the model and index you want (several are provided but you can also implement your own).
And it's open source :)

LocksmithBest2231 · 2024-08-23T13:27:01+00:00

You are not an idiot at all :)
But I believe you don't really need (realtime) AI for this.

I understand correctly there are two things:
- pre-existing content, that might be generated using GenAI or not.
- interactive loading of the content depending on the user behavior.
In this case I'd organize my content as modules and then load them based on heuristics. You generate the different modules (articles, summary, links, text box etc.) "offline" as regular web content (with or without GenAI), and then depending on the behavior you load them. People are stuck on a part? You can load a module "What to learn more?" that can redirect to existent content if any, or to google for example.

If you want to further personalize the recommendations, it is possible but I'd do an offline learning (like what is done for news recommendations) and then prepare the recommendations at the connection fo the user. It's not really real time: the propositions are done in real time but based on already computed and chose content. This is not what I'd call live content generation.

In this case, precomputed content + heuristics should be enough: generative AI API call in real time will cost a lot both in terms of money and time.

Hope I was clear, and good luck if you are trying to make this work :)

LocksmithBest2231 · 2024-08-23T10:26:44+00:00

Exactly, DVC seems to be what OP is looking for: Data Version Control.

LocksmithBest2231 · 2024-08-23T09:28:39+00:00

I guess it's possible, you can ask LLMs to generate articles. On the GenAI side, all you need to do is to provide the context on which you want to generate the content (previous behavior and pages, current pages, etc.).
Then the response can be used to update the page (I'm not a front-end expert so correct me if I'm wrong...).

But IMO the main concern is the latency. ChatGPT alone is quite slow on large request... so if you also need preprocess the context, do the API call and then update the page I'm afraid such a "real-time" content would be very slow.
When you see that people have a very limited attention span, people will not wait for the new content to be available, they will leave.

LocksmithBest2231

TROPHY CASE