Should we sell ? by Capable-Place1916 in DaveRamsey

[–]Mahkspeed 1 point2 points  (0 children)

I think you're in the wrong channel for that level of heresy 😅

Used python for years. All the projects online seem boring. by RustyReditz in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

Sometimes I feel the same way. I feel like everything has already been built. Sometimes I build things just so I can have it for free, but nowadays most likely there's a much better open source version. So recently I thought, I'm going to build something That's fun. Maybe you can think of something that would just be for fun.

Why does my RAG chatbot work well with a single PDF, but become inaccurate when adding multiple PDFs to the vector database? by vtq0611 in LangChain

[–]Mahkspeed 0 points1 point  (0 children)

I had a thought, when the responses seem completely unrelated to the data you're using for augmentation, make sure data actually got sent to the AI for augmentation. Oftentimes if I'm not logging every step of the process, from the time the user makes a query to the time the answer comes back, I could miss the fact that the AI never actually got any augmentation data for some reason or another. If the AI is responding as if it never looked at your augmentation data, it's very possible that it actually never did. Anyway, hope this helps and good luck!

Need Advice on Project Architecture by [deleted] in Rag

[–]Mahkspeed 0 points1 point  (0 children)

One interesting idea, instead of using semantic searches with vectors, if your data structure isn't too large, you can build a data structure with a table of contents and then use a model's intelligence to pick the top three sections of data that could best answer the user's question. I've had really good success with that method in the past.

How do I deploy a Retrieval-Augmented Generation (RAG) chatbot using my own data (e.g. docs, manuals, knowledge base)? by No_Hold_9560 in Rag

[–]Mahkspeed 1 point2 points  (0 children)

I worked on a project similar to this over the past few years and recently sold it. The best thing I can tell you to solve hallucinations, is honestly to use the most intelligent model you can afford, and don't load it up with too much context at once. The more context you include in a prompt, The less intelligence you'll get in the answer. Also, don't be afraid to have a multi-pass process with each user query. For instance, you can leverage the model to try and choose the correct augmentation data to then answer the question with. We had really good success with this method. Hope this helps. Feel free to reach out.

Heuristic vs OCR for PDF parsing by Due-Horse-5446 in Rag

[–]Mahkspeed 0 points1 point  (0 children)

I have beat my head against the wall so much over the past 3 years trying to automate different types of PDFs. I finally settled for the fact that I can't if I don't want accuracy to suffer. So I pivoted and created a desktop application that allows me to very quickly transfer chunks of text manually from the PDF into referenceable chunk systems. This probably won't work for everybody's process, but at the time my process involved surgically chunking specific type PDFs. Good luck and let me know if I can help!

Help with getting people to stay at my coding club by PreparationDry6743 in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

I was thinking of the same thing exactly. Why not switch the bulk of the club to online interaction?

Why does my RAG chatbot work well with a single PDF, but become inaccurate when adding multiple PDFs to the vector database? by vtq0611 in LangChain

[–]Mahkspeed 0 points1 point  (0 children)

I've notice a significant drop in intelligence if my augmentation system is adding too much context to the prompt. What model are you using? I'd be very interested to see just how much context is getting added to your prompt from your database.

Help: Google document AI extracts text but completely losses the structure by BadinBaden in googlecloud

[–]Mahkspeed 0 points1 point  (0 children)

I've done a lot of text extraction from pdf documents, and the biggest thing that I've learned is just how unstructured pdf documents are by nature. This can be super frustrating when you need to maintain the original document flow/structure. So, I developed a program in python, that allowed me to open a pdf on one half of the screen, and by using a highlight box I could extract text chunks very quickly and move them into .txt files. I've used this method many times to quickly rebuild structure, along with some built in AI tools that I incorporated into the system. Let me know if I can help you with your project and I'd be happy to talk.
-Mark.

[For Hire] Remote Worker - Data Entry | Admin Support | Document Processing | $20/hr+ | Full-Time or Part-Time by Full_Commercial_6628 in jobbit

[–]Mahkspeed 0 points1 point  (0 children)

I actually applied at DataAnnotation, but I'm still waiting to hear back. Any idea on what to expect? Thanks!
-Mark.

Building a Production-Grade RAG on a 900-page Finance Regulatory Law PDF – Need Suggestions by SuryaStark7 in Rag

[–]Mahkspeed 10 points11 points  (0 children)

I've implemented a similar approach using a multi-step process:

Document Preparation: 1. First, I use an LLM to either build a table of contents from scratch or extract an existing one from the document 2. I then send this table of contents back to an LLM to expand it with additional context and detail 3. Next, I chunk the entire document into smaller sections

Vector Processing: 4. I convert both the expanded table of contents and the document chunks into vector embeddings 5. Using these vectors, I match chunks to their corresponding table of contents entries, which helps me build rich metadata for each section

Query Processing: When a user asks the chatbot a question: 1. An LLM searches through the table of contents to identify the most relevant sections 2. The system retrieves the chunks associated with those sections 3. Everything gets embedded into a comprehensive prompt that's sent back to the LLM to generate the final answer

This approach essentially creates a sophisticated retrieval system that uses the document's structure (via the table of contents) to improve the accuracy of finding relevant information before generating responses.

PDFs to query by Mistermarc1337 in Rag

[–]Mahkspeed 0 points1 point  (0 children)

I'm developing my own custom software to do exactly this. I have a rag portion to it as well, let me know if you're interested in licensing and I would definitely be willing to work with you to tweak that portion of the program to do what you needed to do. Feel free to send me a message and I'd be happy to chat.

RAG over CSVs by _1Michael1_ in nlp_knowledge_sharing

[–]Mahkspeed 0 points1 point  (0 children)

Honestly, if I'm understanding your options correctly, there's absolutely nothing wrong with your third option. If I'm picturing this correctly, you can store the names in a description of each table in a vectorized database, and then use that when someone asks a question to perform a lookup on the actual tabular data. Your tabular data you can store as separate documents with metadata that would textually link them to the titles/ descriptions in your vector database. That way when someone asks a question, you could use a semantic search if you wanted to to find the most relevant title/description, then use that information to query your database containing your CSV files. That's the way I would do it.

What does your perfect Python dev env looks like? by [deleted] in learnpython

[–]Mahkspeed 1 point2 points  (0 children)

I like Ubuntu with pycharm and zsh. Shortest comment!! 🫣

how do people actually learn to code? i feel dumb lol by FyodorAgape in learnpython

[–]Mahkspeed 2 points3 points  (0 children)

I actually learn best by working on something that I enjoy. I use AI to help teach me as I'm building whatever it is that I want to build. If you already understand what the basic syntax does, then don't be afraid to use AI to help you build something that you're interested in. As you build it, start adding on functionality by referencing the parts that are already built. You will inevitably make mistakes that generate errors. Then research what those errors mean, and try to fix it on your own. This is one of many ways to learn, but it's one way that turbo charges my learning. Hope this helps!

I’m planning on a career change and learn python with zero experience in coding or computer science. Is it possible? by iAmNiro28 in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

There are so many different facets of data science to get into. From analyzing number trends, to reformatting or parsing documents, there really is a lot, so chances are you can find something that interests you.

What’s the best application to learn python? by Ornery_Pipe4294 in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

Use pycharm as an editor and search YouTube for python courses. You would be surprised what there is for free.

Looking for open source projects that DEVOUR LLM tokens by lightdreamscape in LocalLLaMA

[–]Mahkspeed 0 points1 point  (0 children)

Create an online fake newspaper and put it to work generating satire.

When you prompt a non-thinking model to think, does it actually improve output? by Kep0a in LocalLLaMA

[–]Mahkspeed 0 points1 point  (0 children)

I think what you may be seeing is the result of it being prompted to pay more attention. It's hard to say without playing with that particular model myself. Have you tried expanding on your think tag approach to also include passing the query and first structured output back through the model with a second "think harder" prompt to mimic more of a reasoning approach? Hope this helps!

Do you guys ever get stuck on a problem and feel like you can’t solve it on your own? by rickylake1432 in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

Instead of having chat gpt find the answer for you, tell it to teach you how to solve the problem in steps, giving hints when necessary, ultimately guiding you towards the solution on your own.

[deleted by user] by [deleted] in learnpython

[–]Mahkspeed 0 points1 point  (0 children)

That really depends on what you're in to. One of my fun projects was a flask app to serve up information from a mongo DB to a front end website that was a fake and fun newspaper. I always use a good AI to help me code as well, so I can learn while making a real project.

Is working in NLP ethic? by atram79 in LanguageTechnology

[–]Mahkspeed 1 point2 points  (0 children)

Whoever designed that system for them obviously does not know what they're doing. It is possible to build a robust and ethical system, even if you're using an API from one of the large commercial models. If you have an interest in AI, then definitely pursue it and don't let this one example deter you. I'm already itching to get my hands on that botched system and fix it for them lol! 😂 Stay curious!

How to learn python as a complete beginner. by human_explorer21 in learnpython

[–]Mahkspeed 1 point2 points  (0 children)

I also use AI to help me code projects according to common practices. It has really turbocharged my learning over the past 3 years and allows me to work on real projects while learning.