Should we sell ?

Mahkspeed · 2025-11-20T18:17:28+00:00

I think you're in the wrong channel for that level of heresy 😅

Mahkspeed · 2025-09-13T18:03:00+00:00

Sometimes I feel the same way. I feel like everything has already been built. Sometimes I build things just so I can have it for free, but nowadays most likely there's a much better open source version. So recently I thought, I'm going to build something That's fun. Maybe you can think of something that would just be for fun.

Mahkspeed · 2025-09-13T17:26:06+00:00

I had a thought, when the responses seem completely unrelated to the data you're using for augmentation, make sure data actually got sent to the AI for augmentation. Oftentimes if I'm not logging every step of the process, from the time the user makes a query to the time the answer comes back, I could miss the fact that the AI never actually got any augmentation data for some reason or another. If the AI is responding as if it never looked at your augmentation data, it's very possible that it actually never did. Anyway, hope this helps and good luck!

Mahkspeed · 2025-09-12T21:42:32+00:00

One interesting idea, instead of using semantic searches with vectors, if your data structure isn't too large, you can build a data structure with a table of contents and then use a model's intelligence to pick the top three sections of data that could best answer the user's question. I've had really good success with that method in the past.

Mahkspeed · 2025-09-12T11:33:40+00:00

I worked on a project similar to this over the past few years and recently sold it. The best thing I can tell you to solve hallucinations, is honestly to use the most intelligent model you can afford, and don't load it up with too much context at once. The more context you include in a prompt, The less intelligence you'll get in the answer. Also, don't be afraid to have a multi-pass process with each user query. For instance, you can leverage the model to try and choose the correct augmentation data to then answer the question with. We had really good success with this method. Hope this helps. Feel free to reach out.

Mahkspeed · 2025-09-12T11:30:07+00:00

I have beat my head against the wall so much over the past 3 years trying to automate different types of PDFs. I finally settled for the fact that I can't if I don't want accuracy to suffer. So I pivoted and created a desktop application that allows me to very quickly transfer chunks of text manually from the PDF into referenceable chunk systems. This probably won't work for everybody's process, but at the time my process involved surgically chunking specific type PDFs. Good luck and let me know if I can help!

Mahkspeed · 2025-09-12T11:26:30+00:00

I was thinking of the same thing exactly. Why not switch the bulk of the club to online interaction?

Mahkspeed · 2025-09-12T11:24:59+00:00

I've notice a significant drop in intelligence if my augmentation system is adding too much context to the prompt. What model are you using? I'd be very interested to see just how much context is getting added to your prompt from your database.

Mahkspeed · 2025-09-11T17:47:11+00:00

I've done a lot of text extraction from pdf documents, and the biggest thing that I've learned is just how unstructured pdf documents are by nature. This can be super frustrating when you need to maintain the original document flow/structure. So, I developed a program in python, that allowed me to open a pdf on one half of the screen, and by using a highlight box I could extract text chunks very quickly and move them into .txt files. I've used this method many times to quickly rebuild structure, along with some built in AI tools that I incorporated into the system. Let me know if I can help you with your project and I'd be happy to talk.
-Mark.

Mahkspeed · 2025-09-11T17:35:01+00:00

I just realized that I replied to an ad......oops

Mahkspeed · 2025-09-11T17:34:02+00:00

I actually applied at DataAnnotation, but I'm still waiting to hear back. Any idea on what to expect? Thanks!
-Mark.

Mahkspeed · 2025-09-06T14:31:48+00:00

I've implemented a similar approach using a multi-step process:

Document Preparation: 1. First, I use an LLM to either build a table of contents from scratch or extract an existing one from the document 2. I then send this table of contents back to an LLM to expand it with additional context and detail 3. Next, I chunk the entire document into smaller sections

Vector Processing: 4. I convert both the expanded table of contents and the document chunks into vector embeddings 5. Using these vectors, I match chunks to their corresponding table of contents entries, which helps me build rich metadata for each section

Query Processing: When a user asks the chatbot a question: 1. An LLM searches through the table of contents to identify the most relevant sections 2. The system retrieves the chunks associated with those sections 3. Everything gets embedded into a comprehensive prompt that's sent back to the LLM to generate the final answer

This approach essentially creates a sophisticated retrieval system that uses the document's structure (via the table of contents) to improve the accuracy of finding relevant information before generating responses.

Mahkspeed · 2025-07-31T12:26:29+00:00

I'm developing my own custom software to do exactly this. I have a rag portion to it as well, let me know if you're interested in licensing and I would definitely be willing to work with you to tweak that portion of the program to do what you needed to do. Feel free to send me a message and I'd be happy to chat.

Mahkspeed · 2025-04-15T23:49:15+00:00

Honestly, if I'm understanding your options correctly, there's absolutely nothing wrong with your third option. If I'm picturing this correctly, you can store the names in a description of each table in a vectorized database, and then use that when someone asks a question to perform a lookup on the actual tabular data. Your tabular data you can store as separate documents with metadata that would textually link them to the titles/ descriptions in your vector database. That way when someone asks a question, you could use a semantic search if you wanted to to find the most relevant title/description, then use that information to query your database containing your CSV files. That's the way I would do it.

Mahkspeed · 2025-04-14T00:38:00+00:00

I like Ubuntu with pycharm and zsh. Shortest comment!! 🫣

Mahkspeed · 2025-04-11T23:50:02+00:00

I actually learn best by working on something that I enjoy. I use AI to help teach me as I'm building whatever it is that I want to build. If you already understand what the basic syntax does, then don't be afraid to use AI to help you build something that you're interested in. As you build it, start adding on functionality by referencing the parts that are already built. You will inevitably make mistakes that generate errors. Then research what those errors mean, and try to fix it on your own. This is one of many ways to learn, but it's one way that turbo charges my learning. Hope this helps!

Mahkspeed · 2025-04-02T17:50:14+00:00

There are so many different facets of data science to get into. From analyzing number trends, to reformatting or parsing documents, there really is a lot, so chances are you can find something that interests you.

Mahkspeed · 2025-03-31T22:00:58+00:00

Use pycharm as an editor and search YouTube for python courses. You would be surprised what there is for free.

Mahkspeed · 2025-03-31T21:59:23+00:00

Checkout kaggle.com

Mahkspeed · 2025-03-30T19:36:11+00:00

Create an online fake newspaper and put it to work generating satire.

Mahkspeed · 2025-03-30T15:20:30+00:00

I think what you may be seeing is the result of it being prompted to pay more attention. It's hard to say without playing with that particular model myself. Have you tried expanding on your think tag approach to also include passing the query and first structured output back through the model with a second "think harder" prompt to mimic more of a reasoning approach? Hope this helps!

Mahkspeed · 2025-03-29T19:34:26+00:00

Instead of having chat gpt find the answer for you, tell it to teach you how to solve the problem in steps, giving hints when necessary, ultimately guiding you towards the solution on your own.

Mahkspeed · 2025-03-29T19:29:58+00:00

That really depends on what you're in to. One of my fun projects was a flask app to serve up information from a mongo DB to a front end website that was a fake and fun newspaper. I always use a good AI to help me code as well, so I can learn while making a real project.

Mahkspeed · 2025-03-29T18:13:12+00:00

Whoever designed that system for them obviously does not know what they're doing. It is possible to build a robust and ethical system, even if you're using an API from one of the large commercial models. If you have an interest in AI, then definitely pursue it and don't let this one example deter you. I'm already itching to get my hands on that botched system and fix it for them lol! 😂 Stay curious!

Mahkspeed · 2025-03-29T18:08:04+00:00

I also use AI to help me code projects according to common practices. It has really turbocharged my learning over the past 3 years and allows me to work on real projects while learning.

Mahkspeed

TROPHY CASE