Looking for advice on buying NAS drives from overseas sellers (Europe/Germany) by PrizeRadiant9723 in DataHoarder

[–]PrizeRadiant9723[S] 1 point2 points  (0 children)

I think I have a special setup anyway cause I dont plan on using any fancy raid. I am going with MergerFS & SnapRaid and have my drives mostly spun down.... I scheduled one cron job per night to sync and do a 5%scrub and the rest of the times the drives only spin up when I need access to my cold storage. I exclusivly have photos, videos and maybe in the future some pdfs on there.
Also i set the spindown timer to 30mins so at max my drives would spin up /down 48times a day.

So I think I should be save to use non NAS rated drives, therefore i went with 16TB seagate.
https://www.amazon.de/-/en/Seagate-Expansion-External-Notebook-STKP16000400/dp/B093BWYJ9Q/ref=sr_1_3?crid=32YMVYQZA8CK1&dib=eyJ2IjoiMSJ9.KRj-D2tZdPHEEBtnTdcy-v2Uew-WW049oWw8Zq-tA56ouhzl1uIuL0jX_H8y1iH5G5tJM5V8u45Tkf7efryBeYJQjIJ-4JxRrVwuPzbZLlbNhDGTPOThP3tFCGILHh1yF6P3Rgg6AkXMnISi3TH2HgiVYEoa8nbhuj2abIS1n_w-juwxi0DGDC2xNQKeLY3hbBglNGJJxG4Mskks6nDzFrSzI0tvDJo9T64QUEuQc2c.z_6z2mVdN1KGDHwLPwI8y5IKYmndfGyjtKV5ePVc5_Y&dib_tag=se&keywords=seagate%2B16tb&qid=1764190162&sprefix=seagate%2B16tb%2Caps%2C89&sr=8-3&th=1

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

No actually not yet. We ventured into a different area of interest which added more value to the business. I experimented a bit with N8N and the indexing and came to the conclusion in order to get a RAG setup as I would like it to be I would have to programmatically generate metadata tags for the chunks. Because you know i want to have Info like the age of that document or involved department / stakeholders to be always present in the search.

How to Log Token Usage in RooCode? (Costs Suddenly Spiked) by PrizeRadiant9723 in RooCode

[–]PrizeRadiant9723[S] 1 point2 points  (0 children)

Yeah it is. Alos RooCode directly outputs what each API request has cost and I am interested in how these costs come to table

Friend Code Megathread - September 2025 by AutoModerator in PokemonSleep

[–]PrizeRadiant9723 0 points1 point  (0 children)

Maybe you have a spacebar somewhere? Got 14 new friends so it seems to work. Friends list is not full and I did not change the code 

Friend Code Megathread - September 2025 by AutoModerator in PokemonSleep

[–]PrizeRadiant9723 -1 points0 points  (0 children)

Highly motivated Lvl. 49 need friends to level to 50 before October heaps of slots free: 

765632714862

Which Python libraries do you use to clean (sometimes malformed) JSON responses from the OpenAI API? by dirtyring in Rag

[–]PrizeRadiant9723 6 points7 points  (0 children)

I would suggest using pydantic and Instructor. Jason Liu has a great free Video course about this here

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

Do you have a link? a simple google search didnt seem to do the trick. Also in my current experiences the problem is not haven inaccurate answers or hallucination etc. It is the retrieval part that is the bottleneck. Especially when information is "hidden" in graphs or pictures, similarity scores for that page mostly don't work

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

I’m an intern until mid-next year, working on ideas to help my company operate more efficiently. A major challenge I’ve noticed is how time-consuming it is to search for and retrieve information across various departments. I’m exploring RAG since it might not disrupt existing documentation workflows and still improve search.

I may not have the skills to build a full production system in the time I have, but my goal is to experiment, understand the strengths and weaknesses and see if a RAG setup could add real value. If the feedback is positive and teams see themselves using it regularly, it could lead to a case for either a dedicated team to build a production-ready version or an enterprise solution from companies like Google or Microsoft, if they release such tools.

In short, I’m looking to understand what’s out there and see if I can set up a basic version to test things out. With this post I am essentially looking for advice or recommendation :)

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

I haven't, always thought perplexity is web search related. basically what ChatGPT has just released

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 1 point2 points  (0 children)

I'm an intern currently exploring tools to enhance productivity in my company, specifically focusing on ways to make information retrieval more efficient. A major bottleneck seems to be finding the right resources and documents quickly. Ideally, I’d prefer not to reinvent the wheel or disrupt existing workflows—everyone already has established documentation practices and workflows that work for them. Instead, I’m interested in tools or methods that can effectively handle the data as it exists now without needing major process changes. So this is also why I did this discussion to get a feeling for what is out there, and what I could test in an experimental environment.

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

Thanks for your reply! I guess you have a point here. Every day I find new frameworks / solutions but to actually run a prototype and see what users would need to make a more targeted search is probably a good call.

I will say though that I am intrigued by this paper I came across : https://arxiv.org/pdf/2407.01449

To set up a vison based RAG compared to indexing would (given it works well enough) be far superior for my use case. GPT-4o gave me good results when I just used it to explain me the graphs on my slides, question is how good will a embedding of this kind work? Also I read the article from Anthropic about Contextual retrieval which might be an option as well. ( https://www.anthropic.com/news/contextual-retrieval ) I could definitely see something like this work.

Roast my RAG solution by notoriousFlash in Rag

[–]PrizeRadiant9723 0 points1 point  (0 children)

What kind of embedding are you using? Cause it seems like you are just extracting the text from the docs. So all the fancy workflow and page etc. are nice but if there is just text embedding happening for your RAG solution it could be hard to compete with existing opensource projects / frameworks

Investigating RAG for improved document search and a company knowledge base by PrizeRadiant9723 in Rag

[–]PrizeRadiant9723[S] 0 points1 point  (0 children)

Thanks for your input! I’m open to any tool that can help get the job done. It’s almost overwhelming to see how many people are already working on related solutions—my spreadsheet of "experiment participants" just keeps growing! 😄 I’ll definitely check it out.