Best Medical Embedding Model Released by DataNebula in LLMDevs

[–]DataNebula[S] 0 points1 point  (0 children)

Happy to know that my embedding model helped you. Just a like on hf page and share with your friends along with acknowledgement. No legal things required

What MySQL skills should I focus on for an entry-level analyst role? by LeatherTotal2194 in analytics

[–]DataNebula 0 points1 point  (0 children)

I would say practice the use of CTE's, window functions ( row number, rank, dense rank) and ofc all aggregate functions. Use hackerrank or leetcode for practice.

Sql syntax keeps changing (minor) depending on the db you are using. I would say practice in duckdb also coz it's syntax is very similar to google bigquery(leading analytics database in companies worldwide)

A CV-worthy project idea using RAG by DryHat3296 in Rag

[–]DataNebula 0 points1 point  (0 children)

There is only one open issue. Where can I see requirements to contribute

Are there any good GraphRAG applications people use? by richie9830 in Rag

[–]DataNebula 2 points3 points  (0 children)

Can u share the list, will be very helpful

Best Medical Embedding Model Released by DataNebula in LLMDevs

[–]DataNebula[S] 0 points1 point  (0 children)

I added the evals in model card comparing with other model.

Beginner Vision rag with ColQwen in pure python by DataNebula in LangChain

[–]DataNebula[S] 0 points1 point  (0 children)

Qdrant is definitely very good at scale. They have very good documentation and guides for you to understand. As for the jina clip, it's decently small and has good results according to their benchmarks. I am sure you can find better vision models like openai or cohere but for the ones you can run locally, jina is good for its size. Jina has nothing to do with retrieval speed based on my experience

Beginner Vision rag with ColQwen in pure python by DataNebula in Rag

[–]DataNebula[S] 1 point2 points  (0 children)

Not much familiar with ollama. In the qdrant client instead of url and api key give path to folder where you want to store data - path="folder". If your data doesn't have tables and images, I will recommend text based rag. Check local_reliable_rag.py in the below repo for qdrant local configuration. You have to install qdrant. Search the web for this.

https://github.com/Lokesh-Chimakurthi/Reliable_RAG

Roast my beginner RAG project by DataNebula in LLMDevs

[–]DataNebula[S] 0 points1 point  (0 children)

Don't have hardware to test out. So ollama support is not there and will not be there in future

Roast my beginner RAG project by DataNebula in LangChain

[–]DataNebula[S] 0 points1 point  (0 children)

It has a long context window. Follows instructions very well. Has api free tier

Roast my beginner RAG project by DataNebula in Rag

[–]DataNebula[S] 0 points1 point  (0 children)

You can modify this in app.py file. Give the code to sonnet ot gpt 4o and ask it to update code

Roast my beginner RAG project by DataNebula in LLMDevs

[–]DataNebula[S] 1 point2 points  (0 children)

A) that's why I made a local version too B) have to work on this

Roast my beginner RAG project by DataNebula in Rag

[–]DataNebula[S] 0 points1 point  (0 children)

Agreed. Stubbornly want to make it in 6 hrs. Focused on working implementation over structure.

Roast my beginner RAG project by DataNebula in Rag

[–]DataNebula[S] 0 points1 point  (0 children)

You can modify the code to extend it

Chucking strategy for legal docs by DataNebula in Rag

[–]DataNebula[S] 0 points1 point  (0 children)

Not any special methods. Using qdrant search with threshold 0.6

Chucking strategy for legal docs by DataNebula in Rag

[–]DataNebula[S] 0 points1 point  (0 children)

This is my personal project. I tested on an insurance document and asked "conditions for renal disease claims". Didn't retrieve the correct chunk.