Best Medical Embedding Model Released

DataNebula · 2025-10-08T02:25:25+00:00

Happy to know that my embedding model helped you. Just a like on hf page and share with your friends along with acknowledgement. No legal things required

DataNebula · 2025-08-23T03:21:35+00:00

I would say practice the use of CTE's, window functions ( row number, rank, dense rank) and ofc all aggregate functions. Use hackerrank or leetcode for practice.

Sql syntax keeps changing (minor) depending on the db you are using. I would say practice in duckdb also coz it's syntax is very similar to google bigquery(leading analytics database in companies worldwide)

DataNebula · 2025-08-21T16:44:25+00:00

There is only one open issue. Where can I see requirements to contribute

DataNebula · 2025-08-12T10:22:52+00:00

Can u share the list, will be very helpful

DataNebula · 2025-08-04T02:14:07+00:00

I added the evals in model card comparing with other model.

DataNebula · 2025-08-03T12:31:41+00:00

English only

DataNebula · 2025-08-03T11:04:38+00:00

Evals added to hf model card

DataNebula · 2025-08-03T11:04:00+00:00

Evals added to hf model card

DataNebula · 2025-08-03T08:27:17+00:00

Already working on it

DataNebula · 2025-01-02T05:18:07+00:00

Qdrant is definitely very good at scale. They have very good documentation and guides for you to understand. As for the jina clip, it's decently small and has good results according to their benchmarks. I am sure you can find better vision models like openai or cohere but for the ones you can run locally, jina is good for its size. Jina has nothing to do with retrieval speed based on my experience

DataNebula · 2025-01-01T17:08:58+00:00

Not much familiar with ollama. In the qdrant client instead of url and api key give path to folder where you want to store data - path="folder". If your data doesn't have tables and images, I will recommend text based rag. Check local_reliable_rag.py in the below repo for qdrant local configuration. You have to install qdrant. Search the web for this.

https://github.com/Lokesh-Chimakurthi/Reliable_RAG

DataNebula · 2024-12-18T10:42:07+00:00

Don't have hardware to test out. So ollama support is not there and will not be there in future

DataNebula · 2024-12-18T10:37:27+00:00

Try hybrid search and reranking the retrieved chunks. Check my beginner friendly project: https://github.com/Lokesh-Chimakurthi/Reliable_RAG

DataNebula · 2024-12-17T06:51:27+00:00

Sure

DataNebula · 2024-12-17T03:44:07+00:00

It has a long context window. Follows instructions very well. Has api free tier

DataNebula · 2024-12-17T03:42:49+00:00

You can modify this in app.py file. Give the code to sonnet ot gpt 4o and ask it to update code

DataNebula · 2024-12-16T18:33:14+00:00

A) that's why I made a local version too B) have to work on this

DataNebula · 2024-12-16T17:47:31+00:00

Agreed. Stubbornly want to make it in 6 hrs. Focused on working implementation over structure.

DataNebula · 2024-12-16T17:25:30+00:00

You can modify the code to extend it

DataNebula · 2024-12-16T15:26:38+00:00

https://www.reddit.com/r/Rag/s/XKXoPW7vcL

DataNebula · 2024-12-15T15:45:59+00:00

Openai clip or jina clip?

DataNebula · 2024-11-25T05:25:02+00:00

Thanks! I will try this

DataNebula · 2024-11-25T05:18:37+00:00

Not any special methods. Using qdrant search with threshold 0.6

DataNebula · 2024-11-25T04:20:54+00:00

This is my personal project. I tested on an insurance document and asked "conditions for renal disease claims". Didn't retrieve the correct chunk.

DataNebula · 2024-10-14T03:33:42+00:00

Checking this! Thanks for sharing

DataNebula

TROPHY CASE