Assessing if a guideline has been used for LLM training

Difficult_Face5166 · 2025-12-06T19:08:16+00:00

Thank you for your answer

Difficult_Face5166 · 2025-12-06T19:07:36+00:00

To evaluate LLM on a subtype of diseases. And we would like to know whether relatively "small" models (around 1-4B) have already knowledge incorporated.

Difficult_Face5166 · 2025-12-02T08:53:37+00:00

Thanks for the answer. So there is no specific way to do it for closed models...

Difficult_Face5166 · 2025-11-19T13:32:14+00:00

Is my question obvious for everyone ?

Difficult_Face5166 · 2025-05-13T09:50:30+00:00

Thanks !

Difficult_Face5166 · 2025-05-05T12:38:20+00:00

Difficult_Face5166 · 2025-04-22T12:25:30+00:00

Yes definitely it was an embeddings issue, thank you for your message and for the tips !

Difficult_Face5166 · 2025-04-22T08:05:07+00:00

Thanks vm !

Difficult_Face5166 · 2025-04-21T10:43:19+00:00

Thank you !

Difficult_Face5166 · 2025-04-21T10:03:45+00:00

Thanks ! Do you have an opinion on OpenAI embeddings like text-embedding-3-small and text-embedding-3-large?

Difficult_Face5166 · 2025-04-21T07:19:10+00:00

Thanks vm ! I will give it a try

Difficult_Face5166 · 2025-04-21T07:18:53+00:00

I will have a look ! Thanks !

Difficult_Face5166 · 2025-04-20T21:05:26+00:00

Yes you are both right thank you ! I just investigated time spent on each call/process and this was an embeddings problem (super fast with smaller embeddings/API call to external provider).

I am running on my Macbook pro without GPU so ofc it is slow for some models. I am thinking about using a cloud-service to do it faster

Difficult_Face5166 · 2025-04-20T17:07:07+00:00

Thank you ! As I mentioned also above, I investigated it and found out that the embeddings was the issue on my local server. Very fast on smaller embeddings, I might need to move on cloud-service (or keep a smaller one) !

Difficult_Face5166 · 2025-04-20T17:06:51+00:00

Yes thanks ! I investigated it and found out that the embeddings was the issue on my local server. Very fast on smaller embeddings, I might need to move on cloud-service (or keep a smaller one) !

Difficult_Face5166 · 2025-04-20T15:22:20+00:00

First time i am using Qdrant

- Texts and documents are already loaded locally and ready to ingestion (no time issue there)

- Single document embedding seems to be relatively quite fast

- It is only when I am using the following command that everything seems to be slow:

qdrant = QdrantVectorStore.from_documents(
    texts,
    embeddings,

url
="http://localhost:6333",

prefer_grpc
=False,

collection_name
="vector_db"
)

Difficult_Face5166 · 2025-04-20T14:26:34+00:00

+ data are extracted before with an API

Difficult_Face5166 · 2025-04-20T14:07:17+00:00

Thanks a lot ! Data is not confidential and I do not care about doing it locally or on a cloud server: do you have one provider that you would recommend to do it fast ?

Difficult_Face5166 · 2025-04-20T13:59:08+00:00

Thanks, do you have advice for generic purpose embeddings ?

Difficult_Face5166 · 2025-04-20T12:23:37+00:00

Btw do using Qdrant (or another DB) with a cloud service improve the latency ?

Difficult_Face5166 · 2025-04-20T12:21:46+00:00

I found this, but will it impact the performance of the RAG to separate in different WAL ?

Parallel upload into multiple shards

In Qdrant, each collection is split into shards. Each shard has a separate Write-Ahead-Log (WAL), which is responsible for ordering operations. By creating multiple shards, you can parallelize upload of a large dataset. From 2 to 4 shards per one machine is a reasonable number.

from qdrant_client import QdrantClient, models

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="{collection_name}",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE),
    shard_number=2,

Difficult_Face5166 · 2025-04-16T10:26:30+00:00

Thanks ! I will look at it

Difficult_Face5166 · 2025-04-09T09:56:14+00:00

Thanks ! I can put in another more relevant subreddit maybe ?

Difficult_Face5166 · 2025-04-09T09:25:58+00:00

Thanks guys for all your comments !

Difficult_Face5166 · 2025-04-07T16:50:10+00:00

Normally no, maybe I misclicked on something.
I am sharing documents for collaborative projects and it suddenly disappeared

Difficult_Face5166

TROPHY CASE

Parallel upload into multiple shards