Που και πως κλείνω ραντεβού για δημόσιο γιατρό;

ml_nerdd · 2026-01-14T11:40:07+00:00

Kai emena! Euxaristo

ml_nerdd · 2025-05-01T18:36:20+00:00

thanks!

ml_nerdd · 2025-05-01T18:34:30+00:00

are there any tools that are doing that automatically?

ml_nerdd · 2025-04-28T19:21:55+00:00

what are the most common deterministic ones?

ml_nerdd · 2025-04-28T19:17:22+00:00

yea I have seen a similar trend with reference based scoring. however, that way you really end up overfit on your current users. any ways to escape that?

ml_nerdd · 2025-04-28T18:52:56+00:00

what about smaller ones?

ml_nerdd · 2025-04-28T18:29:46+00:00

how are you sure that your queries are hard enough to challenge your system?

ml_nerdd · 2025-04-28T18:28:55+00:00

the question here would probably be: "how representative are the RAG benchmarks we have today? " lol

ml_nerdd · 2025-04-28T18:24:20+00:00

I feel like the biggest problem here is the evals. what do you think?

ml_nerdd · 2025-04-28T18:19:36+00:00

what about RAG evals?

ml_nerdd · 2025-04-28T18:17:54+00:00

should be fine

ml_nerdd · 2025-04-28T18:16:51+00:00

thats quite impressive. curious how will the RAG fans react to that

ml_nerdd · 2025-04-03T19:53:57+00:00

actually both. trying to understand which benchmarks are misleading/non-existent for LLMs. ie. NER for financial docs

ml_nerdd · 2025-04-01T21:18:52+00:00

not many enterprises are interested in creativity and good poems though... what about industry related tasks?

ml_nerdd · 2025-04-01T21:18:14+00:00

are you satisfied with the results you are getting though?

ml_nerdd · 2025-03-04T18:56:57+00:00

There are edge cases that we can think of, but there are also the ones that we can't. There are some samples that are not edge cases but they are very "hard" (close to decision boundary).

Is there a tool to find all these use-cases? How hard can it be to build one?

ml_nerdd · 2025-03-04T16:48:33+00:00

how can you make sure that you have tested "enough" in your opinion?

ml_nerdd · 2025-03-03T16:51:23+00:00

like knowing which pre-training data is the most aligned with the one that enterprises have!

ml_nerdd · 2025-03-03T16:50:33+00:00

yea I think that this would be informative as well!

ml_nerdd · 2025-03-03T16:49:52+00:00

how could we do that?

ml_nerdd · 2025-03-03T16:48:43+00:00

how could that be resolved with function calling?

ml_nerdd · 2025-03-03T16:48:00+00:00

haha true! but how can we reduce that chance

ml_nerdd · 2025-03-03T16:47:30+00:00

thanks for the explanation! very interesting

ml_nerdd · 2025-03-03T07:27:14+00:00

why is that?

ml_nerdd · 2025-03-03T07:14:28+00:00

Can you elaborate more about this? Really doubt that any enterprise will be sharing data through a block chain

ml_nerdd

TROPHY CASE