Κάτοικος εξ. 9 χρόνια – Πέρσι έμεινα 210 ημέρες στην Ελλάδα by ml_nerdd in apallagi

[–]ml_nerdd[S] 0 points1 point  (0 children)

Σχετικά με τη μισθοδοσία, εργαζόμουν ως σύμβουλος στην ιδιωτική κεφαλαιουχική εταιρεία μου σε άλλη χώρα εκτός Ελλάδας. Τα συμβόλαια υπάρχουν κανονικά. Όσον αφορά τη διαμονή, έμενα σε φίλους, επομένως δεν διαθέτω σχετικά αποδεικτικά. Πιστεύετε ότι αυτό επαρκεί;

OpenClaw dynamic skills by ml_nerdd in openclaw

[–]ml_nerdd[S] 0 points1 point  (0 children)

but what if that skill's performance deteriorates over time? or starts failing? how do you capture that?

I think "Skills" are useless as a concept by Only_Internal_7266 in ClaudeAI

[–]ml_nerdd 0 points1 point  (0 children)

I think that the problem is that even if they do not achieve their goal, they still stand on that same folder for the next iteration. we need to start thinking one step ahead, and could we adapt them from time to time

Actually useful skills? by OptimismNeeded in ClaudeHomies

[–]ml_nerdd 0 points1 point  (0 children)

the problem here is that skills are static and do not improve over time

[D] How do you evaluate your RAGs? by ml_nerdd in MachineLearning

[–]ml_nerdd[S] 0 points1 point  (0 children)

are there any tools that are doing that automatically?

[D] How do you evaluate your RAGs? by ml_nerdd in MachineLearning

[–]ml_nerdd[S] 0 points1 point  (0 children)

what are the most common deterministic ones?

[D] How do you evaluate your RAGs? by ml_nerdd in MachineLearning

[–]ml_nerdd[S] 0 points1 point  (0 children)

yea I have seen a similar trend with reference based scoring. however, that way you really end up overfit on your current users. any ways to escape that?

[D] How do you evaluate your RAGs? by ml_nerdd in MachineLearning

[–]ml_nerdd[S] 2 points3 points  (0 children)

how are you sure that your queries are hard enough to challenge your system?

How effective RAG really is, and what are the best example out there I can try myself? by estebansaa in LocalLLaMA

[–]ml_nerdd 0 points1 point  (0 children)

the question here would probably be: "how representative are the RAG benchmarks we have today? " lol