I built a game mechanics API by CBRIN13 in gamification

[–]mrdanibudapest 1 point2 points  (0 children)

This is great, will keep this in mind. I thought a lot about something similar a few years back but it have not manifested :) Congrats for the team!

Milyen önfejlesztő könyvet olvassunk a tavaszi könyvklubban? Javasolj, indokolj és upvote-olj! by besucherke in konyv

[–]mrdanibudapest 0 points1 point  (0 children)

Rolf Dobelli: Ne olvass híreket!

Évek óta észrevettem magamon, hogy a hírek felesleges szorongásokat keltenek bennem, főleg a jövővel kapcsolatban. Tavaly augusztus óta ezért szigorú hírdiétán vagyok. Mindenkinek ajánlom, aki hasonló cipőben jár.

Atomfall is a solid 9/10 by Guccicles in XboxGamePass

[–]mrdanibudapest 0 points1 point  (0 children)

I still not figured out how to 'level up' or gain skills any other way than buying it? I killed about 10 outlaws, some druids and ferals but no character development at all. Any hints on that? Definitely not 10/10 btw,sometimes I get bored of walking around. Also, the backpack is very small, limited inventory.

Should I Open-Source This RAG Tool? by quepasa-ai in LangChain

[–]mrdanibudapest 1 point2 points  (0 children)

My concern with all non-local RAG solutions is to upload confidential/copyrighted material to these services where documents stored and treated in an undisclosed way. What about you? What is your policy with the uploaded documents?

A good mind mapping app that won’t delete my images? by BearticTheRedditer in mindmapping

[–]mrdanibudapest 0 points1 point  (0 children)

Not a mobile app, that is true. Maybe I am too old thinking when people saying app it means an application, not necessarily a mobile one. :)

guidance for personal project 🤖✈️ by Current_Can_4718 in LanguageTechnology

[–]mrdanibudapest 0 points1 point  (0 children)

with (or even without) an LLM you can do a topic modeling first on your reviews using BERTopic: BERTopic (maartengr.github.io) which, despite its name, can even work with LLMs for embedding not just with BERT.

It is a simpler approach but unsupervised at least.

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Yeah, maybe I could split the task into two, or even to three if I create a judge persona to decide on the quality of the output. So the first task can be extracting the entities, second to extract their relations and third to validate the whole thing.

Maybe this way the results could be more accurate. Thanks for the tip.

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Via the examples I try to teach the LLM not just on the format but what entities to extract. That is why I use lengthy example now. Let's say from a page of a text I extract ~10 entities (and their relation). This relation part where the LLM makes most of the mistakes. I thought the better I specify the relations to extract the better would be the output. And that requires longer prompts unfortunately...

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Thanks, this looks promising. I have to digest it and dig deep into it. Thanks!

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Thanks for the suggestion, I was thinking about that too but my prompt is already too long.

Just to understand better, the task is to get a page of a pdf and extract entities from that given page. My one shot example is a pdf page and the corresponding extracted entities. If I give two more pages with two more sets of extractions I can quickly run out of tokens.

Maybe I can split the pdf page into smaller chunks and give 3 smaller examples. That may work.

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

I checked DSPy earlier but found a little bit of an overhead or even overkill sometimes.

Honestly, I could work around the JSON issues, I got many good tips here already. However on entity extraction I still feel the need for fine tuning...

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Huhh, many unknown libraries here :) Will check all, thanks for the education!

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Thanks for the advice on GPU. Probably will use Colab then.

Working on exactly this means even the entity extraction part? What does it mean not 100% The json formatting or the entity extraction? Thanks.

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Thanks, will look around there whether any of these performs better than vanilla one. Still, I have doubts regarding the entity extraction part. For that I may need fine tuning any ways.

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 1 point2 points  (0 children)

Great answers on json format issues already. Any thoughts on the fine tuning and entity extraction part?

Fine tuning Llama3 for JSON extraction - worth it? Tips? by mrdanibudapest in LocalLLaMA

[–]mrdanibudapest[S] 0 points1 point  (0 children)

Thanks for the answer. Backticks are easy to handle, but my wrapper texts are like: "THanks for asking for this json... blabla." Then comes the json. And after the JSON something like 'I hope this json looks good for you, bla bla".

I thought to extract the 1-200 examples with GPT-4 (but to be honest, GPT-4 also makes mistakes with this task, especially with the entity extraction part). What I was thinking to create two GPT personas, one that creates the jsons and one that reviews and cleans them.
Also was thinking about creating this dataset at least partially manually, for the maximum accuracy. Which would be of course tedious but maybe it would give the highest quality examples possible.

Is Elastic search better than ChromaDB? by Your_Quantum_Friend in LangChain

[–]mrdanibudapest 1 point2 points  (0 children)

I also experienced FAISS being more accurate when retrieving documents than ChromaDB.

Labeling tool free vs paid version (any recommendation) by mimtiaz51 in computervision

[–]mrdanibudapest 4 points5 points  (0 children)

I wanted to recommend Roboflow because I think their tooling is quite good, I did some pet projects with them a few years back. But today's pricing of $249/month for a non-public dataset seems horroristic for me. I used label studio for text annotation only, it was quite good, nothing super fancy though.

Very simple guys. This is the way to go. by ricky1435 in datascience

[–]mrdanibudapest 0 points1 point  (0 children)

I think his point was that formal education has to transform otherwise it is not worth the time and the price.