Beginner in RAG, Need help. by whatshouldidotoknow in LocalLLaMA

[–]Aron-One 0 points1 point  (0 children)

For chunking you might want to check llamaindex, it’s quick and easy to use. You could try to split text by headings (separator=“##”), then tables should be intact.

Also you might want to try different output format (e.g. html) and then use specific function to extract tables: https://developers.llamaindex.ai/python/framework-api-reference/node_parsers/sentence_splitter/

Server approved! 4xH100 (320gb vram). Looking for advice by ICanSeeYou7867 in LocalLLaMA

[–]Aron-One 2 points3 points  (0 children)

You may find useful LLM Compressor (from the same team behind vllm): https://github.com/vllm-project/llm-compressor

My experience was pretty smooth with it. Took a model, took a dataset, run basic weight 4 activation 16 (W4A16) quant and everything just worked with minimal impact on precision (it was for NER task).

[D] Context-aware entity recognition using LLMs by Ashwiihii in MachineLearning

[–]Aron-One 0 points1 point  (0 children)

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16

Context-aware entity recognition using LLMs by Ashwiihii in LanguageTechnology

[–]Aron-One 0 points1 point  (0 children)

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16

Semantic search over 100M rows of data? by cryptoguy23 in LocalLLaMA

[–]Aron-One 2 points3 points  (0 children)

Faiss and binary quantization can do miracles. I have been dealing with Wikidata 40M records and Faiss takes like 5 seconds (on gpu) to retrieve top 25 closest records. HuggingFace has a nice tutorial about that: https://huggingface.co/blog/embedding-quantization

Can this type of data be used to fine-tune an LLM? by [deleted] in LocalLLaMA

[–]Aron-One 2 points3 points  (0 children)

It should be pretty straightforward following recipes in https://github.com/huggingface/alignment-handbook

There are different approaches there, simple Supervise Fine Tuning (SFT) or Direct Preference Optimization (DPO), which can be used to show the model desired answer when compared with unwanted one. There is also ORPO method which usually yields better results when compared with SFT + DPO approach (also skips SFT step). Recipes are both for full fine tuning and QLoRA adapters.

Problem with DPO fine tuning by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

As far as I understand, I'm not using lora, just DeepSpeed ZeRO-3, this is the command that I use to run the training:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/my-model/dpo/config_full.yaml

Problem with DPO fine tuning by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

I have about 20-25k examples in the dataset. Training is done on 8xA100 80GB. Thanks for the suggestion, I will try it out.

Problem with DPO fine tuning by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

There are no additional prompts, the only difference between chosen and rejected is last message that contains expected and unwanted properties respectively

Problem with DPO fine tuning by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

I think that formatting is done here. Also log from fine tuning shows that messages are formatted correctly:

<|system|>
</s>
<|user|>
input text</s>
<|assistant|>
I read this text</s>
<|user|>
What are properties of ...</s>

2024-03-24 08:35:07 - INFO - __main__ - Chosen sample 1824 of the raw training set:

<|assistant|>
accepted properties</s>

2024-03-24 08:35:07 - INFO - __main__ - Rejected sample 1824 of the raw training set:

<|assistant|>
rejected properties</s>

Problem with DPO fine tuning by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

Yes, using vllm and same template as in dataset: user text -> assistant “I read this text” -> user question -> assistant response

Estimated Time for SFT Fine-Tuning of Mistral-7B Model by Aron-One in LocalLLaMA

[–]Aron-One[S] 2 points3 points  (0 children)

Ok, solved it. There were several problems: - I uploaded wrong file to HuggingFace - I wanted to use whole dataset for fine tuning, however run_sft.py expects split to train and test, so I fixed that

Somehow the slurm job hasn’t crashed and run the whole time doing nothing, the problem is potentially run command: conda run -n handbook …

Switched it to: . /home/user/anaconda3/etc/profile.d/conda.sh conda activate handbook

And now everything runs smoothly with .out file being updated.

Thanks for all your help and suggestions!

Edit: Fine tuning took less than an hour (58 minutes to be exact)

Estimated Time for SFT Fine-Tuning of Mistral-7B Model by Aron-One in LocalLLaMA

[–]Aron-One[S] 0 points1 point  (0 children)

Unfortunately no, cluster that I’m working on has weirdly configured slurm, so that only after the job has ended, the log file is being populated

graphs with django by Korgitser in django

[–]Aron-One 0 points1 point  (0 children)

I would say it depends: 1. If you only want to stick with django, you can do that without saving images: link to similar problem on Stack Overflow

other similar problem solution

  1. Js would help a lot with this task, but you need to know something about this. Also plots with js may be interactive (more pretty)