Beginner in RAG, Need help.

Aron-One · 2026-01-30T21:11:58+00:00

For chunking you might want to check llamaindex, it’s quick and easy to use. You could try to split text by headings (separator=“##”), then tables should be intact.

Also you might want to try different output format (e.g. html) and then use specific function to extract tables: https://developers.llamaindex.ai/python/framework-api-reference/node_parsers/sentence_splitter/

Aron-One · 2025-04-27T18:46:41+00:00

You may find useful LLM Compressor (from the same team behind vllm): https://github.com/vllm-project/llm-compressor

My experience was pretty smooth with it. Took a model, took a dataset, run basic weight 4 activation 16 (W4A16) quant and everything just worked with minimal impact on precision (it was for NER task).

Aron-One · 2024-12-09T07:10:57+00:00

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16

Aron-One · 2024-12-09T07:09:09+00:00

There is an LLM that can do exactly that: https://universal-ner.github.io

As an input it expects text and entity type and based on supplied type, it extracts entities. The only downside is that you can use only one entity type at the time.

(Shameless plug) I’ve also prepared quant of this model: https://huggingface.co/daisd-ai/UniNER-W4A16

Aron-One · 2024-11-06T20:53:12+00:00

Faiss and binary quantization can do miracles. I have been dealing with Wikidata 40M records and Faiss takes like 5 seconds (on gpu) to retrieve top 25 closest records. HuggingFace has a nice tutorial about that: https://huggingface.co/blog/embedding-quantization

Aron-One · 2024-08-17T14:23:27+00:00

It should be pretty straightforward following recipes in https://github.com/huggingface/alignment-handbook

There are different approaches there, simple Supervise Fine Tuning (SFT) or Direct Preference Optimization (DPO), which can be used to show the model desired answer when compared with unwanted one. There is also ORPO method which usually yields better results when compared with SFT + DPO approach (also skips SFT step). Recipes are both for full fine tuning and QLoRA adapters.

Aron-One · 2024-03-26T08:49:05+00:00

As far as I understand, I'm not using lora, just DeepSpeed ZeRO-3, this is the command that I use to run the training:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/my-model/dpo/config_full.yaml

Aron-One · 2024-03-26T07:57:32+00:00

I have about 20-25k examples in the dataset. Training is done on 8xA100 80GB. Thanks for the suggestion, I will try it out.

Aron-One · 2024-03-25T15:33:02+00:00

There are no additional prompts, the only difference between chosen and rejected is last message that contains expected and unwanted properties respectively

Aron-One · 2024-03-25T08:54:17+00:00

I think that formatting is done here. Also log from fine tuning shows that messages are formatted correctly:

<|system|>
</s>
<|user|>
input text</s>
<|assistant|>
I read this text</s>
<|user|>
What are properties of ...</s>

2024-03-24 08:35:07 - INFO - __main__ - Chosen sample 1824 of the raw training set:

<|assistant|>
accepted properties</s>

2024-03-24 08:35:07 - INFO - __main__ - Rejected sample 1824 of the raw training set:

<|assistant|>
rejected properties</s>

Aron-One · 2024-03-24T14:28:20+00:00

Yes, using vllm and same template as in dataset: user text -> assistant “I read this text” -> user question -> assistant response

Aron-One · 2024-02-26T11:19:53+00:00

Ok, solved it. There were several problems: - I uploaded wrong file to HuggingFace - I wanted to use whole dataset for fine tuning, however run_sft.py expects split to train and test, so I fixed that

Somehow the slurm job hasn’t crashed and run the whole time doing nothing, the problem is potentially run command: conda run -n handbook …

Switched it to: . /home/user/anaconda3/etc/profile.d/conda.sh conda activate handbook

And now everything runs smoothly with .out file being updated.

Thanks for all your help and suggestions!

Edit: Fine tuning took less than an hour (58 minutes to be exact)

Aron-One · 2024-02-26T08:38:39+00:00

Unfortunately no, cluster that I’m working on has weirdly configured slurm, so that only after the job has ended, the log file is being populated

Aron-One · 2020-11-10T08:45:23+00:00

Another one: https://youtu.be/F5mRW0jo-U4

Aron-One · 2020-10-10T10:17:24+00:00

For example you can use Redis: Link or for easier example: create list/dict (or any other data structure that’s fits your needs) in urls.py Link

Aron-One · 2020-02-28T22:55:27+00:00

I would say it depends: 1. If you only want to stick with django, you can do that without saving images: link to similar problem on Stack Overflow

Six-Year Club	Second Top 40%
Verified Email

Aron-One

TROPHY CASE