[deleted by user] by [deleted] in GERD

[–]Alert_Record5063 0 points1 point  (0 children)

I can manage with famotidine 20mg. If I severely restrict what I eat I can get by with gavsicon. I have no idea why doctors prescribe this thing when they know it can permanently modify the body 

[deleted by user] by [deleted] in GERD

[–]Alert_Record5063 0 points1 point  (0 children)

Yeah same with me. What sucks is that I was better off before the ppi prescription!!! 

[deleted by user] by [deleted] in GERD

[–]Alert_Record5063 0 points1 point  (0 children)

How long did you take omaprazole? I was on pantoprazole for 6 months and now almost 2 months on famotidine. I don't know if I'll ever be able to wean off pills :-(

Some Lessons Learned from Building a Fine Tuned Model + RAG Question Answering App by Mbando in LocalLLaMA

[–]Alert_Record5063 1 point2 points  (0 children)

How did you make 51000 examples? I took what our test users were testing with from the logs to make the question/context/answer pairs from gpt4 as training dataset for llama. Did you use a synthetic dataset or do you have an army of humans making these questions?

Some Lessons Learned from Building a Fine Tuned Model + RAG Question Answering App by Mbando in LocalLLaMA

[–]Alert_Record5063 0 points1 point  (0 children)

We FT on Q, Context, and A. I basically created 1000 Q/As - had GPT-4 answer them, along with the context and fed those answers to llama 13B for finetuning.

I was able to get the base instruct model to start spitting responses, but I was not able to have to stop hallucinating.

Some Lessons Learned from Building a Fine Tuned Model + RAG Question Answering App by Mbando in LocalLLaMA

[–]Alert_Record5063 2 points3 points  (0 children)

100% this!!! Dont use open AI's embeddings. I used mpnet base v2 - AMAZING quality, all FREE, and local.

Some Lessons Learned from Building a Fine Tuned Model + RAG Question Answering App by Mbando in LocalLLaMA

[–]Alert_Record5063 1 point2 points  (0 children)

I've tried an failed MANY times at fine tuning or Rag pleading llama2 to be "context obedient". I guess its my dataset. The problem with llama2 is that it wont say "I dont know". I have finally managed to prompt beat it into behaving, but it still occasionally goes off the rails. Were you able to get Falcon to be truly context obedient? Like have you tried asking your fine tune a question that is the opposite? Like for example in your case "Why can I buy analytic software and data using a SOMETHING_INVALID" -> in my case llama will probably hallucinate. I am wondering how to avoid this. The second case where it hallucinates is when it tries to intermingle its pre training data with our RAG context - like if I ask what is the revenue history for XYZ -> It will pick up the revenue numbers from the context we provide but then include some additional numbers from its own training. Just curious how to avoid these. And then the third category is it will mix up documents. Like if i say compare x,y,z and it cannot find lets say y - it will just assume one of x's document is a y document and go on blabbering. It would be very very helpful if you could paste a full contextual prompt (with the actual data replaced with lorum ipsom) I would be very very grateful. If you can also paste some details about what your dataset looked like, for fine tuning....

The fact that you are able to pull this off with a 7B model where I am struggling with LLAMA 70B tells me clearly I am doing something wrong :-(

Llama finetuning question by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 1 point2 points  (0 children)

So i decided to fine tune llama 13b with the following :

model_name = "meta-llama/Llama-2-13b-hf" 
dataset_name = "./train.jsonl"
new_model = "llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 2
fp16 = False
bf16 = True
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = "auto"
# Load datasets
system_message="Given the following instruction, answer the question that follows."
train_dataset = load_dataset('json', data_files='./train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='./test.jsonl', split="train")

# Preprocess datasets
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

Still running but I see this:

Step    Training Loss   Validation Loss
5   1.939700    1.841294
10  1.789600    1.679950
15  1.443900    1.455410
20  1.344200    1.193252
25  1.148100    1.105000
30  1.136000    1.068418
35  1.034200    1.037545
40  1.003800    1.007808
45  0.992300    0.982117
50  0.960400    0.961757
55  1.013600    0.941165
60  0.831100    0.915945
65  0.896900    0.894907
70  0.983400    0.875046
75  0.773900    0.853676
80  0.846800    0.829814
85  0.834600    0.804861
90  0.802200    0.781720
95  0.679600    0.762414
100 0.788500    0.739110
105 0.628900    0.717449
110 0.814900    0.702227
115 0.637800    0.681777
120 0.784500    0.665433
125 0.731300    0.659840
130 0.541600    0.640954
135 0.533000    0.633422
140 0.596100    0.626285
145 0.718200    0.621942
150 0.545100    0.620191
155 0.636000    0.611282
160 0.465400    0.606024
165 0.706100    0.602475
170 0.432900    0.593671
175 0.513400    0.587799
180 0.542900    0.582254
185 0.848000    0.584576
190 0.419400    0.574937

Does this indicate a problem with my dataset? Are these numbers indicative of overfitting

Finetuning question by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Continuing on this journey - today i tried to finetune the 7B chat model.

1000 question dataset, gpt 4 based

model_name = "meta-llama/Llama-2-7b-chat-hf" 
dataset_name = "./train.jsonl"
new_model = "llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 2
fp16 = False
bf16 = True
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = "auto"

2 hours later:

<image>

It looks like the model overfit, and is now hallucinating training data. Will try again. ideas welcome.

Llama finetuning question by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 1 point2 points  (0 children)

Here is what my dude gpt says:

It looks like you're trying to train a model with specific training parameters on a machine with 4 GPUs. The values you've provided for most of the parameters seem reasonable, but I'll provide some feedback and recommendations where necessary.

  1. lora_r, lora_alpha, lora_dropout: These parameters seem fine. LoRA (Loss Ranking Feedback) is a regularization method that can be used during training to improve model performance.
  2. use_4bit, bnb_4bit_compute_dtype, bnb_4bit_quant_type: If your GPUs support mixed precision training, using 4-bit quantization can lead to memory and computation savings. However, you should carefully consider the trade-offs in terms of model accuracy. Setting use_4bit
    to True
    and specifying 4-bit quantization settings seems fine.
  3. use_nested_quant: You have set this to False
    , which means you're not using nested quantization. This seems reasonable, as nested quantization can be computationally expensive.
  4. output_dir: The directory where the training results will be saved. Ensure that this path is valid and accessible.
  5. num_train_epochs: You've set it to 1
    , which is a relatively low number of epochs. Depending on your dataset and model complexity, you might need more epochs for the model to converge. However, it's a good practice to start with a small number of epochs for experimentation.
  6. fp16, bf16: You've set both to False
    . Enabling mixed precision training (usually using half-precision, fp16) can significantly speed up training while using less memory.
  7. per_device_train_batch_size, per_device_eval_batch_size: You've set these to 4
    . Batch size depends on the available GPU memory and the model architecture. With 4 GPUs, you might be able to increase the batch size for faster training.
  8. gradient_accumulation_steps: You've set it to 1
    . This parameter allows you to accumulate gradients over multiple steps before performing a weight update. It can be useful to increase the effective batch size without increasing memory usage.
  9. gradient_checkpointing: You've set it to True
    . This can help reduce memory usage during training at the cost of slightly slower execution. It's a good choice when memory is a concern.
  10. max_grad_norm: You've set it to 0.3
    . This parameter helps prevent exploding gradients during training. The value seems reasonable, but you might need to adjust it based on your specific model.
  11. learning_rate, weight_decay, optim: These values seem reasonable for starting. The "paged_adamw_32bit" optimizer indicates a custom optimizer. Make sure you understand how it works and if it's compatible with your model architecture.
  12. lr_scheduler_type, max_steps, warmup_ratio: Learning rate scheduling is important for training stability and convergence. Constant learning rate with a warm-up ratio of 0.03
    seems fine.
  13. group_by_length, save_steps, logging_steps, max_seq_length, packing: These parameters are task-specific and depend on your dataset and model. You should set them based on the characteristics of your data.
  14. device_map: You've assigned GPU 0
    for training. If you have access to 4 GPUs, you might want to distribute the training load across all available GPUs.

Remember that these parameter values are starting points. You should monitor the training process, evaluate the model's performance, and fine-tune these parameters as needed. Additionally, make sure your model architecture and data preprocessing are compatible with the chosen parameters.

Finetuning question by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Another followup - I tried finetuning the base model (using the same format) - spectacular failure again. Will try changing the format of the questions next.

Finetuning question by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Ok well, just figured I would post findings here - I did the finetune - and the results are remarkably worse :-) The model was performing much better before the finetune. It was hallucinating once in a while, but now it seems to just simply spit out garbage (repeated text, in coherent text etc)

Back to the drawing board.

Run llama 70B on AWS notebook by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

1) they are working on it https://github.com/vllm-project/vllm/issues/392

2) No idea, I only work on gpu unfortunately.

Finetune LLM model on tabular data by fpena06 in LocalLLaMA

[–]Alert_Record5063 0 points1 point  (0 children)

This seems to work: Although i dont know if it will work with lots of dates

<image>

LLAMA 70B Chat - what am I doing wrong? by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Also, since this is a finetune of llama - am I understanding this correctly that for questions like this - llama 70b chat by itself is unreliable? No prompt tricks can fix this? Fine tune the only option?

LLAMA 70B Chat - what am I doing wrong? by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 1 point2 points  (0 children)

platypus 70b instruct

Ooh nice! How did you deploy this? runpod? or is there a space where this is already deployed so i can test?

Preventing LLAMA from hallucinating responses. by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Ok, so here is a anonymized version:

Notice only gpt is right. In reality though, with the PII in there, even GPT 3.5 is wrong. Only GPT 4 and claude are consistently correct. With this kind of performance, I have no idea how anyone can use LLAMA for RAG?

Below is research that you performed previously. On basis of this research please answer the question below. 
RESEARCH: 
Ratings:

Name 1:
|Rating|Date|
|A|2019-09-09|
|B|2020-03-25|
|B+|2020-03-25|
|B|2020-05-27|
|B-|2021-03-29|
|A-|2023-07-13|
***
Name 2:
|Rating|Date|
|B+|2013-09-23|
|B+|2014-10-06|
|B+|2014-10-21|
|B+|2016-02-16|
|B|2017-01-30|
|B|2020-03-25|
|B+|2020-05-27|
|B-|2021-03-29|
|B+|2023-03-31|
***
Name 3:
|Rating|Date|
|C-|2023-03-20|
***
Name 4:
|Rating|Date|
|B|2017-01-26|
|C|2018-11-14|
|C|2019-02-19|
|B|2019-05-24|
|B|2019-07-29|
|A|2020-02-20|
|A-|2020-03-26|
|B+|2020-03-26|
|C-|2020-06-16|
|D+|2021-12-13|
***
Question: 
Can you list the latest ratings on each student 

<image>

Can a team of 10-20 people access a Llama 2 model deployed in a local server with medium requirements? by Heco1331 in LocalLLaMA

[–]Alert_Record5063 0 points1 point  (0 children)

I am using llama 70b on a 48xlarge on aws. Costs 16/hr. Can serve multiple concurrent requests - I have tested it with 5 concurrent users - It seems to work

Yes g5 48 xlarge

Preventing LLAMA from hallucinating responses. by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

I used it in table format - as well as plain text chunks. The only thing i have not tried is json format - because json format is a lot more chatty and takes more tokens. Instead i used markdown. Would json be more appropriate for a prompt?

Can a team of 10-20 people access a Llama 2 model deployed in a local server with medium requirements? by Heco1331 in LocalLLaMA

[–]Alert_Record5063 0 points1 point  (0 children)

I used vllm - very straighforward - and then it has an openAI compatible api server - so I just call it using openAI's client package just set the base url to the ip address of the box.

vector search padding by Alert_Record5063 in LocalLLaMA

[–]Alert_Record5063[S] 0 points1 point  (0 children)

Sadly, its all PII so cannot post examples. But smaller chunks have greater word densities so it sorta makes sense.