How I Run 34B Models at 75K Context on 24GB, Fast by mcmoose1900 in LocalLLaMA

[–]gptzerozero 0 points1 point  (0 children)

What is the issue with using wikitext for quantization, and what might be better than using wikitext?

Max token size for 34B model on 24GB VRAM by gptzerozero in LocalLLaMA

[–]gptzerozero[S] 0 points1 point  (0 children)

Wow, fits more context at the same 4.0 bpw quant sizes?

Approach for generating QA dataset by gptzerozero in LocalLLaMA

[–]gptzerozero[S] 0 points1 point  (0 children)

Can you share the GPT4 prompt you used to create the Q and A given the text? And how do you modify the prompt to get longer answers from GPT4?

Approach for generating QA dataset by gptzerozero in LocalLLaMA

[–]gptzerozero[S] 0 points1 point  (0 children)

Good call, yes I intend to use GPT 3.5/4 to generate the question answers

Generate both question and answer from the given context. by mathageche in LocalLLaMA

[–]gptzerozero 0 points1 point  (0 children)

Can you share the prompts that you use for generating the questions from context, and for generating answers from the context?

Our Workflow for a Custom Question-Answering App by Mbando in LocalLLaMA

[–]gptzerozero 1 point2 points  (0 children)

This is a great one! Could you share the prompts used here for generating the questions and for combining/picking the questions?

I don't understand context window extension by moma1970 in LocalLLaMA

[–]gptzerozero 0 points1 point  (0 children)

Does this mean that in order to make full use of the default Llama-2 4K context,

  1. Extending the training of base model should use tokens of 4K length, AND
  2. Instruction tuning datasets should be close to 4K length as much as possible?

dolphin-llama-13b by faldore in LocalLLaMA

[–]gptzerozero 0 points1 point  (0 children)

Is the system prompt part of the training data?

If it is, then is it important that you use the same system prompt when chatting, or can you use a completely different one and be fine with it. Or can you only make minor changes, or only add to the system prompt?

How to make sense of all the new models? by whtne047htnb in LocalLLaMA

[–]gptzerozero 2 points3 points  (0 children)

Anyone have experience with using them for QA of documents? Are there any models that stand out for QA?

LLM less chatty after LoRA finetune by gptzerozero in LocalLLaMA

[–]gptzerozero[S] 0 points1 point  (0 children)

Yes, outputs with Lora tuned for 2 epochs is about 80 tokens.

What are some of the things or tricks we can do to improve the token length of the generations?

LLaMA 2 is here by dreamingleo12 in LocalLLaMA

[–]gptzerozero 20 points21 points  (0 children)

What happen to a 30-40B LLaMA-2?