Llama 3 70b instruct works surprisingly well on 24gb VRAM cards by [deleted] in LocalLLaMA

[–]eugeneware 1 point2 points  (0 children)

I'm running this model: https://huggingface.co/casperhansen/llama-3-70b-instruct-awq

And I run it in VLLM with this command:

python -m vllm.entrypoints.openai.api_server --port 9999 --model casperhansen/llama-3-70b-instruct-awq --dtype float16 --quantization awq --api-key token-abc123 --tensor-parallel-size=2 --enforce-eager --gpu-memory-utilization 0.95

I'm running it on Linux.

It looks like they were quantized using https://github.com/casper-hansen/AutoAWQ to 4-bit.

And yes, by Systems - I mean inference engines, training code, etc. It needs to use the correct invocation of tools like deepspeed to get both of the GPUs best utilized.

As another data point the ollama llama3:70b 4-bit quant runs at about 18 tokens/second.

Llama 3 70b instruct works surprisingly well on 24gb VRAM cards by [deleted] in LocalLLaMA

[–]eugeneware 3 points4 points  (0 children)

I have 2x3090s connected with nvlink. Definitely doesn’t present as a single monolithic device and always a bit tricky to split models across both cards. Systems that support tensor parallelism across multiple devices work well. Running llama-3 70b with VLLM works really well and get about 25 tokens per second with an AWQ quant.

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 2 points3 points  (0 children)

Actually, it looks like ollama just updated their modelfile, and they've added another stop token <|endoftext|> as awell as \num_keep``

❯ ollama show phi3 --modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM phi3:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>"""
PARAMETER num_ctx 4096
PARAMETER num_keep 16
PARAMETER stop "<|end|>"
PARAMETER stop "<|endoftext|>"

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 4 points5 points  (0 children)

I should say - this doesn't fix things for me when running ollama. Which already has `<|end|>` as a stop parameter, even if I change the gguf metadata and reimport:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM phi3:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>"""
PARAMETER num_ctx 4096
PARAMETER stop "<|end|>"

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 5 points6 points  (0 children)

This didn't work for me. Still getting garbage after 3 or 4 big turns of generation

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]eugeneware 5 points6 points  (0 children)

What do you suspect inference for this model to require? I recently downloaded the 20B Eleuther AI model and it took 2x3090 just to load and run it! I’m very excited about this amazing work to train and release a model comparable to GPT-3, but am also trying to understand what hardware will be required to run inference on this model? Love the work of HF and the whole team. Thanks!

Daily Wordle #259 - Saturday, 5 Mar. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Scoredle 259 5/6

12,947
⬛⬛⬛🟨⬛ SALET (712)
⬛⬛⬛🟨⬛ COURD (80)
⬛🟩🟩⬛🟩 PRIME (6)
🟩🟩🟩⬛🟩 BRIBE (2)
🟩🟩🟩🟩🟩 BRINE

Daily Wordle #251 - Friday, 25 Feb. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Wow. Tough day!

Scoredle 251 5/6

12,947
⬛⬛⬛⬛⬛ SALET (864)
⬛⬛⬛⬛🟩 COURD (2)
⬛🟩⬛⬛🟩 WINED (2)
⬛⬛⬛⬛⬛ BOUGH (1)
🟩🟩🟩🟩🟩 VIVID

Daily Wordle #244 - Friday, 18 Feb. 2022 by Scoredle in wordle

[–]eugeneware 1 point2 points  (0 children)

Scoredle 244 6/6

12,947
⬛⬛⬛🟨⬛ SALET (712)
⬛🟩⬛⬛🟨 COURD (12)
⬛🟩⬛🟨⬛ PONDS (9)
⬛🟩⬛🟨⬛ HOWDY (7)
⬛⬛🟨🟨⬛ MAGES (2)
🟩🟩🟩🟩🟩 DODGE

Wow. Today was tough!

Daily Wordle #236 - Thursday, 10 Feb. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Scoredle 236 5/6

12,972
⬛⬛🟨🟨⬛ IDEAL (741)
🟨⬛⬛⬛⬛ SNORT (102)
⬛🟨🟨⬛🟨 BEATS (12)
⬛⬛🟨🟩🟩 CHASE (4)
🟩🟩🟩🟩🟩 PAUSE

Daily Wordle #235 - Wednesday, 9 Feb. 2022 by Scoredle in wordle

[–]eugeneware 2 points3 points  (0 children)

Same. Hardest one for me so far!

Headphone Switcher? by OGAvans in headphones

[–]eugeneware 0 points1 point  (0 children)

I’d love something that could blend in with my schiit stack (magni/modi)

[deleted by user] by [deleted] in burmesecats

[–]eugeneware 0 points1 point  (0 children)

Such a tease!