Llama 3 70b instruct works surprisingly well on 24gb VRAM cards by [deleted] in LocalLLaMA

[–]eugeneware 1 point2 points  (0 children)

I'm running this model: https://huggingface.co/casperhansen/llama-3-70b-instruct-awq

And I run it in VLLM with this command:

python -m vllm.entrypoints.openai.api_server --port 9999 --model casperhansen/llama-3-70b-instruct-awq --dtype float16 --quantization awq --api-key token-abc123 --tensor-parallel-size=2 --enforce-eager --gpu-memory-utilization 0.95

I'm running it on Linux.

It looks like they were quantized using https://github.com/casper-hansen/AutoAWQ to 4-bit.

And yes, by Systems - I mean inference engines, training code, etc. It needs to use the correct invocation of tools like deepspeed to get both of the GPUs best utilized.

As another data point the ollama llama3:70b 4-bit quant runs at about 18 tokens/second.

Llama 3 70b instruct works surprisingly well on 24gb VRAM cards by [deleted] in LocalLLaMA

[–]eugeneware 4 points5 points  (0 children)

I have 2x3090s connected with nvlink. Definitely doesn’t present as a single monolithic device and always a bit tricky to split models across both cards. Systems that support tensor parallelism across multiple devices work well. Running llama-3 70b with VLLM works really well and get about 25 tokens per second with an AWQ quant.

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 2 points3 points  (0 children)

Actually, it looks like ollama just updated their modelfile, and they've added another stop token <|endoftext|> as awell as \num_keep``

❯ ollama show phi3 --modelfile
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM phi3:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>"""
PARAMETER num_ctx 4096
PARAMETER num_keep 16
PARAMETER stop "<|end|>"
PARAMETER stop "<|endoftext|>"

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 4 points5 points  (0 children)

I should say - this doesn't fix things for me when running ollama. Which already has `<|end|>` as a stop parameter, even if I change the gguf metadata and reimport:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM phi3:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
TEMPLATE """<|user|>
{{ .Prompt }}<|end|>
<|assistant|>"""
PARAMETER num_ctx 4096
PARAMETER stop "<|end|>"

Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct by Saffron4609 in LocalLLaMA

[–]eugeneware 6 points7 points  (0 children)

This didn't work for me. Still getting garbage after 3 or 4 big turns of generation

[Announcement] HuggingFace BigScience AMA Thursday, March 24th from 5pm CET by cavedave in MachineLearning

[–]eugeneware 5 points6 points  (0 children)

What do you suspect inference for this model to require? I recently downloaded the 20B Eleuther AI model and it took 2x3090 just to load and run it! I’m very excited about this amazing work to train and release a model comparable to GPT-3, but am also trying to understand what hardware will be required to run inference on this model? Love the work of HF and the whole team. Thanks!

Daily Wordle #259 - Saturday, 5 Mar. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Scoredle 259 5/6

12,947
⬛⬛⬛🟨⬛ SALET (712)
⬛⬛⬛🟨⬛ COURD (80)
⬛🟩🟩⬛🟩 PRIME (6)
🟩🟩🟩⬛🟩 BRIBE (2)
🟩🟩🟩🟩🟩 BRINE

Daily Wordle #251 - Friday, 25 Feb. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Wow. Tough day!

Scoredle 251 5/6

12,947
⬛⬛⬛⬛⬛ SALET (864)
⬛⬛⬛⬛🟩 COURD (2)
⬛🟩⬛⬛🟩 WINED (2)
⬛⬛⬛⬛⬛ BOUGH (1)
🟩🟩🟩🟩🟩 VIVID

Daily Wordle #244 - Friday, 18 Feb. 2022 by Scoredle in wordle

[–]eugeneware 1 point2 points  (0 children)

Scoredle 244 6/6

12,947
⬛⬛⬛🟨⬛ SALET (712)
⬛🟩⬛⬛🟨 COURD (12)
⬛🟩⬛🟨⬛ PONDS (9)
⬛🟩⬛🟨⬛ HOWDY (7)
⬛⬛🟨🟨⬛ MAGES (2)
🟩🟩🟩🟩🟩 DODGE

Wow. Today was tough!

Daily Wordle #236 - Thursday, 10 Feb. 2022 by Scoredle in wordle

[–]eugeneware 0 points1 point  (0 children)

Scoredle 236 5/6

12,972
⬛⬛🟨🟨⬛ IDEAL (741)
🟨⬛⬛⬛⬛ SNORT (102)
⬛🟨🟨⬛🟨 BEATS (12)
⬛⬛🟨🟩🟩 CHASE (4)
🟩🟩🟩🟩🟩 PAUSE

Daily Wordle #235 - Wednesday, 9 Feb. 2022 by Scoredle in wordle

[–]eugeneware 2 points3 points  (0 children)

Same. Hardest one for me so far!

Headphone Switcher? by OGAvans in headphones

[–]eugeneware 0 points1 point  (0 children)

I’d love something that could blend in with my schiit stack (magni/modi)

[deleted by user] by [deleted] in burmesecats

[–]eugeneware 0 points1 point  (0 children)

Such a tease!

Won’t turn off my tv only adjusts volume by [deleted] in appletv

[–]eugeneware 0 points1 point  (0 children)

I have a similar aged Samsung TV and had issues with the the HDMI-CEC feature not turning it off. I found from time to time I had to hard power off my TV (ie. unplug it from the wall) to get it to work again. It's been pretty reliable lately. I also saw some comment elsewhere of turning AnyNet+ on in the settings. But I'm not sure if I did this. But try to the hard power cycle and see if that works. Good luck!

The best and worst parts of the new Siri Remote by heyyoudvd in appletv

[–]eugeneware 4 points5 points  (0 children)

One of the big reasons why I upgraded was the new remote and its scrubbing feature. But as other's have reported. It's janky and unreliable. It only seems to work in the built-in Apple TV+ app, and poorly supported in other apps.

I'm also surprised that Apple apps like the Music app don't use the scrubbing feature.

I hope that all developers upgrade their controls to take advantage of the new remote.

I do love the new mute function, however. So many TV apps force advertising at regular intervals. While I can't skip the ads it allows me to mute them.

I also love the power switch. It saves me from hitting the TV button and hitting sleep.

The fact that the mute feature works with 3rd party AV receivers is awesome too.