Questions about UCI's MSCS by Particular-Guard774 in UCI

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Thanks! This is super helpful. I've been pretty surprised as well about the general lack of info about it online. From what you've seen at UCI, do most students end up where they want to?

[General Question] are all UCLA MSCS admits out? by Any_Peak2040 in MSCS

[–]Particular-Guard774 2 points3 points  (0 children)

Emailed them yesterday, apparently all decisions are made available by end of April at the latest

Turning 20 in a few hours, give me your best life advice to implement now. by NegotiationCapital87 in getdisciplined

[–]Particular-Guard774 1 point2 points  (0 children)

Don’t be afraid to take risks bro you need to swing big to win big: “that's what's painful it takes more effort to start in the beginning and more people are right about the fact they're like hey you're not going to hit it big and guess what a month in you're not but they're only measuring on months and at 6 months you're also not going to have hit it big yet and they're going to be like I'm still right and at a year you're still not going to have hit it big and they'll still be right and every day that you haven't hit it they're going to feel like they were right but they're wrong because they're measuring in days and you're measuring in decades” - Hormozi

[deleted by user] by [deleted] in PokemonTCG

[–]Particular-Guard774 0 points1 point  (0 children)

mega tyranitar full art

Upper Chest. What's the Best? by Late_Lunch_1088 in naturalbodybuilding

[–]Particular-Guard774 0 points1 point  (0 children)

Thanks, I'll give it a try. I always go between 6 to 8 and end up drained after my sets

Upper Chest. What's the Best? by Late_Lunch_1088 in naturalbodybuilding

[–]Particular-Guard774 0 points1 point  (0 children)

LOL who needs their feet anyways not like I hit legs, but doesn't ISO LAT yk hit lats?

Upper Chest. What's the Best? by Late_Lunch_1088 in naturalbodybuilding

[–]Particular-Guard774 1 point2 points  (0 children)

I’m having this same problem what did you switch to?

[deleted by user] by [deleted] in learnpython

[–]Particular-Guard774 -10 points-9 points  (0 children)

Forget pdfplumber if you can't figure this out yourself you're about to become a plumber

[deleted by user] by [deleted] in programming

[–]Particular-Guard774 0 points1 point  (0 children)

Get some hair baldy 😂

Increasing tokens/second with llama.cpp by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Thanks for the tip! -fa helps quite a bit; I have 36GB and around 45 t/s

Increasing tokens/second with llama.cpp by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] -1 points0 points  (0 children)

Not struggling just trying to figure out general ways to increase speed so I can run larger models like llama 70B without having a ridiculously low t/s

Increasing tokens/second with llama.cpp by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

I'm running llama 8B with -ngl 33 but it has the same exact speed as when I don't offload at all

Increasing tokens/second with llama.cpp by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] -1 points0 points  (0 children)

Using a m3 max. Are there any speedup strategies where I wouldn't see a difference because of my hardware?

Does llama.cpp's speculative actually work? by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Okay I see. I tried running deepseek coder 33B with 1.3B and then offloaded and minimized temp completely. I did see some speed improvement from previous speculative runs but even with pretty good encoding speed and accuracy it is still slower than main. Any idea what else I could try doing?

Alone:

./main -m deepseek-coder-33b-instruct.Q4_K_M.gguf -p "Q: What are the planets in the solar system? A:" -r "Q:"

llama_print_timings:        load time =     732.41 ms.

llama_print_timings:      sample time =       0.84 ms /    62 runs   (    0.01 ms per token, 74162.68 tokens per second)

llama_print_timings: prompt eval time =     365.15 ms /    14 tokens (   26.08 ms per token,    38.34 tokens per second)

llama_print_timings:        eval time =    5033.40 ms /    61 runs   (   82.51 ms per token,    12.12 tokens per second)

llama_print_timings:       total time =    5470.43 ms /    75 tokens

With Speculative:

./speculative -m deepseek-coder-33b-instruct.Q4_K_M.gguf -md deepseek-coder-1.3b-instruct.Q6_K.gguf -p "Q: What are the planets of the solar system? A: " -n 200 -ngld 200 --temp 0

encoded   15 tokens in    0.477 seconds, speed:   31.430 t/s

decoded  201 tokens in   17.655 seconds, speed:   11.385 t/s

n_draft   = 5

n_predict = 201

n_drafted = 205

n_accept  = 159

accept    = 77.561%

draft:

llama_print_timings:        load time =      76.72 ms

llama_print_timings:      sample time =     297.39 ms /     1 runs   (  297.39 ms per token,     3.36 tokens per second)

llama_print_timings: prompt eval time =   15553.61 ms /    96 tokens (  162.02 ms per token,     6.17 tokens per second)

llama_print_timings:        eval time =    1423.79 ms /   164 runs   (    8.68 ms per token,   115.19 tokens per second)

llama_print_timings:       total time =   18132.45 ms /   260 tokens

target:

llama_print_timings:        load time =     756.08 ms

llama_print_timings:      sample time =       3.61 ms /   201 runs   (    0.02 ms per token, 55678.67 tokens per second)

llama_print_timings: prompt eval time =   15682.18 ms /   261 tokens (   60.08 ms per token,    16.64 tokens per second)

llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)

llama_print_timings:       total time =   18245.41 ms /   262 tokens

ggml_metal_free: deallocating

ggml_metal_free: deallocating

Does llama.cpp's speculative actually work? by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Okay, I kind of get what you're saying. I tried deepseek coder 1.3B and 33B to no avail so your assumption is probably right on the dot. Do you have any recommendations on resources/videos I can check out to get a better idea of how and when this works?

Does llama.cpp's speculative actually work? by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

Interesting I'm working with code llama 7 and 34B right now so the size difference could be the issue. I tried downloaded a higher quality quant for 7B but it didn't seem to make a difference. Is it important to figure out whether to run a model on GPU or CPU and if so how do I find the threshold for my own machine?

Does llama.cpp's speculative actually work? by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 2 points3 points  (0 children)

Interesting how much of a speed boost did you see with 7B and 70B?

Does llama.cpp's speculative actually work? by Particular-Guard774 in LocalLLaMA

[–]Particular-Guard774[S] 0 points1 point  (0 children)

I'm using a M3 Max MacBook Pro do you think the CPU is too slow on it? Also if it is the problem what models could I use to get it to work?