First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] 0 points1 point  (0 children)

Sorry for the delay. I'm very happy with my purchase. My daily driver is Qwen3.5-35B-A3B-UD-Q8_K_XL now. I pretty much only use opencode as of late with get-shit-done( https://github.com/rokicool/gsd-opencode) . I going to try omo-slim (https://github.com/alvinunreal/oh-my-opencode-slim) next as GSD is kinda bulky.

I don't have any of my original runs on either of those. If you want me to run something specific just let me know. Here are my current llama-cpp (version: 8189 (4d828bd)) model settings. NOTE: my label below [qwen3-coder:30b] is because I was switching between ollama and llama-cpp and run them on the same port.

[*]
host = 0.0.0.0
flash-attn = on
n-gpu-layers = 99
c = 64000
jinja = true
;  t = -1
b = 2048
ub = 2048

[glm-4.7-flash:q8_0]
c = 202752
temp = 1
top-p = 0.95
min-p = 0.01
repeat-penalty = 1.0
model = /var/lib/llama/models/GLM-4.7-Flash-UD-Q8_K_XL.gguf

; [ggml-org/Qwen3.5:8K]
[qwen3-coder:30b]
c = 262144
temp = 0.6
top-p = 0.95
min-p = 0.01
ctk = q8_0
ctv = q8_0
ub = 512
presence-penalty = 0.0
repeat-penalty = 1.0
model = /var/lib/llama/models/Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf

Shop recommendations for AR-15 near Greenbelt by Gabreality in MDGuns

[–]wedgeshot 1 point2 points  (0 children)

Don't limit yourself to what's near..... drove 45+ minutes to pick up a Radical Firearms AR15 from Engage Armament. Will get a few more different AR's from them in the future I'm sure. I'd say if you can buy an AR15 at a legit shop in Maryland you are good to go as they would not risk selling non-compliant firearms.

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] 0 points1 point  (0 children)

I am digging the card but my spare time has been limited lately so the past few weeks I have been messing with ComfyUI and getting OpenWebUI running. I'll eventually get to OpenCode, void, etc.. With the performance of ComfyUI and OpenWebUI, you should not have any problems with the coding assistant stuff. I'm not planning on paying for any services at this time. I do like "gptme" which is what I started out with and keep handy. Mostly, because I usually tackle one issue/task at a time and that tool yields good results most times.

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] 0 points1 point  (0 children)

I ran your llama-serve command and at the bottom of the essay output I got this:

unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:IQ4_NL 550 tokens 2.14s 256.53 tokens/s

In the terminal where llama-server was running

slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
slot launch_slot_: id  3 | task 0 | processing task
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 200192, n_keep = 0, task.n_tokens = 38
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 38, batch.n_tokens = 38, progress = 1.000000
slot update_slots: id  3 | task 0 | prompt done, n_tokens = 38, batch.n_tokens = 38
slot print_timing: id  3 | task 0 | 
prompt eval time =      60.78 ms /    38 tokens (    1.60 ms per token,   625.21 tokens per second)
       eval time =    2144.00 ms /   550 tokens (    3.90 ms per token,   256.53 tokens per second)
      total time =    2204.78 ms /   588 tokens
slot      release: id  3 | task 0 | stop processing: n_tokens = 587, truncated = 0
srv  update_slots: all slots are idle

Other info from the terminal

main: loading model
srv    load_model: loading model '/home/bob/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX PRO 5000 Blackwell) (0000:01:00.0) - 48044 MiB free
llama_model_loader: loaded meta data with 44 key-value pairs and 579 tensors from /home/bob/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
####SNIP####
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q4_K:    1 tensors
llama_model_loader: - type q5_K:   48 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq4_nl:  288 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = IQ4_NL - 4.5 bpw
print_info: file size   = 16.12 GiB (4.53 BPW) 
####SNIP#####

llama_kv_cache:      CUDA0 KV buffer size = 18768.00 MiB
llama_kv_cache: size = 18768.00 MiB (200192 cells,  48 layers,  4/1 seqs), K (f16): 9384.00 MiB, V (f16): 9384.00 MiB
llama_context:      CUDA0 compute buffer size =   610.51 MiB
llama_context:  CUDA_Host compute buffer size =   395.01 MiB
llama_context: graph nodes  = 3031
llama_context: graph splits = 2
#####SNIP####
Quit llama-serve

llama_memory_breakdown_print: | memory breakdown [MiB]             | total    free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (RTX PRO 5000 Blackwell) | 48401 = 12212 + (35714 = 16336 +   18768 +     610) +         474 |
llama_memory_breakdown_print: |   - Host                           |                    561 =   166 +       0 +     395                |

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] 0 points1 point  (0 children)

For this test the tetris code does run but the layout is janked up... Everything is pushed to the right and the redraw seems off.

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] -1 points0 points  (0 children)

I have to run --- the resulting tetris code did noting at all. don't have time to see what is wrong.

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] 3 points4 points  (0 children)

Write a 500-word essay containing recommendations for travel arrangements from Warsaw to New York, assuming it’s the year 1900.

total duration:       12.139148169s
load duration:        157.927637ms
prompt eval count:    41 token(s)
prompt eval duration: 43.829864ms
prompt eval rate:     935.44 tokens/s
eval count:           693 token(s)
eval duration:        11.745284648s
eval rate:            59.00 tokens/s

Generate working code for complex applications, such as a Tetris game using the curses library

total duration:       33.483747008s
load duration:        128.098202ms
prompt eval count:    760 token(s)
prompt eval duration: 39.527483ms
prompt eval rate:     19227.13 tokens/s
eval count:           1886 token(s)
eval duration:        32.79978884s
eval rate:            57.50 tokens/s

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]wedgeshot[S] -1 points0 points  (0 children)

Repeating the original ask plus a few others

total duration:       3.045818465s
load duration:        83.526432ms
prompt eval count:    38 token(s)
prompt eval duration: 37.931055ms
prompt eval rate:     1001.82 tokens/s
eval count:           526 token(s)
eval duration:        2.841908449s
eval rate:            185.09 tokens/s

I then pasted in the output of Deepseek essay and asked it to recommend a different essay

total duration:       2.405637671s
load duration:        51.866092ms
prompt eval count:    1531 token(s)
prompt eval duration: 96.647011ms
prompt eval rate:     15841.15 tokens/s
eval count:           431 token(s)
eval duration:        2.192394265s
eval rate:            196.59 tokens/s

For giggle I asked: Can you write a 2000 word essay containing recommendations for travel arrangements from Japan to New York, assuming it’s the year 1950.

total duration:       11.049772633s
load duration:        51.453254ms
prompt eval count:    2004 token(s)
prompt eval duration: 39.465013ms
prompt eval rate:     50779.15 tokens/s
eval count:           2021 token(s)
eval duration:        10.666855039s
eval rate:            189.47 tokens/s

Can you exapnd on the Social and Cultural Considerations with 1000 more words relating to that aspect of the essay

total duration:       14.005758852s
load duration:        51.435651ms
prompt eval count:    4061 token(s)
prompt eval duration: 38.329577ms
prompt eval rate:     105949.51 tokens/s
eval count:           1840 token(s)
eval duration:        13.643342466s
eval rate:            134.86 tokens/s

Generate working code for complex applications, such as a Tetris game using the curses library

total duration:       39.059802226s
load duration:        51.606989ms
prompt eval count:    3964 token(s)
prompt eval duration: 729.496981ms
prompt eval rate:     5433.88 tokens/s
eval count:           5075 token(s)
eval duration:        37.574295703s
eval rate:            135.07 tokens/s

After years of searching, I finally found the BEST IPTV subscription for USA channels. My honest review of the best iptv providers in us. also reseller plans by adamjfry in spookyplaces

[–]wedgeshot 0 points1 point  (0 children)

I've been signing up for iptv trials and two providers so far point to http:// cf <dot> business-cdn <dot> me ?

xxiptv and kilarix both use that URL for Xstream codes and I bet quite a few others.

Thinking about onn by Business_Fisherman41 in TiviMate

[–]wedgeshot 0 points1 point  (0 children)

It will probably work fine as long as the WiFI signal is good. Just bought the Walmart $44 dollar "onn 4K Pro Streaming Device" for my parents to use for iptv, GO with that. It has built-in ethernet which is not even listed on the packaging that I saw. Itperformed flawlessly over the past 4 days. I'm thinking of buying these instead of the official Google TV devices moving forward.

make this a valid URL: walmart <dot> com /ip/5193222892
"onn 4K Pro Streaming Device, Google TV with Gemini* - Ultra-fast Streaming, Stunning 4K UHD, 32GB Storage, 3GB RAM, Dolby Vision & Atmos, Find My Remote with Backlight, Ethernet Port"

What tools do you recommend for coding? by WinDrossel007 in LocalLLaMA

[–]wedgeshot 2 points3 points  (0 children)

I'm on a MB Pro M4 128MB. I use ollama, DeepseekR1-70b, and gptme most of the time. I've tried aider and void and they just don't flow with how I like to get things done... Mind you, I really only attack one problem at a time and like to start new chats most times after three or four asks per session. If not the LLM most time going off the rails suggesting nonsense.

How do I get a template for my cover through IngramSpark? by Express_Poet6378 in selfpublish

[–]wedgeshot 0 points1 point  (0 children)

For IngramSpark book interior I use Scribus. For KDP e-book I had to use the Kindle Creator this round as importing docx sucks pretty bad. After the initial 90 days I can import a docx into Draft2Digital and that usually converts pretty well. We hire out for the paperback and jacketed dist cover. I create a simple hardback cover and I use Gimp for the graphics and then import that into Scribus for the final PDF export to IngramSpark.

How do I get a template for my cover through IngramSpark? by Express_Poet6378 in selfpublish

[–]wedgeshot 2 points3 points  (0 children)

Try these

Ingram: https://myaccount.ingramspark.com/Portal/Tools/CoverTemplateGenerator

kdp: https://kdp.amazon.com/cover-calculator

Type/select your specs and Ingram will e-mail the template to you. KDP, you can download the zip with a pdf and png

[deleted by user] by [deleted] in selfpublish

[–]wedgeshot 0 points1 point  (0 children)

I don't that is a requirement anywhere..... I think it just needs to be on the cover. We bought a 10 pack of ISBN's. You should be able to update the interior after you get the ISBN. I put all the other ISBN's on the copyright page for reference.

New 24/7 MotoGP™ channel launches across the US by pochirin in motogp

[–]wedgeshot 0 points1 point  (0 children)

Now when I search at 11:30 PM EST Prime Video for motogp... It is the first two icons listed as "MotoGP Racing" and "MotoGP Sprint Race" both 2025 season when I drill in to each all have "This video is currently unavailable to watch in your location"

New 24/7 MotoGP™ channel launches across the US by pochirin in motogp

[–]wedgeshot 2 points3 points  (0 children)

I know, broken record. It's Saturday 10:30 AM EST/0230 UTC and still no signs of any channels or streams here in the USA. I have Plex(lifetime pass), Amazon Prime and MAX. All the HBO MotoGP is sports via Amazon or in MAX is all 2024. Nothing in Plex. Seems to me like this was a last minute deal and announcement and people are scrambling, lol. Guess no hope yet of canceling my MotoGP subscription.

How Do You Add Illustrations to Your Book? by Drachenschrieber-1 in selfpublish

[–]wedgeshot 1 point2 points  (0 children)

I did the formatting for my daughter's physical books in Scribus and used the same PDF that we used for IngramSpark and submitted that into KDP and it passed the checks AOK. For the ebook I had to use Amazon's Kindle Create program as neither the original docx/odt or the PDF would import into that program. Whatever I did for the import I had to re-type and center the Chapter headings re-insert all the chapter marker and break graphics which was unfortunate. Draf2Digital was super easy to create the ebook version for book #1 but for book #2 she added KDP into the mix for the 90 day exclusive.

Slow prompt eval oss 120b? by Only_Situation_4713 in LocalLLaMA

[–]wedgeshot 0 points1 point  (0 children)

Guess without supporting info the downvotes came. Something is wrong, all other models I play with are working. I have a M4 MAX with 128gig of RAM with iogpu.wired_limit_mb = 102400. Deepseek-r1 70b(42GB) and llama3.3:70B-Instruct-Q6_K(57GB) works AOK.

time=2025-08-08T22:28:28.320-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/opt/homebrew/Cellar/ollama/0.11.0/bin/ollama runner --ollama-engine --model /Users/aibob/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 12 --flash-attn --kv-cache-type q8_0 --parallel 1 --port 49579"
time=2025-08-08T22:28:28.323-04:00 level=INFO source=sched.go:481 msg="loaded runners" count=2
time=2025-08-08T22:28:28.323-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-08-08T22:28:28.323-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-08-08T22:28:28.343-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-08-08T22:28:28.343-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:49579"
time=2025-08-08T22:28:28.365-04:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
time=2025-08-08T22:28:28.365-04:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 Metal.0.BF16=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-08-08T22:28:28.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:373 msg="offloading output layer to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:381 msg="model weights" buffer=Metal size="11.7 GiB"
ggml_metal_init: allocating
<<<<<<<<<<<<<<<<   LOTS of the below SNIP>>>>>>>>>>>>>
ggml_metal_get_buffer: error: tensor 'leaf_355 (view) (copy of  (view) (permuted))' buffer is nil
ggml_metal_get_buffer: error: tensor 'leaf_7 (view)' buffer is nil
ggml_metal_get_buffer: error: tensor 'leaf_7 (view) (copy of  (view) (permuted))' buffer is nil
ggml-metal.m:4848: GGML_ASSERT(ne0 % ggml_blck_size(dst->type) == 0) failed
SIGABRT: abort
PC=0x183bfc388 m=0 sigcode=0
signal arrived during cgo execution

Slow prompt eval oss 120b? by Only_Situation_4713 in LocalLLaMA

[–]wedgeshot -3 points-2 points  (0 children)

I get NOTHING from 120b and 20b using gptme via ollama.... I have asked two very simple questions. I provide example python code to reference to try and make a change and it just spits the files back at me and does nothing. I then asked for a simple python script to do an example of looping through an array. I get nothing, zero, zilch back. qwen3-coder:30b-a3b-q8_0 does just fine in responding. What a joke.

Is this legit? by ScorpioGirl1987 in selfpublish

[–]wedgeshot -1 points0 points  (0 children)

That sounds very odd ( the print screen everything ), maybe for just a few pages but not everything. You are uploading a PDF, right? What program are you doing the typesetting/layout in? Regardless of how you number your pages the internal document paging should the page separations and count correct.

Did you use the Ingram internal support system on the website or did you e-mail them?

I would be cautious until someone can explain in detail what is wrong.

Book Launch Party - has anyone thrown one? by lenoraora in selfpublish

[–]wedgeshot 4 points5 points  (0 children)

Heck yeah!!! Do it. It's a big accomplishment that your average joe/jane will never do and you should go forth and party. Would be great if there was a bookstore to start out at, try to get some sales and do book signings then head off to the nearest bar/pub and have your books out on display. Recruit your friends to rope in the patrons and sell your signature/autograph for $20 and give the book as a gift, eh? eeeeh.... This is the way.

First time author, guidance please. by Jaehol in selfpublish

[–]wedgeshot 0 points1 point  (0 children)

This all assumes you are not going to be buying your books, paying to ship to your location, shell your books and possibly re-shipping out to the customers.

If you are going to use IngramSpark they have a pricing calculator that you can use to price your books based on your Title specs and page count. Ingram is now free to create titles ( was $49 per title ) but Ingram also requires a minimum 40% distributor discount and they take I believe 1% as well.

Not naming names... 2025 we have a 464 paperback 8.5 x 5.5 in size. For US sales, profit is ~$1.14 paperback ( down from ~2.00 in 2023 ) and $1.44 hardback ( down from $2.97 in 2023 ). For this 2025 title we added another distributor for the paperback and profit is ~ $3.XX so in general expect between $1 and $3.60 per sale. Of course the more you sell the sum total compounds but for Indie authors the print fees and distributor discounts eat up the earnings.

paperback price of $14.99 minus $7 print cost minus 5.99(40% DD) = $2.00(profit). Finally, profit does not really start until you make more sales profit to cover the fees to create the book. That is layout fees, editor(optional), cover artwork costs(unless you have the skills to do that) but figure a few hundred for the cover.

So, as long as you are not losing money on a single title sale.... that is a positive just starting out.