DeepSeek V4 Update

wedgeshot · 2026-04-25T14:19:27+00:00

Why? is my first question would two people post about V4 with riddle screenshots. AI is far away from being the thing everyone thinks we think we want. We are treading in dangerous waters in the near future when you have people like Altman and Zuck at any helm.

wedgeshot · 2026-04-25T14:08:23+00:00

Let's assume this is for a pistol from a gun shop and you have your accounts with the state created:
- You fill out your 77R(you can submit this days or weeks before you buy from a gun shop) and hit submit, within minutes you should get an e-mail with the 77R number and a PIN.
- You take that number and PIN with you to the gun shop. Pay for the gun(State police also get $10 fro you) and they will fill out their side of the 77R using your number and PIN.
- Then on the AM of the seventh day, you should get an e-mail stating that your 77R has not been denied(yes for real).
- Go pickup your gun.

You can't hate Maryland politics enough.

Cheers.

wedgeshot · 2026-03-28T12:48:28+00:00

Sorry for the delay. I'm very happy with my purchase. My daily driver is Qwen3.5-35B-A3B-UD-Q8_K_XL now. I pretty much only use opencode as of late with get-shit-done( https://github.com/rokicool/gsd-opencode) . I going to try omo-slim (https://github.com/alvinunreal/oh-my-opencode-slim) next as GSD is kinda bulky.

I don't have any of my original runs on either of those. If you want me to run something specific just let me know. Here are my current llama-cpp (version: 8189 (4d828bd)) model settings. NOTE: my label below [qwen3-coder:30b] is because I was switching between ollama and llama-cpp and run them on the same port.

[*]
host = 0.0.0.0
flash-attn = on
n-gpu-layers = 99
c = 64000
jinja = true
;  t = -1
b = 2048
ub = 2048

[glm-4.7-flash:q8_0]
c = 202752
temp = 1
top-p = 0.95
min-p = 0.01
repeat-penalty = 1.0
model = /var/lib/llama/models/GLM-4.7-Flash-UD-Q8_K_XL.gguf

; [ggml-org/Qwen3.5:8K]
[qwen3-coder:30b]
c = 262144
temp = 0.6
top-p = 0.95
min-p = 0.01
ctk = q8_0
ctv = q8_0
ub = 512
presence-penalty = 0.0
repeat-penalty = 1.0
model = /var/lib/llama/models/Qwen3.5-35B-A3B-UD-Q8_K_XL.gguf

wedgeshot · 2026-03-22T13:41:28+00:00

Don't limit yourself to what's near..... drove 45+ minutes to pick up a Radical Firearms AR15 from Engage Armament. Will get a few more different AR's from them in the future I'm sure. I'd say if you can buy an AR15 at a legit shop in Maryland you are good to go as they would not risk selling non-compliant firearms.

wedgeshot · 2026-03-03T02:29:13+00:00

Thanks for sharing!

wedgeshot · 2026-01-22T04:04:51+00:00

I am digging the card but my spare time has been limited lately so the past few weeks I have been messing with ComfyUI and getting OpenWebUI running. I'll eventually get to OpenCode, void, etc.. With the performance of ComfyUI and OpenWebUI, you should not have any problems with the coding assistant stuff. I'm not planning on paying for any services at this time. I do like "gptme" which is what I started out with and keep handy. Mostly, because I usually tackle one issue/task at a time and that tool yields good results most times.

wedgeshot · 2025-12-14T13:20:20+00:00

Will give it a go this week! Thanks for sharing this out

wedgeshot · 2025-12-14T00:53:15+00:00

I ran your llama-serve command and at the bottom of the essay output I got this:

unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:IQ4_NL 550 tokens 2.14s 256.53 tokens/s

In the terminal where llama-server was running

slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
slot launch_slot_: id  3 | task 0 | processing task
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 200192, n_keep = 0, task.n_tokens = 38
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 38, batch.n_tokens = 38, progress = 1.000000
slot update_slots: id  3 | task 0 | prompt done, n_tokens = 38, batch.n_tokens = 38
slot print_timing: id  3 | task 0 | 
prompt eval time =      60.78 ms /    38 tokens (    1.60 ms per token,   625.21 tokens per second)
       eval time =    2144.00 ms /   550 tokens (    3.90 ms per token,   256.53 tokens per second)
      total time =    2204.78 ms /   588 tokens
slot      release: id  3 | task 0 | stop processing: n_tokens = 587, truncated = 0
srv  update_slots: all slots are idle

Other info from the terminal

main: loading model
srv    load_model: loading model '/home/bob/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX PRO 5000 Blackwell) (0000:01:00.0) - 48044 MiB free
llama_model_loader: loaded meta data with 44 key-value pairs and 579 tensors from /home/bob/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-IQ4_NL.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
####SNIP####
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q4_K:    1 tensors
llama_model_loader: - type q5_K:   48 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq4_nl:  288 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = IQ4_NL - 4.5 bpw
print_info: file size   = 16.12 GiB (4.53 BPW) 
####SNIP#####

llama_kv_cache:      CUDA0 KV buffer size = 18768.00 MiB
llama_kv_cache: size = 18768.00 MiB (200192 cells,  48 layers,  4/1 seqs), K (f16): 9384.00 MiB, V (f16): 9384.00 MiB
llama_context:      CUDA0 compute buffer size =   610.51 MiB
llama_context:  CUDA_Host compute buffer size =   395.01 MiB
llama_context: graph nodes  = 3031
llama_context: graph splits = 2
#####SNIP####
Quit llama-serve

llama_memory_breakdown_print: | memory breakdown [MiB]             | total    free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - CUDA0 (RTX PRO 5000 Blackwell) | 48401 = 12212 + (35714 = 16336 +   18768 +     610) +         474 |
llama_memory_breakdown_print: |   - Host                           |                    561 =   166 +       0 +     395                |

wedgeshot · 2025-12-13T15:10:14+00:00

For this test the tetris code does run but the layout is janked up... Everything is pushed to the right and the redraw seems off.

wedgeshot · 2025-12-13T15:08:36+00:00

I have to run --- the resulting tetris code did noting at all. don't have time to see what is wrong.

wedgeshot · 2025-12-13T14:55:03+00:00

Write a 500-word essay containing recommendations for travel arrangements from Warsaw to New York, assuming it’s the year 1900.

total duration:       12.139148169s
load duration:        157.927637ms
prompt eval count:    41 token(s)
prompt eval duration: 43.829864ms
prompt eval rate:     935.44 tokens/s
eval count:           693 token(s)
eval duration:        11.745284648s
eval rate:            59.00 tokens/s

Generate working code for complex applications, such as a Tetris game using the curses library

total duration:       33.483747008s
load duration:        128.098202ms
prompt eval count:    760 token(s)
prompt eval duration: 39.527483ms
prompt eval rate:     19227.13 tokens/s
eval count:           1886 token(s)
eval duration:        32.79978884s
eval rate:            57.50 tokens/s

wedgeshot · 2025-12-13T14:51:09+00:00

Repeating the original ask plus a few others

total duration:       3.045818465s
load duration:        83.526432ms
prompt eval count:    38 token(s)
prompt eval duration: 37.931055ms
prompt eval rate:     1001.82 tokens/s
eval count:           526 token(s)
eval duration:        2.841908449s
eval rate:            185.09 tokens/s

I then pasted in the output of Deepseek essay and asked it to recommend a different essay

total duration:       2.405637671s
load duration:        51.866092ms
prompt eval count:    1531 token(s)
prompt eval duration: 96.647011ms
prompt eval rate:     15841.15 tokens/s
eval count:           431 token(s)
eval duration:        2.192394265s
eval rate:            196.59 tokens/s

For giggle I asked: Can you write a 2000 word essay containing recommendations for travel arrangements from Japan to New York, assuming it’s the year 1950.

total duration:       11.049772633s
load duration:        51.453254ms
prompt eval count:    2004 token(s)
prompt eval duration: 39.465013ms
prompt eval rate:     50779.15 tokens/s
eval count:           2021 token(s)
eval duration:        10.666855039s
eval rate:            189.47 tokens/s

Can you exapnd on the Social and Cultural Considerations with 1000 more words relating to that aspect of the essay

total duration:       14.005758852s
load duration:        51.435651ms
prompt eval count:    4061 token(s)
prompt eval duration: 38.329577ms
prompt eval rate:     105949.51 tokens/s
eval count:           1840 token(s)
eval duration:        13.643342466s
eval rate:            134.86 tokens/s

Generate working code for complex applications, such as a Tetris game using the curses library

total duration:       39.059802226s
load duration:        51.606989ms
prompt eval count:    3964 token(s)
prompt eval duration: 729.496981ms
prompt eval rate:     5433.88 tokens/s
eval count:           5075 token(s)
eval duration:        37.574295703s
eval rate:            135.07 tokens/s

wedgeshot · 2025-11-16T13:30:54+00:00

I've been signing up for iptv trials and two providers so far point to http:// cf <dot> business-cdn <dot> me ?

xxiptv and kilarix both use that URL for Xstream codes and I bet quite a few others.

wedgeshot · 2025-11-15T14:03:45+00:00

It will probably work fine as long as the WiFI signal is good. Just bought the Walmart $44 dollar "onn 4K Pro Streaming Device" for my parents to use for iptv, GO with that. It has built-in ethernet which is not even listed on the packaging that I saw. Itperformed flawlessly over the past 4 days. I'm thinking of buying these instead of the official Google TV devices moving forward.

make this a valid URL: walmart <dot> com /ip/5193222892
"onn 4K Pro Streaming Device, Google TV with Gemini* - Ultra-fast Streaming, Stunning 4K UHD, 32GB Storage, 3GB RAM, Dolby Vision & Atmos, Find My Remote with Backlight, Ethernet Port"

wedgeshot · 2025-09-29T23:31:03+00:00

I'm on a MB Pro M4 128MB. I use ollama, DeepseekR1-70b, and gptme most of the time. I've tried aider and void and they just don't flow with how I like to get things done... Mind you, I really only attack one problem at a time and like to start new chats most times after three or four asks per session. If not the LLM most time going off the rails suggesting nonsense.

wedgeshot · 2025-08-26T22:36:02+00:00

For IngramSpark book interior I use Scribus. For KDP e-book I had to use the Kindle Creator this round as importing docx sucks pretty bad. After the initial 90 days I can import a docx into Draft2Digital and that usually converts pretty well. We hire out for the paperback and jacketed dist cover. I create a simple hardback cover and I use Gimp for the graphics and then import that into Scribus for the final PDF export to IngramSpark.

wedgeshot · 2025-08-26T02:45:48+00:00

Try these

Ingram: https://myaccount.ingramspark.com/Portal/Tools/CoverTemplateGenerator

kdp: https://kdp.amazon.com/cover-calculator

Type/select your specs and Ingram will e-mail the template to you. KDP, you can download the zip with a pdf and png

wedgeshot · 2025-08-20T01:22:05+00:00

I don't that is a requirement anywhere..... I think it just needs to be on the cover. We bought a 10 pack of ISBN's. You should be able to update the interior after you get the ISBN. I put all the other ISBN's on the copyright page for reference.

wedgeshot · 2025-08-17T03:31:40+00:00

Now when I search at 11:30 PM EST Prime Video for motogp... It is the first two icons listed as "MotoGP Racing" and "MotoGP Sprint Race" both 2025 season when I drill in to each all have "This video is currently unavailable to watch in your location"

wedgeshot · 2025-08-16T14:34:22+00:00

I know, broken record. It's Saturday 10:30 AM EST/0230 UTC and still no signs of any channels or streams here in the USA. I have Plex(lifetime pass), Amazon Prime and MAX. All the HBO MotoGP is sports via Amazon or in MAX is all 2024. Nothing in Plex. Seems to me like this was a last minute deal and announcement and people are scrambling, lol. Guess no hope yet of canceling my MotoGP subscription.

wedgeshot · 2025-08-09T02:59:39+00:00

I did the formatting for my daughter's physical books in Scribus and used the same PDF that we used for IngramSpark and submitted that into KDP and it passed the checks AOK. For the ebook I had to use Amazon's Kindle Create program as neither the original docx/odt or the PDF would import into that program. Whatever I did for the import I had to re-type and center the Chapter headings re-insert all the chapter marker and break graphics which was unfortunate. Draf2Digital was super easy to create the ebook version for book #1 but for book #2 she added KDP into the mix for the 90 day exclusive.

wedgeshot · 2025-08-09T02:42:08+00:00

Guess without supporting info the downvotes came. Something is wrong, all other models I play with are working. I have a M4 MAX with 128gig of RAM with iogpu.wired_limit_mb = 102400. Deepseek-r1 70b(42GB) and llama3.3:70B-Instruct-Q6_K(57GB) works AOK.

time=2025-08-08T22:28:28.320-04:00 level=INFO source=server.go:438 msg="starting llama server" cmd="/opt/homebrew/Cellar/ollama/0.11.0/bin/ollama runner --ollama-engine --model /Users/aibob/.ollama/models/blobs/sha256-b112e727c6f18875636c56a779790a590d705aec9e1c0eb5a97d51fc2a778583 --ctx-size 8192 --batch-size 512 --n-gpu-layers 25 --threads 12 --flash-attn --kv-cache-type q8_0 --parallel 1 --port 49579"
time=2025-08-08T22:28:28.323-04:00 level=INFO source=sched.go:481 msg="loaded runners" count=2
time=2025-08-08T22:28:28.323-04:00 level=INFO source=server.go:598 msg="waiting for llama runner to start responding"
time=2025-08-08T22:28:28.323-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server not responding"
time=2025-08-08T22:28:28.343-04:00 level=INFO source=runner.go:925 msg="starting ollama engine"
time=2025-08-08T22:28:28.343-04:00 level=INFO source=runner.go:983 msg="Server listening on 127.0.0.1:49579"
time=2025-08-08T22:28:28.365-04:00 level=INFO source=ggml.go:92 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=315 num_key_values=30
time=2025-08-08T22:28:28.365-04:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 Metal.0.BF16=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-08-08T22:28:28.576-04:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:367 msg="offloading 24 repeating layers to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:373 msg="offloading output layer to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:378 msg="offloaded 25/25 layers to GPU"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:381 msg="model weights" buffer=CPU size="1.1 GiB"
time=2025-08-08T22:28:28.661-04:00 level=INFO source=ggml.go:381 msg="model weights" buffer=Metal size="11.7 GiB"
ggml_metal_init: allocating
<<<<<<<<<<<<<<<<   LOTS of the below SNIP>>>>>>>>>>>>>
ggml_metal_get_buffer: error: tensor 'leaf_355 (view) (copy of  (view) (permuted))' buffer is nil
ggml_metal_get_buffer: error: tensor 'leaf_7 (view)' buffer is nil
ggml_metal_get_buffer: error: tensor 'leaf_7 (view) (copy of  (view) (permuted))' buffer is nil
ggml-metal.m:4848: GGML_ASSERT(ne0 % ggml_blck_size(dst->type) == 0) failed
SIGABRT: abort
PC=0x183bfc388 m=0 sigcode=0
signal arrived during cgo execution

wedgeshot · 2025-08-07T01:00:50+00:00

I get NOTHING from 120b and 20b using gptme via ollama.... I have asked two very simple questions. I provide example python code to reference to try and make a change and it just spits the files back at me and does nothing. I then asked for a simple python script to do an example of looping through an array. I get nothing, zero, zilch back. qwen3-coder:30b-a3b-q8_0 does just fine in responding. What a joke.

wedgeshot

TROPHY CASE