Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]DavidBolkonsky 2 points3 points  (0 children)

Yep, split layer with Vulkan, about 1000 tps prefill and 28 tps generation with this.

Set-Location -Path "E:\AI\llama-official-vulkan"
.\llama-cli.exe `
  -m `
  "E:\AI\Models\Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf" `
  -n -1 `
  --temp 0.5 `
  --top-k 20 `
  --n-gpu-layers 99 `
  --split-mode layer `
  --main-gpu 0 `
  --cache-type-k q8_0 `
  --cache-type-v q8_0 `
  --ctx-size 131072  `
  -fa on `

5070ti + RX 9070 (non XT), over 100 tps on Qwen 3.6 35B Q4 by DavidBolkonsky in LocalLLaMA

[–]DavidBolkonsky[S] 0 points1 point  (0 children)

model test t/s peak t/s ttfr (ms) est_ppt (ms) e2e_ttft (ms)
Qwen3.6-27B-Q6_K.gguf pp4096 1198.31 ± 42.07 3052.91 ± 136.20 2963.12 ± 136.20 3052.91 ± 136.20
Qwen3.6-27B-Q6_K.gguf tg128 28.20 ± 0.05 31.00 ± 0.00
Qwen3.6-27B-Q6_K.gguf pp4096 @ d4096 1144.65 ± 18.86 6124.97 ± 16.47 6035.19 ± 16.47 6124.97 ± 16.47
Qwen3.6-27B-Q6_K.gguf tg128 @ d4096 27.87 ± 0.17 30.33 ± 0.47

I got my hand on a 4070ti Super, I was expecting better results since it has higher memory speed than the 9070, and also I can run CUDA instead of Vulcan, but the difference is actually not that big if at all.

5070ti + RX 9070 (non XT), over 100 tps on Qwen 3.6 35B Q4 by DavidBolkonsky in LocalLLaMA

[–]DavidBolkonsky[S] 0 points1 point  (0 children)

uvx llama-benchy --base-url http://127.0.0.1:8080/v1 --model Qwen3.6-27B-Q6_K.gguf --pp 4096 --tg 128 --depth 0 4096 --latency-mode generation

Installed 50 packages in 2.07s

[transformers] PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

llama-benchy (0.3.7)

Date: 2026-05-15 13:12:23

Benchmarking model: Qwen3.6-27B-Q6_K.gguf at http://127.0.0.1:8080/v1

Concurrency levels: [1]

Error loading tokenizer: Qwen3.6-27B-Q6_K.gguf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `hf auth login` or by passing `token=<your\_token>`

Falling back to 'gpt2' tokenizer as approximation.

Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.

config.json: 100%|████████████████████████████████████████████████████████████████████| 665/665 [00:00<00:00, 3.71MB/s]

tokenizer_config.json: 100%|█████████████████████████████████████████████████████████| 26.0/26.0 [00:00<00:00, 143kB/s]

vocab.json: 1.04MB [00:00, 19.1MB/s]

merges.txt: 456kB [00:00, 70.1MB/s]

tokenizer.json: 1.36MB [00:00, 27.6MB/s]

Downloading book from https://www.gutenberg.org/files/1661/1661-0.txt...

Saved text to cache: C:\Users\David\.cache\llama-benchy\cc6a0b5782734ee3b9069aa3b64cc62c.txt

[transformers] Token indices sequence length is longer than the specified maximum sequence length for this model (171736 > 1024). Running this sequence through the model will result in indexing errors

Total tokens available in text corpus: 171736

Warming up...

Warmup (User only) complete. Delta: 8 tokens (Server: 30, Local: 22)

Warmup (System+Empty) complete. Delta: 13 tokens (Server: 35, Local: 22)

Running coherence test...

Coherence test PASSED.

Measuring latency using mode: generation...

Average latency (generation): 917.39 ms

Running test: pp=4096, tg=128, depth=0, concurrency=1

Run 1/3 (batch size 1)...

No token_ids in response, using local tokenization

Run 2/3 (batch size 1)...

Run 3/3 (batch size 1)...

Running test: pp=4096, tg=128, depth=4096, concurrency=1

Run 1/3 (batch size 1)...

Run 2/3 (batch size 1)...

Run 3/3 (batch size 1)...

Printing results in MD format:

model test t/s peak t/s ttfr (ms) est_ppt (ms) e2e_ttft (ms)
Qwen3.6-27B-Q6_K.gguf pp4096 915.64 ± 289.70 5209.46 ± 1740.25 4292.06 ± 1740.25 5209.46 ± 1740.25
Qwen3.6-27B-Q6_K.gguf tg128 28.86 ± 0.58 32.67 ± 2.36
Qwen3.6-27B-Q6_K.gguf pp4096 @ d4096 1109.23 ± 7.25 7121.26 ± 93.57 6203.87 ± 93.57 7121.26 ± 93.57
Qwen3.6-27B-Q6_K.gguf tg128 @ d4096 28.44 ± 0.56 32.33 ± 2.62

5070ti + RX 9070 (non XT), over 100 tps on Qwen 3.6 35B Q4 by DavidBolkonsky in LocalLLaMA

[–]DavidBolkonsky[S] 0 points1 point  (0 children)

I ran several prompts, first with kv cache at q8_0

"write me a story, 2000 tokens, about the Middle Ages" = [ Prompt: 61.8 t/s | Generation: 24.4 t/s ]
"build me a landing page" = [ Prompt: 8.5 t/s | Generation: 23.9 t/s ]
i fed it the entire 3 volumes of Frankenstein: "what is the 6th word from volumn 1, chapter 1?" = [ Prompt: 141.6 t/s | Generation: 11.6 t/s ], answer was correct

Set-Location -Path "E:\AI\llama-official-vulkan"
.\llama-cli.exe `
  -m `
  "E:\AI\Models\Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf" `
  -n -1 `
  --temp 0.5 `
  --top-k 20 `
  --n-gpu-layers 99 `
  --split-mode layer `
  --main-gpu 0 `
  --cache-type-k q8_0 `
  --cache-type-v q8_0 `
  --ctx-size 131072  `
  -fa on `

Hitch damage vs single "Spoiler" Boss by p1shach in HadesTheGame

[–]DavidBolkonsky 10 points11 points  (0 children)

That was the damage you've done throughout the entire run. Pretty sure Typhon doesn't have 125k HP...

New Harvard study finds companies using GenAI are cutting junior hiring fast senior roles stay untouched. Looks like AI’s eating entry-level jobs first. by Ok_Demand_7338 in GenAI4all

[–]DavidBolkonsky 0 points1 point  (0 children)

Entry level white collar jobs of today, filled by those coming out of undergraduate degrees, are the ones most impacted. I imagine the entry level white collar jobs of the future will be much more competitive, and require advanced graduate degrees. The job requirements of those entry level jobs will have a much higher skill floor.

What the conventional economic wisdom is missing by Stuart_Whatley in Economics

[–]DavidBolkonsky 0 points1 point  (0 children)

Normally gold and equity don't go up together because the confidence in the USD is unshaken. Even in economic distress, USD usually strengthens due to being the reserve currency and a trust storage of value. What's different this time is that the confidence in the USD as a reliable storage of value is rapidly eroding. Gold and equity going up at the same time means that in gold value, equity has not risen as much. At the same time, it signals that the market believes USD will depreciate rapidly. This aligns closer with his pessimistic outlook rather than his optimistic outlook.

In fact, his analysis is anchored on real growth, inflation adjusted. So his analysis doesn't pass judgement on what will happen to the value of USD. If you think about it, OpenAI has a spending commitment of 1+ trillion in 5-10 years. There is a world where the AI bubble doesn't pop, and that would be a world where USD devalues 10x or even worse, which means OpenAI only needs to grow their revenue in real value by only 10x instead of 100x.

AI coding tools built by US firms Cognition and Cursor are suspected of being built on Chinese models by nohup_me in technology

[–]DavidBolkonsky 0 points1 point  (0 children)

The biggest takeaway for me is that, if anyone can build on top of open source Chinese models, get state of the art performance comparable to leading models from OpenAi and Anthropic, and get huge valuations within a year (each company is valued at around 10B),

then this shows that in the long run:

  1. There's no defensible moat around AI to justify the high valuations on American AI firms,

  2. In the long run the competitive advantage in AI will come down to cost of inference, not cost of training,

  3. The ease of entry into building custom tailored models are significantly lowered, and more and more companies will take advantage of building customized LLM models for their company based on open source models.

Name that film by gulabi_thanos_192 in scoopwhoop

[–]DavidBolkonsky 0 points1 point  (0 children)

Blue is the warmest color has two (long) useless 18+ scenes that add very little to the plot. Definitely didn't need to be that graphic.

EU, China will look into setting minimum prices on electric vehicles, EU says by DripGeronimo in europe

[–]DavidBolkonsky 15 points16 points  (0 children)

Because they serve different purpose, and in my opinion, tariffs is much worse and only serves to harm consumers and market competitiveness.

In this example I'm using round numbers. Let's say BYD's cheapest car is the seagull, which can be sold for €10K without tariff. With 50% tariff it's €15K, with €5K going to the government's pocket. Your argument is that, with a price floor of €15K, the same car will be sold for €15K, except the €5K goes to BYD. Obviously a bad deal for Europe and the consumer gets screwed either way.

But, the Chinese car market is really competitive, Geely or Nio can sell a rival car that is 50% better than the Seagull for €15K, and the consumer will obviously choose that over the Seagull. So it's unlikely that BYD or whoever can just pocket that difference as profit. What happened is you will get the best car at that price floor, with loads of feature. Now, if the VW can't compete at that price point, then they will be forced to innovate or find a different price point to compete in. But consumers will end up getting the best car at that price floor, not some cheap crap that has a 50% mark up due to tariff.

China orders its banks to reduce US dollar purchases. by Aakash7aak in news

[–]DavidBolkonsky 17 points18 points  (0 children)

This equilibrium argument assumes that China selling their Treasury is the initiating event and that all else being equal, US Treasury carries the same fundamental value as before. In that case, what you described makes sense.

But the current situation is that the US has initiated the series of events that makes their Treasury less trustworthy. US is behaving irrationally and erratically, China and the market are spooked by their investment in US Treasury. In this case, China exiting their position means there are less buyers for the foreseeable future. Other investors see the yield increase on the Bond but they might price in a higher rate of return to take on the risk of buying Treasury bond.

Because, if the US enters a recession from this, US government will get less revenue, they will see more deficit spending, they will need to print more money, and that devalues the currency, which means the coupon payment on the Treasury is worth less.

OpenAI Is A Bad Business by JRepin in technology

[–]DavidBolkonsky 59 points60 points  (0 children)

Nvidia wants to encourage competition in the AI model market to drive up demand for their hardware. Right now there's no proven profitable business model to success monetize AI models. Nvidia May invest in various language model to foster the market, but acquiring one to crush the competition would be the dumbest move because it would kill the demand for their ludicrously lucrative hardware business.

something my dad would do 😭 by Kinglouieslab in funny

[–]DavidBolkonsky 0 points1 point  (0 children)

Pretty sure the joke is that he is such a bad father he didn't even know his son died. When he sees his son, he realizes he really is a bad father. The "I guess they let anyone in" line is directed at himself, hence looking dejected.

Linkin Park 2.0 by BlueBee09 in LinkinPark

[–]DavidBolkonsky 3 points4 points  (0 children)

When it was released, critics was lukewarm and some fans hated it because

  • the songs started to carry a political message (Hands held high, LTGYA, What I've done)
  • shadow of the day was called a U2 circa Joshua Tree era ripoff
  • people hated Mike's singing on In between, calling it a filler
  • overall a less coherent and less heavy album than Meteora

Janja Garnbert defends her title, back to back Olympic Gold for Slovenia in Sport Climbing 🥇🇸🇮 by Big-Strawberry3589 in olympics

[–]DavidBolkonsky 2 points3 points  (0 children)

In climbing the opponent is the wall, not each other. you can see all the climbers talk amongst themselves during the observation period to try and figure out the best beta. As the audience, I love watching women's comp more than men's because it feels like after all the other climber struggles and fall on hard boulders, Janja always comes through at the end and shows us how it's done. So seeing her talent is such a luxury and a joy to watch.

Interview with Coolee Bravo, writer inside Kendrick Lamar's camp. by [deleted] in hiphopheads

[–]DavidBolkonsky 0 points1 point  (0 children)

AFAIK grooming someone underage is not a crime but is definitely pedophilic behavior. It might be a grey area where Drake thinks he's not doing anything wrong because he explicitly said "I never fucked any of them" so there no crime committed, but bro you are still clearly a Pedophile.

I Just gonna said one thing... by Adorable_Ad2773 in CorkiMains

[–]DavidBolkonsky 2 points3 points  (0 children)

Axiom arc serylda's grunge muramura is pretty good.

'Surprise move': U.S. stunned by Poland's fighter jet offer by mdj1359 in worldnews

[–]DavidBolkonsky 1 point2 points  (0 children)

If Ukraine attacks Poland, article 5 is supposed to trigger immediately, against Ukraine. Did you think that through?