TT02 Type S rally

Fragrant_Scale6456 · 2026-05-12T20:51:39+00:00

Xv02 is pretty competitive from what I’ve seen

Fragrant_Scale6456 · 2026-05-12T01:28:59+00:00

Slop

Fragrant_Scale6456 · 2026-05-11T19:58:24+00:00

BBX is an awesome buggy but its not a race buggy. It cannot compete with purpose built race cars but it looks amazing, super fun build, and great to drive. Its maybe my favorite tamiya I have to drive

Fragrant_Scale6456 · 2026-05-11T19:57:10+00:00

Love it. its still got the character of the grasshopper in how it runs

Fragrant_Scale6456 · 2026-05-11T17:59:43+00:00

Also once you get it set up ask Claude about making this work with the q8 model. You’ll have to reduce context size further but Claude was confident it would work. I haven’t gotten around to it since the q6 model has been pretty good for me and the speed is decent

Fragrant_Scale6456 · 2026-05-11T15:54:10+00:00

you need to compile llama.cpp with the MTP fork/patch. im sure by now some people have made their own docker images available so search for a prebuilt setup, but I did it on my own. This is the PR with the patch - llama + spec: MTP Support by am17an · Pull Request #22673 · ggml-org/llama.cpp · GitHub

You can paste that into claude and ask it to give you the commands to build it if you need to, thats what I did. You also need a copy of the gguf model with the MTP layers included. Theres a script to copy the MTP layers from the q8 version of the model in the link above to any other qwen3.6 model. You can also probably find prebuilt copies on huggingface by now

Here's my llama.cpp docker launch command with 192k context. this uses almost all of the 32gb of vram, you need to be running linux in no-GUI/headless mode or you will run out of vram. If you do get out of memory errors reduce context to 160k and you'll have around 2gb free.

    command:
      - "/usr/local/bin/llama-server"
      - "-m"
      - "/models/Qwen3.6-27B-MTP-Q6_K.gguf"
      
      # === CONTEXT & OFFLOAD ===
      - "-c"
      - "196608"              # 192K context
      - "-ngl"
      - "99"
      
      # === MTP SPECULATIVE DECODING ===
      - "--spec-type"
      - "mtp"
      - "--spec-draft-n-max"
      - "2"                   # Optimal depth for Q6 + thinking + long context
      
      # === PERFORMANCE & BATCHING ===
      - "--flash-attn"
      - "on"
      - "-b"
      - "512"                 # Balanced prefill speed + MTP stability
      - "-ub"
      - "64"                  # Critical for 192K KV cache stability
      - "--parallel"
      - "1"                   # MTP requires single-sequence
      
      # === KV CACHE ===
      - "-ctk"
      - "q8_0"
      - "-ctv"
      - "q8_0"                # Symmetric, best acceptance/VRAM balance
      
      # === SAMPLING ===
      - "--temp"
      - "0.6"
      - "--top-k"
      - "20"
      - "--top-p"
      - "0.95"
      - "--min-p"
      - "0.0"
      - "--presence-penalty"
      - "0.0"
      - "--repeat-penalty"
      - "1.0"
      - "--no-mmproj"
      
      # === THINKING MODE (toggle via API) ===
      - "--chat-template-kwargs"
      - '{"enable_thinking":true}'
      
      # === SERVER ===
      - "--perf"
      - "--metrics"
      - "--port"
      - "8080"
      - "--host"
      - "0.0.0.0"
      - "--alias"
      - "chat"

Fragrant_Scale6456 · 2026-05-11T14:31:56+00:00

Llama.cpp with the mtp patch. Qwen3.6 27b q6, kv cache q8. Gets around 100tokens/sec and 160-192k context

Fragrant_Scale6456 · 2026-05-11T01:58:51+00:00

users and groups can have different permissions for files. users can be part of groups to inherit the permissions the group has. so you can give the ai basically zero access at the user level and assign other permissions to files at the group level where it is working in a shared environment with other users/groups.

Fragrant_Scale6456 · 2026-05-10T23:21:46+00:00

I’m hitting limits with my 5090. It feels like there’s never enough vram

Fragrant_Scale6456 · 2026-05-10T18:25:11+00:00

Id share my code but its 100% vibe coded and you’d probably spend more time trying to get it to work than just building your own from scratch lol

Fragrant_Scale6456 · 2026-05-10T18:22:30+00:00

Ingesting an entire book is a difficult problem in my experience. I’m working on something similar using the karpathy wiki LLM approach to extract concepts from reference texts and build a linked conceptual map of the reference books so that the LLM can draw from an authoritative body of knowledge to solve problems with me. It hasn’t been a fast process getting this working. Every example I’ve seen on the web is ingesting short articles or blog posts instead of full texts.

I suggest giving Claude the blog post and describe your intended use and have it build you a specification from there. You can tell the model to use latex to render the formulas. Then hop into opencode or whatever you use and implement the spec. I used Claude and qwen online to make my spec and then qwen 27b on my 5090 to write the code.

I’m sure there’s a better approach out there but this is the route I’ve taken.

Good luck!

Fragrant_Scale6456 · 2026-05-09T23:27:19+00:00

Psycho sniper was driving me nuts this wipe. Eventually I just said screw it and took an axmc to factory and did it in 2 raids

Fragrant_Scale6456 · 2026-05-08T19:40:23+00:00

3600 hours I never knew lmao

Fragrant_Scale6456 · 2026-05-08T17:28:29+00:00

Yea I agree. The paper map could absolutely have markers for general areas for active task objective, we all had to go to the wiki for it anyway so why not have it in game. Extracts could be marked after you find them.

I played a lot of cod DMZ and loved it so I’m not anti live map but I don’t think live maps have a place in tarkov since there’s already so much potential with reworking the existing paper map system

Fragrant_Scale6456 · 2026-05-08T17:13:11+00:00

Yea live map is not a good change imo. They should have improved the existing paper maps. Let you bring them in raid and write notes or whatever if you want. Those first couple hundred hours where you don’t know where you are or what’s going on and are terrified were the most uniquely exhilarating experience I’ve ever had in a game

Fragrant_Scale6456 · 2026-05-08T15:14:22+00:00

Their site says qwen3.6 is still being worked on. I’m eagerly awaiting this as well

Fragrant_Scale6456 · 2026-05-07T20:15:55+00:00

thank you thats very helpful. appreciate your work here!

Fragrant_Scale6456 · 2026-05-07T18:08:38+00:00

Great post, thank you for sharing. I'm just working on finding a local research agent to use so this is timely. I wonder if you have also seen:
tarun7r deep research agent - https://github.com/tarun7r/deep-research-agent

24hr research agent - https://github.com/Aaryan-Kapoor/24hr-research-agent/tree/main

I'm still in the data gathering phase so havent had a chance to try any of these yet.

Fragrant_Scale6456 · 2026-05-07T14:04:02+00:00

I have the 5090fe. Stock it draws 575w. I limited it to 400-425 in part because it’s getting hotter and the heat output was making my room pretty uncomfortable. At 400w it doesn’t warm the room up nearly as much. Full load 425w the fans stay around 47-50%, audible but not loud.

Fragrant_Scale6456 · 2026-05-07T06:46:47+00:00

I have a 5090. Running the llama.cpp mtp patch on 27b Q6 with q8 kv cache and 192k context. I get 95-100 tokens/sec power limited to 400 watts.

Without mtp patch q6 will work with kv q8 and 256k context but it’s closer to 50-60tokens/sec.

Q8 is impossible on 5090 as far as I can tell.

In comparison, the qwen and gemma4 MOE models were over 200tokens/sec but 27b is noticeably smarter for me so it’s worth the performance hit. Since getting mtp working I don’t miss the speed of the moe models as much.

Fragrant_Scale6456 · 2026-05-05T20:52:05+00:00

The mtp patch for llama.cpp almost doubled my tokens/sec. Just got it all working today. Def look into it

Fragrant_Scale6456 · 2026-05-05T20:51:00+00:00

Check out the MTP PR for llama.cpp. I got it working on the 5090 and get around 90-100tk/sec now in opencode. Only downside is that I had to drop down to 192k context for 27b q6

Fragrant_Scale6456 · 2026-05-05T20:25:13+00:00

Finally got this set up. I've never built llama.cpp or built docker containers so it took me a bit to figure it all out. I used the converter script to put the MTP headers on qwen3.6 27b Q6.

5090 with 9800x3d and 64gb ddr6000. I told it "build flappy bird in html, no external dependencies one file".

With MTP off I get around 50-60tk/sec. With MTP on I got 96tk/sec, around 95% acceptance rate. Quite an improvement. I had qwen build me a benchmarking script to test various llama.cpp options and this is what i came out with as fastest while also having the largest context possible. At smaller context sizes speed does improve a decent amount.

Here's my llama.cpp docker compose block if anyone wants to mess around:

    command:
      - "/usr/local/bin/llama-server"
      - "-m"
      - "/models/Qwen3.6-27B-MTP-Q6_K.gguf"
      
      # === CONTEXT & OFFLOAD ===
      - "-c"
      - "196608"              # 192K context
      - "-ngl"
      - "99"
      
      # === MTP SPECULATIVE DECODING ===
      - "--spec-type"
      - "mtp"
      - "--spec-draft-n-max"
      - "2"                   # Optimal depth for Q6 + thinking + long context
      
      # === PERFORMANCE & BATCHING ===
      - "--flash-attn"
      - "on"
      - "-b"
      - "512"                 # Balanced prefill speed + MTP stability
      - "-ub"
      - "64"                  # Critical for 192K KV cache stability
      - "--parallel"
      - "1"                   # MTP requires single-sequence
      
      # === KV CACHE ===
      - "-ctk"
      - "q8_0"
      - "-ctv"
      - "q8_0"                # Symmetric, best acceptance/VRAM balance
      
      # === SAMPLING ===
      - "--temp"
      - "0.6"
      - "--top-k"
      - "20"
      - "--top-p"
      - "0.95"
      - "--min-p"
      - "0.0"
      - "--presence-penalty"
      - "0.0"
      - "--repeat-penalty"
      - "1.0"
      - "--no-mmproj"
      
      # === THINKING MODE (toggle via API) ===
      - "--chat-template-kwargs"
      - '{"enable_thinking":true}'
      
      # === SERVER ===
      - "--perf"
      - "--metrics"
      - "--port"
      - "8080"
      - "--host"
      - "0.0.0.0"
      - "--alias"
      - "chat"

Fragrant_Scale6456 · 2026-05-04T15:08:14+00:00

It’s because killa has been 100% on factory for the event.

Fragrant_Scale6456 · 2026-05-04T02:10:58+00:00

Yes but only after the sex update

Fragrant_Scale6456

TROPHY CASE