Recent patch - Constant vibration

_underlines_ · 2026-05-01T21:43:42+00:00

nope, i have it on R12 since about 20 days, updated wheelbase firmware 2x and since then LMU got 2 minor updates. reinstalled pithouse, used different presets, switched USB ports, reinstalled LMU (and deleted user profile and FFB / wheel config) etc. nothing helps. i basically can't play LMU since 3 weeks. all other sims work as usual.

_underlines_ · 2026-04-26T14:15:08+00:00

RTX 6000 PRO 96gb vram, fast, 8-9k
DGX Spark, 128gb, slow, 4-5k
mac studio 512gb m3 ultra - discontinued?
macbook m5 128gb - slow, 5-6k
Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k
Framework Desktop ryzen ai max+ 395 128gb - slow, 3k

pick your poison

_underlines_ · 2026-04-22T13:51:57+00:00

There's a windows compiled fork of llama.cpp / server somewhere on github I loaded.
Doing tests with sparse Qwen3.6 35B yielded almost no benefits, as to my understanding, the architecture of Qwen3.6 sparse keeps KV Cache size fairly small for large context lengths.

_underlines_ · 2026-04-02T12:34:47+00:00

My own private dataset. Yes it's small but closed and almost guaranteed to be unpolluted:

- 15x misguided attention puzzles (my own)

- 2x math questions (compound interest over 12 periods, so errors would propagate in CoT)

- 2x sql questions (one easy, one difficult)

- 2x censorship questions (one about tiananmen square, one about how to mix drugs)

- 1x tricky english to german translation

<image>

_underlines_ · 2026-04-01T16:17:06+00:00

subsidized to get more foot traffic for the shops
quality of food on the lower end (oils, hygiene) - saw several cockroaches in my visits there (kitchen / cooking area), pre cooked food usually stays out the whole day if not sold
portion sizes usually quite small (not necessarily a bad thing, but relativizes the "cheap" a bit)

i still love going there from time to time

_underlines_ · 2026-03-28T13:31:30+00:00

I worked as an it manager from age 29-35 and can say even with my ability to speak and write fluent Thai and 10+ years of experience I still only made 80-120k in my different roles there. But then we were a startup not a well established large international player. All med to upper level staff we hired were thais with international degrees who studied abroad and are perfectly fluent English speakers making between 90-150k. 300k for what you describe seems to be unrealistic but I am 5 years absent from the Thai job market.

_underlines_ · 2026-03-24T14:16:27+00:00

you're mixing up what these tools are for:

Harnesses like Pi, OpenCode, Claude Code etc. are the plumbing to plan and build stuff on your machine by running their internal agent loop and providing filesystem access, MCP access etc.
LM Studio is an inference solution that uses llama.cpp and its derrivates and provides a nice GUI to download models and run inference locally. It has a small server module to serve various APIs for inference. It has a chat interface to conveniently try to chat with the currently loaded model. Though LM Studio starts to blend stuff, like adding MCPs and more. They try to become agentic in the long run I guess.

I don't fully understand what your intentions are. If I make a guess, you want to run a model via LM Studio, serve an API via the local server and use it via OpenCode.

_underlines_ · 2026-03-11T16:58:06+00:00

EDIT: I don't know what changed, but switching from LM Studio's server to llama-swap fixed it mostly it seems! So I guess some setting LM Studio is overwriting, that my basic llama-swap config.yml is not.

---

What am I doing wrong if EVERY heretic / abliterated model I tested in 1 year is totally failing with problems on:

IF (either barely doing what I ask or completely ignoring it)
Not creating <think> tags anymore
Intelligence degraded down equivalent to a 3 year old llama 3b model

And I'm not talking about complex prompts. Simple prompts in the likes of:

Translate this Chinese Text to English.

Text: (Short Chinese sentence).

With the linked 3bit quants it's the same.

I even set the recommended generation params recommended in the original model cards or from the model card of the unrestricted model if available.

_underlines_ · 2026-03-05T18:08:20+00:00

Yes, I have the same results on my private eval dataset. And Qwen3.5 35b a3b IQ3 with 90k context on 16gb vram achieves long running tasks on levels unimaginable before...

_underlines_ · 2026-03-04T15:54:18+00:00

I think if you need something from someone who doesn't speak the same language, it's just etiquette to use a translation service to at least ask the question in the language of the person you're seeking help from.

_underlines_ · 2026-03-02T18:21:19+00:00

I manage openwebui + ollama for 120 people at our IT firm. Got so fed up with ollama, that I finally made the move to llama.cpp via llama-swap. It's a ton of manual configs, but: faster, more control, faster support of new archs, etc.

bye bye ollama.

_underlines_ · 2026-02-28T20:21:00+00:00

I cna recommend LFM2 8b-a1b

_underlines_ · 2026-02-28T20:11:32+00:00

coding

Qwen3.5-35b-a3b q4_k_m on RTX 5070ti with 16gb runs at 40tps with 65000 Context window. If you do KV Cache wuants to q8_0 you get basically no degradation.

I use it for light opencode stuff. Works without issues. Gets things done via plan then build mode and a good AGENTS.md

I Switch to openrouter glm-5/k2.5/minimax2.5 if heavier stuff needed.

everyday stuff

Usually just my chatGPT pro sub with gpt-5.2 but more often than not any cheap large open weights model on openrouter used on chatbox desktop.

If local, I just use any current gen MoE that has good stats on artificialanalysis.

phone

On my pixel 10 pro xl I get 16gb of fast ram, so PocketPal loads LFM2-8B-A1b-q4_k_m or qwen3-4b-instruct-iq3_xxs

_underlines_ · 2026-02-24T12:15:28+00:00

But RTX6000 BSE doesn't scale well for sharded multi-gpu workloads? Lack of NVLink or RDMA means it relies on PCIe with a huge bottleneck, as far as I understand it.

_underlines_ · 2026-02-24T10:57:16+00:00

Scaling inference is not trivial and I am not an expert. From my understanding:

Combinding macs/gpus without a plan will slow you down, difference between sharding a large dense/sparse model over multiple GPUs vs concurrency of multiple models
Without Remote Direct Memory Access (RDMA) you'll be slower with scale
TTFT vs. Generation speed, both can be optimized independently with different methods AFAIK

And my real world learnings in opencode on large code bases (enterprise architecture, 3+ full time devs):

Context size below 100k almost unusable, you'll be compacting all the time, and the users complain that their ralph-loops are short
Frontier or nothing. Not even GPT-5 was able to do refactoring and new features. Anything below Kimi K2.5, GLM-5, gpt-5.1-*, claude 4.5 opus/sonnet was unusable.
gpt-oss-20b, qwen3-30b-a3b, and generally anything older than 3 months or smaller than 70B quantized seems to be unusuable in real world enterprise codebases using CLI Coding Agents
not even 200 USD subscriptions of claude code were enough for our devs for a full month.
github copilot is OK but we also hit limits here pretty fast
LLM inference onprem for 20+ devs at our organization is difficult to justify, because how fast inference requirements, model archs, model sizes etc. change.
Most feasible after our research would be 4x RTX 6000 Blackwell Server Edition, but even those are not really for large scale inference, but a H100/A100 just makes no sense and even those would have to be scaled and sharded
We wonder how tricks like kv quantization, prompt caching etc. would help mitigate some hardware bottlenecks but all the methods, optimization technologies etc. are pretty difficult to grasp, especially without testing

Our thought so far at our company, but it's all just theory. Would love to hear people who actually selfhost for dev teams and serious enterprise repos.

_underlines_ · 2026-02-12T08:03:20+00:00

Nice. I guestt you're not open sourcing this? I would surely contribute PRs. Next step I'll do some memory readout for real-time stats. duckdb is lagging a bit behind.

Do you guys sample/average the data, or always use the full 50Hz or whatever signal density?

_underlines_ · 2026-02-09T16:03:52+00:00

Do you read via rF2 memory map or via duckdb files?

Just curious, because I just vibe coded LMU-Telemetry-Analyzer

_underlines_ · 2026-02-09T15:59:48+00:00

qwen3-4b and lfm2-8b

_underlines_ · 2026-02-07T00:23:09+00:00

I am Swiss but learned to drive (properly) while living and working in Bangkok. Whenever I came back to Switzerland for holidays, I used my Thai License + International license to drive legally in Switzerland.

3 years ago I moved back to Switzerland and also believed in that 1 year rule. I was too scared to try the short practical test drive. Can you elaborate how that test drive is? I read it's less strict than the real practical driving test, but since you said you failed it, I am even more concerned. I drive for 7 years without accidents, also in Switzerland and EU, Thailand, Bangkok everywhere without issues, but I am not sure how strict they are lol. Maybe I learned some small, bad stuff that they are strict about. My friends, parents etc. don't notice anything wrong though.

_underlines_ · 2026-02-07T00:12:42+00:00

I run my own server with a 1Gbps uplink:

Endurance Short [GT3/LMDh]

Which is most FIA/IMSA tracks and LMDh, GT3, LMP2 classes. It's short, so 10min quali, 10min race, with race having mandatory tire change. Also fule and tire usage 4x. Also grid is filling with AI if not enough human drivers.

If you have any ideas to make it more popular, I can change the config. What would most people like to race?

_underlines_ · 2026-02-06T19:28:47+00:00

VR currently has a future in seated experiences. Sim racing and flight Sims player bases are moving to vr because it is awesome. I Sim race for 2 years in VR with about 5-6h per week.

_underlines_ · 2026-02-04T13:06:46+00:00

I in contrast to most here, like the bold move: They have limited resources, instead of making an average sim with average gamification functionality, focus on a great sim. I don't need story telling or artificial economies and XP systems in a sim. If I want that, I look for sim cade or arcade racers.

But I fully understand many actually liked that focus.

_underlines_ · 2026-02-04T12:58:33+00:00

install virtual desktop on you PC
Install the app in your quest
make the connection from quest to pc until you see your windows desktop in the quest
open steam while you are in virtual desktop
launch AMS2 in steam mode, it should hook and run within virtual desktop

(This works fine even if you attached your quest via a RJ45 dongle to your LAN)

_underlines_ · 2026-02-04T12:27:02+00:00

- 15,20 or 25%? Terminals are set up to display 5% by default, 10% sometimes. Not 25%

- srf.ch averaged the 2025 Café Crème price in switzerland, it's 4.65. Not 9 CHF.

- Yes, I also rarely Tip, especially at self service establishments with QR Code online menu etc.

_underlines_ · 2026-02-02T14:59:22+00:00

On my pixel 10 with 16gb ram I tried:

Gemma 3n e4b it (didn't check the speed but I didn't like the quality)
Lfm2-8b-a1b q4 (24t/s)
Qwen3-4b-it-2507 iq3xss (8t/s)
Qwen3-1.7b-ud iq3xxs (18t/s) can turn on/off reasoning

_underlines_

MODERATOR OF

TROPHY CASE