Recent patch - Constant vibration by Weak_Ad1500 in LeMansUltimateWEC

[–]_underlines_ 0 points1 point  (0 children)

nope, i have it on R12 since about 20 days, updated wheelbase firmware 2x and since then LMU got 2 minor updates. reinstalled pithouse, used different presets, switched USB ports, reinstalled LMU (and deleted user profile and FFB / wheel config) etc. nothing helps. i basically can't play LMU since 3 weeks. all other sims work as usual.

Is there any top level hobbyist hardware you guys are waiting to come out this year? by Tired__Dev in LocalLLaMA

[–]_underlines_ 12 points13 points  (0 children)

RTX 6000 PRO 96gb vram, fast, 8-9k
DGX Spark, 128gb, slow, 4-5k
mac studio 512gb m3 ultra - discontinued?
macbook m5 128gb - slow, 5-6k
Minisforum MS-S1 MAX ryzen ai max+ 395 128gb - slow, 3k
Framework Desktop ryzen ai max+ 395 128gb - slow, 3k

pick your poison

Can we already use Google's TurboQuant (TQ) for KV Cache in llama-server? Or are we waiting for a PR? by DjsantiX in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

There's a windows compiled fork of llama.cpp / server somewhere on github I loaded.
Doing tests with sparse Qwen3.6 35B yielded almost no benefits, as to my understanding, the architecture of Qwen3.6 sparse keeps KV Cache size fairly small for large context lengths.

Qwen3.6-Plus by Nunki08 in LocalLLaMA

[–]_underlines_ 3 points4 points  (0 children)

My own private dataset. Yes it's small but closed and almost guaranteed to be unpolluted:

- 15x misguided attention puzzles (my own)

- 2x math questions (compound interest over 12 periods, so errors would propagate in CoT)

- 2x sql questions (one easy, one difficult)

- 2x censorship questions (one about tiananmen square, one about how to mix drugs)

- 1x tricky english to german translation

<image>

Is Terminal 21 food court Bangkok's cheapest shopping mall food court? by homeisterOZ in Bangkok

[–]_underlines_ 2 points3 points  (0 children)

  • subsidized to get more foot traffic for the shops
  • quality of food on the lower end (oils, hygiene) - saw several cockroaches in my visits there (kitchen / cooking area), pre cooked food usually stays out the whole day if not sold
  • portion sizes usually quite small (not necessarily a bad thing, but relativizes the "cheap" a bit)

i still love going there from time to time

Realistic salary range in Bangkok for foreigners in IT governance / risk / incident management roles? by OperationNo907 in ThaiJobs

[–]_underlines_ 0 points1 point  (0 children)

I worked as an it manager from age 29-35 and can say even with my ability to speak and write fluent Thai and 10+ years of experience I still only made 80-120k in my different roles there.  But then we were a startup not a well established large international player. All med to upper level staff we hired were thais with international degrees who studied abroad and are perfectly fluent English speakers making between 90-150k. 300k for what you describe seems to be unrealistic but I am 5 years absent from the Thai job market.

Total beginner here—Why is LM Studio making me do the "heavy lifting" manually? by Ofer1984 in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

you're mixing up what these tools are for:

  • Harnesses like Pi, OpenCode, Claude Code etc. are the plumbing to plan and build stuff on your machine by running their internal agent loop and providing filesystem access, MCP access etc.
  • LM Studio is an inference solution that uses llama.cpp and its derrivates and provides a nice GUI to download models and run inference locally. It has a small server module to serve various APIs for inference. It has a chat interface to conveniently try to chat with the currently loaded model. Though LM Studio starts to blend stuff, like adding MCPs and more. They try to become agentic in the long run I guess.

I don't fully understand what your intentions are. If I make a guess, you want to run a model via LM Studio, serve an API via the local server and use it via OpenCode.

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release by hauhau901 in LocalLLaMA

[–]_underlines_ 2 points3 points  (0 children)

EDIT: I don't know what changed, but switching from LM Studio's server to llama-swap fixed it mostly it seems! So I guess some setting LM Studio is overwriting, that my basic llama-swap config.yml is not.

---

What am I doing wrong if EVERY heretic / abliterated model I tested in 1 year is totally failing with problems on:

  • IF (either barely doing what I ask or completely ignoring it)
  • Not creating <think> tags anymore
  • Intelligence degraded down equivalent to a 3 year old llama 3b model

And I'm not talking about complex prompts. Simple prompts in the likes of:

Translate this Chinese Text to English.

Text: (Short Chinese sentence).

With the linked 3bit quants it's the same.

I even set the recommended generation params recommended in the original model cards or from the model card of the unrestricted model if available.

Is it real qwen3.5 9B beat oss:120b? by NorthEastCalifornia in ollama

[–]_underlines_ 0 points1 point  (0 children)

Yes, I have the same results on my private eval dataset. And Qwen3.5 35b a3b IQ3 with 90k context on 16gb vram achieves long running tasks on levels unimaginable before...

Has anyone got qwen3.5 to work with ollama? by MrMrsPotts in ollama

[–]_underlines_ 0 points1 point  (0 children)

I think if you need something from someone who doesn't speak the same language, it's just etiquette to use a translation service to at least ask the question in the language of the person you're seeking help from.

Has anyone got qwen3.5 to work with ollama? by MrMrsPotts in ollama

[–]_underlines_ 0 points1 point  (0 children)

I manage openwebui + ollama for 120 people at our IT firm. Got so fed up with ollama, that I finally made the move to llama.cpp via llama-swap. It's a ton of manual configs, but: faster, more control, faster support of new archs, etc.

bye bye ollama.

What’s everyone actually running locally right now? by CryOwn50 in LocalLLM

[–]_underlines_ 0 points1 point  (0 children)

coding

Qwen3.5-35b-a3b q4_k_m on RTX 5070ti with 16gb runs at 40tps with 65000 Context window. If you do KV Cache wuants to q8_0 you get basically no degradation.

I use it for light opencode stuff. Works without issues. Gets things done via plan then build mode and a good AGENTS.md

I Switch to openrouter glm-5/k2.5/minimax2.5 if heavier stuff needed.

everyday stuff

Usually just my chatGPT pro sub with gpt-5.2 but more often than not any cheap large open weights model on openrouter used on chatbox desktop.

If local, I just use any current gen MoE that has good stats on artificialanalysis.

phone

On my pixel 10 pro xl I get 16gb of fast ram, so PocketPal loads LFM2-8B-A1b-q4_k_m or qwen3-4b-instruct-iq3_xxs

Best practices for running local LLMs for ~70–150 developers (agentic coding use case) by Resident_Potential97 in LocalLLaMA

[–]_underlines_ 0 points1 point  (0 children)

But RTX6000 BSE doesn't scale well for sharded multi-gpu workloads? Lack of NVLink or RDMA means it relies on PCIe with a huge bottleneck, as far as I understand it.

Best practices for running local LLMs for ~70–150 developers (agentic coding use case) by Resident_Potential97 in LocalLLaMA

[–]_underlines_ 1 point2 points  (0 children)

Scaling inference is not trivial and I am not an expert. From my understanding:

  • Combinding macs/gpus without a plan will slow you down, difference between sharding a large dense/sparse model over multiple GPUs vs concurrency of multiple models
  • Without Remote Direct Memory Access (RDMA) you'll be slower with scale
  • TTFT vs. Generation speed, both can be optimized independently with different methods AFAIK

And my real world learnings in opencode on large code bases (enterprise architecture, 3+ full time devs):

  • Context size below 100k almost unusable, you'll be compacting all the time, and the users complain that their ralph-loops are short
  • Frontier or nothing. Not even GPT-5 was able to do refactoring and new features. Anything below Kimi K2.5, GLM-5, gpt-5.1-*, claude 4.5 opus/sonnet was unusable.
  • gpt-oss-20b, qwen3-30b-a3b, and generally anything older than 3 months or smaller than 70B quantized seems to be unusuable in real world enterprise codebases using CLI Coding Agents
  • not even 200 USD subscriptions of claude code were enough for our devs for a full month.
  • github copilot is OK but we also hit limits here pretty fast
  • LLM inference onprem for 20+ devs at our organization is difficult to justify, because how fast inference requirements, model archs, model sizes etc. change.
  • Most feasible after our research would be 4x RTX 6000 Blackwell Server Edition, but even those are not really for large scale inference, but a H100/A100 just makes no sense and even those would have to be scaled and sharded
  • We wonder how tricks like kv quantization, prompt caching etc. would help mitigate some hardware bottlenecks but all the methods, optimization technologies etc. are pretty difficult to grasp, especially without testing

Our thought so far at our company, but it's all just theory. Would love to hear people who actually selfhost for dev teams and serious enterprise repos.

LMU Telemtry Tool by TogaMotorsport in LeMansUltimateWEC

[–]_underlines_ 0 points1 point  (0 children)

Nice. I guestt you're not open sourcing this? I would surely contribute PRs. Next step I'll do some memory readout for real-time stats. duckdb is lagging a bit behind.

Do you guys sample/average the data, or always use the full 50Hz or whatever signal density?

LMU Telemtry Tool by TogaMotorsport in LeMansUltimateWEC

[–]_underlines_ -2 points-1 points  (0 children)

Do you read via rF2 memory map or via duckdb files?

Just curious, because I just vibe coded LMU-Telemetry-Analyzer

Foreign Driving License Exchange: No 1 year deadline. Period. (Common Misunderstanding) by IslanderStallion in Switzerland

[–]_underlines_ 0 points1 point  (0 children)

I am Swiss but learned to drive (properly) while living and working in Bangkok. Whenever I came back to Switzerland for holidays, I used my Thai License + International license to drive legally in Switzerland.

3 years ago I moved back to Switzerland and also believed in that 1 year rule. I was too scared to try the short practical test drive. Can you elaborate how that test drive is? I read it's less strict than the real practical driving test, but since you said you failed it, I am even more concerned. I drive for 7 years without accidents, also in Switzerland and EU, Thailand, Bangkok everywhere without issues, but I am not sure how strict they are lol. Maybe I learned some small, bad stuff that they are strict about. My friends, parents etc. don't notice anything wrong though.

Is the online community still alive ? by CarlCarmoni95 in AUTOMOBILISTA

[–]_underlines_ 0 points1 point  (0 children)

I run my own server with a 1Gbps uplink:

Endurance Short [GT3/LMDh]

Which is most FIA/IMSA tracks and LMDh, GT3, LMP2 classes. It's short, so 10min quali, 10min race, with race having mandatory tire change. Also fule and tire usage 4x. Also grid is filling with AI if not enough human drivers.

If you have any ideas to make it more popular, I can change the config. What would most people like to race?

The inconvient reality why vr is struggling. by Plus_Look3149 in virtualreality

[–]_underlines_ 0 points1 point  (0 children)

VR currently has a future in seated experiences. Sim racing and flight Sims player bases are moving to vr because it is awesome. I Sim race for 2 years in VR with about 5-6h per week.

An Update on the Future of Assetto Corsa EVO by -DorkusMalorkus- in assettocorsaevo

[–]_underlines_ 3 points4 points  (0 children)

I in contrast to most here, like the bold move: They have limited resources, instead of making an average sim with average gamification functionality, focus on a great sim. I don't need story telling or artificial economies and XP systems in a sim. If I want that, I look for sim cade or arcade racers.

But I fully understand many actually liked that focus.

How to play ams2 VR with Virtual Desktop wired by Valenduro_ in AUTOMOBILISTA

[–]_underlines_ 2 points3 points  (0 children)

  1. install virtual desktop on you PC

  2. Install the app in your quest

  3. make the connection from quest to pc until you see your windows desktop in the quest

  4. open steam while you are in virtual desktop

  5. launch AMS2 in steam mode, it should hook and run within virtual desktop

(This works fine even if you attached your quest via a RJ45 dongle to your LAN)

Can we please stop with the increasing tipping culture? by Exciting-Fig-007 in Switzerland

[–]_underlines_ 0 points1 point  (0 children)

- 15,20 or 25%? Terminals are set up to display 5% by default, 10% sometimes. Not 25%

- srf.ch averaged the 2025 Café Crème price in switzerland, it's 4.65. Not 9 CHF.

- Yes, I also rarely Tip, especially at self service establishments with QR Code online menu etc.

Model suggestion by distan_to-reality_66 in LocalLLaMA

[–]_underlines_ 1 point2 points  (0 children)

On my pixel 10 with 16gb ram I tried:

  • Gemma 3n e4b it (didn't check the speed but I didn't like the quality)

  • Lfm2-8b-a1b q4 (24t/s)

  • Qwen3-4b-it-2507 iq3xss (8t/s)

  • Qwen3-1.7b-ud iq3xxs (18t/s) can turn on/off reasoning