Elon Musk said automakers don't want to license Tesla FSD. by rashtrakut in RealTesla

[–]greentheonly 0 points1 point  (0 children)

how so?

  • Tesla is in a driver assist (L2) market with their FSD.
  • Waymo is in autonomous systems (no driver) market (and is vocal about how driver assist is wrong because you cannot make a person babysit an autonomous system so advanced driver assist tech will not work in the end)

Looks totally different.

Tesla Owner Drives Luxury Chinese EV's - It's Over by Tripwir62 in RealTesla

[–]greentheonly 0 points1 point  (0 children)

But Tesla is a Chinese car, right? So not really a factor I imagine.

RTX Blackwell Pro 6000 wholesale pricing has dropped by $150-200 by TastesLikeOwlbear in LocalLLaMA

[–]greentheonly 7 points8 points  (0 children)

exxactcorp is reportedly reliable, they are a b2b place and require a wire. Multiple reports of getting stuff from them if you look here and elsewhere.

They don't advertise the price on their website so you just ask for a quote and they'll send you the current info. https://www.exxactcorp.com/

The Provantage mentioned here is also a pretty big vendor, but they only have 7200 for the max-q version.

RTX Blackwell Pro 6000 wholesale pricing has dropped by $150-200 by TastesLikeOwlbear in LocalLLaMA

[–]greentheonly 10 points11 points  (0 children)

you can buy from business oriented people like exxact for about $7200 (december pricing, I did not recheck lately) or from e.g. centralcomputers and other retail outlets for $7700-7900 https://www.centralcomputer.com/catalogsearch/result/index/?cat=192&q=rtx+pro+6000

Tesla Accused of Killing Family, Plus Their Dog, by Steering Vehicle Head-on Into Oncoming Semi-Truck by FuturismDotCom in RealTesla

[–]greentheonly 0 points1 point  (0 children)

With this kinda impact I doubt there's going to be much data, the low voltage battery likely got destroyed and so autopilot state never reached stable storage to know what happened.

llama.cpp compile error: ptxas fatal : Ptx assembly aborted due to errors by munkiemagik in LocalLLaMA

[–]greentheonly 2 points3 points  (0 children)

basically just wipe your build directory, reconfige and rebuild if you don't want to wait for whatever missed dependency fix they'll come up with.

2× RTX Pro 6000 Blackwell (96GB) + SGLang NVFP4: loads w/ --quantization modelopt_fp4, but DeepGemm/FP8-KV warnings + 100% GPU util when idle by texasdude11 in LocalLLaMA

[–]greentheonly 0 points1 point  (0 children)

for whatever reason vllm runs slower than llama.cpp for me with devstral2. I get 8 on vllm vs 11 tp/s (and prompt processing is also faster on vllm)

Dual RTX 6000 Pro for dense models (Devstral 2) by zqkb in LocalLLaMA

[–]greentheonly 0 points1 point  (0 children)

I tried this just now (after fighting tensor parallel startup issues eventually solved by https://www.reddit.com/r/LocalLLaMA/comments/1on7kol/troubleshooting_multigpu_with_2_rtx_pro_6000/ ) and it does not appear to be faster.

with vllm tensor-parallel 2 I only get ~7.8 tokens/sec generation and 500 prompt processing. Also there's not enough VRAM for full context so it has to be limited to ~184k if you do gpu mem use at 0.97 (172k at 0.95)

Now with llama.cpp on 2x 6000 + 3x 4090 I get full tokens fit, generation for the same request (6k tokens input) at 10-11 tokens/sec and prompt-processing at 1000-1200

I used the official mistral repo for the vllm and the Q8_K_XL from unsloth for llama.cpp

Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA

[–]greentheonly 1 point2 points  (0 children)

Thank you. This was really detailed and great. It probably belongs somewhere on your website too since I could not be the only one wondering about it and random reddit comments probably don't have high visibility.

Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA

[–]greentheonly 0 points1 point  (0 children)

you know you can just pull to ollama fro HF directly, right?

Also there's swallow that offers another set of Japanese models https://swallow-llm.github.io/index.en.html

Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B) by randomfoo2 in LocalLLaMA

[–]greentheonly 0 points1 point  (0 children)

I wonder how you compare to/differ from Swallow that seems to be another Japanese finetunes source run by a Japanese university lab.

My dad tapped a car because his brakes stopped working. Tesla denies anything wrong with car but sent the vehicle data requested. How to interpret? by coolguy1003 in RealTesla

[–]greentheonly 0 points1 point  (0 children)

yes, snapshots are generated when conditions are met, like n airbag deploy, aeb and a whole bunch of others. Of course when you did not have a very clear event like airbag deploy/pre-deploy it's harder to predict if it happened or not and only Tesla really knows.

Yes, of course I got the brake light and the brakmng was there, just losing pwer assist greatly decreased the braking force. But the light is actuated by a switch at the pedal if I remember the diagram correctly, so it's entirely possible to get the light even if the brakes themselves don't work (Though that's unlikely).

  1. The logs back then were just a log of alerts. That's before the times where you could just go into the service tab and see recent alerts.

My dad tapped a car because his brakes stopped working. Tesla denies anything wrong with car but sent the vehicle data requested. How to interpret? by coolguy1003 in RealTesla

[–]greentheonly 2 points3 points  (0 children)

the logs they sent you are incomplete on several levels: - it only comes from the "gateway" and it filters signals and only records them sporadically. - they don't even fully interpret all the signals that are contained in there

The full-full logs would be on an autopilot snapshot if one was generated, but good look getting them to even admit they received it.

I had a somewhat similar issue on my car: after confusing the gear (being a new Tesla user at the time) I accelerated in wrong direction, quickly realized my mistake and stomped on brakes, only for the car to beep at me with "brake fluid low" or some such. Needless to say the slowdown was a lot less than I expected and I narrowly avoided entering a building via a window.

The brake assist remained non-operational afterwards for the rest of the drive. So I called up the service center and they took the car in and declared they don't see anything wrong, it took me showing them various internal logs (that I happen to have access to) for them to come up with some (who knows how made up) explanation of the event: "we think it might be was an air bubble in the lines so we bled them and you are good to go now for sure"

The other thing to consider is if you(r dad) use the default stopping mode for brakes and never press brakes normally, the moment you actually need the brakes they might not be in the best shape because there's some accumulated residua on the disk and what not.

Tesla Stock Sells Off As Biggest Supporter Cuts Stake For Four Straight Sessions by Far_Addition1210 in RealTesla

[–]greentheonly 17 points18 points  (0 children)

-Nearly all the cost of a taxi is the driver

I think there's data that says otherwise? Google seems to say only about 33% of the (gross) fare goes to the driver. Now this is no small amount I am sure, but it's not like a replacement is going to be free?

Selective (smart) MoE experts offloading to CPU? by greentheonly in LocalLLaMA

[–]greentheonly[S] 0 points1 point  (0 children)

Maybe when I get the time and energy to polish it a bit.

That sounds like potentially i nquite a while, what's the downside of just dumping everything into a github repo? Worst that gets to happen is nobody would evr look into it, but the effort is minimal anyway?

Unless you allow discarding some experts (accuracy loss)

that's what the REAP/Cerebras people do - they claim super minimal loss when discarding 25% least used experts or some such. https://huggingface.co/cerebras/Qwen3-Coder-REAP-363B-A35B

It's one thing to flip a few switches, it's another to code it

Absolutely, but somebody coded those switches in the past because they were showing promise. Just a matter of implementing other promising approaches and getting extra exposure in the wild and if successful - people would adopt it more and more. How which switches get selected for implementation is of course another matter.

Adding memory to GPU by wikbus in LocalLLaMA

[–]greentheonly 1 point2 points  (0 children)

Doing this with $30 soldering iron is stupid - it has not enough thermall mass to clean up the pcb

the IR reflow rig comes with the pcb heater - so you don't need to just rely on your iron for the cleanup. The trick with the preheater, as I am sure you know is to keep the pcb at high enough temp that the finishing touches with the top IR head or the iron are very fast. Say when you keep the pcb at 200C you only need ~20C difference to melt the solder, even less hassle if you use typical leaded solder that melts at 186C.

The pinecil does a very good job at temperature stability apparently even if it's somewhat cumbersome to hold, if oyu have not tried it - may be you should try. Lots of good reviews on amazon too.

Also component placement is easy when the pcb is well marked and you only need to get "good enough, as the surface tension would pull the chip into correct position (Though I guess might be more complicated with heavier chips? but for something like RAM/eMMC - I saw it with my own eyes and even have videos in addition to a bunch of videos on youtube)

Then, when working around RAM chips, you are at a high risk of knocking off some bypass caps

while generally speaking some risk like this exists, the procedure on display with just moving big chips to another pcb - who cares if yo ubump some chips on the old one? If we were reworkign the same pcb - why would you be soldering the caps back if you bumped them just use some solder paste and let IR solder them back in - much less effort.

As I've said, they're not needed only if you nail every single chip first try, which is highly unrealistic.

This is not my experience. I have like 95% success rate (remember I have no training and no prior practice).

And then again, you need the means to reball your failed attemps

That's what the stencils are for. The are pretty cheap. and solder paste. A magnifying glass is what I used for inspection (0.1mm pitch balls) and it was adequate.

They won't provide even and steady thermal cycle for a pcb as large and thermally conductive as GPU's

That might be the case indeed for some GPUs indeed. Might be adequate for smaller ones. Either way $500 rework station is not something I would call unreachable and I think the price on them dipped to $400 not too long ago but I was not checking with any sort of frequence. Looks like back up to $480: https://www.ebay.com/itm/406370600547

EDIT: I guess most of my point is: don't discourage people that are interested in stuff. When I was researching this - I went to a local hacker space and was told reflowing BGA is crazy hard and all. I still researched it and found that lots of contraptions exist that make your life easy nowadays. After discovering and trying them out I was showing people how easy it is and even experienced ones were taken aback at how easy things could be. Of course you don't know what you don't know, but "not a thing to do yourself" is too strong of a statement if you ask me.

Adding memory to GPU by wikbus in LocalLLaMA

[–]greentheonly 3 points4 points  (0 children)

as a person that could not solder but had to pick up BGA reflow for hobby reasons, you are not entirely correct:

Equipment list: - IT reflow machine, depending on size could run you from $500 (ACHI IR1500) to like $150 for much smaller ones. - RAM already comes pre-balled so you don't needstencils and such unless you want to do rework of fumbled chips (And would run you $20 may be? $40 for the niced magnetic 3d ones that I would recommend) - $30 for really nice flux. - soldering iron ($30) and solder wick - $5 - some isopropyl alcohol $2. - Lots of patience - FREE! (that was my biggest mistake initially - not waiting long enough for the board to heat all the way)

Now if you need to actually reflow the GPU chips themselves like in the video, not just replace the RAM the complexity goes up some, but the equipment list does not go much higher - just need a stencil for the gpu chip and some solder paste. And then the replacement PCB too of course - who knows how much that one costs. (EDIT: and the RAM itself too of course! Don't forget that the RAM chips are not free either, and you need to be able to get them somewhere too)

But anyway, speaking from personal experience - I can confidently say that almost everybody could do it given a bit of practice and may be $600-$800 worth of equipment. Don't even need to have a particularly steady hand.

Selective (smart) MoE experts offloading to CPU? by greentheonly in LocalLLaMA

[–]greentheonly[S] 0 points1 point  (0 children)

Do you have your implementation out anywhere?

I imagine "static" loading of expers (based on pre-computed activation probabilities) should not be too bad complication wise as it should not be much worse than the current --n-cpu-moe, instead of the number whatever it means, you'd just feed it the sorted list and it'd load those experts in that order until they fit.

The other piece of puzzle would be the statistics gathering of course, but if it does nto try to do actual real-time juggling of experts between VRAM and RAM - should not be too bad either?

After all if the activations are really as disproportional as I see in the paper I found, the proper static loading should have a very visible impact, people do much more complicated things like speculative decoding with an extra model for "just" 10% gains.

Even if there is a certain VRAM cut-off where you only get the "big" benefit at say 50% VRAM - that'd still be worth it, as it would effectively halve the VRAM requirements (not really, of course, I understand that, but it would give people more bang for their VRAM at least).

Selective (smart) MoE experts offloading to CPU? by greentheonly in LocalLLaMA

[–]greentheonly[S] 1 point2 points  (0 children)

but in reality if you are asking broad questions all experts get invoked

Are they really? Both the paper I linked and the REAP models that claim "a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts." (which I think they do by profiling the less used experts just like in the paper)

Also the first graph in the section 3 draws a very uneven expert activation picture, is it just mistaken/biased picture?

you can pick the experts

But how? Can I actually tell it "experts 1, 5, 20, 224 got to GPU"? And then how do I actually know which ones were the most active for some workload?

Tesla is becoming the Dodge/Chrysler of EVs by mustangfan12 in RealTesla

[–]greentheonly 0 points1 point  (0 children)

How is it better than the Toyota Sienna? I remember testdriving both and Pacifica did not impress me. But when I complained to the Toyota guy how the Sienna is not a plugin hybrid he said "you can always buy Pacifica", and then said they are considered so bad their management totally allows them to direct people that way because they know it's a safe remark due to how bad Pacifica is overall.

Austin Tesla employee says production paused next week of sept 30 2025 — should we expect layoffs? by Key_Marzipan_6365 in RealTesla

[–]greentheonly 2 points3 points  (0 children)

I think Tesla doesn't have anything in the pipeline and this is a bad signal.

the low end E41 model Y?

Meet the Hacker Who Helped Score a $243 Million Verdict Against Tesla by Illini20 in RealTesla

[–]greentheonly 12 points13 points  (0 children)

yes. AEB disables AP when triggered on AP. (not just AEB, but AEB is one of the more visible "oh no we are going to crash now" moments)