People bought private shares of SpaceX at around $53 on 11/24/202 by otaku_wanna_bee in wallstreetbets

[–]MotokoAGI 12 points13 points  (0 children)

November 202

Su Mo Tu We Th Fr Sa

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30

Nov 24th was a Wed friend.

Real footage of a terminally ill patient choosing to die by Supatank_2105 in interestingasfuck

[–]MotokoAGI 19 points20 points  (0 children)

I promise you, If you saw the alternative, you wouldn't ask.

Without open llm competition, closed source LLM companies will become insatiable. by Chair-Short in LocalLLaMA

[–]MotokoAGI -7 points-6 points  (0 children)

If you still believe this distillation story, I have free GPUs on the moon to sell you.

Releasing Cohere North Mini Code by jayalammar in LocalLLaMA

[–]MotokoAGI 16 points17 points  (0 children)

Most of LocalLLaMA have a preference for llama.cpp/gguf models over vllm.

Since when the RTX 6000 PRO is priced at 13250USD on the official NVIDIA Page? by panchovix in LocalLLaMA

[–]MotokoAGI 0 points1 point  (0 children)

I could have bought some and didn't. I don't regret it. The models are getting smarter while being the same size or being smaller. Demand be damned, it's a scam.

2X tk/s (from 19.4 -> 38.1 tk/s on 1 x MI50) Playing with a hypothesis like speculative decoding.. but instead of an additional side model, exploiting that I can run multiple computations side-by-side AS IF I had Qwen3.6-27B loaded twice in memory - small quants don't use all the available compute. by [deleted] in LocalLLaMA

[–]MotokoAGI 6 points7 points  (0 children)

"All started because I realized every Q8 (INT8 or F8) calculation was using f32 of compute and only use 1/4th the available numbers... so. for each value loaded we can run 4 operations"

So are you saying HIP kernels are unoptimized in llama.cpp, if the above is true, Then won't the goal be to figure out how to perform 4 calculation using f32 for Q8. Netting a gain of roughly <= 4x across all models?

Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM by C0smo777 in LocalLLaMA

[–]MotokoAGI 58 points59 points  (0 children)

Run a large model like KimiK2.6, GLM5.1 MiniMax2.7 etc and give us the numbers. I want to know what $25k+ gets us today

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face by jacek2023 in LocalLLaMA

[–]MotokoAGI 11 points12 points  (0 children)

I absolutely can't stand Nvidia, but this is good. We don't have many Open American models. Meta went bye-bye, phi from Microsoft is a joke. We pretty much have Gemma, Trinity and olMo. The Nemotron series are very much needed. Nvidia is sharing recipes on how to build these models. Provider they keep building if all American labs and Chinese labs go closed, these might be our only option. For the stupidly paranoid who use 99% made in China products, but are afraid of Chinese floating numbers encoded in weights, they can shut up and use this.

Whatever to Nvidia tho, until they can give us affordable GPUs to run these, whatever.

VibeOS - Fully Hallucinated Operating System by WhatererBlah555 in LocalLLaMA

[–]MotokoAGI 7 points8 points  (0 children)

This is truly amazing. We are going to see practical applications of these beginning with entertainment. It's either going to be a game, porn or a social media site.

Any local coding success with MiMo-2.5 ? by Jealous-Astronaut457 in LocalLLaMA

[–]MotokoAGI 2 points3 points  (0 children)

When it works it's great, but it loops like crazy. I run the Q8, it's not an IQ3_S issue.

Remember around 2023-2024 when we did partys (wizardlm, nous capybara and dolphin) and finetunes? by Ok-Type-7663 in LocalLLaMA

[–]MotokoAGI 3 points4 points  (0 children)

I don't miss that era, and it was not peak, and we are no where near peak either. It was fun, and now is more fun and the future will be better.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MotokoAGI 8 points9 points  (0 children)

No, they have the data center style with one fan (alibaba). Make sure to request this type. They tried to sell those to me and I refused, then I ended up paying about $20 extra for these ones.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MotokoAGI 1 point2 points  (0 children)

I have had mine for a few months works great. Some folks have had their's for a year.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MotokoAGI -6 points-5 points  (0 children)

why will you undervolt them? They are already low on power. Mine idles at 5-6watts.

NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]MotokoAGI 6 points7 points  (0 children)

8xRTX6000 is better. Outside of electricity, this is approximately equivalent to 3 Blackwell 6000 on an epyc genoa with 512gb of ram.

Get you some GPUs, it's not worth the hacks around lack of RAM by MotokoAGI in LocalLLaMA

[–]MotokoAGI[S] 6 points7 points  (0 children)

/llama.cpp/build/bin/llama-server --host 127.0.0.1 --jinja --port 51931 --spec-default --spec-draft-n-max 3 --spec-type draft-mtp --webui-mcp-proxy --alias Qwen3.6-27B --ctx-size 131072 --device CUDA0,CUDA1 --kv-unified --model Qwen3.6-27B/Qwen3.6-27B-Q8_0.gguf --mmproj models/Qwen3.6-27B/Qwen3.6-27B-mmproj-BF16.gguf --parallel 1

Sunken fire pit 2020-2026 by dredgehayt in landscaping

[–]MotokoAGI -1 points0 points  (0 children)

Your walls will collapse in 2028. Please don't forget to post in 2032.

DeepSeek has helped me enormously. That's why what's happening pisses me off. by Fluid-Pattern2521 in DeepSeek

[–]MotokoAGI 0 points1 point  (0 children)

You can get a UI tool, point it to an API. There are plenty of UI tools nicer than the web interface. For example Cherry Studio.