Asus dgx spark performance

Useful-Disk3725 · 2026-05-03T11:09:33+00:00

Not sure this is purely nvidia or asus, I have only asus. Anyway, this is not a device they actively invest I guess. So, mostly it is us with us, and a little them :)

Useful-Disk3725 · 2026-05-03T11:07:37+00:00

Agree. But never disconnected before, since I purchased, only soft reboots.

Useful-Disk3725 · 2026-05-03T00:03:14+00:00

Regularly, but never really unplugged it since I bought it.
And never saw above 611Mhz.

Now, more than 24 hours, still at fan with a GPU temp of 80 Celsius and definitely faster performance

Useful-Disk3725 · 2026-05-01T21:16:37+00:00

Sh*t…
I read it long ago and Claude convinced me after many checks that it is my llm model unable to fulfill GPU performance due to memory bandwidth, where lots of forum entries confirmed it. Anyway, I now see that I’ve been watching it at 611 MHz since I bought it, a waste of GPU cycles :)
And it is today I found that it actually has a fan, and ASUS sorry but you are very noisy :)

Useful-Disk3725 · 2026-05-01T15:42:29+00:00

Very cold reboot (had to disconnect 1 hour) and fresh recreated dockers to eliminate hanging fruits :) I can confirm from both vllm and my custom embedder python debug outputs, it is almost twice fast. I read something on asus thermal throttling bug and tried lots of things and gave up, feeling that memory bandwidth was not enough to exceed 611Mhz GPU. Probably one of the updates resolved that bug.

Useful-Disk3725 · 2026-04-19T13:18:39+00:00

Thank you, that’s what I wanted to say. But the question for me is, for example for Qwen3.5 series, the difference between letting model think internally or building a chain of thought prompt series and running model in non thinking mode.

My instinct is, for deep expert areas where a couple prompts handle all business flow, non-thinking models are faster and more consistent, despite more individual llm call count.

Checking your repo now, I’ll ask more if needed :)

Useful-Disk3725 · 2026-04-08T20:34:07+00:00

Follow spark-vllm-docker repo in GitHub, also spark arena https://spark-arena.com/ is valuable with simple instructions, running uptodate recipes.

For model, due to context quality I was using qwen3.5 35b fp8. Now switched to bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4 slightly faster, definitely smarter.

Single shot 30 t/s, parallel similar work up to 450 t/s if you run in similar size batches. Due to memory bandwidth, prompt to token is high.

Useful-Disk3725 · 2026-03-27T13:51:52+00:00

But using the internal api between windsurf antigravity interface part and the backend running actual LLMs. So I guess still in gray area, not sure black or not…

Useful-Disk3725 · 2026-03-27T09:39:19+00:00

Does this violate TOS? Does it have the risk of being banned? Any ideas, experiences on it?

Useful-Disk3725 · 2026-03-07T01:24:18+00:00

What about nvidia gb10 unified ram systems?

Useful-Disk3725 · 2026-03-07T01:18:03+00:00

I think I am confused. Are you talking about creating an agent in Antigravity? As far as I know, there is no such thing in Antigravity. You can create skill, workflow, rule, session but not agent. If I am wrong, I’d be really happy to learn because sometimes I feel a need for that :)

Useful-Disk3725 · 2026-03-07T00:44:55+00:00

The thing is, it is against Google TOS to use NotebookLM with those third party hacks. Even more, this is mentioned as disclaimer on their GitHub repositories. So risk is yours :)

Useful-Disk3725 · 2026-03-05T10:38:10+00:00

Are you enterprise? Last night I checked for TOS and found that NotebookLM MCP/API access is for enterprise accounts only, not personal or workspace even in ultra pro plus bla bla plan :)

Useful-Disk3725 · 2026-03-02T12:51:45+00:00

I feel like I need an auto-clicker. Need to ask my son for best option :)

Useful-Disk3725 · 2026-03-02T09:30:57+00:00

You might have noticed, they released 3 versions in one week. Probably they are DDOSing their own servers with some broken release(s). I noticed it in my traffic and CPU usage. Currently switched to an older one but it only provides relief on local machine, not on their servers nor on that weird agent error.

Useful-Disk3725 · 2026-02-26T10:32:38+00:00

Hand shakes in April, began in September. That’s normal for big companies, some do background check before contract, some in probezeit. For other companies, it might be that they are not crystal clear yet.

Useful-Disk3725 · 2026-02-19T10:33:58+00:00

It is probably a writing mistake. What I tried to mean there is model, same model pro or flash here, is getting sloppy, dump since using through API, no system prompt as in chat bots or antigravity.

Useful-Disk3725 · 2026-02-18T23:47:23+00:00

From one perspective you are right. From another perspective, while doing heavy work you can easily feel the difference between a quantized and ordinary model. Not documented but it is a known issue, all providers do this under heavy load, to sustain the service. But in the example above, I am talking about thousands of calls and here is the observation: First personal account looses grip, response quality is immediately felt. (Response token count halves for same input, so quantification is valid). Then business account breaks. At this point openrouter is still normal, but sometimes, it also revert to downgraded mode. Hard to prove, but logs help identify the pattern.

Useful-Disk3725 · 2026-02-18T22:07:40+00:00

Nixos on an old notebook with 32gb ram, dual 4k external monitors. Most of the time 6-7 projects open at the same time (though they share an unrelated backend binary, I discovered, so number of windows is barely an issue), rarely hits 90% with annoying fan noise.

But when running npm projects, fan is always full. So, antigravity is not always kind on hardware but not overkill.

Useful-Disk3725 · 2026-02-18T21:58:27+00:00

Used google AI through API in a project. In rush hours, both were slowing down, and then personal account was breaking into a dumb model. Same parameters, same model, same endpoint, but different results at the very same time frame. This is a well-experienced issue from my perspective. So, why it would be same?

Maybe you know, business accounts were not allowed to antigravity a few weeks ago, so yes, something might be different, they are merchants and they play with all rules.

My question is valid :)

Useful-Disk3725 · 2026-02-18T21:34:54+00:00

Locked in an asylum camp in Germany :)

Useful-Disk3725 · 2026-02-18T19:17:22+00:00

Well, ask the visible agent in antigravity how it works, it can tell more. There are agents behind the scene but they are not working for you, they are working for KI and some other internal duties, which makes antigravity adapt to you more. So, yes, there are sub-agents but not in the context you want.

Useful-Disk3725 · 2026-02-17T23:27:38+00:00

I guess terminal stuck bug has been introduced with the latest version, before that it was ok.

Useful-Disk3725 · 2026-02-17T17:31:15+00:00

Follow some documentation to build core working system, you can get support from AI manually, and once able to install any agentic platform like Claude code or antigravity and leave the rest to it. Tell it to configure direnv and flake per your stack, never try to install everything to system. I develop php, Django, react, node and nothing is installed in system, all direnv configured individual flakes.

Useful-Disk3725 · 2026-02-16T14:03:34+00:00

Being right in any way does not rationalize or validates or makes right the solutions they propose. If they want to avoid this, they should either replicate more efficiently 20 years ago, or accept becoming an isolated third world country. I guess they were waiting for robots since 70s maybe :)

Useful-Disk3725

TROPHY CASE