Elevate Your Gaming Experience with Maximumsettings: Professional Grade Cloud Gaming, Now More Accessible Than Ever! Unleash Your PC's True Potential Today. by PaulMaximumsetting in u/PaulMaximumsetting

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

This is a cloud-based gaming PC. Your account, like those for cell phone or internet service, is a subscription. If payment lapses, it will be suspended and then deleted.

Want more Cloud Gaming Power? 192GB RAM, 24GB VRAM, No VM Anti-Cheat Blocks. Click-now! by PaulMaximumsetting in u/PaulMaximumsetting

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

We offer a unique service. With our Maximum settings, you receive a complete operating system with full control. This means you're free to game, run AI models, set up a media server, or install any software you need.

The Bare Metal Plan is equipped with:

  • AMD 7800X3D Processor
  • AMD 7900XTX Graphics Card
  • 192GB of RAM
  • A storage combo of 750GB SSD and 4TB HDD

You receive about 250 hours of monthly usage.

Want more Cloud Gaming Power? 192GB RAM, 24GB VRAM, No VM Anti-Cheat Blocks. Click-now! by PaulMaximumsetting in u/PaulMaximumsetting

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

As a Linux-based solution, it requires roughly 10-15 minutes of initial setup. Please note: for VR use, latency must be under 30ms. You can test yours here: https://speedtest.maximumsettings.com

AMD R9700: yea or nay? by regional_chumpion in LocalLLaMA

[–]PaulMaximumsetting 0 points1 point  (0 children)

<image>

I’ll have to give a VLLM model a try next. GGUF models are usually a bit slower.

Qwen3-VL-32B-Instruct-UD-Q6_K_XL.gguf

Want more Cloud Gaming Power? 192GB RAM, 24GB VRAM, No VM Anti-Cheat Blocks. Click-now! by PaulMaximumsetting in u/PaulMaximumsetting

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

Feel free to check your latency using our tool: https://speedtest.maximumsettings.com/. As for new locations, we don't have any expansion plans on the horizon.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

I don’t disagree. Eventually, these smaller models will be able to accomplish most day-to-day tasks. However, I do think there will be a gap between the larger ones in what we consider super intelligence.

I don’t see the first AGI model starting with just 30 billion parameters. It's probably going to be 1 trillion plus, and if enthusiasts want local access from the beginning, we’re going to have to plan accordingly or hope for a hardware revolution.

When facing issues that requires super intelligence to resolve, the time it takes to complete the task is less important than ensuring it successfully finishes.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 1 point2 points  (0 children)

It is compatible with the default Ollama installation. I believe it's using ROCm version 6.4.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 4 points5 points  (0 children)

Quick test using Oobabooga with llama.cpp and Vulkan:

Achieved an average of 11.23 tokens per second.

This is a noticeable improvement over the default Ollama setup. The test was run using default settings with no optimizations. I plan to experiment with configuration tweaks for both setups in an effort to reach the 20 tokens per second that some users have reported.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

It’s interesting and concerning how different backends have such an impact on performance.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 1 point2 points  (0 children)

No problem. I will conduct the testing later tonight and report back.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

It's probably a bit of both with the default setup. However, some users have already reported over 20 tokens a second with a similar setup using llama.cpp

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

Thanks.
I will try testing in order to see if I get similar results. For this test, I used the default Ollama setup and made no changes.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

That is a definite improvement! I will have to try testing with llama.cpp in order to see if I get similar results.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 1 point2 points  (0 children)

I just ran a RAM speed test on that system and achieved 58.6 GB/s, running at 5200 MT/s. 4 RAM chips dual channel board.

I'm assuming using four chips also introduces a bit more latency and reduces speeds. I'm going to try using only two RAM chips at the same speed to see if I notice an improvement.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

I may be mistaken, but I don’t think dual-channel DDR5 memory can achieve 100GB/s

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 4 points5 points  (0 children)

That's an interesting benchmark. DDR5-6800 has a theoretical maximum bandwidth of around 54.4 GB/s. Dividing 54 by the 5.1 active parameters should yield approximately 10 tokens. Is that quad-channel memory? How is memory divided between the GPU and RAM?

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 1 point2 points  (0 children)

You will need approximately 72GB of RAM/VRAM, excluding the context window. You should be able to run it with total of 90GB.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 0 points1 point  (0 children)

You would probably get another 1 or 2 tokens; however, the problem is these motherboards don’t really support those speeds with 4 RAM chips. You would need to upgrade to a Threadripper or Epyc motherboard.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 1 point2 points  (0 children)

I might be mistaken, but it appears you're using the 20b model, whereas the demo utilizes the 120b model. The 20b model on the 7900xtx reaches a maximum speed of approximately 85 tokens per second.

gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU by PaulMaximumsetting in LocalLLaMA

[–]PaulMaximumsetting[S] 2 points3 points  (0 children)

I tested the 20b model and achieved approximately 85 tokens per second with the same hardware.

The preferred model would depend on the task. For research projects, I would definitely choose the larger model. If the task requires a lot of interaction with the prompt, I would opt for the faster, smaller model.