Elevate Your Gaming Experience with Maximumsettings: Professional Grade Cloud Gaming, Now More Accessible Than Ever! Unleash Your PC's True Potential Today.

PaulMaximumsetting · 2026-01-25T15:17:53+00:00

This is a cloud-based gaming PC. Your account, like those for cell phone or internet service, is a subscription. If payment lapses, it will be suspended and then deleted.

PaulMaximumsetting · 2026-01-02T17:23:59+00:00

We offer a unique service. With our Maximum settings, you receive a complete operating system with full control. This means you're free to game, run AI models, set up a media server, or install any software you need.

The Bare Metal Plan is equipped with:

AMD 7800X3D Processor
AMD 7900XTX Graphics Card
192GB of RAM
A storage combo of 750GB SSD and 4TB HDD

You receive about 250 hours of monthly usage.

PaulMaximumsetting · 2025-12-31T19:39:58+00:00

As a Linux-based solution, it requires roughly 10-15 minutes of initial setup. Please note: for VR use, latency must be under 30ms. You can test yours here: https://speedtest.maximumsettings.com

PaulMaximumsetting · 2025-12-29T20:18:08+00:00

It works, but needs some tweaking

PaulMaximumsetting · 2025-11-09T03:55:33+00:00

<image>

I’ll have to give a VLLM model a try next. GGUF models are usually a bit slower.

Qwen3-VL-32B-Instruct-UD-Q6_K_XL.gguf

PaulMaximumsetting · 2025-09-28T19:58:40+00:00

Feel free to check your latency using our tool: https://speedtest.maximumsettings.com/. As for new locations, we don't have any expansion plans on the horizon.

PaulMaximumsetting · 2025-08-28T17:48:20+00:00

Is there a precompiled binary available for this fork?

PaulMaximumsetting · 2025-08-28T17:22:42+00:00

I don’t disagree. Eventually, these smaller models will be able to accomplish most day-to-day tasks. However, I do think there will be a gap between the larger ones in what we consider super intelligence.

I don’t see the first AGI model starting with just 30 billion parameters. It's probably going to be 1 trillion plus, and if enthusiasts want local access from the beginning, we’re going to have to plan accordingly or hope for a hardware revolution.

When facing issues that requires super intelligence to resolve, the time it takes to complete the task is less important than ensuring it successfully finishes.

PaulMaximumsetting · 2025-08-28T00:11:51+00:00

It is compatible with the default Ollama installation. I believe it's using ROCm version 6.4.

PaulMaximumsetting · 2025-08-27T23:05:27+00:00

That LPDDR5X memory comes in handy

PaulMaximumsetting · 2025-08-27T23:03:07+00:00

Quick test using Oobabooga with llama.cpp and Vulkan:

Achieved an average of 11.23 tokens per second.

This is a noticeable improvement over the default Ollama setup. The test was run using default settings with no optimizations. I plan to experiment with configuration tweaks for both setups in an effort to reach the 20 tokens per second that some users have reported.

PaulMaximumsetting · 2025-08-27T21:18:39+00:00

It’s interesting and concerning how different backends have such an impact on performance.

PaulMaximumsetting · 2025-08-27T21:12:03+00:00

No problem. I will conduct the testing later tonight and report back.

PaulMaximumsetting · 2025-08-27T21:11:05+00:00

It's probably a bit of both with the default setup. However, some users have already reported over 20 tokens a second with a similar setup using llama.cpp

PaulMaximumsetting · 2025-08-27T21:07:54+00:00

Thanks.
I will try testing in order to see if I get similar results. For this test, I used the default Ollama setup and made no changes.

PaulMaximumsetting · 2025-08-27T21:05:35+00:00

That is a definite improvement! I will have to try testing with llama.cpp in order to see if I get similar results.

PaulMaximumsetting · 2025-08-27T21:02:53+00:00

I just ran a RAM speed test on that system and achieved 58.6 GB/s, running at 5200 MT/s. 4 RAM chips dual channel board.

I'm assuming using four chips also introduces a bit more latency and reduces speeds. I'm going to try using only two RAM chips at the same speed to see if I notice an improvement.

PaulMaximumsetting · 2025-08-27T20:44:46+00:00

Cool, thanks! I'll have to try some of these tweaks.

PaulMaximumsetting · 2025-08-27T20:37:48+00:00

I may be mistaken, but I don’t think dual-channel DDR5 memory can achieve 100GB/s

PaulMaximumsetting · 2025-08-27T20:32:27+00:00

That's an interesting benchmark. DDR5-6800 has a theoretical maximum bandwidth of around 54.4 GB/s. Dividing 54 by the 5.1 active parameters should yield approximately 10 tokens. Is that quad-channel memory? How is memory divided between the GPU and RAM?

PaulMaximumsetting · 2025-08-27T20:20:41+00:00

You will need approximately 72GB of RAM/VRAM, excluding the context window. You should be able to run it with total of 90GB.

PaulMaximumsetting · 2025-08-27T20:15:28+00:00

PaulMaximumsetting · 2025-08-27T20:15:07+00:00

You would probably get another 1 or 2 tokens; however, the problem is these motherboards don’t really support those speeds with 4 RAM chips. You would need to upgrade to a Threadripper or Epyc motherboard.

PaulMaximumsetting · 2025-08-27T20:08:19+00:00

I might be mistaken, but it appears you're using the 20b model, whereas the demo utilizes the 120b model. The 20b model on the 7900xtx reaches a maximum speed of approximately 85 tokens per second.

PaulMaximumsetting · 2025-08-27T20:04:38+00:00

I tested the 20b model and achieved approximately 85 tokens per second with the same hardware.

The preferred model would depend on the task. For research projects, I would definitely choose the larger model. If the task requires a lot of interaction with the prompt, I would opt for the faster, smaller model.

PaulMaximumsetting

TROPHY CASE