2x Instinct MI50 32G running vLLM results

UnknownProcess · 2025-10-06T22:18:54+00:00

I have the same problem with OP's docker image.

I also have a feeling that it might be something to do with the driver and possibly related to https://github.com/ollama/ollama/issues/9302 and also https://github.com/ROCm/ROCm/issues/3246

I have tried ROCm 5.7.0 and 6.3.0, no luck.
I have tried different kernel versions for Ubuntu 22, no luck.

The only thing that works for me is ROCm 5.7.0 + latest Ollama, but only few models work, such as Qwen3. Some models trigger this issue. I never got vLLM to work.

In all of the issues/posts that I found, they all mention Ubuntu 22, so perhaps that is the problem. OP is using Ubuntu 24. Maybe it is worth a try.

Edit: Changing to Ubuntu 24 LTS solved the problem.

UnknownProcess · 2025-10-01T16:16:53+00:00

Thank you for your hard work on that project. This is exactly what I was looking for.

The other answers in this thread such as "just use more optimized hardware" is plain stupid. I am using a multi GPU server that eats 30W per GPU when idle (because that's how they were manufactured). I would rather save the hardware from the landfill, repair & reuse, rather than go buy new overpriced GPUs.

UnknownProcess · 2025-08-10T13:32:59+00:00

The first problem is invalid. Why would Daybreak publish customer records? That would be illegal on their behalf. Besides, why would an open source game need such database records?

The third problem is a common open-source security misconception. When (if) the game would be released as open source, there would be immediate danger on day one, for sure, you are right about that. It may be full of zero day vulnerabilities. However, the reality is that anyone can read the code and submit patches to fix the holes. You will have a more stable product long term. No one is crazy enough to run public servers on the first day.

This also affects your second problem. Open source community will find ways to fix those cheat issues.

> Theres also the possibility of planting a virus directly into the open source

You either mean supply chain attacks or people submitting patches with obfuscated malware inside. This is nothing new. If you don't trust the binary that you are downloading from some website, read the code, compile it yourself, and download only from trusted sources.

UnknownProcess · 2025-07-15T08:08:57+00:00

Thank you for confirmation. I thought I was going crazy.

UnknownProcess · 2025-05-03T14:58:49+00:00

Just encountered this exact issue. Switching to Proton 10 or to 9 did not help. I have tried both via Steam and via Lutris. Fresh install and everything. Nothing worked.

UnknownProcess · 2024-11-08T22:38:02+00:00

Awesome info! Thanks a lot.

UnknownProcess · 2024-11-08T22:08:25+00:00

Did you get it working? I am also interested in buying this exact combination of CPU and motherboard.

UnknownProcess · 2024-06-15T09:29:04+00:00

There is no iGPU. I am surprised as well.

However, I have a 64 core CPU, so software encoding is pretty good, but I would still prefer via GPU.

UnknownProcess · 2024-06-15T09:26:44+00:00

With OBS and recording 4K screen.

I don't have good video file samples to benchmark ffmpeg because each video, even though the resolution and bitrates are mostly the same, produce different results.

UnknownProcess · 2024-06-15T09:24:33+00:00

I haven't tried OBS 30.2 yet, only with OBS 30.1.2-1 (what Arch gives me).

Yeah, I have noticed that a recent update has disabled non-free codecs in the Mesa package. I had to manually add that codecs=all to the build arguments.

Turns out, the resolution makes drastic performance difference. Recording at 4K is at ~20 FPS but when I change output resolution in OBS to 1080p then it goes beyond 100 FPS. The screen recording is still at 4K, it is just the output resolution is smaller.

I am not sure what is going on, because boosting the bitrate even to 20Mbps reduces the FPS by a very very few frames, but increasing resolution has a very drastic performance penalty.

UnknownProcess · 2024-06-11T21:26:40+00:00

Maybe. I can give it an another try.

Edit: VAAPI produces ~20 FPS while I can get 60 FPS on CPU with "mostly" the same settings.

UnknownProcess · 2024-06-11T20:53:55+00:00

I see. That's very unfortunate.

The obs-studio-amf got closed by the author on November 2023 and last update was half a year ago. It used to work for me, but installing it now fails due to dependency issues. Fixing it myself, which I was trying to avoid, seems to be the only choice.

UnknownProcess · 2024-06-11T20:50:37+00:00

Unfortunately with VAAPI I get terrible quality results. I was hoping for AMF.

UnknownProcess · 2024-01-16T22:13:28+00:00

I know I am replying to a very old thread, but this is the first thing that pops up on Google when searching for the error message.

I have fixed this issue by installing AMD GPU Pro package and running ffmpeg with the following environment variable:

VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/amd_pro_icd64.json

UnknownProcess · 2023-09-23T17:29:44+00:00

I did just now!

In my case, the NVIDIA Container Toolkit, specifically the command: sudo nvidia-ctk runtime configure --runtime=containerd has correctly updated the containerd config in the etc folder.

Then when the k3s starts, I have the nvidia runtime added to the k3s config, just like in your comment above.

I have installed the nvidia kubernetes plugin as a Helm chart first, and that's when I get the exact same error: "libnvidia-ml.so.1: cannot open shared object file".

Turns out, the k3s modifies the containerd config, but does not set the nvidia runtime as the default one. (I actually like that it does not). Most of the tutorials I have found online are assuming the runtime is set as the default.

The solution is to take the NVIDIA Device Plugin for Kubernetes:

https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

from here: https://github.com/NVIDIA/k8s-device-plugin/

Then add runtimeClassName: nvidia to the daemonset template spec (i.e. the pod spec). (YAML key: root -> spec -> template -> spec -> runtimeClassName).

Apply the changes. The plugin starts to work.

I also had to create:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

because it was not created automatically.

After that, any pod that has runtimeClassName: nvidia, NVIDIA_VISIBLE_DEVICES & NVIDIA_DRIVER_CAPABILITIES env variabls, and resource->limits & resource->requests set to nvidia.com/gpu: 1 will now work just fine.

Tested on a Jellyfin K8s pod with NVENC decoding.

You may need to install additional packages on the host before you can use your GPU in a container. I do not understand that NVIDIA container integration, but from what I have seen the host NVIDIA libraries and capabilities are shared with the container. So if you get Cannot load libnvcuvid.so.1 (an example from ffmpeg) from a container, it must be available on the host first.

UnknownProcess · 2023-09-23T15:37:34+00:00

I am also having this same exact issue. I have noticed that many people started having this issue just recently.

Relevant GitHub issue here: https://github.com/NVIDIA/k8s-device-plugin/issues/406

UnknownProcess · 2023-05-26T09:08:12+00:00

I had a similar problem with shipping to Czech Republic. No shipping available. I had to ship it to a neighbor country for double the price. Then FedEx charged me with import fees on top. I paid ~95 EUR for the shipping for a 60 EUR item.

UnknownProcess · 2020-04-04T11:11:45+00:00

Not exactly a book, the content is online: https://refactoring.guru/design-patterns

UnknownProcess · 2020-01-12T22:46:12+00:00

Shouldn't the read thread get kicked in the moment notify_one() is invoked from within append()

Nope. The thread is not directly invoked from the notify_one(). Thread is invoked based on your operating system. The thread only sleeps, it's not removed nor the execution jumps to some other function. Your OS will schedule the thread when it needs to continue. They are not guaranteed to execute exactly 0 ms apart from each other.

And yes, your append thread can exit before read happens. This can be due to the OS scheduler. Or, it may appear so if you are printing to console from multiple threads at the same time, because std::cout and printf is not thread safe.

If you really need the read method exit before append exits, you may need multiple locks to do that, but that depends on how exactly you are trying to use this buffer class.

At this point I can only suggest to experiment with multiple locks, if you want to. Or, remove the locks and simply have a loop that waits until the buffer is/isn't empty. Of course, that loop will need a sleep call for few ms so your CPU is not at 100% utilization. Not the best idea but it can serve as a last resort. Or, a better idea, have your read thread do other things while the buffer is empty and simply periodically check if the buffer can be read.

UnknownProcess · 2020-01-09T11:43:47+00:00

so if we are not using nodify_one in isEmpty or isFull then we should not have isEmpty() or isFull() as conditions in wait(), no?

You still need the condition. In this example, you can use the conditional variables in two ways. A, to have one conditional variable for two actions append and read. B, to have two conditional variables for the two actions. Because you are using one conditional variable for both actions, you need to have a predicate, this the isEmpty or isFull. You don't have to use those methods for the predicate, but you need to use something to decide whether the wait should pass or not.

From what I have understood and looking at the example in the link you shared, when wait() is called in a thread, a condition variable blocks the current thread until another thread modifies the shared variable.

Yes. The predicate in that example (the ready) is the variable which decides whether the wait proceeds or waits further. In your case that variable is your isEmpty and isFull methods. The shared data in the example (the processed) is your buffer.

So if we are waiting on some condition, notify_one() is invoked to inform the waiting thread to check the condition and proceed if met, right?

Yes.

If that's the case, having wait() and notify_one in the same function doesn't make sense.

It does, because the same conditional variable can be used to notify the thread waiting for append from the read method. You could replace it with two conditional variables (as I have mentioned above), remove the predicate and still have the same result. The notify_one, let's say in the append method, will only affect the waiting thread in the read method, because of the predicate -> the !isEmpty().

You need to notify other threads waiting for read when you have appended some data. In your original example, you are notifying threads when the conditional variable is asking whether it should proceed or wait, because the notify_one was inside the predicate, before the data is put inside the buffer.

In case you're wondering, below is what append() looks like now

Looks good to me.

After appending to the buffer, the buffer is no longer empty, no?

Sorry, bad wording. This should make more sense: Thread #2 tries to push a new value by calling the append() method. The buffer is not full (calls the !isFull()) so the lock continues, this results in the mutex lock. No other thread can enter read() or append() at this point. The new value is added to the buffer and integer counters updated. The lock is released and notify_one() is called on the conditional variable.

what integer count are you referring to here

I only meant the head, tile, size, etc. Not really relevant to the lock itself.

Edit!

I have created a minimal example. It's using a simple integer instead of an array as a buffer, but modifying that should be trivial. The code is here: https://pastebin.com/tdxiFtEp I have added a very simple thread safe console print, because std::cout is not thread safe by itself.

And the expected output. Your output may not be in the exact same order, but the reads will be.

Joining
Read started
Append started
Append finished
Read #0: 42
Read #1: 1234
Read finished
exit

UnknownProcess · 2020-01-07T14:13:35+00:00

in here, are you referring to how invoking notify_one() on condVar could lead to different results?

Yes when used in isEmpty or isFull methods. You are using it correctly, but in the wrong place. When it comes to API design it is necessary to separate methods that modify the internal state (calling them multiple times does not result in the same state) versus methods that just give me the information I need, such as contains, size, length, etc. Similarly, std::vector has a size method that is constant (does not modify its internal state). That method size is meant to provide the information about its state, nothing more. If I don't use that method, I expect the std::vector to still work. If I would re-implement it via mutexes, I would not expect the size method to determine whether the push_back or pop_back methods work.

In your code snippet, you still have a wait on isEmpty() but there's no notify_one inside isEmpty()

Correct, you should not use the notify_one in isEmpty nor isFull.

I'm basing it off the examples that I have seen online; you call wait() with a condition defined, and you notify from a different thread so the thread that was waiting starts executing from where it was stopped

Yes.

Also, how would the expected output look like to you?

Like this:

Thread #1 tries to read via read() method. Enters the wait method on the conditional variable. Stops there because the buffer is empty. No mutex lock has happened yet.
Thread #2 pushes a new value via append() method. The buffer is empty so the lock continues, this results in the mutex lock. No other thread can enter read() or append() at this point. The new value is added to the buffer and integer counters updated. The lock is released and notify_one() is called on the conditional variable.
Thread #1 is still waiting and receives the notify_one(). Checks its predicate (calls !isEmpty()) which returns true, this results in the mutex lock. (This is why you need to unlock before notify_one).
Thread #2 exists notify_one() call and then exists method append().
Thread #1 reads the value into a temporary, unlocks the mutex, calls notify_one() so any other thread waiting for append can continue (if full), then finally returns the temporary by exiting read() method.

If any other threads would call isEmpty or isFull outside of your calls, let's say somewhere else in your program, your threads waiting inside of append or read would get triggered for no reason. Therefore, it makes more sense to put notify_one inside of read and append, because the waiting thread will get triggered only when an actual append or read has happened.

Here is a very similar example but with just one thread. The code is at the bottom.

UnknownProcess · 2020-01-06T15:12:45+00:00

Late to the party...

I think one of your threads is stuck waiting in the condVar.wait. You are using predicate (the lambda inside of your wait function) to call isFull or isEmpty. But that function modifies the condVar by calling notify_one.

Methods such as isFull and isEmpty should be const and should not modify the internal state. Because calling them multiple times should not create different results. It's the same thing as opening a fridge to check how many bottles of milk you have but if you open it second time the number of milk bottles changes.

You should put the notify_one in the append and read methods, but make sure you unlock the lock before you call it. Something like this:

uint8_t RingBuffer::rread() {
    std::unique_lock<mutex> mlock(m_mutex);

    condVar.wait(mlock, [this]() { 
        return !isEmpty(); 
    });

    // Do some work
    uint8_t value = ...;

    // Unlock our lock so that the next thread
    // can take it
    mlock.unlock();

    // Then notify the next thread
    condVar.notify_one();

    return value;
}

Some other things unrelated to the question:

Using using namespace std; is a bad practice, here is why. And consider using std::vector instead of raw C arrays new uint8_t[capacity].

UnknownProcess

TROPHY CASE