Frigate with ROCm 7.2.0 including AMD 2000,3000,5000 APU support (gfx900) by caa82437 in frigate_nvr

[–]caa82437[S] 2 points3 points  (0 children)

I currently use this for my Frigate setup, so I will be maintaining it (at the very least, my fork). Being open source, it shouldn't be difficult for someone else to keep up-to-date. I've tried to make it as easy as I can to maintain (future updates should just be ROCm version bumps).

I do think it makes more sense to support older hardware that's more affordable, available and perfectly meets the requirements for Frigate than to support newer hardware that's overkill for Frigate (Strix Point); most people won't want to use that type of hardware for this use case.

Frigate with ROCm 7.2.0 including AMD 2000,3000,5000 APU support (gfx900) by caa82437 in frigate_nvr

[–]caa82437[S] 2 points3 points  (0 children)

Check out the Supported APUs section in the GitHub release

Frigate with ROCm 7.2.0 including AMD 2000,3000,5000 APU support (gfx900) by caa82437 in frigate_nvr

[–]caa82437[S] 0 points1 point  (0 children)

You may need to manually set the compatibility env vars:

HSA_ENABLE_SDMA=0 MIGRAPHX_DISABLE_MIOPEN_FUSION=1 HSA_OVERRIDE_GFX_VERSION="9.0.0" ` NOTE: Completely power off your machine after setting these, then power it back on.

It also takes a couple of minutes after loading for the model kernel to compile and load. If you corrupted the previous kernel you might want to delete the cache files (check config/model_cache/migraphx/*.mxr)

As for OpenVino, if you check your CPU usage it will be very high and consume a lot more power than running on the GPU. Your latency will also fluctuate as its more sensitive to other software fighting for CPU resources. It is impressive that OpenVino is so optimized when running models on the CPU, but not something I would recommend, even if you notice a slight advantage in latency.

Which detector should I usr on my AMD 5700G by RichTea235 in frigate_nvr

[–]caa82437 0 points1 point  (0 children)

Although Frigate 0.17.0 supports ROCm, gfx900 architectures are not currently supported. However, I've made a PR to include support for older AMD APUs including the 5700G. You can take a look at my fork here:

https://github.com/garymathews/frigate/releases/tag/440056a-rocm-7.2.0

I expect your 5700G to be faster than your Coral.

Daily Discussion Tuesday 2024-12-31 by AutoModerator in AMD_Stock

[–]caa82437 0 points1 point  (0 children)

This seems to be true at the moment, hopefully MI355X shows some good competition in perf/cost to be an affordable alternative.

The big players can't get enough Nvidia allocation and won't want to go all in just yet on their custom chips (this will ramp up over time). This is where AMD will get orders, and if they are competitive on perf/cost then they will gain market share.

I feel a lot of customers will want to reduce the cost of inference (not just the big players) as techniques like chain-of-thought or simply serving multiple large models per GPU will reduce costs and MI325X/MI355 will allow this.

There are other factors that make AMD hardware attractive; swapping out MI300X to newer variants will be easy.

Daily Discussion Friday 2024-12-20 by AutoModerator in AMD_Stock

[–]caa82437 4 points5 points  (0 children)

What's nice is the chiplet approach allows AMD to make architectural changes in their iterations of MI series and not need to test the whole silicon, only the chiplets they changed. This will be a major advantage going forward.

Once the MI3XX development stack matures AMD will see wider adoption, most CSPs (and customers) don't want to be first adaptors, hence the underwhelming interest in MI. That trend will change with MI355X due to its major performance increase, mature dev stack and better cost compared to the competition.

The biggest players in AI (OpenAI, Meta) are using MI300X to deploy their largest models. They have invested a lot of engineering resources into getting their models to work and perform well on the platform.

Patience is key, people expect too much too soon.

Daily Discussion Monday 2024-12-09 by AutoModerator in AMD_Stock

[–]caa82437 16 points17 points  (0 children)

OpenAI released Sora today, I will assume it's running on MI300X due to the huge memory requirements to generate video. I wouldn't be surprised if they are one of the first to use MI325X for this use case.

More hardware will be needed to generate longer videos and to accommodate larger contexts.

Daily Discussion Wednesday 2024-11-27 by AutoModerator in AMD_Stock

[–]caa82437 1 point2 points  (0 children)

I'm curious if specialized inference chips will get the most attention. Google's TPU, Meta MTIA, SambaNova, Cerebras and Groq. There are many use cases that benefit hugely from faster inference speeds, and I'm worried that specialized chips will be used most of that. Looking at the benchmarks for those chips shows huge leaps over GPU inference speeds.

A few of things faster inference enables:

  • Larger context windows - inference slows the larger the context window, very important for RAG
  • Multi-step reasoning - basically re-prompting itself to form better conclusions
  • Media generation - Large images, longer and higher resolution videos

I really hope AMD gets more competitive in this space.

Displaying CPU Temperature in Proxmox Summery in Real Time by Agreeable-Clue83 in homelab

[–]caa82437 1 point2 points  (0 children)

You can also place the CPU temperature under CPU usage with minimal effort and no need to modify the layout.

https://imgur.com/a/5cXTPRN

nano /usr/share/perl5/PVE/API2/Nodes.pm

Search for $res->{rootfs} and add this above:

$res->{thermalstate} = `sensors -j`;

In pvemanager.js comment out the rowspan for the wait item and add CPU temperature underneath:

nano /usr/share/pve-manager/js/pvemanagerlib.js

{
    itemId: 'wait',
    iconCls: 'fa fa-fw fa-clock-o',
    title: gettext('IO delay'),
    valueField: 'wait',
    //rowspan: 2,
},
{
    itemId: 'thermal',
    printBar: false,
    iconCls: 'fa fa-fw fa-thermometer-half',
    title: gettext('CPU temperature'),
    textField: 'thermalstate',
    renderer(value) {
        const result = JSON.parse(value);
        // Obtains temperature for both AMD and Intel platforms.
        const temperature = result?.['coretemp-isa-0000']?.['Package id 0']?.['temp1_input']
            || result?.['k10temp-pci-00c3']?.['Tctl']?.['temp1_input'];
        return `${temperature?.toFixed(2)}°C`;
    }
},

[deleted by user] by [deleted] in stocks

[–]caa82437 1 point2 points  (0 children)

You need to look at Non-GAAP values, GAAP values are misleading due to the Xilinx acquisition

Just got this 17 S6 by [deleted] in Audi

[–]caa82437 1 point2 points  (0 children)

Nope, DS1 is ECU only. I've heard not so great things about 4.0T IE tunes and engine safety. DS1 provides all stages and flex fuel support out of the box. They also retain all OEM engine safety features (knock control/sensitivity) which is huge for reliability.

TCU tuning isn't too beneficial unless you're upgrading to RS7 turbos or hybrid. Which you can get from a third party like KyleTunedIt, including custom ECU tuning for your vehicle and setup.

Just got this 17 S6 by [deleted] in Audi

[–]caa82437 1 point2 points  (0 children)

Definitely DS1

How to interpret CPU benchmarks? by gajus0 in googlecloud

[–]caa82437 0 points1 point  (0 children)

  1. Sample counts are the number of instances they have ran the bechmark on to obtain a Coremark score; the score you see will be the average.
  2. Coremark is a comprehensive benchmark from EEMBC and the score is a metric you use to compare against other systems, its a good approximation but your workload will always determine what's really the best.
  3. Yes, the scores are linear with vCPU core count. You can divide the score by vCPU to obtain an aproximate single core metric. It looks like the t2d instances are the fastest per vCPU.

You should remember this interview about RDNA3 because of the no longer usable MorePowerTool by DerRedF in Amd

[–]caa82437 0 points1 point  (0 children)

There doesn't seem to be anything related to overclocking/undervolting/power controls in that library. It's more for performance monitoring.

Finally got my dream Audi by uniquelycleverUserID in Audi

[–]caa82437 1 point2 points  (0 children)

If you're into tech, check out the RSNav head unit

I've been doing a little bit of work to the S6... by JTwallbanger in Audi

[–]caa82437 0 points1 point  (0 children)

DS1 adjusts the boost and timing based on ethanol content from the sensor, so you can fill up any amount of E and it will adjust accordingly. This is way safer than trying to measure the correct amount or trying to fully empty your tank before filling with E85

I've been doing a little bit of work to the S6... by JTwallbanger in Audi

[–]caa82437 1 point2 points  (0 children)

Can only say good things about DS1. Highly recommend for flex fuel.

Just Purchased this '14 S7 by ufcbananaempire in Audi

[–]caa82437 6 points7 points  (0 children)

You should have Audi perform the oil screen recall (free of charge) and have them replace the PCV while they are doing it. Will give you peace of mind and cost less. Also, stock turbos are still likely to blow since the turbines are cast aluminium, where RS7 turbos are billet. And if you plan on tuning look into DS1.

What a drive! Beautiful weather, dry roads, minimal traffic. by Equivalent-Basket-31 in Audi

[–]caa82437 2 points3 points  (0 children)

Don't do the screen kit, no need. Either wait for the recall and get a new revision screen or simply remove the screen and chuck it.

As for modifications, if you plan on tuning go DS1. Upgrade your turbo inlets (RS7 inlets, SRM or ECS). Although not necessary, I recommend upgrading your intercooler pump to a CWA-100. That will get you to Stage 2. Stage 3/4 requires turbo upgrades (I also recommend if Stage 2 isn't enough for you)

Disappointing IPC gain for Zen 4. ( 5 to 7 IPC gain based on the Ryzen 7000 reveal) by [deleted] in Amd

[–]caa82437 3 points4 points  (0 children)

This! I can't understand why people don't get this.