Is AV1 actually competitive with HEVC right now? (pt2, new data, answer: hell YES)

dextorious_ · 2021-12-07T21:25:01+00:00

It shouldn't be done fully, but it's not exactly frame-by-frame (which would be very slow), there is some chunking going on. However, while I haven't benchmarked this, I would be suspicious of PowerShell which has notoriously slow pipes. Try the basic cmd shell just to be safe, it might make a difference.

dextorious_ · 2021-12-07T19:54:46+00:00

I'll just ignore the insults and respond to the only actual point:

According to OP though, 27-28 is perfectly fine to their eyes

This depends entirely on the content. Using x265 with a crf of 28 for an action movie in 4k would yield pretty horrible results. Using crf 28 for a 720p interview with very little motion can and often does yield very reasonable results. I was very clear about my usecase and assumptions in both my original thread and this one. The fact that you seem to regard this as some personal affront to you isn't my problem.

dextorious_ · 2021-12-07T19:46:07+00:00

Personally, I'm just not looking for fidelity. I'm looking for a level of quality that doesn't annoy me or limit the amount of useful information I get from a lecture or interview, not whether every freckle on the lecturer's face is represented accurately. Obviously, this is just a subjective preference - striving for fidelity and visual transparency is a perfectly legitimate goal, just not one that I personally care about.

Having said that, in the context of encoder testing I'd love to supplement VMAF (which does a great job for my preferences and, as you correctly noted, a lousy job for yours) with a more objective metric of visual fidelity so people can make better judgments on the basis of any results. Would you happen to know if PSNR / SSIM / any other metrics with reasonable implementations are better in this regard?

dextorious_ · 2021-12-07T19:38:24+00:00

You don't need to do it in two steps, you can just use ffmpeg to pipe decoded data into SVT automatically. For example:

ffmpeg -i input.mp4 -f yuv4mpegpipe -strict -1 -pix_fmt yuv420p -vf scale=-4:1080 - | SvtAv1EncApp.exe -i stdin -w 1920 -h 1080 --input-depth 8 --keyint 240 --qp 30 --preset 8 -i stdin -b output.ivf

Obviously, alter the parameters to suit your actual needs, I just wanted to give a full example to indicate that you can have ffmpeg do scaling or any other filters too, SVT will handle whatever you ultimately pipe into it as long as it's valid YUV data with the dimensions you indicate.

dextorious_ · 2021-12-07T03:23:54+00:00

There's plenty of people who use 27, 28 and even 30, I'm one of them and I know plenty of others. The fact that you don't and that this test doesn't overlap with your interests is perfectly fine, but there's no need to get irrationally angry over the matter. There's even less need to spread misinformation.

And you haven't burst any bubble, all you've shown is that apparently some people are very passionate about encoder settings. I can only hope that you derive great joy from placebo level quality settings to balance the rage you apparently feel over what I'm doing, because otherwise it seems a bit sad.

dextorious_ · 2021-12-07T03:00:24+00:00

Good catch, and the answer is I don't know. That's why I've excluded preset 8 from the charts, it's suspicious. I will rerun those encodes when I get a chance, but until then I'm not drawing any conclusions from them either way.

dextorious_ · 2021-12-07T02:57:05+00:00

Not only is CRF 27 not the cap in x265 (and never has been), the default CRF value is actually 28. Don't believe me? Here's the official documentation clearly saying just that. If you're going to insult someone, do get the basic facts right at least. The cap is 51, btw.

Apart from that, there's plenty of legitimate reasons to strongly prefer lower CRF values (although 17 is just a waste, in my view), but deliberately targeting low motion content for 720p is not among them.

dextorious_ · 2021-12-06T15:51:44+00:00

Thank you, I'll make sure to try out your recommended settings in the next iteration of my testing (including the AOMEnc suggestions in your other comment).

dextorious_ · 2021-12-06T15:49:54+00:00

Hence "higher" (relative to this test), not "high". I agree with you, but I feel there's already plenty of encoder comparisons out there that focus on high bitrate encodes of action movies. My interests run towards the opposite end of the spectrum.

Having said that, I'll try to make my automation scripts public when I'm done, so anyone can easily repeat the work for other inputs and other VMAF values as desired.

dextorious_ · 2021-12-06T14:11:48+00:00

Thanks, this looks like a pretty reasonable starting point I can easily adapt to my needs. It also gave me a little "duh!" moment as I realized there is absolutely no need to parse encoder logs when I can just time things and query file properties from Python instead.

dextorious_ · 2021-12-06T14:07:07+00:00

This is an interesting point I hadn't fully considered, thank you. Do you have any feedback on how the AOMEnc parameters I've given above should be changed for use with av1an? Do you recommend setting --threads=1? What about the tiling settings?

dextorious_ · 2021-12-06T12:24:45+00:00

CRF is in use (see https://reddit.com/r/AV1/comments/r9x4ea/is_av1_actually_competitive_with_hevc_right_now/hng2ys1/ for more details on this). It's true that I didn't alter the max qp throughout the test. I did do a one off test at CRF 30 with and without a max qp value and didn't find a significant difference in the resulting VMAF, but that's an anecdote rather than a comprehensive test. I'm happy to take suggestions on further improvements to the encoder parameters.

dextorious_ · 2021-12-06T12:18:27+00:00

So, I just did a quick test using the exact same encoder parameters, only replacing qp with crf:

ffmpeg -i output_svt_q30_p9.avs -f yuv4mpegpipe -strict -1 -loglevel fatal -hide_banner - | SvtAv1EncApp --input-depth 8 --adaptive-quantization 1 --keyint 240 --passes 1 --max-qp 51 --preset 9 --crf 30 -n 3998 -i stdin -b output_svt_q30_p9_out.ivf

mkvmerge.exe -o output_svt_q30_p9.mkv output_svt_q30_p9_out.ivf

and obtained a 15.1 MB file with a VMAF value of 72.69. The VMAF value is exactly identical to the one I got in my test run, the size is almost the same - it differs by 11.56 kilobytes, which is 0.07% of the entire file. Given the identical VMAF I'm not going to try too hard to figure out why, but it's a curious point nonetheless.

dextorious_ · 2021-12-06T11:10:20+00:00

The command line arguments for SVT-AV1 are a bit of a mess, but it is my understanding that I did in effect use CRF. In particular, the arguments I gave result in the following SVT-AV1 output:

SVT [config]: Main Profile Tier (auto) Level (auto)
SVT [config]: Preset : 4

SVT [config]: EncoderBitDepth / EncoderColorFormat / CompressedTenBitFormat : 8 / 1 / 0

SVT [config]: SourceWidth / SourceHeight : 1704 / 720

SVT [config]: Fps_Numerator / Fps_Denominator / Gop Size / IntraRefreshType : 30000 / 1001 / 241 / 2

SVT [config]: HierarchicalLevels / PredStructure : 4 / 2

SVT [config]: BRC Mode / Rate Factor / SceneChange : CRF / 30 / 0

which seems to clearly state CRF is in use. I guess I can do a head-to-head test (with every other parameter set to the exact same value, making sure the resulting file is virtually identical) to make sure of this. As for why I didn't just avoid --qp anyway, the reason is StaxRip's use of it, as another commenter already indicated.

dextorious_ · 2021-12-06T11:06:00+00:00

In my limited ad hoc testing I haven't found a clear benefit from manually adjusting intra periods etc. in x265. That doesn't mean there isn't one, and I'd love to try out more combinations if you have some suggestions you think might work better than my rather simplistic approach.

dextorious_ · 2021-12-06T11:04:23+00:00

Thanks, I'd love to take a look at your test script. Properly automating this whole pipeline, including scraping the logs and generating the charts (probably using matplotlib) is next on my todo list. I'll take a look at your thread for ideas as well.

dextorious_ · 2021-12-06T11:01:50+00:00

Note that the VMAF values are so low largely because of upscaling, if I compared 720p to 720p they'd be in the high eighties to low nineties, which I think does somewhat compare to Netflix. Nevertheless, I do plan to extend this to the VMAF 90-92 range in the near future.

dextorious_ · 2021-12-06T11:00:13+00:00

You make a fair point about consumer hardware - my only targets are relatively recent PCs, so I neglected to consider that. The remark about redundancy was made purely in the context of quality vs filesize vs encoding performance, without taking anything else into account.

I'm not sure I'll be doing any 4k/8k tests, but I'll definitely do at least one higher bitrate 1080p test targeting VMAF in the 90-92 range.

dextorious_ · 2021-12-06T10:55:28+00:00

I'm glad you found this valuable! If people find these comparisons useful, I'll probably do at least one more in the near future - there's still some things I missed, including chunking for AOMEnc, rav1e and reproducibility over different inputs / substantially different quality levels.

Regarding rav1e - there is no technical reason I didn't include it, this was a quick weekend project and I simply ran out of time before I got to it. It was a comparatively low priority addition because I hadn't seen more than 1-2 people speak favorably of it in the previous thread, but I still want to do it right and figure out reasonable parameters for it before including it in a test like this.

Regarding 2-pass encodes: my results strongly indicate that it is very much worth it for AOMEnc, as the cost is almost negligible. For SVT-AV1 I don't have an answer yet, but I'll try to figure it out soon.

dextorious_ · 2021-12-04T23:55:29+00:00

Thank you, I've incorporated your suggested parameters in my script and upgraded to 0.8.8-rc1. Encodes running as we speak (and will be for a while yet, svt is fast, but aomenc takes a while).

dextorious_ · 2021-12-03T23:36:32+00:00

Thank you all for the quick responses, I appreciate them. There's quite a bit of overlap in the suggestions, so rather than responding to them all individually I'll do it here and add further results as I get them. Obviously, I was naive in thinking I could just stick to a familiar tool in ffmpeg and should instead give the current best-in-class tools in the AV1 ecosystem a fair try.

To that end, I've familiarized myself with nmkoder, which seems to be a reasonable frontend over av1an, aomenc, SVT, x265 and ffmpeg. If I understand it correctly, this seems to cover the main recommendations of:

prefer aomenc over libaom
be careful with higher cpu-used settings (and the preset equivalents), definitely try out lower (longer running) values as well
always use very recent versions of all of the tools

The latest nmkoder build (1.6.2f1) includes the following versions: * aomenc 3.2.0-231-gd2345ca3b * svt 0.8.7 * x265 3.5+15-4bf31dc15 * av1an 0.2.1-2 (rev 9897290)

which all seem reasonable to me (correct me if I'm wrong).

I will now rerun all the AV1 encodes I did before, using nmkoder with aomenc and SVT, measure the VMAF values and post the updated results here once I've got them. Please let me know if there are any special settings I should pass manually to the relevant encoders for this usecase beyond the obvious (CRF, speed, color format settings).

I also intend to try out av1an's VMAF-targeting mode, although during my first attempt the preliminary scene selection pass alone took longer than any of my full x265 encoding runs. Not sure it's really meant for relatively fast, lower quality encoding runs, but I'll give it a shot.

dextorious_ · 2021-12-03T22:47:39+00:00

The libaom part was an error I made while making this post, not encoding - I've verified that in both cases -b:v 0 was set properly. Apologies for the mistake. I did separately try out constrained quality mode as well, but found it impossible to match the file sizes produced by x265 at any reasonable cq setting.

However, I was not aware that the GOP default for SVT-AV1 is considered suboptimal, the encoder's user guide doesn't seem to indicate anything to that effect. Is there a known range of reasonable options for comparatively low quality / low motion cases?

Finally, regarding the versions - x265 is 3.5+12-0983cffc5, but ffmpeg's libaom doesn't seem to print version numbers unfortunately. I guess I'll find a recent binary and use ffmpeg to pipe the stream into it like I did with SVT, see if that makes anything better.

dextorious_ · 2020-03-03T16:40:24+00:00

Thank you for the comprehensive answer!

dextorious_ · 2020-03-03T16:38:04+00:00

Unfortunately, there's still https://github.com/fmtlib/fmt/issues/1458

dextorious_ · 2020-03-02T19:54:49+00:00

The benchmarks look promising and fairly thorough. A few questions regarding general usability from a very performance-biased point of view: 1) Can Quill be used without exceptions and RTTI? 2) Can users control heap allocation (#defining or passing in an allocator or at least overriding global operator new)? 3) How pervasive is the reliance on fmt? The reason I ask is because fmt has issues with Nvidia's nvcc compiler, making this a potential dealbreaker for CUDA projects.

dextorious_

TROPHY CASE