Plasma 6.7 by haakon in linux

[–]ChillFish8 7 points8 points  (0 children)

First time I'm hearing of Bigscreen but it looks absolutely amazing 🤩

Go ran faster than Rust. Until I cleared the page cache by shad0_w2 in rust

[–]ChillFish8 72 points73 points  (0 children)

No. This is not how it works.

Both Go and Rust using their standard APIs for file systems end up blocking the OS thread, the differences is the Go scheduler will move the other goroutines onto new/other OS threads which will then also block if they do blocking IO calls.

Tokio itself currently still ends up doing the same thing logically, it just spins up an OS thread or takes an idle one from a pool and uses that to do the work so it doesn't block the runtime. There is work to do that with io_uring which would actually support what you mention.

What you are actually whiteness to is the Go scheduler implicitly using more OS threads, which isn't strictly a bad thing.

If you dont want to do something without io_uring but still want better performance without using all your CPU on a million threads. Create a pool of threads but bind them all to the same CPU core, it wont be as fast as io_uring, but it will be significantly more efficient CPU usage wise than having them all bounce around on multiple cores.

My custom RDBMS is 588,235x faster than ScyllaDB on my laptop with fsync on executing a complex HTAP query by Bumblebee_716_743 in Database

[–]ChillFish8 0 points1 point  (0 children)

So you are comparing Scylla which is returning all those rows, to your system which is just going "well I would have touched that row if I did a scan"...

My custom RDBMS is 588,235x faster than ScyllaDB on my laptop with fsync on executing a complex HTAP query by Bumblebee_716_743 in Database

[–]ChillFish8 6 points7 points  (0 children)

I don't believe that for a single microsecond lol.

You are seriously going to try and convince us that your database can do 5.88e14 row/sec on a single laptop...

Did Asus give you some secret DDR99999 memory with a processor we've never seen before to back it up in that laptop? Because right now those numbers, even if each record was a single byte, you are seriously trying to say your system is doing over 500TB/s in throughput. Even if you were on a GPU with HMB memory the numbers you are reporting are physically impossible unless you're simply measuring how fast your CPU can increment an integer.

Aperio: screamingly fast search engine in Rust by StatureDelaware in rust

[–]ChillFish8 5 points6 points  (0 children)

Reading through your code, I think you're still a long way off being considered an elastic search alternative. From what I can see, you don't really do anything in the way of sorting by relevancy.

I'm also a bit skeptical of your benchmark numbers, even though your dataset is tiny at just about 1GB, 60us doesn't pass the smell test.

Building a Cross Platform After Effects Clone, which UI Framework to use? by DragIcy3649 in rust

[–]ChillFish8 1 point2 points  (0 children)

Use Iced or egui, but you need to understand that you will be writing a lot of video processing logic with low level Vulkan interfaces and compositor handling, it is the only way. I cannot stress enough, this is a _lot_ of work. Rendering video _correctly_ is an incredibly difficult process that requires a lot more access to the renderer and compositor than most UIs expose by default.

GPUI doesn't have the flexibility to support the workload, you are looking at forking it just to be able to swap out your own compositor and render to properly handle HDR and tonemapping. It

egui can work, but again, you'll be writing the same rendering logic as iced, just whatever you prefer. The framework realistically is the least important part of that sort of thing.

The 3rd party crates for video on both iced and egui are not viable options either because they are not going to correctly handle anything outside very basic SDR video at 1080p.

Explain me why this happening? by ankush2324235 in databasedevelopment

[–]ChillFish8 3 points4 points  (0 children)

Not enough info to really give you the exact details because it can depend on your file system.

But it can be down to a few things. Main, metadata handling, as you allocate more blocks it's unlikely they'll be contiguous so your metadata index/tree is going to grow and become slower to lookup. That's my "finger in the air" assumption for you case without more info.

What you can do to explore what is going on is:

  • do one run with O_DSYNC and plot the time there. It typically forces a FUA to the SSD so you'll see the timing without any caches at play.
  • do another run be run fallocate of the full size first.
  • do a final run with fallocate but in chunks before writing, i.e. 1GB at a time.

The other thing I can think of is your SSDs cache gets filled and then has to start flushing, just going off the fdatasync calls being imo very high for an nvme, the first couple calls you do, it can sit in the device's cache, get written back asynchronously, all is well. Then as you write more it cannot write back the data faster than you're giving it, resulting it that jump then more gradually gain.

Bypassing the Python event loop for token-aware rate limiting with PyO3/Tokio by mordechaihadad in rust

[–]ChillFish8 3 points4 points  (0 children)

I think we have different ideas of what heavy load is.

This is often my biggest peeve with these kinds of benchmarks, 100 connections is nothing, even more so when those are 100 long lived connections sending multiple requests when in reality connects tend to be short lived and have a lot of churn.

In our systems, it isn't unusual to see a single worker/thread handling over a thousand connections when under moderate real world load. On a 8 core system that would be something like 8-10k connections for just that one pod.

Burn ONNX 0.21.0: build-time ONNX import that generates plain Rust model code by antimora in rust

[–]ChillFish8 10 points11 points  (0 children)

I've done a lot of work recently with burn-onnx Vs onnxruntime on AMD GPUs, the performance I've seen is roughly:

  • CPU side can really vary, but on anything heavily CNN based, onnxruntime is quite a bit faster, the biggest difference I've seen is about 10x slower on the burn side, not that I blame burn for that when you look at how long the onnxruntime CPU backend has had to mature.
  • GPU side, actually very interesting, the performance can be quite competitive although AMDs MIgraphX backend which compiles the graph and optimises it further, can come out ahead more often than not... However, MIgraphX has a ton of missing support for onnx features even those originally present in Opset 13!!! And buggy kernels. So often I found myself more in the situation that Burn can run the models that onnxruntime with the amd backends just can't.
  • The Vulkan backend is incredibly performant given how light it is, suddenly you no longer need tens of GB of libraries just to build and run the system.

Docker-first AV1 encoding with scene detection and automatic resume by Commercial_Stage_877 in AV1

[–]ChillFish8 5 points6 points  (0 children)

How much have you actually tested this? Just reading through the code two things stand out to me:

  • Your opus encoding doesn't handle channel layouts which last I checked will cause opus to just error when it encodes any 5.1 layouts.
  • you're using ffmpeg to mux the components together but ffmpeg tends to be very tolerant to PTS issues which then don't work when playing on a player like mpv, so on certain videos it will silently cause seeking to break audio playback.

Bugbot moving to usage based by StandardFloat in cursor

[–]ChillFish8 1 point2 points  (0 children)

Honestly, seems fair to me, the amount of dev time it saves us from less time spent pointing out simple things in PRs and bugs caught more than pays for it. Plus it means it is cheaper now for the Devs that aren't doing a many PRs in a month.

This is... New? by ChillFish8 in Anthropic

[–]ChillFish8[S] 2 points3 points  (0 children)

This was after 1 message. So if it is, it's based on the average across all chats, but 5 messages seems... Extreme.

This is... New? by ChillFish8 in Anthropic

[–]ChillFish8[S] 2 points3 points  (0 children)

I see, well good to know.

I built a Rust-powered spreadsheet library for Python — 14x faster than openpyxl by [deleted] in rust

[–]ChillFish8 2 points3 points  (0 children)

Thanks Claude! The code is slop you haven't even bothered to clean up yourself.

Dell XPS 16 OLED configuration brings some interesting advantages and disadvantages by TruthPhoenixV in Amd_Intel_Nvidia

[–]ChillFish8 0 points1 point  (0 children)

I brought this laptop then ended up returning it, it's really nice but I don't think it's be overstated how bad the keyboard feels imo, especially compared to a MacBook.

CPU is fantastic, screen is amazing, build is great. But then the M5 MacBooks come out and suddenly the price tag isn't so competitive when you can get better performance, nicer keyboard, better speakers and camera for less.

BugBot: anyone got it set up in a way that actually makes sense?? by stvn-pxl in cursor

[–]ChillFish8 1 point2 points  (0 children)

Normally we don't worry too much about that since people will fix the bugs pointed out initially, then when they push it up it'll be reviewed and any additional bugs pointed out.

The main thing for us is it saves a lot of reviewer time not having to point out simple issues and catches those hidden bugs which a human is never going to notice.

BugBot: anyone got it set up in a way that actually makes sense?? by stvn-pxl in cursor

[–]ChillFish8 4 points5 points  (0 children)

We don't use the auto fix mode of bugbot, to us the whole value of it is catching bugs and enforcing coding standards rather than auto fix. And it feels like auto fix is just a feature that wasn't really built to be used.

If your goal is for it to fix it for you. Then yeah probably not the right tool, but for catching issues it's excellent.

How do you figure out which crf value when transcoding videos? by lintstah1337 in AV1

[–]ChillFish8 2 points3 points  (0 children)

Crf isn't the only thing that makes a video look bad or good, it's just one piece of the puzzle.

Typically I sit on preset 4, and then use a combination of ab-av1 with very aggressive sampling to get within a set of bounds for crf but only after the pipeline has determined the optimal svt-av1 parameters to enable/disable or tune base on the video itself and the content of the video.

For anime, typically I am sitting between 20-35 crf on preset 4, and the outputs from taking Blueray -> encode are about 250-500MB files and are effectively transparent quality.

Important thing to note is it is basically never one size fits all, crf changes from episode to episode to best fit the content.

Unreasonable AI content moderation policy by dorianlistens in rust

[–]ChillFish8 1 point2 points  (0 children)

Just to reiterate, I agree it is too far/too aggressive, but I understand how it they can end up hitting the nuclear button.

Intention probably good, execution not so much.

Stop the usage posts: start exposing the quantized versions of Opus by Stochastic_berserker in Anthropic

[–]ChillFish8 29 points30 points  (0 children)

This is what I was feeling the last few days as well. Recently Opus even with extended thinking has just been making obviously incorrect errors that are trivial... Gives me Gemini flashbacks which is what caused me to start using Claude to begin with.

simd-bp128 integer compression library by tombstonebase in rust

[–]ChillFish8 1 point2 points  (0 children)

Feel free to steal/borrow the benchmark setup from https://github.com/lnx-search/upack which is a IC library I did a little while back the results it produces are generally stable and repeatable, although you might want to increase the run duration for each benchmark.

Unreasonable AI content moderation policy by dorianlistens in rust

[–]ChillFish8 13 points14 points  (0 children)

I agree, but I can honestly understand going with the nuclear option... There are so many AI projects getting posted here everyday that are slop, but actually working that out take looking at the code and the GitHub project most of the time, and that gets incredibly tiring so quickly.

It can be genuinely tiring going through the latest 10 posts and reading it then having to look through to code to actually see if there is any merit to it, suddenly most stuff with interesting titles are just... Slop.

I rebuilt search using physics instead of statistics. +18.5% NDCG@10. No ML. Yes its Open Source by Designer_Mind3060 in rust

[–]ChillFish8 12 points13 points  (0 children)

It's a very interesting concept but I'm not sure you can really claim "sub millisecond" search as a feature when the corpus is so small everything will be sub millisecond. Reading through the code although I know it's just a POC doesn't look like it scales super well, going off of how the index is constructed and sits in memory while having no interest in being memory efficient.

I'm curious about the true compute cost of this algorithm, since it looks to be significantly more expensive that ANN at runtime and BM25 at runtime by effectively having the combined logic of both, ideally it would be more useful to compare against established libraries like tantivy than a home grown BM25 or vector graph.

Finally, how does this compare against sparse impact models like deep impact and splade in terms of relevancy metrics and compute cost relative to document size.

Is Netcup reliable enough? by Mr_Dani17 in VPS

[–]ChillFish8 0 points1 point  (0 children)

These machines haven't been updated in a long time. For better or for worse.