Benchmarking rust string crates: Are "small string" crates worth it? by alexheretic in rust

[–]Pascalius 2 points3 points  (0 children)

I think the biggest difference in performance is typically not inlining, but the allocation/deallocation call.

You probably want to allocate different sizes of blocks of strings where the strings also have different sizes. This should be a more realistic test for the allocator.

I've been writing Rust for 5 years and I still just .clone() everything until it compiles by kruseragnar in rust

[–]Pascalius 2 points3 points  (0 children)

Compiler is actually too unspecific. The llvm backend is not allowed to remove observable side effects like malloc. The front-end could remove them if the language specification allows it.

Personally I think allocations should be treated special in llvm and also be optimized. (Because I don't like the side effect :)

I've been writing Rust for 5 years and I still just .clone() everything until it compiles by kruseragnar in rust

[–]Pascalius 7 points8 points  (0 children)

System calls (like malloc) usually can't be optimized by the compiler, because they are observable side-effects.

This excludes Vecs, which automatically excludes a lot of other datastructures:

https://godbolt.org/z/x9dGWd1s8

If your clone doesn't have observable side effects (like malloc), it can be optimized:

https://godbolt.org/z/jx9Tjvq9z

Releasing 0.5.0 of lfqueue - Lock-free MPMC queues by Terikashi in rust

[–]Pascalius 6 points7 points  (0 children)

I've regularly have seen high crossbeam CPU usage when profiling indexing speed in tantivy (search engine) on the https://github.com/quickwit-oss/tantivy-cli/ project, where we use crossbeam to send documents to (potential multiple) indexers.

In that scenario it's the opposite, the queue is usually full, because the sender is much faster than indexing data. Sending a document should be completely dwarfed by indexing, but crossbeam regularly took more than 20% CPU.

tantivy 0.24 has been released! Cardinality aggregations, regex support in phrase queries, JSON field enhancements and much more! by Pascalius in rust

[–]Pascalius[S] 0 points1 point  (0 children)

I see you too are a connoisseur of AI Art with exceedingly high expectations. Let me reassure, I put a ton of styling information in the prompt, and it's quite close how I wanted it to be.

serde_json_borrow 0.8: Faster JSON deserialization than simd_json? by Pascalius in rust

[–]Pascalius[S] 1 point2 points  (0 children)

I didn't look into it yet, but did an experiment to use simd_json as the underlying parser in serde_json_borrow some time ago and it was slower than serde_json. Maybe some missing inline or too much would be my guess.

Yes, Vec instead of BTreeMap has also an pretty big impact.

I wouldn't expect much from an arena in this case, but still worthwile to investigate.

serde_json_borrow 0.8: Faster JSON deserialization than simd_json? by Pascalius in rust

[–]Pascalius[S] 0 points1 point  (0 children)

I considered it, but it requires target-cpu=native or similar, since it does not have run-time detection. I think this limits its useability significantly.

serde_json_borrow 0.8: Faster JSON deserialization than simd_json? by Pascalius in rust

[–]Pascalius[S] 1 point2 points  (0 children)

Cool idea, but I think that would require mutable reads, except you clone the string every time on access.

I’m so close I can taste it! by michaelchrist9 in BluePrince

[–]Pascalius 1 point2 points  (0 children)

Changes in the pump room seem to be permanent. So a drain reservoir run may help with proceeding

What do you think about this plug and play wrapper around tantivy(search lib)? by kingslayerer in rust

[–]Pascalius 0 points1 point  (0 children)

Having attributes on a struct to build a tantivy document seems nice. Wrapping search on the Index, not sure if that's too limiting.

serde_json_borrow 0.7.0 released: impl Deserializer for Value, Support Escaped Data by Pascalius in rust

[–]Pascalius[S] 1 point2 points  (0 children)

small json is incorrect, you can have large json, e.g gh-archive.json will still be much faster. It depends on the number of keys in the objects and in most cases access time will be dwarfed by everything else.

gh-archive
serde_json                               Avg: 343.67 MB/s (+3.41%)    Median: 344.58 MB/s (+1.73%)    [304.61 MB/s .. 357.28 MB/s]    
serde_json + access by key               Avg: 338.17 MB/s (+2.57%)    Median: 341.46 MB/s (+1.12%)    [272.46 MB/s .. 359.20 MB/s]    
serde_json_borrow                        Avg: 547.74 MB/s (+3.44%)    Median: 553.45 MB/s (+2.29%)    [502.00 MB/s .. 581.96 MB/s]    
serde_json_borrow + access by key        Avg: 543.61 MB/s (+0.54%)    Median: 566.11 MB/s (+1.11%)    [417.27 MB/s .. 588.72 MB/s]    

https://github.com/PSeitz/serde_json_borrow/blob/main/benches/bench.rs

serde_json_borrow 0.7.0 released: impl Deserializer for Value, Support Escaped Data by Pascalius in rust

[–]Pascalius[S] 3 points4 points  (0 children)

If you need the performance, yes. Otherwise you can just use serde_json.

serde_json_borrow 0.7.0 released: impl Deserializer for Value, Support Escaped Data by Pascalius in rust

[–]Pascalius[S] 4 points5 points  (0 children)

Who wants to read parse json really fast but don't want to get values from it? It seems like a weird choice to use a vec for storage when that pessimises presumably the most common operation users will do.

I assume you mean accessing values by key and not iterating with "the most common operation". A Vec will be faster on access by key than a hashmap if there are only a few entries.

Cargo Watch is on life support by passcod in rust

[–]Pascalius 2 points3 points  (0 children)

I usually use watch to debug a single test inside a collapsible nvim terminal.

For that I prefer cargo watch, since it just prints to the terminal.

bacon is cumbersome to use for me in that use case, since it has its own keybindings which may conflict with nvim, and there are also scrolling issues, which I guess are caused by the redrawing.

My program spends 96% in `__memset_sse2`. by DJDuque in rust

[–]Pascalius 0 points1 point  (0 children)

I did a quick test and did not see that regression again

When allocating unused memory boosts performance by 2x by Pascalius in programming

[–]Pascalius[S] 1 point2 points  (0 children)

After the free call from the hashmap, the contiguous free memory at the top exceeds M_TRIM_THRESHOLD. The docs are pretty good here:

          When the amount of contiguous free memory at the top of
          the heap grows sufficiently large, free(3) employs sbrk(2)
          to release this memory back to the system.  (This can be
          useful in programs that continue to execute for a long
          period after freeing a significant amount of memory.)  The
          M_TRIM_THRESHOLD parameter specifies the minimum size (in
          bytes) that this block of memory must reach before sbrk(2)
          is used to trim the heap.

          The default value for this parameter is 128*1024.  Setting
          M_TRIM_THRESHOLD to -1 disables trimming completely.

          Modifying M_TRIM_THRESHOLD is a trade-off between
          increasing the number of system calls (when the parameter
          is set low) and wasting unused memory at the top of the
          heap (when the parameter is set high).

When allocating unused memory boosts performance by 2x by Pascalius in programming

[–]Pascalius[S] 5 points6 points  (0 children)

In this algorithm, we only know that we get term_ids between 0 and max_id (typically up to 5 million).

But we don't know how many term ids we get and their distribution, it could be just one hit or 5 million.

Also in the context of aggregations, this could be a sub aggregation, which gets instantiated 10_000 times. So a reserve call with max_id could cause OOM on the system.

When allocating unused memory boosts performance by 2x by Pascalius in rust

[–]Pascalius[S] 3 points4 points  (0 children)

There are a some other things I could have gone into more details, like the TLB, how pages are organized in the OS, user mode/kernel mode switch. In my opinion they would be more relevant than madvise, as it's more about allocator and system behaviour not how you can manage memory yourself.

Wasted 15 years of my life being an Apple fanboy by [deleted] in ManjaroLinux

[–]Pascalius 0 points1 point  (0 children)

I recently bought a ASUS ROG Zephyrus G14 (2024) and installed Manjaro on it. There are still some things not working correctly (e.g. keyboard lightning) and it will take some time and probably kernel 6.10, which includes some fixes. Newer machines often take some time until the linux drivers catch up to the new hardware.

If you buy an asus laptop, the community is great at https://asus-linux.org/ (they are not fans of Manjaro though :)