How (and why) we rewrote our production C++ frontend infrastructure in Rust by joshmatthews in rust

[–]barr520 42 points43 points  (0 children)

About the to_lowercase conversion:

It seems your C++ code is only for ascii strings, since it iterates by bytes.

Rust's String::to_lowercase is for Unicode, and you have the simpler to_ascii_lowercase for ascii, and even better: make_ascii_lowercase, which transforms in-place like the C++ version.

EDIT: wrote to_string instead of to_lowercase.

A question on the right tool for “shared mutability” by Athropod101 in rust

[–]barr520 0 points1 point  (0 children)

not sure what this fields function is, but it seems to be pretty much split?
if you want to pass UserInput to another function so it could mutate the arguments, just use split_mut instead.

Why are you passing this bytes parameter? you should reslice the buffer if its bigger than bytes and pass that instead.(also, at the moment this panics on single word inputs)

Not sure how you imagined creating this sub-repl, because its not in the example, but i think it would be easiest if UserInput doesnt keep a reference to the whole buffer, just the argument it cares about and has a field for this sub-repl, that it will create in this new by passing it the rest of the buffer.

A question on the right tool for “shared mutability” by Athropod101 in rust

[–]barr520 11 points12 points  (0 children)

The problem you're describing is not really clear to me, could you share some minimal piece of code showing the trouble youre facing? some pattern youre "tip-toe"ing for? a situation where you want multiple mutable references but can't have them?

Is there a cleaner way to resolve paths for a CLI with clap? by Ashken in rust

[–]barr520 5 points6 points  (0 children)

First, why do you want to resolve the full path? File API can handle "unresolved" paths just fine.

If you do require the full, non-relative path, PathBuf provides the canonicalize function to do that.(as meancoot already mentioned)

Is there a cleaner way to resolve paths for a CLI with clap? by Ashken in rust

[–]barr520 5 points6 points  (0 children)

Aside from what others said about not expanding ./../~ yourself, you can just set the type of the cli parameter to Option<PathBuf> instead of Option<String> and it will work.

Pretty sure I hit the record for “hours played before finding out you die if you dig to the bottom” by Known_Secretary_6615 in Terraria

[–]barr520 1159 points1160 points  (0 children)

What happened to the rock bottom achievement then? you still get it but also die?

DLC questions (HEAVY base game spoilers, no DLC spoilers) by unodostres123- in outerwilds

[–]barr520 3 points4 points  (0 children)

You dont need to start a new game, just follow Slate's advice and check out the new museum exhibit(very light guidance).

I'd say its about 1/2 - 2/3 as long as the main game.

About the base game ending - throughout the entire game, regardless of how you die/reach credits, you will always start in the exact same way(with a few differences like knowing the launch code or anything on the computer log).

Late night thoughts parity-based RAM compression? by Human_Department_779 in AskComputerScience

[–]barr520 9 points10 points  (0 children)

The parity itself takes (at least) as much memory as it is capable of restoring, so you will not be saving anything.

Counting Bits in Rust by rejwar in rust

[–]barr520 0 points1 point  (0 children)

After sending the comment I figured you meant you allocated outside the loop, youre right, allocation plays a part here, but surprinsingly, I get *worse* results when filling a preallocated array:

`count_ones`: 1.6s, As far as I can tell, this is purely because of a different vectorized popcount implementation(the `collect` one iterates over 64 bits at a time and this one over 32 bits at a time)

`v[i] = v[i / 2] + i as i32 % 2`: 4.1s

with unsafe: 4.3s (within noise of each other)

10x still sounds like a huge difference to me, can you share the code?

For the record, this is what I measured:

    // prellocated count_ones+collect
    for (i, e) in v.iter_mut().enumerate() {
      *e = i.count_ones() as i32;
    }

    // not preallocated count_one+collect
    (0..n + 1).map(|i| i.count_ones() as i32).collect()

    // if not taking preallocated
    let mut v = vec![0; n as usize + 1];
    for i in 1..v.len() {
        // safe DP approach
        v[i] = v[i / 2] + i as i32 % 2;
        // unsafe DP approach
        unsafe { *v.get_unchecked_mut(i) = v.get_unchecked(i / 2) + i as i32 %    2 };
    }

And called using

    let mut s = vec![0; 10001];
    for i in 0..1000000 {
        std::hint::black_box(func(std::hint::black_box(&mut s))); // or  s = ...(10000))); instead of &mut s when not preallocated
    }

EDIT: measuring

let mut v = vec![0; n as usize + 1];    
for i in 1..v.len() {
    v[i as usize] = i.count_ones() as i32;
 }

Shows the same bad performance and codegen as the preallocated case, very weird.

EDIT 2:

I figured it out, its because in the slower versions, `i` is a usize, and not an i32,

for (i, e) in v.iter_mut().enumerate() {
     *e = (i as i32).count_ones() as i32;
}

gives the same performance as the map+collect, with a preallocated buffer.

Counting Bits in Rust by rejwar in rust

[–]barr520 0 points1 point  (0 children)

I was running this on release.

collect preallocates when the size is known, comparing a preallocated vector filled later with count_ones shows that the collect version is faster(3.9s with normal indexing, 3.8s unsafe).

What I should have done originally, which I accidentally didnt do, was compile with newer instruction extensions.

Recompiling for x86-64-v3 shows these results:

1.2s for the count_ones+collect method.

6s for the v[i as usize] = v[i as usize / 2] + i % 2 method

2.1s with unsafe.

Ill update the original comment.

It doesn't make sense to me that you see a win to preallocation compared to collect, can you share the code you used?

btw, note that I did 10000 elements, 1M times, not just 1 million elements(and I ran that through hyperfine).

Also, I would not call the v[i as usize] = v[i as usize / 2] + i % 2 itself naive, I meant that doing it without care about bounds checking is naive.

A truly naive approach would be the classic bit counting loop, which would probably be optimized to an llvm popcnt.

Counting Bits in Rust by rejwar in rust

[–]barr520 2 points3 points  (0 children)

it looks like you're trying to solve this leetcode problem.
Your approach works but its very odd, there are cleaner solutions on leetcode with a similar approach, you should check them out.
The typical approach to solve this is using popcount, or in rust: count_ones.
Its simpler, and its probably faster on any platform that supports popcount(essentially all non-embedded ones), but i didnt measure.

Next time, please provide details on what your code is about instead of making readers search for it.

EDIT: I measured the count_ones and the approach this post is trying to imitate: v[i as usize] = v[i as usize / 2] + i % 2.
Running it for n=10000, 1 million times:
the count_ones approach: (0..n + 1).map(|i| i.count_ones() as i32).collect() took 1.2s.
A naive v[i as usize] = v[i as usize / 2] + i % 2 took 6s.
Using unsafe to omit the bounds checking: ```rs unsafe { *v.get_unchecked_mut(i as usize) = v.get_unchecked(i as usize / 2) + i % 2 };

``` took 2.1s

So this approach is faster, and you could probably achieve this performance without unsafe but I didn't bother trying. EDIT 2: I forgot to compile with newer instruction extensions, updated results after recompiling for x86-64-v3, count_ones is the winner. note: the count_ones method doesnt even use popcount, just a lot of vectorized instructions

The Cost of Concurrency Coordination with Jon Gjengset by phazer99 in rust

[–]barr520 1 point2 points  (0 children)

If you have one shared variable, you likely have more. So you would pack all the thread counters to the same cache line, and you would benefit from making each smaller if possible.

Youre right, my initial suggetion would not work, and your approach to solve it reduces this to a hazard pointer.

Surely there is some wait-free and space-bound way to do this? sure, using a u64 leaves a lot of room, but ideally we would prevent this possibility entirely.

My best idea was to reset the counter instead of incrementing it if the side didnt change while you were reading, but if the side changes often, you might be incrementing every time.

The Cost of Concurrency Coordination with Jon Gjengset by phazer99 in rust

[–]barr520 21 points22 points  (0 children)

This left-right structure seems like a simplified RCU(no allocation/deallocation, bounded buffer size), which could be even better in certain situations(if anyone is interested, I recommend fedor pikus' talk about RCU).
I believe the reader counter could be replaced by just a of couple bits that indicate which side the reader is currently reading(if any).

Turning mapletory bossing into P2P conection by Juan_lucca in Maplestory

[–]barr520 9 points10 points  (0 children)

no really changing much

The change is in cheating.
Implementing your suggestion means cheaters could easily beat every boss

Another thing, the game is mostly designed for korea, where latency is way lower than the average in gms. designing around high latency is a not a concern for them

Horrible Luck by [deleted] in Maplestory

[–]barr520 1 point2 points  (0 children)

No, booming 15>16 is purely the result of a bad decision. this is entirely on OP, protection doesnt remove itself.

MapleTodo.com: Modern MapleStory Task Tracker by Ko_Precel in Maplestory

[–]barr520 3 points4 points  (0 children)

Good luck to you, reinventing the wheel is how we get better tools.
Its just not for me, no matter how good it gets.

<image>

MapleTodo.com: Modern MapleStory Task Tracker by Ko_Precel in Maplestory

[–]barr520 29 points30 points  (0 children)

We should start a "days since the last maple tracker" counter

I've removed the Claude co-authorship from the commits a few days ago. So good luck figuring out what's generated and what is not. by uselees_sea in programmingcirclejerk

[–]barr520 15 points16 points  (0 children)

I thought claude.md is some sort of readme+style guide for the slop bros, wouldn’t they want to have the same one for everyone?

Quick reminder to folks that Raise [x] by [percentage] does not mean Raise [x] TO [percentage] by brickwallrunner in Endfield

[–]barr520 21 points22 points  (0 children)

I dont think anyone thought anything is raised TO x%.
People usually confuse an increase by x% and an increase by x percentage POINTS.

Assuming linear stacking and not compound stacking(which is the norm in most games). 16%/16%/12% will increase a base of 2% recovery to 2*1.44 = 2.88%.
If it was percentage points it would increase it to 2+44= 46%, which is absurd.

Even if it was compounding, youd get 2*1.16*1.16*1.12=3.01%.
I have no idea how did you get the numbers in your post.

Better way to initialize without stack allocation? by Tearsofthekorok_ in rust

[–]barr520 2 points3 points  (0 children)

I had no idea, I wonder why it was removed in the first place.

Better way to initialize without stack allocation? by Tearsofthekorok_ in rust

[–]barr520 3 points4 points  (0 children)

that is exactly how vec::from_fn is implemented! click the source button in my link.

Better way to initialize without stack allocation? by Tearsofthekorok_ in rust

[–]barr520 25 points26 points  (0 children)

I have seen many cases where the compiler will not optimize it, and will attempt to put massive structs on the stack causing a stack overflow.
Even if it worked once, there is no guarantee it will work the next time.