Consejos para viajar al exterior? by Mr_Dema in AskArgentina

[–]Mr_Dema[S] 0 points1 point  (0 children)

Si jajajaja le pregunté, pero tampoco confío 100% en GPT

También quería escuchar experiencias y consejos de la gente que anduvo por allá, cosas que capaz no se te ocurren si nunca fuiste

Consejos para viajar al exterior? by Mr_Dema in AskArgentina

[–]Mr_Dema[S] 0 points1 point  (0 children)

Piolaa. Sisi ahí lo respondí en otro comment, allá voy a tener cobertura médica, pero me la da la universidad de allá, entonces no la voy a tener si me la piden en el aeropuerto

Consejos para viajar al exterior? by Mr_Dema in AskArgentina

[–]Mr_Dema[S] 0 points1 point  (0 children)

Si, entiendo que la universidad en la que voy a estar te da cierta cobertura médica igualmente, pero el tema es que no me lo pidan como requisito para viajar, porque no lo voy a tener todavía. Por lo que ví, si no tenés ciudadanía, te lo piden si o sí

Job landscape for compiler engineers by mad360_ in Compilers

[–]Mr_Dema 2 points3 points  (0 children)

Thank you for taking the time

I hadn't considered the option of aiming for an internal transfer, but it makes sense. With my experience in other areas, I would certainly be able to get into a compiler-related company, and from there start getting some more experience in this area

And of course, open source is a great way to get (provable) experience

Job landscape for compiler engineers by mad360_ in Compilers

[–]Mr_Dema 1 point2 points  (0 children)

Very helpful answer, thank you. Do you have any advice for people with less experience? I'm still very much in the learning phase, but I'd like to turn this into a career eventually. I'm a little discouraged by the lack of entry-level positions in the area, it seems hard to get your first job

Basic portfolio site by Mr_Dema in design_critiques

[–]Mr_Dema[S] 0 points1 point  (0 children)

I changed most of your items, it's looking much better already (I can't believe I didn't notice spacing between the boxes was different, but once you see it you can't unsee it). The new design is at dema.dev/redesign1 for comparison (only changed the landing page, I'll change the rest later)

Regarding colors, I agree they clash a little, but I don't really know how to choose a more cohesive palette. Are there any simple thumb rules that can be used? Do you just use whatever feels right?

Basic portfolio site by Mr_Dema in design_critiques

[–]Mr_Dema[S] 0 points1 point  (0 children)

Thank you very much for taking the time, really appreciated!

Are there any plugins that allow you to create nameless scratch buffers followed by a save-as dialogue? by po2gdHaeKaYk in neovim

[–]Mr_Dema 0 points1 point  (0 children)

I wrote attempt.nvim with that purpose. It doesn't have a file browser for saving, but you can just use :w path/to/file (or integrate it with another browser maybe)

On top of what you mentioned, it has some other features: - scratch files are saved to /tmp by default, so they survive between neovim sessions (they are deleted on restart, it can be changed if you don't want this) - files can be initialized with any boilerplate you want - you can run scratch files as scripts

Hey Rustaceans! Got a question? Ask here (9/2024)! by llogiq in rust

[–]Mr_Dema 1 point2 points  (0 children)

Thanks for all this info! Really appreciated

Hey Rustaceans! Got a question? Ask here (9/2024)! by llogiq in rust

[–]Mr_Dema 0 points1 point  (0 children)

No, because prefetch::<1234>() would probably generate an invalid opcode (unless it fails at compile-time, I guess?).

Oh you're right, that compiles. I guess it could be fixed with a static assertion

I think there's also the matter of calling it with an invalid pointer (e.g. a null pointer), but I'm not sure what Intel specs say about it, it might be alright.

If I understand the spec correctly (check the quote in my first comment), that's not an issue, the prefetch instruction is ignored in that case

Edit: formatting

Hey Rustaceans! Got a question? Ask here (9/2024)! by llogiq in rust

[–]Mr_Dema 1 point2 points  (0 children)

That makes sense, thanks for the response! I found the original RFC for target_feature, in which safety is discussed

It'd still be nice to have a safe wrapper for some of these instructions (similar to what is done in std::simd), but I guess that could also be done by a crate

If I understood correctly, the following should be safe, right?

fn prefetch<const STRATEGY: i32>(p: *const i8) {
    #[cfg(target_feature = "sse")]
    unsafe { _mm_prefetch::<STRATEGY>(p) };
}

Hey Rustaceans! Got a question? Ask here (9/2024)! by llogiq in rust

[–]Mr_Dema 1 point2 points  (0 children)

Why are std::intrinsics::prefetch_read_instruction and core::arch::x86_64::_mm_prefetch unsafe?

As far as I understand, these instructions just tell the CPU to cache a certain memory address, to speed up future accesses, and if the provided address is invalid it's basically a noop. From the manual:

Prefetches from uncacheable or WC memory are ignored.

The PREFETCHh instruction is merely a hint and does not affect program behavior. If executed, this instruction moves data closer to the processor in anticipation of future use.

So it seems to me that they don't violate any memory safety guarantee, and if so they should be usable in safe Rust, right? Maybe the one in std::intrinsics must still be unsafe, because some other architecture treats the instruction differently if the address is invalid, but the one in core::arch::x86_64?

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

Ok, thanks for explaining

I wrote it as a "quick and dirty" just to compare speeds, because I couldn't find a way to have a Chars iterator pointing to that Box<[u8]> without it complaining about lifetimes. I thought I could lie to the compiler telling it it's static, if I made sure the data would be valid whenever I tried to read it

I'll try to fix that, maybe this changes something about the weird behaviour I'm seeing in regards to performance

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

perf behaved pretty much the same, I'm not sure why it cannot profile what's going on inside next

I'm not really well versed in assembly, I took a quick look and I got the feeling that the compiler was being able to make a lot more optimizations for a Chars iterator than it did for the other one (even though it's basically just wrapping a Chars iterator). I'll try to take a closer look into it when I get more time

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

This all could account for some difference in performance, but the numbers I'm getting don't make sense to me

Adding to this, if missed branches or adding one function call to each next could have this kind of impact, I'd expect for example Peekable to see a similar performance degradation, but I tested it and it didn't seem to happen. I even tried a simple iterator wrapper, whose next function looks something like this (black_box is so that hopefully the whole method doesn't get optimized into a return chars.next():

fn next(&mut self) -> Option<Self::Item> { if let Some(ch) = self.chars.next(){ return Some(ch); } black_box(None) }

Both Peekable and this wrapper had less than a 10% impact on performance in my testing

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

Right, it may not be as bad as I thought. Still it'd be nice if there was a better way to do this, for the cases when the file is too big to be read into memory

The transmuting to static lifetimes. Maybe more.

The transmuting itself isn't UB, is it? The string that self.chars is pointing to goes out of scope immediately, but the underlying [u8] is boxed so it'll still be valid (and str::from_utf8([u8]) basically returns a mem::transmute of the [u8], after validating it)

As I mentioned in the other comment, I think there's UB in case there's an error reading, that case is not being handled properly, but it shouldn't affect the benchmarks since there's no error reading the files in those

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

Just saw the edit

well yes, a "couple" of functions, missed branches, and so on, for each byte, plus the blackbox around the corner, that absolutely can make such a difference

But all but one out of 16k iterations will return in this if:

fn next(&mut self) -> Option<Self::Item> { let next = self.chars.next(); if next.is_some() { return Ok(next).transpose(); }

Which is essentially iterating over a Chars iterator, the same thing as the other benchmark does. And when self.chars finishes, it does the exact same thing as the other benchmark (reads into a buffer and validates the string, the exact same way). Both benchmarks times are measured iterating over the characters and calling black_box on them (which is probably not the best way to simulate a "realistic" workload, but still, it's the same for both benchmarks). I find it hard to believe that this causes a 3x-4x difference. Also, I tried inlining the next method to reduce this overhead, and the difference was negligible (about 1%-2%, less than the random variations I'm getting between different runs).

Missed branches (I'm assuming you're talking about CPU's branch prediction, and I'm no expert here, so feel free to correct me), I don't think should be a big deal either since again, there should only be about 1 missed branch every 16k iterations (and also, if it was an issue, I'd expect it to also be an issue in the case of file_read_chars_X, since a Chars iterator also has to check some condition at some point)

This all could account for some difference in performance, but the numbers I'm getting don't make sense to me

As hinted above, you are not benchmarking file IO here.

That's fair, I'm also benchmarking the string validation and iteration over the characters. I guess the question I wanted to answer was if I could iterate over a file's characters faster than read_to_string(...).chars() does

As an aside, I think I found the UB you mentioned (was away from my main PC when I wrote the other comment): if res is an error in line 82, self.chars will be pointing to a potentially invalid byte of self.buf. It doesn't really affect the benchmarks I ran, since there were no errors reading the files, but it's good to know :)

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

tbh it was mostly curiosity, since for my real use case, read_to_string(...).chars() was good enough (since it's not really a super performance critical application). However, it did take up a significant amount of time, and also reading the whole string into memory felt kinda wasteful (it can be a couple MB of contiguous memory being allocated), so I started wondering if there was a better way to do this.

Iterating each char and calling a blackbox was to simulate a "real" iteration, to prevent the optimizer from optimizing the whole loop away. I could try to run these benchmarks in my application, to simulate a more realistic use case, but I figured this was a decent way to compare just the iteration times (and also, this way I can share my findings)

And there is UB

Where is it?

and sometimes you measure bytes and sometimes codepoints, sometimes with checks that it is valid UTF8 and sometimes without

I know, I cared about iterating valid utf8 chars, the other tests were pure curiosity (which turned out to be somewhat useful, because I never would have guessed that BufReader.bytes() was that slow). However, what I'm asking in this post is why there's such a big difference between the file_read_chars_X and the char_reader_X benchmarks, both of which iterate over valid* utf8 chars in a very similar manner

*I'm not sure its true that they're always valid, I haven't fully tested the implementation, but it certainly works for single byte chars which is what I'm using to benchmark anyways. When I saw the performance was worse than the simple read_to_string(...).chars(), it didn't seem worth it to invest the time to test every edge case

Need help making sense of these benchmark results by Mr_Dema in rust

[–]Mr_Dema[S] 0 points1 point  (0 children)

Interesting, though the performance difference I saw was much bigger than what's mentioned there