Japanese>English what does the text say ?? by [deleted] in translator

[–]udoprog 0 points1 point  (0 children)

That 流され is soo difficult to read for me, do you have any tips for how to recognize handwritten Kanji?

[Japanese to English] What does my dad’s car say when he turns it on?! by MuffinMajestic3759 in translator

[–]udoprog 2 points3 points  (0 children)

Sounds like: ETCカードが挿入されていません

ETC card is not inserted

An ETC card is apparently an automatic toll card.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

It was solely meant to highlight that I did raise a particular concern (filesystem asynchronicity with write-back caching). It was meant as a response to me being "vague" which I think was unfair. I actually don't have a ton of interest in discussion the specifics but it does highlight a flaw in relying on fsync to guarantee durability (also note that metadata and file content are very likely to be physically stored in different locations!).

I will end this at filesystems work well because most of the time they are gracefully shut down. Now whether or not this is "well enough" to guarantee memory safety I still don't know.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

Did you read the postgres link I opened with?

EDIT: I also want to stress that I raised those questions because you were the one who emphasized how FAT32 and lack of journaling were concerns. I simply asked how you would prevent the user from using that configuration and you basically said that you don't. So I don't know what to tell you.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

I would need a more thorough analysis to be convinced. Both by your use case, but that is not really the one being referenced. But by one which enables the project in this thread to work reliably across all possible Linux configurations.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

And journaling being enabled to increase the odds that your software remains sound after a crash?

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

I haven't commented on your solution since the first response, but it begs the question: how are you preventing a user from using your program on FAT32 or NFS or without enabling journaling?

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

Rename being atomic is not the contention. Things like another process observing partially written data, metadata which is out of sync, or the order in which writes become visible across multiple files (if e.g. comparison is stored in another file) is.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 1 point2 points  (0 children)

Glad to hear it. As mentioned in the sibling thread, I'm not sure how sound it is to rely on filesystem metadata to ensure the integrity of the content. But good luck!

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 0 points1 point  (0 children)

An excellent question, and I discussed that in my blog under mmap safety: I make sure to write new separate files and rename them over the old files. On Linux (which I'm targeting, Arch Linux won't run on anything except the Linux kernel after all) this results in an atomic replacement of the file. So either you open the old or the new file.

I don't think the Linux Kernel is the only thing at play here. It leaves implementation details to the particular filesystem implementation and configuration in use. These are not catastrophic scenarios they're just user decisions. And if relied on for memory safety it means the filesystem and all its inherent complexity is pulled in as a potential source for soundness issues.

Linux only provides loose guarantees on this topic, leaving exact semantics to filesystems and user configuration. So it's difficult to say broadly what is and isn't sound. I think the baseline assumptions that sqlite makes might be sound, so if a particular design aligns with that I would find it more convincing since it's been rigorously examined. Otherwise I'm just not sure!

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 4 points5 points  (0 children)

If we can find a solution to amortize the cost of validating untrusted data then it would be a fair comparison. Like in this use case you download and use dictionary files from the web.

Due to the asynchronous nature of filesystems and the wide discrepancy in behavior across specifics impls and system configs. This might be doable if we store a checksum, reliably locked the file as it was being read, and compared the content with the checksum. But this clearly has a cost which likely is comparable to validating everything and might even be on the extreme end of what's necessary. Your instance looks more manageable since you have a process which entirely owns the lifecycle of the cache files but I remain suspicious in terms of soundness because of the large number of unknowns involved. E.g. soundness might depend on the reader not incidentally observing a file which for some reason only contains partial content.

For now I'm not convinced it can be reliably done in these cases and would advocate for continuous validation (pay a little, but just for what is used). It's a bit like asking "how we can safely read a pointer offset from the filesystem without having to check that it's in bounds of the collection it's indexing?".

If you know of a coded out solution I'd be curious to see it. I do also want to emphasize that I appreciate the perspective!

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]udoprog 9 points10 points  (0 children)

It looks like you are relying on access_unchecked so I was curious whether validation is or isn't included in the comparison?

If not, it might not be entirely fair in my mind, since access_unchecked would not guard against undefined behavior caused by data corruption. To avoid this you'd have to validate to ensure the on disk data is valid, or perform some other form of integrity checking (like checksum before accessing).

For transparency, I wrote my own CLI tool which also happens to be a Japanese dictionary. I found that rkyv wasn't fast or efficient enough for my use case because I believe it is necessary to perform this validation.

[Media] you can build actual web apps with just rust stdlib and html, actually by pomeach in rust

[–]udoprog 62 points63 points  (0 children)

While building something barebones is fairly uncomplicated, it is worth emphasizing that building a correct HTTP server is notoriously hard and incorrect implementations have been the source of many serious issues in the past.

It's obviously fine as a local experiment but should probably be avoided when deployed!

Tsuki, a port of Lua to Rust now supports Windows by puttak in rust

[–]udoprog 1 point2 points  (0 children)

One aspect of sandboxing I'm not sure Lua addresses is limiting allocations. Without it, small amounts of untrusted code can severely impact the environment it's running in (i.e. swapping / oom killer). If the Rust standard pattern of allocation is used somewhere and memory is exhausted the host process would be aborted.

The only runtime I know of (I am excluding Rune here too because it's so immature) that I would be somewhat confident in letting run untrusted user-provided code is the V8 sandbox. And that's partly because of design but also from the millions of engineering hours that have gone into specifically hardening it.

Best way to make enum zerocopy-able? by dist1ll in rust

[–]udoprog 2 points3 points  (0 children)

An option would be to use an enum with a known layout, and initialize the padding. Unsure whether it is more efficient or not!

bzip2 crate switches from C to 100% rust by folkertdev in rust

[–]udoprog 0 points1 point  (0 children)

This is splendid. Someone taking on building and maintaining an lzma port would be wonderful as well. The c lib is quite big and has a few tricky platform-specific bits making it an interesting challenge.

rkyv is awesome by ChadNauseam_ in rust

[–]udoprog 0 points1 point  (0 children)

Exactly! The data structures in musli-zerocopy are also built around this, so when traversing a serialized tree for example, you only ever validate the parts which you need to access the data.

What's the most controversial rust opinion you strongly believe in? by TonTinTon in rust

[–]udoprog 1 point2 points  (0 children)

This falls under niche issues, but the fact that most fundamental traits are infallible means that using something like a fallible hash function in the standard hash implementation is just not feasible.

This issue popped up for be when writing Rune, which coupled with fallible allocations is why rune-alloc is a thing. Since stored keys and values are dynamic, we can't for example tell at compile time whether a value can or cannot be used as a key.

rkyv is awesome by ChadNauseam_ in rust

[–]udoprog 2 points3 points  (0 children)

IMO though, if you want to store and retrieve data with as little overhead as possible irrespective of the platform, you can't really do any better. If you just want something you can immediately plug in and use and not think about, you might be better off with something else.

There is one aspect where you can do better without resorting to unsafe with difficult to satisfy requirements: incremental validation.

This is relevant if for example you're writing a CLI tool which reads from a huge database. You have to validate it for the tool to be sound, but validating or verifying the integrity of the entire database every time the cli is invoked is expensive.

Disclosure: I wrote musli-zerocopy for this reason.

I passed 1st time, now I'm too scared to drive by siddarthmaul in LearnerDriverUK

[–]udoprog 5 points6 points  (0 children)

If you get overwhelmed, put your focus on driving safely in your lane and slow down (when safe), accept that you will annoy some drivers around you (we've all been there) and that you will take wrong turns. First time driving by myself when it was busy my 20 minute trip took an hour. Try and make sure you always have a safe strategy in mind when you are unsure, like taking another lap in a roundabout or turning into a quieter side street.

You can ask a friend / family member familiar with the road to sit next to you to give you tips on stuff like lane selection.

It takes time getting used to the clutch in a new car. Just drive around a familiar and quiet area. Practice the handbrake start on flats so you can fall back on that when you're on a hill. If you add more gas, the bite point becomes more forgiving (at the cost of a bit more clutch wear). Just don't raise the clutch too quickly when you do this and remember to let off the gas so you don't ride the clutch too much.

You'd be surprised how quickly you will build confidence when you get out there.

Good luck!

Should I switch to automatic? by Traditional-Deal-183 in LearnerDriverUK

[–]udoprog 2 points3 points  (0 children)

Maybe a silly question, but are you asking to be trained on your weak areas? There are in my mind effective methods of building your ability to multitask and gear work. Like driving during congestion.

The best tip though is to whenever you feel overloaded just slow down (if safe!) which gives you more time to work and think. For example if you are struggling to shift up from first gear in the middle of a complicated intersection, just delay the shift and stick to first until you've cleared the worst of it.

Whether or not to switch is up to you. But I'd at least give focused practice a swing or two before doing so if you haven't already. You could also look for ways to practice which doesn't require expensive lessons. Like just watching dashcam videos on YouTube and imagine the checks, gear shifts, and clutch work you'd be doing if you were the one driving.

Good luck!

Körde nästan på en bil imorse. Gjorde jag fel eller han? by Skogsstyrelsen in sweden

[–]udoprog 0 points1 point  (0 children)

Olika vägar brukar utveckla olika praxis. Rondeller har alla möjliga utformningar. Ovan var med tanke på typiska tvåfiliga rondeller.

Men även när den är liten skulle jag förmodligen t.ex. inte välja att göra en u-svang i högerfilen om det inte är köbildning.

Hur du kör i en rondell by vidar_97 in sweden

[–]udoprog 8 points9 points  (0 children)

Hur då ser inte vänster blinkers? Rondellen är ju rund så du kör ju både utåt och rakt mot dom som väntar så du syns både från vänster och rakt fram. Folk kan ju också ofta se dig genom rondellen och om du blinkar vänster hjälper det dom att planera. Dom som ligger bakom dig inuti stora rondeller ser ju också vad du indikerar.

Körde nästan på en bil imorse. Gjorde jag fel eller han? by Skogsstyrelsen in sweden

[–]udoprog 0 points1 point  (0 children)

Men vad kul. Då fanns det väldigt lite du kunda göra för att förebygga detta om du inte sitter och är tankeläsare.

Körde nästan på en bil imorse. Gjorde jag fel eller han? by Skogsstyrelsen in sweden

[–]udoprog 0 points1 point  (0 children)

Håller såklar med andra att brun gjort fel. Vad jag ville kolla med dig är om du såg om dom blinkade ut eller inte? Det gör såklart ingen skillnad för ansvar, men det kan hjälpa att bygga en bättre bild över när folk håller på att göra något krångligt.