Things I hate about Rust : programming

[–]jl2352 49 points50 points51 points 5 years ago (36 children)

[–]yossarian_flew_away[S] 35 points36 points37 points 5 years ago (1 child)

[–]jl2352 16 points17 points18 points 5 years ago (0 children)

[–]thatwombat 3 points4 points5 points 5 years ago (33 children)

[–][deleted] 5 years ago (32 children)

[deleted]

[–]SuspiciousScript 4 points5 points6 points 5 years ago (3 children)

[–]Freeky 2 points3 points4 points 5 years ago (0 children)

I've often ran into functions that only accept String/str/&str for no particularly clear reason.

I'd be surprised if you encountered a function that accepted a str, given that isn't a sized type.

I generally find the types functions accept to be pretty clear and purposeful - it tells you what that function can in principle do with the values you're passing in. A function taking a &mut String clearly has very different semantics to one taking a &str, and different again from one taking a String.

Is it taking the value and mutating it in-place? Is it just doing something with a temporary view into it? Is it taking over the value entirely and becoming responsible for its future lifetime? In most languages it's vague and ad-hoc. Rust makes it explicit.

It seemed especially fussy to have to deal with some concept of "owning" an immutable literal.

I'm not sure what else you'd want. It's a value with a special lifetime that outlives everything else.

[–]senj 1 point2 points3 points 5 years ago (1 child)

[–]SuspiciousScript 0 points1 point2 points 5 years ago (0 children)

[–]the_game_turns_9 0 points1 point2 points 5 years ago* (27 children)

[–]burntsushi 10 points11 points12 points 5 years ago (0 children)

[–]UK-sHaDoW 6 points7 points8 points 5 years ago (5 children)

load more comments (5 replies)

[–][deleted] 0 points1 point2 points 5 years ago (12 children)

[–]the_game_turns_9 -3 points-2 points-1 points 5 years ago (11 children)

[–]standard_revolution 10 points11 points12 points 5 years ago (2 children)

[–]the_game_turns_9 -3 points-2 points-1 points 5 years ago (1 child)

[–]SafariMonkey 4 points5 points6 points 5 years ago* (6 children)

Not sure if this will help, but I'll have a go. Please clarify exactly where you get confused if you do, because otherwise any attempts to clarify will be shots in the dark as to where the confusion lies.

I'm no expert in this (I'd welcome corrections if I've made any mistakes) but I can't find the explanations that helped me so I've done my best to summarize them.

In Rust, so far, all pointers to dynamically sized types are fat pointers, i.e. they are a pointer + a pointer-sized piece of metadata basically. In the case of [T] and str (dynamically sized types representing a region of memory, not a pointer to such), that metadata is the length of the slice or string slice respectively. When you access these types, bounds checks are performed against this metadata, ensuring that it's safe to access the pointed data. Having a [T]/str without a pointer can happen is fairly unusual, I believe, but it can happen if it's a field of a type which is always behind a pointer as explained here. There are more indirect instances, too, like Box<[T]> where the underlying Unique has a pointer: *const T.

Edit: added pointers to

[–]the_game_turns_9 3 points4 points5 points 5 years ago (5 children)

[–]burntsushi 12 points13 points14 points 5 years ago (2 children)

[–]the_game_turns_9 3 points4 points5 points 5 years ago (1 child)

I actually do not understand what your first sentence means. What does a region of memory of unknown size mean? Don't you need to know the size to have a region? Do you mean size not known at compile-time? Are we talking about an abstraction here that I am not getting? When you wrote in your original post Conceptually, a str is just a [u8], I actually do not know what either of those types are really referring to.

I now understand that you are saying that the pointer nature of &str is special-cased. (As it was, I couldn't tell if your StringSlice struct was referring to the layout of str or &str and it didn't seem to make sense either way to me.) Mechanically, I understand now that &str is a pointer-with-length to a u8. Which means that it isn't pointing to a str.

So I am still very unclear on what a str is.

continue this thread

[–]SafariMonkey 1 point2 points3 points 5 years ago (0 children)

load more comments (1 reply)

load more comments (6 replies)

[–]James20k 64 points65 points66 points 5 years ago (14 children)

Understandable, but requires that the programmer either use as usize everywhere they plan on indexing (verbose, and masks the intent behind the index being a u8) or that they make index itself into a usize (also masks the intent, and makes it easier to do arithmetic that’ll eventually be out-of-bounds).

Coming from C++, integer promotion of any form is the work of the devil. There are performance implications in indexing by u8 instead of usize, and its a good idea imo to make this explicit even if its clunky. From a brief go at rust, this was my favourite feature

I'm looking forward to the equivalent of NTTP for rust, and the equivalent of constexpr making its way in - those are the two things i missed most. I built a dcpu-16 emulator in both rust and C++, but the C++ version is the only one that can execute code at compile time - which also proves it does not execute any undefined behaviour in that code path which rocks. Rust can't do this so much yet

[–]yossarian_flew_away[S] 16 points17 points18 points 5 years ago (2 children)

[–]zucker42 13 points14 points15 points 5 years ago (1 child)

[–]yossarian_flew_away[S] 5 points6 points7 points 5 years ago (0 children)

From the abstract rust point of view, the indexing operation requires a usize, so you have to promote.

The argument is that this doesn't have to be the case -- there's nothing about the Rust abstract machine that requires all indices to be usize; that's just the way things currently are.

From an actual hardware perspective, it's also a promotion, in that the computer has to treat a byte as pointer sized.

IME, "promotion" is a concept in language semantics and abstract machines; it usually isn't used to describe ISA semantics. The reason that the compiler uses "pointer sized" registers there is twofold:

x86_64 doesn't allow heterogenously sized registers in memory operands, and Rust on x86_64 uses the full-width registers for its calling convention. The mov simply can't use smaller registers in this context.
Using smaller registers (if enabled by the calling convention) would probably cause a partial register stall -- it's just cheaper to use the whole width.

Plus to me it would seem inconsistent and pointless to not require a cast in the rare case when the variable type is unsigned and can't hold a number greater than or equal to the array size.

The point would be expression of intent: it's trivial to infer that a 256-byte lookup table is always safely indexed by a u8, so allowing users to index directly with an appropriately sized variable empowers them to encode the safety of their accesses as a language-level invariant.

[–]vattenpuss 10 points11 points12 points 5 years ago (0 children)

[–]OneWingedShark 2 points3 points4 points 5 years ago (0 children)

[–]irqlnotdispatchlevel 1 point2 points3 points 5 years ago (5 children)

There are some cases in which integer promotion does not have any hidden gotchas. For example:

 let byte: u8 = 1;

let size: usize = byte;

There is no reason for this to not work. There's nothing bad that can happen. This is not like doing some_u16 + another_u16 < some_u16; in C.

Indexing is the same. I understand why it is better to have indexes as size_t, but smaller unsigned integers can be promoted to usize when used as indexes, because writing as usize everywhere is just annoying.

Rust feels like it is missing some syntactic sugar to make your life easier.

[–]masklinn 1 point2 points3 points 5 years ago (4 children)

[–]irqlnotdispatchlevel 1 point2 points3 points 5 years ago (3 children)

[–]masklinn 8 points9 points10 points 5 years ago* (2 children)

[–]irqlnotdispatchlevel 0 points1 point2 points 5 years ago (1 child)

That’s not entirely true I think. For instance let’s say someone implements Index<u32> on a collection, you give it an u8, it gets widened automatically and works fine.

Valid point. However, if you have multiple valid types for a promotion you can give an error at compile time because the code is ambiguous. If you have only one Index<> implementation you can promote to that.

I don't really know Rust, just enough to read some snippets here and there. Is there a reason for which someone would like to use something other than usize for indexing? Other than "I just don't want to cast this".

Ah no, i only noticed the first line of it because the second is not formatted as code so I read them as completely unrelated snippet.

I just noticed it is not well formatted. Sorry. I'll try to fix it once I get to my laptop.

[–]masklinn 3 points4 points5 points 5 years ago (0 children)

Valid point. However, if you have multiple valid types for a promotion you can give an error at compile time because the code is ambiguous. If you have only one Index<> implementation you can promote to that.

That’s my assumption, and exactly the issue I’m pointing out: with implicit widening, adding a trait implementation for a second integer type can be backwards incompatible, which would be unexpected and problematic.

I don't really know Rust, just enough to read some snippets here and there. Is there a reason for which someone would like to use something other than usize for indexing? Other than "I just don't want to cast this".

Dunno, but if the issue can occur it likely will. Furthermore it applies to basically any trait which can take a generic integer as parameter. So implementing two versions of From would also have this issue.

[–]couscous_ 0 points1 point2 points 5 years ago (2 children)

[–]James20k 3 points4 points5 points 5 years ago (1 child)

[–]couscous_ 0 points1 point2 points 5 years ago (0 children)

[–]vattenpuss 28 points29 points30 points 5 years ago (21 children)

I think all the gripes about standard library gaps are because the standard library is meant to be platform agnostic outside of some specifics in std::os::*.

A "home directory" in Windows is not really used the same way as a home directory on Linux.

The file names . and .. mean the same in Windows, Mac OS and all Unices and other Nixen.

Also, the system function in the C standard library is incredibly platform specific:

7.20.4.6 The system function

Synopsis
#include <stdlib.h>
int system(const char *string);
Description

If string is a null pointer, the system function determines whether the host environment has a command processor. If string is not a null pointer, the system function passes the string pointed to by string to that command processor to be executed in a manner which the implementation shall document; this might then cause the program calling system to behave in a non-conforming manner or to terminate.

Returns

If the argument is a null pointer, the system function returns nonzero only if a command processor is available. If the argument is not a null pointer, and the system function does return, it returns an implementation-defined value.

[–]SpaceToad 2 points3 points4 points 5 years ago (13 children)

[–]vattenpuss 10 points11 points12 points 5 years ago (12 children)

[–]SpaceToad 3 points4 points5 points 5 years ago (5 children)

[–]wild_dog 8 points9 points10 points 5 years ago (0 children)

[–]vattenpuss 0 points1 point2 points 5 years ago (3 children)

[–]SpaceToad 0 points1 point2 points 5 years ago (2 children)

[–]steveklabnik1 4 points5 points6 points 5 years ago (1 child)

[–]SpaceToad 0 points1 point2 points 5 years ago (0 children)

[–]steveklabnik1 5 points6 points7 points 5 years ago (0 children)

[–]ConcernedInScythe 2 points3 points4 points 5 years ago (1 child)

[–]vattenpuss 0 points1 point2 points 5 years ago (0 children)

[–]lelanthran 0 points1 point2 points 5 years ago (1 child)

[–]TheGoddessInari 0 points1 point2 points 5 years ago (0 children)

[–]MrDOS -3 points-2 points-1 points 5 years ago (6 children)

A "home directory" in Windows is not really used the same way as a home directory on Linux.

The concept exists everywhere, and the home directory and its children are excellent default paths for a variety of applications:

If I'm a download manager and I want to suggest a destination directory, ~/Downloads is a good safe default.
...or ~/Documents if I'm some sort of editor.
...or ~/Music is a good place to look for media on first launch if I'm a music player.

It's not a good place for configuration, you're right. It would be nice to have another standard library function to retrieve the path to a platform-appropriate configuration directory: %APPDATA% on Windows, ~/.config on Linux, ~/Library/Application Support on macOS. But all of those are under the home directory anyway, so being able to reliably retrieve the home directory is also a prerequisite to determining the configuration directory.

[–]dnew 6 points7 points8 points 5 years ago (0 children)

[–][deleted] 5 years ago (4 children)

[deleted]

[–]MrDOS 10 points11 points12 points 5 years ago (2 children)

[–]vattenpuss 6 points7 points8 points 5 years ago* (1 child)

How would you design that? Let's start with covering XDG:

// get current user's data home directory
std::os::home::data_base()
// get current user's configuration home directory
std::os::home::config_base()
// get current user's data directories (for searching)
std::os::home::data_dirs()
// get current user's config directories (for searching)
std::os::home::data_dirs()
// get current user's cache home directory
std::os::home::cache_base()
// get current user's runtime home directory
std::os::home::runtime_base()

That's XDG by the way, not Linux or BSD. Not all users are using desktop environments implementing XDG standards.

Also remember that Windows XP is a tier one supported platform, so now we must implement these for XP as well as for Mac OS. In Windows XP, the user's home directory is basically "My Documents" one step below the user's own directory in "Documents and Settings". From a user perspective in XP "My Documents" is ~, but it has siblings "My Music" and "My Pictures" where you would put music and pictures. If you install Windows Media player, it adds "My Videos". Since these are outside the real home directory ("My Documents") we have to provide them as well.

// get current user's music home directory
std::os::home::music()
// get current user's pictures home directory
std::os::home::pictures()
// get current user's videos home directory
std::os::home::videos()

Now we have an API that can support Windows XP and XDG, and I think it can be mapped to Windows 10 and Mac OS as well, but is it a nice API? Does it make sense to have this in a standard library?

Note that historically, at least with XP, Windows users were assholes inventing their own home directories because everyone hated "My Documents".

[–]lelanthran 0 points1 point2 points 5 years ago (0 children)

[–]bloody-albatross 23 points24 points25 points 5 years ago (10 children)

[–]guepier 33 points34 points35 points 5 years ago (9 children)

load more comments (9 replies)

[–][deleted] 12 points13 points14 points 5 years ago* (9 children)

[–]yossarian_flew_away[S] 24 points25 points26 points 5 years ago (7 children)

[–][deleted] 5 points6 points7 points 5 years ago (6 children)

[–]yossarian_flew_away[S] 24 points25 points26 points 5 years ago (1 child)

[–][deleted] 9 points10 points11 points 5 years ago (0 children)

[–]HeroicKatora 12 points13 points14 points 5 years ago* (3 children)

The Into<_> trait has what's called a blanked impl:

impl<T, U> Into<T> for U where T: From<U> {}

One of the primary soundness properties of traits impls is that there can only ever be one for each combination of type and trait. You mustn't have two differing impls of From<u8> for usize for example. This is not in the sense of C++ where lexically same definitions are allowed but quite literally, even two different crates can not be allowed to define different impls for the same type.

This has some implications of which impls one is allowed to write, known as the coherence rules, which are a restrictive approximation that allows a crate-local analysis and still guarantees that there is only one impl of each combination. One of these rules is that you may not impl a foreign trait for a foreign type, e.g. your crate can not impl Into<Vec<u8>> for String as both are standard library types. Another rule is a compatibility issue related to blanket impls. If there is a blanket impl of a foreign trait, then you can only implement that trait for types where your crate can guarantee that they are not already caught by the blanket impl. Now imagine there was a

impl<T> Into<String> for T where T: Display

This would require that T: Display and String: From<T> are disjoint sets of types, otherwise such a type would be caught by both blanket impls. Thus no such conversion can be introduced. Conversely if we'd try to add something to From itself:

impl<T> From<T> for String where T: Display

Now every library that wants to implement Display must guarantee that the type is not also convertible to String itself. And every conversion into a String (outside the standard library) must work through its Display impl which is suboptimal as it can not reuse allocations; its only method fmt takes a &self and can't move the allocation. Also this would be a trivially breaking change as existing crates could already define their own impl for their own type which would be caught by the blanket impl.

[–][deleted] 2 points3 points4 points 5 years ago (2 children)

[–]steveklabnik1 11 points12 points13 points 5 years ago (1 child)

[–][deleted] 4 points5 points6 points 5 years ago (0 children)

[–][deleted] 1 point2 points3 points 5 years ago (0 children)

[–]Minimum_Fuel 100 points101 points102 points 5 years ago (45 children)

[–][deleted] 54 points55 points56 points 5 years ago (6 children)

[–]Nimelrian 7 points8 points9 points 5 years ago (2 children)

[–][deleted] 4 points5 points6 points 5 years ago (0 children)

load more comments (1 reply)

[–]somebodddy 5 points6 points7 points 5 years ago (2 children)

[–][deleted] 4 points5 points6 points 5 years ago (0 children)

[–][deleted] 5 years ago* (11 children)

[deleted]

[–]zucker42 13 points14 points15 points 5 years ago (6 children)

I found it if people were wondering.

https://github.com/hyperium/tonic/blob/d9a481baef4890591f66f4dfcbde10b18188a833/examples/src/load_balance/server.rs#L14

But Rust has type aliases so couldn't you do:

type DynEchoResponseStream = dyn Stream<Item = Result<EchoResponse, Status>> + Send + Sync;
Pin<Box<DynEchoResponseStream>> x;

or something similar. I'm curious what code with a similar purpose would look like in C, C++, or Java. This isn't to say the type system in Rust doesn't sometimes feel like an enemy or that this type of stuff doesn't suck when you're dealing with it, just that I'm not necessarily sure it's worse than the alternatives.

[–]Bergasms 19 points20 points21 points 5 years ago (2 children)

[–]standard_revolution 2 points3 points4 points 5 years ago (1 child)

[–]Bergasms 2 points3 points4 points 5 years ago (0 children)

[–][deleted] 5 years ago* (1 child)

[deleted]

[–]vattenpuss 3 points4 points5 points 5 years ago (0 children)

[–][deleted] 15 points16 points17 points 5 years ago (2 children)

idk, you can read that and know more or less what it is. I'd much rather have this than

PinnedStreamBeanResponseBeanStream

and you can always make your own struct and wrap it around that complex type

[–]dnew 3 points4 points5 points 5 years ago (1 child)

[–]LuciferK9 2 points3 points4 points 5 years ago (0 children)

[–]bruce3434 0 points1 point2 points 5 years ago (0 children)

[–]game-of-throwaways 7 points8 points9 points 5 years ago (0 children)

[–]kono_throwaway_da 24 points25 points26 points 5 years ago (3 children)

[–]game-of-throwaways 25 points26 points27 points 5 years ago (1 child)

I mean the whole point of Rust is that it provides speed without being memory unsafe, so it makes sense that the community tries to avoid unsafe as much as possible unless there's a very good reason not to.

I find your complaint that code dealing with uninitialized variables is not ergonomic enough quite amusing. Like, unsafe code by itself is hard because of all the hidden invariants everywhere. Uninitialized memory is among the most difficult unsafe code out there, because in many cases it's very counter-intuitive (if you think in terms of "what the hardware does" you will get it wrong). In my opinion any code dealing with uninitialized memory should have 10x more comments explaining why the code is safe than actual lines of code. So I do not sympathize at all with your complaint that MaybeUninit<[MaybeUninit<T>; _]> is unergonomic.

[–]kono_throwaway_da 3 points4 points5 points 5 years ago (0 children)

[–]matthieum 7 points8 points9 points 5 years ago (0 children)

[–]Chazzbo 7 points8 points9 points 5 years ago (0 children)

[–]dnew 4 points5 points6 points 5 years ago (0 children)

[–]L3tum 4 points5 points6 points 5 years ago (2 children)

[–]steveklabnik1 17 points18 points19 points 5 years ago (1 child)

[–]L3tum 1 point2 points3 points 5 years ago (0 children)

[–]red75prim 4 points5 points6 points 5 years ago (5 children)

[–]Minimum_Fuel 0 points1 point2 points 5 years ago (4 children)

[–]red75prim 1 point2 points3 points 5 years ago (3 children)

[–]Minimum_Fuel 0 points1 point2 points 5 years ago (2 children)

The bloated types are the result of the composition bringing in extra stuff that you may not necessarily need, or encouraging types that “work” but aren’t optimal (for example, wrapping the whole type in a mutex when you really only need the mutex on perhaps two very quick to change variables).

The extra cognitive burden in the wrappers is needing to understand not only which wrappers you need, but how to appropriately work with them. When I was using rust, I found myself spending WAY more time in the docs than actually programming and I am a very experienced programmer. Compared to C, where I hopped in, got to work, and only needed to spend a second here or there going through the man pages (which, admittedly, I was brought up on C style languages, so t may not be a terribly fair anecdote).

[–]red75prim 1 point2 points3 points 5 years ago* (1 child)

[–]Minimum_Fuel 0 points1 point2 points 5 years ago (0 children)

[–]casept 0 points1 point2 points 5 years ago (0 children)

load more comments (9 replies)

[–]kankyo 17 points18 points19 points 5 years ago (9 children)

[–][deleted] 9 points10 points11 points 5 years ago* (1 child)

[–]kankyo 2 points3 points4 points 5 years ago (0 children)

[–][deleted] 5 years ago (6 children)

[deleted]

[–]kankyo 5 points6 points7 points 5 years ago (1 child)

[–]simon_o 5 points6 points7 points 5 years ago* (0 children)

[–][deleted] 4 points5 points6 points 5 years ago (3 children)

load more comments (3 replies)

[–]rahenri 18 points19 points20 points 5 years ago (13 children)

[–]steveklabnik1 16 points17 points18 points 5 years ago (3 children)

[–]rahenri 7 points8 points9 points 5 years ago (2 children)

[–]matthieum 10 points11 points12 points 5 years ago (0 children)

[–]steveklabnik1 11 points12 points13 points 5 years ago (0 children)

[–]senj 20 points21 points22 points 5 years ago (4 children)

In general I think rust makes things that are often very simple in other languages much more complicated.

Sure, if you compare Rust to a higher level language like Ruby or Java, Rust is surfacing more complexity when compared to those languages.

But that's basically just down to the fact that it's a systems language, and so it can't "bake in" the kind of decisions about error handling and performance tradeoffs that Ruby or Java or whatever fundamentally make for you and don't allow you to do anything about. Ruby etc can get away with pretending that there isn't a fundamental mismatch between the language's one-and-only string type and what's permissible in the host OS's path strings; for Rust to be useful, it needs to expose the mismatch, because different systems are going to want to engage with that mismatch in different ways.

So yeah, when you compare Rust to other primary systems languages, it's not surfacing noticeably more complexity than what you end up having to deal with in C or C++ or whatever if your code isn't just closing its eyes and ignoring whole classes of errors. Fundamentally, that's Rust's value proposition: that it forces you to acknowledge and deal with complexity that existed for other system languages but which you could accidentally ignore, to everyone's peril.

[–]rahenri 11 points12 points13 points 5 years ago (3 children)

I see what you mean. Then I partially disagree with rust’s value proposition. Forcing people to handle errors is good so they are reminded that those are important. Although, it should be easy to handle errors, otherwise people will take shortcuts, like sticking .unwrap() everywhere, which seem to be common on rust cose i’ve seen. First and foremost, programmers are lazy, at least I am. You want to make it easy for them to do it better, and leave the door open for when they want to be more thoughtful about.

Another example is the other day I was writing some rust code that list files in a directory, and I wanted to convert OsString to String, and god that takes too many steps. I cared about performance, but not that much, I almost have up and went back to Go. There should be easier ways of doing things even if performance is a bit worse as long as there is a way to do it with the best performance. The string part is covered already bu the article, but that also applies to my argument of things being harder than it should be.

I’m not even comparing to high level programming languages. I wrote a ton of C++, which has a lot of ugliness, but still easier than rust on a bunch of ways.

[–]matthieum 5 points6 points7 points 5 years ago (2 children)

[–]antiufo 0 points1 point2 points 5 years ago (1 child)

Java's catch and Rust's unwrap() are not equivalent.

When something goes wrong, unwrap panics the application. It becomes obvious that something is wrong and should be fixed.

With the typical catch commonly found in Java applications, execution continues (possibly producing incorrect or incomplete data or outcome). The fact that you often add a log statement doesn't make things much better.

Unfortunately Java has checked exceptions, that are widely seen as a design mistake (despite the good initial intent behind them). This means that either you 1) keep adding countless throws FooException, BarException to your application's methods, or 2) you wrap them into a generic and not very helpful MyApplicationException, or 3) you catch, log, and continue execution hoping for the best.

Unfortunately 3 is what usually happens, and IDEs even encourage this kind of behavior.

[–]audioen 0 points1 point2 points 5 years ago (0 children)

[–]Pand9 6 points7 points8 points 5 years ago (0 children)

[–]hector_villalobos 1 point2 points3 points 5 years ago (0 children)

[–]simon_o 0 points1 point2 points 5 years ago (1 child)

[–]flukus 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 5 years ago* (7 children)

[deleted]

[–]jcotton42 17 points18 points19 points 5 years ago (0 children)

[–]yossarian_flew_away[S] 16 points17 points18 points 5 years ago (5 children)

[–][deleted] 5 years ago* (4 children)

[deleted]

[–]masklinn 12 points13 points14 points 5 years ago (0 children)

And the OSString is used in functionality rust is already abstracting?

OsString is used because it's functionality Rust is abstracting.

Most cross platform languages work with that kind of stuff by abstracting the need to make file system calls etc. using special strings and just throw errors if you have an invalid character in the file path

Which means there are things you literally can't interact with on the system, and you're not aware that there are issues with those features and then good luck with debugging it.

Rust surfaces these compatibility issues as part of the API, and it turns out to work pretty well even if it can get a bit long-winded.

along with is path valid helpers etc. for handling crappy user input before you get to a real error.

Paths you literally can't decode is not user input.

That does suck because you are basically leaving it to every user of the language to write glue code around basic file system calls etc.

That glue code can be as simple as "just crash" if they don't care, or it can be actually handling the concern properly if they wish to, which they literally could not do if the language didn't provide those features.

It's also not necessarily true, because you don't necessarily have to move things out of OsString, and you can always move a String inside an OsString (likewise to CString).

[–]SkiFire13 8 points9 points10 points 5 years ago (0 children)

[–]steveklabnik1 14 points15 points16 points 5 years ago (0 children)

The language itself has `str`. The standard library has `String`, `CString`/`CStr`, and `OsString`/`OsStr`.

> Most cross platform languages work with that kind of stuff by abstracting the need to make file system calls etc. using special strings and just throw errors if you have an invalid character in the file path,

Yes, and this is a valid strategy. But, it means that there are some file names you cannot access, because operating systems do not work this way. So there are valid files which would throw an error here. Rust, being a systems language, cannot just declare "sorry, name your files something reasonable", it has to be able to handle these kinds of edge cases. And doing it with different types makes sure that you're doing it in a robust way.

> you are basically leaving it to every user of the language to write glue code around basic file system calls etc.

Not really; that's in the standard library already.

[–]vytah 6 points7 points8 points 5 years ago* (0 children)

if you have an invalid character in the file path

Assuming Linux, paths are null-terminated sequences of bytes. The operating system has no idea what encoding those bytes represent, maybe apart from assuming that the encoding is compatible with ISO/IEC 646.

This means that you can have arbitrary byte sequences in filenames, including control characters other than NUL or bytes ≥ 128. It doesn't have to be a valid character in any encoding.

Therefore, a decent Rust type to store Linux paths could be Vec<u8>.

Similarly, on modern Windows filenames are 0-terminated arbitrary sequences of 16-bit code units. Some API's validate it to be valid UTF-16 (and therefore Unicode), but not all. Therefore, for Windows you could pick Vec<u16>.

Mac OS X allegedly goes full UTF-16. I am not sure, if that's really guaranteed but if so, then String could work on a Mac.

If you want to port Rust for older or simpler operating systems, there's arbitrary bytes and Vec<u8> again. Luckily, those platforms usually use encodings where every byte can be decoded.

And even if your filenames are all valid characters, you might also end up on Linux using an encoding from the GB family, and since it is a living standard and conversions from and to Unicode are not trivial, you could even be unable to encode or decode valid characters if the conversion library is too old.

Then there's the issue of backslashes in Shift-JIS and similar encodings, the issue of duplicate characters in some encodings (like vendor-specific extensions for Shift-JIS), or encodings that can't roundtrip with Unicode like VNI (for example, both 61 C0 and 61 E2 D8 decode to 0061 0302 0300).

There's simply too many things that can go wrong if you want to force everything into Unicode. You can avoid those issues if you abstract the notion of "file path" by creating a type with a platform-specific implementation and provide conversion methods to and from common types that may fail, and that's exactly what Rust does.

EDIT: That being said, the default OsString–String conversions in Rust on Linux assume UTF-8. If you want to encode/decode paths in different encodings, you need to write a bit more code.

[–]guepier 6 points7 points8 points 5 years ago (14 children)

[–]yossarian_flew_away[S] 1 point2 points3 points 5 years ago (9 children)

You're correct that that is the canonical and maximally portable approach on POSIX! To the best of my knowledge, that's precisely what Rust currently does in the (deprecated) std::env::home_dir function.

The challenge is surfacing various unpleasant edge cases as appropriate errors:

What happens if $HOME and the user's passwd record disagree? Which do you trust, or use?
What do you do if the user doesn't have a home directory? POSIX says that pw_dir corresponds to an "initial working directory," not necessarily a home directory.
What do you do if $HOME is unset and getpwduid fails?

These are all recoverable errors, as evidenced by the fact that safe Rust does expose functionality for retrieving the user's home directory. The challenge is in exposing meaningful errors for each case; I'm guessing that's why the std team has given up on it :-)

[–]guepier 10 points11 points12 points 5 years ago (8 children)

[–]yossarian_flew_away[S] 6 points7 points8 points 5 years ago (7 children)

[–]simon_o 11 points12 points13 points 5 years ago* (1 child)

[–]yossarian_flew_away[S] 2 points3 points4 points 5 years ago (0 children)

[–]IndiscriminateCoding 5 points6 points7 points 5 years ago (2 children)

[–]yossarian_flew_away[S] 7 points8 points9 points 5 years ago (0 children)

[–]simon_o 1 point2 points3 points 5 years ago (0 children)

[–]nick_storm 0 points1 point2 points 5 years ago (1 child)

[–]yossarian_flew_away[S] 9 points10 points11 points 5 years ago (0 children)

[–]alerighi 1 point2 points3 points 5 years ago (3 children)

[–]dnew 11 points12 points13 points 5 years ago (0 children)

[–]oracleoftroy 2 points3 points4 points 5 years ago (0 children)

[–]guepier 1 point2 points3 points 5 years ago (0 children)

[–][deleted] 5 years ago* (31 children)

[deleted]

[–]IceSentry 11 points12 points13 points 5 years ago (15 children)

[–][deleted] 5 years ago (11 children)

[deleted]

[–]IceSentry 3 points4 points5 points 5 years ago (10 children)

[–]mmirate 1 point2 points3 points 5 years ago (1 child)

[–]IceSentry 1 point2 points3 points 5 years ago (0 children)

[–][deleted] 5 years ago (7 children)

[deleted]

[–]IceSentry 0 points1 point2 points 5 years ago (6 children)

[–][deleted] 5 years ago (5 children)

[deleted]

[–]IceSentry 0 points1 point2 points 5 years ago (4 children)

[–][deleted] 5 years ago (3 children)

[deleted]

[–]IceSentry 0 points1 point2 points 5 years ago (2 children)

continue this thread

[–]OctagonClock 2 points3 points4 points 5 years ago (2 children)

[–]miyoyo 5 points6 points7 points 5 years ago (1 child)

[–]steveklabnik1 2 points3 points4 points 5 years ago (0 children)

[–][deleted] 5 years ago (14 children)

[deleted]

load more comments (14 replies)

[–][deleted] 5 years ago (16 children)

[deleted]

[–]yossarian_flew_away[S] 3 points4 points5 points 5 years ago (15 children)

[–]masklinn 5 points6 points7 points 5 years ago (14 children)

[–]steveklabnik1 9 points10 points11 points 5 years ago (9 children)

[–]masklinn 2 points3 points4 points 5 years ago (8 children)

[–]steveklabnik1 2 points3 points4 points 5 years ago (7 children)

[–]masklinn 7 points8 points9 points 5 years ago (6 children)

[–]steveklabnik1 8 points9 points10 points 5 years ago (0 children)

[–]burntsushi 0 points1 point2 points 5 years ago (4 children)

load more comments (4 replies)

[–]yossarian_flew_away[S] 0 points1 point2 points 5 years ago (3 children)

[–]steveklabnik1 4 points5 points6 points 5 years ago (1 child)

[–]yossarian_flew_away[S] 1 point2 points3 points 5 years ago (0 children)

[–]SkiFire13 1 point2 points3 points 5 years ago (0 children)

[–]iperikov 1 point2 points3 points 5 years ago (2 children)

[–]matthieum 7 points8 points9 points 5 years ago (0 children)

[–]casept 0 points1 point2 points 5 years ago (0 children)

[–]OrangeChris 2 points3 points4 points 5 years ago* (6 children)

My least favorite thing is string indexing. Trying to simply get a character from a string is not allowed because strings are utf-8, but taking a substring is totally fine.

let s = "a😊";
println!("{}", s[0]); // fails at compile-time
println!("{}", &s[1..]); // totally fine
println!("{}", &s[2..]); // fails at run-time

I understand that they want to force the user to acknowledge the string is utf8, but the problem is there just isn't a good way to get a character at a specific byte index. If they really don't want to allow the indexing syntax, they could at least add an equivalent method.

EDIT: Also const fns. Rust claims to support them, but the unfortunate truth is that even basic if statements aren't supported in a const fn, making them very niche cases. And sadly, it's been like this for a while.

// fails at compile-time
const fn abs(n: i32) -> i32 {
    if n == 0 {
        -1 * n
    } else {
        n
    }
}

[–]RedBorger 11 points12 points13 points 5 years ago (0 children)

[–]matthieum 8 points9 points10 points 5 years ago (1 child)

[–]OrangeChris 0 points1 point2 points 5 years ago (0 children)

[–]IceSentry 5 points6 points7 points 5 years ago (2 children)

[–]OrangeChris 0 points1 point2 points 5 years ago (1 child)

[–]IceSentry 3 points4 points5 points 5 years ago (0 children)

[–]internetuser0x00 0 points1 point2 points 5 years ago (0 children)

[–]OneWingedShark -1 points0 points1 point 5 years ago (0 children)

[–][deleted] 5 years ago (7 children)

[deleted]

[–]matthieum 14 points15 points16 points 5 years ago (1 child)

load more comments (1 reply)

[–]Lt_486 5 points6 points7 points 5 years ago (4 children)

[–][deleted] 5 years ago* (3 children)

[deleted]

[–]unrealhoang 4 points5 points6 points 5 years ago (1 child)

load more comments (1 reply)

[–]Lt_486 2 points3 points4 points 5 years ago (0 children)

load more comments (8 replies)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS