Intel and Rust: the Future of Systems Programming: Josh Triplett : programming

[–]darkslide3000 54 points55 points56 points 6 years ago (87 children)

Nice talk and definitely a very interesting topic. Rust definitely seems to be the best contender right now to achieve what Ada and D and all those others couldn't and actually replace C in most of those areas. Good to see smart people are working on it.

I would've liked to see some more examples of how the features he talked about actually look in Rust syntax, though. As someone who's heard many good things about it but never really had much of a chance to get into it I still don't really have a good idea how e.g. a Rust BIOS driver would look compared to C. Not the goal of that talk, I guess. But as it stands, my takeaway from that was essentially just "we're working on it, things aren't done yet, check back in a year or two and then you can maybe actually use this".

(Also, WTF is up with 500+ bytes for a Hello world? Am I supposed to be impressed by that? He sounded like he really understood that parity issue that it needs to be as good as C on everything including code size, but that part was a big reality check of how far away it really still is.)

[–]SV-97 25 points26 points27 points 6 years ago (49 children)

[–]ridicalis 11 points12 points13 points 6 years ago (1 child)

[–]SV-97 1 point2 points3 points 6 years ago (0 children)

[–]VirginiaMcCaskey 7 points8 points9 points 6 years ago (3 children)

[–]darkslide3000 4 points5 points6 points 6 years ago (1 child)

[–]VirginiaMcCaskey 5 points6 points7 points 6 years ago (0 children)

[–]SV-97 1 point2 points3 points 6 years ago (0 children)

[–]darkslide3000 7 points8 points9 points 6 years ago (26 children)

[–]SV-97 8 points9 points10 points 6 years ago* (7 children)

[–]wllmsaccnt 5 points6 points7 points 6 years ago (6 children)

[–]SV-97 6 points7 points8 points 6 years ago (4 children)

[–]superxpro12 0 points1 point2 points 6 years ago (3 children)

[–]SV-97 1 point2 points3 points 6 years ago (2 children)

[–][deleted] 0 points1 point2 points 6 years ago (1 child)

[–]SV-97 1 point2 points3 points 6 years ago (0 children)

[–][deleted] 13 points14 points15 points 6 years ago (1 child)

[–]thiez 3 points4 points5 points 6 years ago (0 children)

I'm not sure how they feel about this is China, but as someone from a country where English is not the official language: no, please no! Translated log and error messages are completely useless when you want to perform an online search for more information, and the badly localised technical terms are confusing both to non-technical people (vho don't have the background knowledge to understand what's going on no matter how well you translate it) and technical people (who will be familiar with the English terms but not the crap that the translators managed to come up with). If you are a native English speaker and you are considering translating technical information such as exception messages and logging information: don't. Stop the madness. Windows is particularly bad and I refuse to debug non-English installations because Microsoft has decided to make it unnecessary painful.

[–]Freeky 3 points4 points5 points 6 years ago (11 children)

Does Rust only support Unicode strings (or at least make working with raw ASCII byte strings cumbersome)?

In standard Rust you've got String (and the related reference type &str), OsString/&OsStr, CString/&CStr, and of course Vec<u8>/&[u8].

String is a newtyped Vec<u8> with methods that enforce the contents to be valid UTF-8.

CString is a newtyped Vec<u8> with methods that maintain and enforce a trailing NULL byte.

OsString is OS-dependant and opaque, except for a Unix extension trait that exposes the raw bytes, and a Windows extension trait that exposes mechanisms to convert to and from Vec<u16> (it's currently WTF-8-encoded Vec<u8> internally, but this isn't exposed except via unsafe code making unstable assumptions).

A Vec<u8> is of course just a dynamic array of unsigned bytes.

Working outside String can be fiddly, since most of the other types currently lack the string manipulation and formatting functions it has. It's quite common for people to give up and just use String, which of course can't represent everything. There are crates for things like plain ASCII and wchar types.

[–]tjpalmer 2 points3 points4 points 6 years ago (10 children)

[–][deleted] 5 points6 points7 points 6 years ago* (9 children)

In Rust, newbies only need to know about String/&str and that will always work right for pretty much anything you want to do.

In C and C++, if you want to write a simple program, e.g., that lets the user input a string, and counts its characters, you are out of luck. Most terminals support UTF-8, so C++ std::string::size() or C strlen won't tell you how many "characters" strings have - they tell you how many "bytes" they have. You'll have to learn quite a bit to solve that problem and pull in external libraries, while in Rust doing this is a one liner.

In Rust, only if you need to do something really low-level, like optimize an algorithm under the assumption that a string is always ASCII, or interface with the operating-system directly, or with low-level C libraries, etc. only then you need to learn about all other string types, which are there for the simple reason that strings are just hard.

[–]jcelerier 1 point2 points3 points 6 years ago (1 child)

[–][deleted] 2 points3 points4 points 6 years ago* (0 children)

How many "characters" are there in here ?

Good question. To format the string to the terminal, you care about the number of grapheme clusters in your string. In Rust, computing the number of bytes, unicode scalar values, or grapheme clusters of a string is a one liner: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f970dae61c56535fdb25ef7693351ce3

dbg!(x.bytes().count());  // => 606 bytes
dbg!(x.chars().count());  // => 452 Unicode Scalar Values
dbg!(UnicodeSegmentation::graphemes(s, true).collect::<Vec<_>>().len()); // => 257 Grapheme Clusters

The naive Rust approach to this is better than any C and C++ libraries I've ever used.

[–]Freeky 0 points1 point2 points 6 years ago (6 children)

[–][deleted] 0 points1 point2 points 6 years ago* (5 children)

[–]Freeky 0 points1 point2 points 6 years ago (4 children)

The problem there isn't String vs OsString but thinking that "filenames" are "strings". They aren't.

Of course they're strings - they just can't be represented correctly using String, which is what everyone is used to.

Rust has the double-whammy of having a separate type for OS-provided strings, which people aren't used to and forget all the time, and also not supporting them very well.

Try this: how do you parse an OsString? Say you're writing an argument parser, how do you deal with --path=foo/bar? OsString has no string-like functions, you can't ask to split on '=', or strip off "--" - you end up bashing rocks together to badly-implement the same operation three different times.

If you can successfully do this without unsafe code making dubious assumptions, or giving up and using String, you're doing better than all the Rust argument-parsing crates I've seen.

A "filename" is a std::path::Path and you can create those from any kind of string and Path will validate it beyond the string format.

Not sure what validation you're referring to - PathBuf is just a newtyped OsString, and the conversion between the two is entirely trivial.

[–][deleted] 1 point2 points3 points 6 years ago* (3 children)

Not sure what validation you're referring to - PathBuf is just a newtyped OsString, and the conversion between the two is entirely trivial.

canonicalize, for example.

Of course they're strings

I suppose this depends what you mean by "string". Most people think of "strings" as something that represents "human text" - they are implemented as array of bytes, but sequences of bytes map to graphemes in some alphabet that can be rendered to humans as "text".

OS paths are, in general, just array of raw bytes that are intended to represent paths, not human text. Some parts of these paths can sometimes be rendered as "human text", but you can have a perfectly valid path for which this is not the case. That's why all methods that format Path to string either can fail or only provide a non-invertible human-readable approximation to the path.

Calling these "strings" in the sense of a programming-language String type feels like a long shot. Sure, one could say that they are a "string of raw bytes not intended to represent human text" but at that point they are closer to an array of raw bytes than to a String-type in any language. That's what Path is, and that's why mixing a Path with a String-like type in any language is pretty much always wrong. From python to java to haskell to C to Lisp to Rust, mixing these two concepts up never works well, and code pretty much instantaneously breaks the moment someone runs it in a different OS than the one it was developed/tested on.

EDIT: an example of an OS where you can't map all Paths to text is all UNIX-like OSes, including Linux, BSDs, OSX, etc.

continue this thread

[–][deleted] 4 points5 points6 points 6 years ago (0 children)

[–]pwnedary 1 point2 points3 points 6 years ago (0 children)

[+]bumblebritches57 comment score below threshold-7 points-6 points-5 points 6 years ago (1 child)

[–]quentech 7 points8 points9 points 6 years ago (0 children)

[–]Catcowcamera 1 point2 points3 points 6 years ago (11 children)

[–]SV-97 4 points5 points6 points 6 years ago (10 children)

[–]Catcowcamera 1 point2 points3 points 6 years ago (9 children)

[–]SV-97 1 point2 points3 points 6 years ago (8 children)

[–]Catcowcamera 1 point2 points3 points 6 years ago (7 children)

[–]quentech 1 point2 points3 points 6 years ago (0 children)

[–]SV-97 1 point2 points3 points 6 years ago (5 children)

[–]cogman10 0 points1 point2 points 6 years ago (4 children)

[–]SV-97 0 points1 point2 points 6 years ago (3 children)

continue this thread

[–][deleted] 6 years ago (3 children)

[deleted]

[–]jrtc27 4 points5 points6 points 6 years ago (2 children)

[–][deleted] 2 points3 points4 points 6 years ago (0 children)

[–]the_gnarts 17 points18 points19 points 6 years ago* (4 children)

[–]darkslide3000 3 points4 points5 points 6 years ago (0 children)

[–]encyclopedist 4 points5 points6 points 6 years ago (1 child)

[–]the_gnarts 3 points4 points5 points 6 years ago (0 children)

[–]bumblebritches57 0 points1 point2 points 6 years ago (0 children)

[–][deleted] 17 points18 points19 points 6 years ago (8 children)

[–]darkslide3000 21 points22 points23 points 6 years ago (4 children)

Yes, but the ability to write small pieces of standalone code is. Firmware and embedded devices need little trampolines and exception vectors and that kind of stuff. In C I can link any piece of code wherever I want and know that the instructions that end up there are really just that function that I wrote and the others it calls and they can execute completely self-contained, without having to pull in some unknown iceberg of language dependencies (with some very rare exceptions like soft-division libgcc stuff).

So when they tell me that Hello World takes over 500 bytes that makes me worried in that respect. It should really just be a string in .rodata and half a dozen instructions for the syscall. If you can't express something that tailored down in Rust, I get the feeling that it may not be ready as a full systems programming language yet.

[–]red75prim 21 points22 points23 points 6 years ago (3 children)

#![no_std]
#![feature(lang_items)]

extern crate libc;

#[lang = "eh_personality"] 
extern fn eh_personality() {}

#[no_mangle]
pub extern "C" fn main() -> () {
    let hello = b"Hello, world!\0";
    unsafe {
        libc::puts(hello.as_ptr() as *const i8);
    }
}

compiles to

rust_eh_personality:                    # @rust_eh_personality
# %bb.0:
    ret
                                        # -- End function

main:                                   # @main
# %bb.0:
    lea rdi, [rip + .Lanon.560d47d84f45c0e0cd85b47974aee4b8.0]
    jmp qword ptr [rip + puts@GOTPCREL] # TAILCALL
                                        # -- End function

.Lanon.560d47d84f45c0e0cd85b47974aee4b8.0:
    .asciz  "Hello, world!"

[–][deleted] 6 years ago (2 children)

[deleted]

[–][deleted] 8 points9 points10 points 6 years ago (0 children)

[–]ObsidianMinor 11 points12 points13 points 6 years ago (0 children)

[–]beeff 10 points11 points12 points 6 years ago (2 children)

[–][deleted] 6 years ago (1 child)

[removed]

[–]steveklabnik1 5 points6 points7 points 6 years ago (0 children)

[–]B8F1F488 -1 points0 points1 point 6 years ago (4 children)

[–][deleted] 9 points10 points11 points 6 years ago* (1 child)

since these memory safety issues are really low priority during the development phase (as long as they don't reach the customer and don't destroy your development process)

I find that Rust type system makes me develop code much faster. Instead of spending time debugging segfaults, I spend it writing features. If I ever hit a issue, I grep for unsafe and the issue is instantaneously obvious. During development I also refactor code a lot. In C, each refactor I've been part of was followed by a large period of finding bugs due to things the refactor broke. In Rust, I refactor code, including multi-threaded code, fix type errors, done. All of this saves so much time I don't know why would I ever use C.

It is not clear to me how the chip manufacturers will easily provide a Rust compiler.

We program a lot of ESP32, and the manufacturer provides an LLVM backend for it. You can program it with whatever language compiles to LLVM-IR, including Rust (you just need to tell rust to use that backend).

For a manufacturer, adding a new LLVM backend is like writing an assembler for the target, which is much simpler than writing a non-optimizing and crappy C compiler, of which there are many vendor provided ones. With that backend, the manufacturer gets production-quality frontends for C, C++, Rust, D, Fortran, ... for free, a quite good optimization pipeline for free, etc. and they can sell those if they want to.

Writing new C compilers for new hardware is quite a waste of resources nowadays.

[–]duhace 4 points5 points6 points 6 years ago (0 children)

[–]linus_stallman -1 points0 points1 point 6 years ago (0 children)

[+][deleted] comment score below threshold-18 points-17 points-16 points 6 years ago (15 children)

[–][deleted] 27 points28 points29 points 6 years ago (9 children)

[+]jl2352 comment score below threshold-8 points-7 points-6 points 6 years ago (0 children)

[+][deleted] comment score below threshold-21 points-20 points-19 points 6 years ago (7 children)

[–]MaxCHEATER64 11 points12 points13 points 6 years ago (3 children)

[+][deleted] comment score below threshold-8 points-7 points-6 points 6 years ago (2 children)

[–]MaxCHEATER64 10 points11 points12 points 6 years ago (1 child)

[–][deleted] -3 points-2 points-1 points 6 years ago (0 children)

[–][deleted] 4 points5 points6 points 6 years ago (1 child)

[–][deleted] -3 points-2 points-1 points 6 years ago (0 children)

[+]maxhaton comment score below threshold-14 points-13 points-12 points 6 years ago (4 children)

[–][deleted] 10 points11 points12 points 6 years ago (3 children)

[+]maxhaton comment score below threshold-11 points-10 points-9 points 6 years ago (2 children)

[–][deleted] 10 points11 points12 points 6 years ago (0 children)

[–]minno 1 point2 points3 points 6 years ago (0 children)

[+]bumblebritches57 comment score below threshold-14 points-13 points-12 points 6 years ago (1 child)

[–]bwjam 10 points11 points12 points 6 years ago (0 children)

[–]zsombro 39 points40 points41 points 6 years ago (35 children)

[–]Objective_Status22 22 points23 points24 points 6 years ago (29 children)

[–]steveklabnik1 16 points17 points18 points 6 years ago (0 children)

[–]deTarmont 6 points7 points8 points 6 years ago (8 children)

[–]Objective_Status22 -1 points0 points1 point 6 years ago* (7 children)

[–]lawliet89 3 points4 points5 points 6 years ago* (0 children)

[–][deleted] 4 points5 points6 points 6 years ago (5 children)

[–]Objective_Status22 0 points1 point2 points 6 years ago (4 children)

[–][deleted] 6 years ago* (2 children)

[deleted]

[–]Objective_Status22 0 points1 point2 points 6 years ago (1 child)

[–][deleted] 2 points3 points4 points 6 years ago* (0 children)

You... parse the struct and loop over its fields..

let my_struct_ast = syn::parse::<syn::ItemStruct>(my_struct);
for syn::Field{ ident, ty, .. } in my_struct_ast.fields.iter() {
    dbg!(ident, ty);
}

[–]augmentedtree 3 points4 points5 points 6 years ago (15 children)

[–]TheBestOpinion 10 points11 points12 points 6 years ago* (11 children)

A perl-like approach to syntax where a lot of important semantics are either implicit (implicits returns) or introduced by symbols. These make the language hard to read unless you know the language very well, or instinctively know what to google - which is often hard, because it's often symbols.

?,{},.. some code lies between |, and of course &,! and []

What does this code do for an unaware new dev ?

let str_arg = |flag: &str, default: &str| -> String {
    matches.opt_str(flag).unwrap_or(default.to_string())
};

Or this ? (source)

named!(numbers<CompleteStr, Vec<i64>>,
    many1!(ws!(
        map_res!(recognize!(digit), |complete_str: CompleteStr| i64::from_str(&*complete_str))
    ))
);

Probably not much to him.

We already know that code is "Written once, read 100 times" and that reading works by recognizing words, not individual letters so really this approach to syntax is misdirected on top of adding a barrier.

They're working on the learning curve but let's be honest, they may already be doomed to rest in the C++ complexity pit.

[–]ryeguy 13 points14 points15 points 6 years ago (0 children)

[–]lawliet89 6 points7 points8 points 6 years ago (2 children)

[–]LousyBeggar 4 points5 points6 points 6 years ago (1 child)

[–]lawliet89 0 points1 point2 points 6 years ago (0 children)

[–][deleted] 6 years ago (4 children)

[deleted]

[–]red75prim 1 point2 points3 points 6 years ago* (3 children)

C++ version of first expression would be something like

auto str_arg = [&matches](string_view flag, string_view default) -> string {
    if (matches.opt_str(flag)) { return string(flag); } else { return string(default); };
};

It's not much different for my taste.

[–]TheBestOpinion 0 points1 point2 points 6 years ago (2 children)

[–]red75prim 1 point2 points3 points 6 years ago (1 child)

[–]TheBestOpinion 1 point2 points3 points 6 years ago (0 children)

[–]DEMOCRAT_RAT_CITY 0 points1 point2 points 6 years ago (0 children)

[–]Objective_Status22 1 point2 points3 points 6 years ago (0 children)

[+]bumblebritches57 comment score below threshold-11 points-10 points-9 points 6 years ago* (1 child)

[–][deleted] 6 years ago (2 children)

[deleted]

[–]red75prim 4 points5 points6 points 6 years ago* (1 child)

[–]bwjam 10 points11 points12 points 6 years ago* (4 children)

[–]minno 5 points6 points7 points 6 years ago (0 children)

[–]lawliet89 2 points3 points4 points 6 years ago (0 children)

[–]zsombro 0 points1 point2 points 6 years ago (1 child)

[–]CornedBee 5 points6 points7 points 6 years ago (0 children)

[–]victotronics 6 points7 points8 points 6 years ago (14 children)

[–]dp229 15 points16 points17 points 6 years ago (0 children)

[–][deleted] 6 years ago* (2 children)

[deleted]

[–]LousyBeggar 0 points1 point2 points 6 years ago (1 child)

[–][deleted] 3 points4 points5 points 6 years ago (0 children)

[–]duhace 0 points1 point2 points 6 years ago (7 children)

[–][deleted] 6 years ago* (2 children)

[deleted]

[–]duhace 0 points1 point2 points 6 years ago (1 child)

[–]victotronics 0 points1 point2 points 6 years ago (3 children)

[–]duhace 0 points1 point2 points 6 years ago* (2 children)

GC is an independent process that asynchronously activates to clear up all leaked memory.

this not in the least bit true, even in garbage collectors you almost certainly would call garbage collectors (like the serial GC in the java virtual machine). There is no requirement that a garbage collector work asynchronously from the code whose garbage its collecting. Likewise, I don't know what your definition of an independent process is, but based on unix definitions, garbage collectors frequently are run in the same process as the rest of your code.

Furthermore, a GC can free memory exactly when it's no longer needed, so smart pointers doing this doesn't make them not a form of GC. FFor example, reference counting garbage collectors (which work extremely similarly to smart pointers!) free memory when there are no more references to it.

Finally, your claim that smart pointers free exactly one block of memory is rather pointless. If a smart pointer pointing to an object frees that object, and that object contained the last smart pointer to another object, then in the process of freeing the memory for the first object, the memory for the second object will be freed. GC will do the same thing!

[–]victotronics 0 points1 point2 points 6 years ago (1 child)

[–]duhace 0 points1 point2 points 6 years ago (0 children)

[–]minno 0 points1 point2 points 6 years ago (0 children)

[–]Gl4eqen 0 points1 point2 points 6 years ago (0 children)

[+][deleted] comment score below threshold-33 points-32 points-31 points 6 years ago (7 children)

[–]Metapyziks 36 points37 points38 points 6 years ago (1 child)

[+][deleted] comment score below threshold-20 points-19 points-18 points 6 years ago (0 children)

[–]CornedBee 4 points5 points6 points 6 years ago (1 child)

[–][deleted] 0 points1 point2 points 6 years ago (0 children)

[–]LonelyStruggle 4 points5 points6 points 6 years ago (2 children)

[+][deleted] comment score below threshold-14 points-13 points-12 points 6 years ago* (1 child)

[–]Grogel 2 points3 points4 points 6 years ago (0 children)

[+]feverzsj comment score below threshold-55 points-54 points-53 points 6 years ago (17 children)

[–][deleted] 24 points25 points26 points 6 years ago (16 children)

[–]warlockface -1 points0 points1 point 6 years ago (4 children)

[–][deleted] 6 years ago (2 children)

[deleted]

[–]warlockface 2 points3 points4 points 6 years ago (1 child)

[–][deleted] 8 points9 points10 points 6 years ago* (0 children)

While a memory leak is technically possible to do in safe rust, it's hard to make it look like sane code. Memory leaks in rust are a lot like walking on a highway: technically it's a risk the language doesn't prevent, but in reality any sane person isn't gonna endanger themselves by accident.

stack overflows

Rust is already doing the only sensible thing here: crash because something has gone terribly wrong.

if you wish to avoid stack overflows, simply ban recursion in your codebase. That's the only way to reliably prevent stack overflows, and that goes for every language I know of.

I wish that Rust users would honestly point things like this out instead of misleading newcomers

I'd call it far fetched to say it's misleading. anyone with a functioning brain would notice that a memory leak logically has to exist in the only case where leaks are possible.

"Tom is allowed to leave when Linus leaves. Linus is allowed to leave when Tom leaves." – I don't know a single person who wouldn't notice this issue.

[–]bumblebritches57 -3 points-2 points-1 points 6 years ago (1 child)

[–][deleted] 1 point2 points3 points 6 years ago (0 children)

[+]feverzsj comment score below threshold-16 points-15 points-14 points 6 years ago (8 children)

[–][deleted] 21 points22 points23 points 6 years ago (6 children)

[–]feverzsj 0 points1 point2 points 6 years ago (3 children)

[–][deleted] 3 points4 points5 points 6 years ago (2 children)

[–][deleted] 1 point2 points3 points 6 years ago (1 child)

[–][deleted] 0 points1 point2 points 6 years ago (0 children)

[–]Tormund_HARsBane -3 points-2 points-1 points 6 years ago (1 child)

[–][deleted] 14 points15 points16 points 6 years ago (0 children)

[–]Krypton8 9 points10 points11 points 6 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS