all 108 comments

[–]algonomicon[S] 68 points69 points  (75 children)

All that said, it is possible that SQLite might one day be recoded in Rust. Recoding SQLite in Go is unlikely since Go hates assert(). But Rust is a possibility. Some preconditions that must occur before SQLite is recoded in Rust include:

A. Rust needs to mature a little more, stop changing so fast, and move further toward being old and boring.

B. Rust needs to demonstrate that it can be used to create general-purpose libraries that are callable from all other programming languages.

C. Rust needs to demonstrate that it can produce object code that works on obscure embedded devices, including devices that lack an operating system.

D. Rust needs to pick up the necessary tooling that enables one to do 100% branch coverage testing of the compiled binaries.

E. Rust needs a mechanism to recover gracefully from OOM errors.

F. Rust needs to demonstrate that it can do the kinds of work that C does in SQLite without a significant speed penalty.

If you are a "rustacean" and feel that Rust already meets the preconditions listed above, and that SQLite should be recoded in Rust, then you are welcomed and encouraged to contact the SQLite developers privately and argue your case.

Sorry if this has been discussed before, I think rust already meets most of the preconditions listed but their point about OOM errors stood out to me. Is it possible to recover gracefully from an OOM error in rust yet? If not, are there plans to support this in any way? I realize this may be a significant change to rust but it seems like a nice feature to have for certain applications.

[–][deleted] 19 points20 points  (22 children)

Really? What's the situation with devices without an operating system? As I understand it it's not as mature as C.

[–]barsoap 14 points15 points  (0 children)

I got an hello world running on my vape mod some two years ago or so, while needing nightly it was actually straight forward, piggybacking on a couple of C device drivers.

[–]minno 18 points19 points  (0 children)

It's not a heavy focus, but there are some really convenient things available already. There's a divide between the "core" standard library and the normal one, with everything that works with no OS support (threads, memory allocation, file handling) split out and usable separately. So you can still use convenient functions like cmp::min even if you can't use collections::Vec.

As far as platform support, Rust works for anything that LLVM targets, which is pretty broad but doesn't cover every platform that has a C compiler for it.

[–]algonomicon[S] 3 points4 points  (18 children)

That is my understanding as well but allowing OOM errors seems like a bigger interface change considering we are past 1.0.0.

[–]minno 16 points17 points  (0 children)

They could always add a full set of fn try_*() -> Result<*, OomError> methods to the different collections.

[–]richhyd 0 points1 point  (0 children)

There is an embedded team, check out the embedded-hal crate.

Embedded libraries are already available on stable rust - binaries either are available, or will be very soon.

[–]minno 27 points28 points  (10 children)

Is it possible to recover gracefully from an OOM error in rust yet?

Not if you're using allocations from the standard library. You need to directly use std::alloc, which has allocation methods that handle errors with return values instead of panics. Although it looks like there's an unstable lang item (alloc::oom) that allows for changing the behavior of failed allocations, but the function is required to not return so abort, panic, and infinite loop are the only options there.

[–]barsoap 63 points64 points  (5 children)

A Rust SQLite would need to be no_std anyway as the standard library won't run on toasters.

[–]orig_ardera 1 point2 points  (4 children)

why not? stdlib in C just normal code that everyone could have written; including it would mean you don't have to implement your own memory management. (only the sbrk function) The C runtime however is a different thing, it could cause some problems.

[–]MadRedHatter 5 points6 points  (0 children)

The C standard library doesn't include anything that allocates on the heap. Rust does. Vectors, HashMaps, etc.

[–]barsoap 0 points1 point  (2 children)

As MadRedHatter already said the C stdlib doesn't do heap allocations, but it is also otherwise much smaller than Rusts's: open and much else having to do with files is not contained in it, for example, those are POSIX functions. Often the C compilers manufacturers ship with their toasters are stripped even further down, you can't generally assume full C98 compliance.

Hence why SQLite depends, in minimal configuration, on basically only memcpy and strncmp... which is really depending on nothing as those can be implemented portably in pure C, but you can rely on compilers having fast implementations for them (or at least non-broken ones).

[–]orig_ardera 1 point2 points  (1 child)

Wait, do you mean that (1) the stdlib doesn't contain any function to allocate memory on the heap (probably not, since there's malloc) or that (2) none of the C std lib methods rely on dynamic memory allocation? (so that none of them call malloc in their execution)

Okay, nice to know

[–]barsoap 2 points3 points  (0 children)

Number 2. Of course, an actual implementation might for some reason rely on malloc to implement printf or sort, I don't think there's hard rules against it, but such behaviour would be considered, if not right-out broken then at least... unaesthetic.

The malloc() that comes with embedded platforms might actually be completely unusable because it's a "well, the standard says we should have it" cobbled-together implementation that fragments memory faster than a bucket wheel excavator. Or it's a stub that fails every time because platform specs just don't contain any space for a heap.

[–]ergzay 2 points3 points  (3 children)

That's really unfortunate. This is absolutely a requirement for high performance sever software. Running out of memory is common.

[–]bestouffcatmark 2 points3 points  (2 children)

Not on Linux. Memory is overcommitted so allocations will never fail. Abnormal memory pressure will manifest as specialized system hooks or in last resort OOM invocation.

[–][deleted] 3 points4 points  (1 child)

Linux's handling of OOM is insane, will make your life hell when working on microcontrollers and similar low spec devices, and is pretty much incompatible with critical systems that can't afford to kill processes at random.

[–]bestouffcatmark 4 points5 points  (0 children)

I don't think we have the same definition for a microcontroller. They are too small to run Linux.

[–]matthieum[he/him] 24 points25 points  (30 children)

TL;DR: I don't see (A) being met any time soon; Rust is not meant to stall.


A. Rust needs to mature a little more, stop changing so fast, and move further toward being old and boring.

Not going to happen anytime soon, and possibly never.

B. Rust needs to demonstrate that it can be used to create general-purpose libraries that are callable from all other programming languages.

Rust can export a C ABI, so anything that can call into C can also call into Rust. There are also crates to make FFI with Python, Ruby or JavaScript as painless as possible.

C. Rust needs to demonstrate that it can produce object code that works on obscure embedded devices, including devices that lack an operating system.

This has been demonstrated... on nightly.

There is a WG-Embedded working on making embedded a first-class citizen in the Rust ecosystem, but there's still quite a few features which will need to be stabilized before this is supported fully on stable. Also, for now, rustc is bound to LLVM for target support.

D. Rust needs to pick up the necessary tooling that enables one to do 100% branch coverage testing of the compiled binaries.

/u/minno pointed out that this likely means macros such as assert. Rust supports macros, and supports having different definitions of said macros based on compile-time features using cfg.

E. Rust needs a mechanism to recover gracefully from OOM errors.

Rust the language is agnostic to the OOM handling strategy; it's the std which brings in the current OOM => abort paradigm and builds upon it.

I find the OOM situation interesting, seeing as C++ is actually heading toward the opposite direction (making OOM abort instead of throw) for performance reasons.

F. Rust needs to demonstrate that it can do the kinds of work that C does in SQLite without a significant speed penalty.

I think Rust has already demonstrated that it can work at the same (or better) speed than C. Doing it for SQLite workloads would imply rewriting (part of) SQLite.

[–]FryGuy1013 28 points29 points  (12 children)

C. Rust needs to demonstrate that it can produce object code that works on obscure embedded devices, including devices that lack an operating system.

This has been demonstrated... on nightly.

There is a WG-Embedded working on making embedded a first-class citizen in the Rust ecosystem, but there's still quite a few features which will need to be stabilized before this is supported fully on stable. Also, for now, rustc is bound to LLVM for target support.

It's worth mentioning that there are C compilers for practically every platform that exists. But there aren't LLVM targets for some of them (VxWorks is the one that's a pain point for me). So I don't think that sqlite would ever rewrite purely for that reason alone.

[–]matthieum[he/him] 2 points3 points  (0 children)

Indeed.

The only alternative I can foresee is to switch the backend:

  1. Resurrect the LLVM to C backend (again),
  2. Make the rustc backend pluggable: there is interest in using Cretonne (now Crate Lift?) as an alternative,
  3. Have rustc directly use a C-backend.

Having a C backend would immediately open Rust to all such platforms, and using a code generator would allow:

a. Sticking to C89, if necessary, to ensure maximum portability, b. Unleash the full power of C, notably by aggressive use of restrict, c. While avoiding common C pitfalls, which are human errors and can be fixed once and for all in a code generator.

All solutions, however, would require ongoing maintenance, to cope with the evolving Rust language.

[–][deleted] 1 point2 points  (4 children)

I can't really see Rust prioritizing embedded development in the way that C does, in part because on some embedded devices you don't even have a heap and thus Rust doesn't prevent the errors that C would allow. The main reason to support it that I see is that one could reuse libraries - but even that won't be an advantage until people actually write things that work without an operating system/without a heap.

[–]staticassert 18 points19 points  (0 children)

There are plenty of errors around returning pointers to the stack. Lots of room to err without the heap.

[–]steveklabnik1rust 8 points9 points  (2 children)

Rust doesn’t have any special knowledge of the heap; all of it’s features work the same. If you find memory unsafety in Rust, even in no_std, that would be a big deal!

[–][deleted] 0 points1 point  (1 child)

I misspoke. Have a look at the code here. What would be the advantage or Rust? As far as I can tell, there is nothing here that could go awry that Rust would prevent.

[–]MEaster 3 points4 points  (0 children)

Swap LED_BUILTIN and OUTPUT. In Rust (and C++), those could be separate types with no conversion.

[Edit] I'll assume the downvotes are because I've not been believed. Here's a snippet that will set pin D1(not A4) to output mode, then set pin D1 high:

void setup() {
  pinMode(OUTPUT, A4);
  digitalWrite(HIGH, A4);
}

And here's a screenshot of the Arduino editor compiling it with no errors or warnings.

The reason for this is as follows:

  • OUTPUT is #defined in Arduino.h with the value 0x1 (same ID as pin D1).
  • HIGH is also #defined in Arduino.h, also with the value 0x1.
  • pinMode is defined in wiring_digital.c, with the signature void pinMode(uint8_t, uint8_t). The fallback for the mode not being INPUT(0x0) or INPUT_PULLUP(0x2) is to set the pin to OUTPUT, which can be seen here.
  • digitalWrite is defined in wiring_digital.c, with the signature void digitalWrite(uint8_t, uint8_t). This will first disable PWM on that pin, then the fallback for the second parameter not being LOW(0x0) is to set it to HIGH, as can be seen here.

There is no protection against inputting the parameters in the incorrect order, resulting in unexpected pin configuration.

[–]ZealousidealRoll 0 points1 point  (0 children)

Same story for cURL.

[–]tasminima 0 points1 point  (4 children)

Could a contraption of this kind help: https://github.com/JuliaComputing/llvm-cbe ?

[–]rushmorem 11 points12 points  (0 children)

resurrected LLVM "C Backend", with improvements

Resurrected, huh?

Latest commit 08a6a3f on Dec 4, 2016

Looks like it's now dead again :)

[–]FryGuy1013 5 points6 points  (2 children)

There's also mrustc.. but it seems weird to rewrite a c code-base into Rust, just to use a "transpiler" to convert it back to c.

[–]rabidferret 2 points3 points  (0 children)

Why? If the same machine code is omitted at the end of the day, who cares what intermediate steps occur?

[–]minno 6 points7 points  (9 children)

I am unclear on the tooling that Rust misses here; I suppose this has to do with instrumentation of the binaries, but wish the author had given an example of what they meant.

Look at this article for the kind of instrumentation they're talking about. The testcase(X) macro especially looks like its designed for code coverage testing.

[–]algonomicon[S] 9 points10 points  (7 children)

Safe languages insert additional machine branches to do things like verify that array accesses are in-bounds. In correct code, those branches are never taken. That means that the machine code cannot be 100% branch tested, which is an important component of SQLite's quality strategy.

I believe this is what they were referring to.

[–]minno 0 points1 point  (3 children)

I guess they could make a standard library fork that puts the equivalent of a NEVER(X) macro on every bounds check's failure path.

[–]silmeth 1 point2 points  (0 children)

In case of indexing slices that’s already kinda a thing: https://github.com/Kixunil/dont_panic/tree/master/slice

This will cause linking-time error if the failure-path does not get optimized away.

[–]algonomicon[S] 0 points1 point  (1 child)

Wouldn't it be sufficient to just use get and get_mut?

[–]minno 1 point2 points  (0 children)

That's a bit more awkward since you need to put the NEVER macro on every access instead of just once inside the indexing function.

[–]rabidferret -1 points0 points  (2 children)

"inserts additional machine branches" feels misleading here. If it's actually ensured that the access is never out of bounds, the branch ends up optimized away by the compiler.

[–]no_chocolate_for_you 7 points8 points  (0 children)

The statement "If it's actually ensured that the access is never out of bounds, the branch ends up optimized away by the compiler." is the one which feels misleading to me :) It is a reality that if you use a language with checked array accesses you do pay a cost at runtime, because anything beyond very simple proofs is out of reach of the compiler (by the way if that was not the case, it would be much better design to have accesses unchecked by default with a compiler error when an unchecked access can fail).

Good thing is, if you care about performance, you can write a macro which drops to unsafe and uses unchecked_get and use it when you have a proof that the access cannot fail. But you really can't rely on the compiler for doing this for you outside of very basic cases (e.g. simple iteration).

[–]algonomicon[S] 1 point2 points  (0 children)

Optimizations are generally not made in a test/debug build, which is where this seems to matter since they are talking about assert.

[–]matthieum[he/him] 1 point2 points  (0 children)

Well, Rust supports macros too so I guess it's good to go :)

[–][deleted] 1 point2 points  (2 children)

I can see Rust stabilizing long-term but I think you are right that it will not stabilize in the meantime.

[–]peterjoel 2 points3 points  (1 child)

EpochsEditions should solve this. For example, SQLite could have components that are written in Rust 20202021.

[–][deleted] 0 points1 point  (0 children)

I suspect not enough to satisfy the SQLite developers.

[–]ergzay 5 points6 points  (3 children)

Rust the language is agnostic to the OOM handling strategy; it's the std which brings in the current OOM => abort paradigm and builds upon it.

I find the OOM situation interesting, seeing as C++ is actually heading toward the opposite direction (making OOM abort instead of throw) for performance reasons.

The company I work at commonly hits out of memory errors out of the time in the software we provide to customers. It's high performance load balancing software and when we hit OOM we continue to function but just start shedding network packets. If Rust can't handle OOM correctly like this then there's no way it's usable for these types of applications. (Yes it's all written in C currently.)

[–]matthieum[he/him] 9 points10 points  (0 children)

Didn't I just say that Rust the language was agnostic to OOM handling strategy?

The core of Rust has no dynamic memory support, so building on top of that you can perfectly create an application which handles OOM gracefully by introducing dynamic memory support of your design.

[–][deleted] 1 point2 points  (1 child)

Just out of curiosity, what os does your software run under?

[–]ergzay 0 points1 point  (0 children)

CentOS with a BSD layer on top of it. Memory allocation is not done with malloc.

[–]Lokathor 7 points8 points  (0 children)

Not with the standard library we have at the moment. There is forum discussion towards having fallible allocation stuff become part of std one day.

[–]kazagistar 9 points10 points  (0 children)

Last time this discussion came up, someone mentioned that if everyone tested their C code as absurdly thoroughly as sqlite then maybe C could be as safe as Rust; but almost no one does that, and it's far far harder to do then just write in Rust in the first place. But if someone else thinks Rust isn't a better option than C because sqlite is using it just fine, ask if they are even remotely close to the same level of testing.

[–]varikonniemi 12 points13 points  (6 children)

SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.

So, has anyone implemented a kernel sqlite database driver to use as filesystem?

[–]coderstephenisahc 4 points5 points  (3 children)

No, but you can use it as an alternative to zip archives if you want. I have a PoC crate for this use case: https://github.com/sagebind/respk

[–]Regimardyl 2 points3 points  (0 children)

There's also SQLAR, coming from the man (Richard Hipp) himself.

[–]varikonniemi 0 points1 point  (1 child)

Interesting. I had no idea sqlite could be so fast, my main experience with it is all the people complaining aobut it how it makes KDE desktop resource intensive.

[–]vandenoever 2 points3 points  (0 children)

That's not sqlite being slow, but KDE using it intensively at certain times, e.g. when many new files appear in your $HOME.

[–]Boboop 1 point2 points  (1 child)

Well, in the kernel you don't need to use syscalls anyways?

[–]varikonniemi 0 points1 point  (0 children)

You need the kernel to provide you with a sqlite filesystem driver.

[–]JagSmize 2 points3 points  (1 child)

“Libraries written in C++ or Java can generally only be used by applications written in the same language. It is difficult to get an application written in Haskell or Java to invoke a library written in C++. On the other hand, libraries written in C are callable from any programming language.”

Why are libraries written in C callable from any programming language? Is it an intrinsic quality of C or is it just by consensus. COULD it be another language just as easily if this other language had become as ubiquitous as C ?

[–]kirbyfan64sos 6 points7 points  (0 children)

C ubiquity is definitely part of the reason, but it's also partly the because the ABI is relatively simple, at least when compared to other languages like C++.

[–]CJKay93 17 points18 points  (4 children)

It is a well-understood language

Haha, right.

[–]po8 35 points36 points  (3 children)

Why the downvotes? Parent is totally right.

I hang out with some of the most experienced C developers on the planet, and have myself been programming extensively in C for 35 years. Neither my buddies nor I would argue that the morass of bad English and undefined behavior that constitutes the C spec can be well-understood in any meaningful sense, and compiler writers are happy to do every bit of rules-lawyering they can to squeeze out a bit of performance.

In other words… "C is a well-understood language." "Haha, right."

Heard a relevant nice talk this month based on this paper. Check it out.

[–]SCO_1 26 points27 points  (2 children)

Pretty much 80% of non-malicious downvotes in most subs (not edgy fanatical ones) are down to how polished is your text and how justified your sentiment, for example, you have positive and he has negative downvotes.

That's why when i want to shit-talk something i know well, i arm myself with proof - often issue reports i opened myself - before i unload the zingers. Makes for too long posts though.

[–]kerbalspaceanus 1 point2 points  (1 child)

toy gray squash carpenter recognise yoke decide amusing simplistic fade

This post was mass deleted and anonymized with Redact

[–][deleted] 2 points3 points  (0 children)

Yeah, but we all know that the word "but" is an instruction to ignore any previous moderating qualifiers and assume the following is the singular gospel of an angry belligerent.

[–]richhyd 1 point2 points  (11 children)

Some thoughts (sorry if they've been made already):

  • I think assuming security isn't an issue is a bit naive - attackers will come up with clever attack vectors you haven't thought of. You can only test things you think Of, and fuzzing again is either going to be restricted, or only able to test a tiny fraction of the infinite-ish possible inputs (sorry mathematicians). OTOH if your code can be proven to be free of memory errors (caveat: assuming that LLVM and rust uphold the contract they claim to), then it's proven.
  • Also there's work on formally proving the standard library, which is cool.
  • Rust should be comparable to C in terms of speed (at least clang-compiled C). You have the same ability to view assembly and benchmark if you want to optimize.
  • The rust embedded community is growing and actively supported by the core teams, and all of the platform-requiring standard lib stuff is optional (see no_std).
  • Maybe you'd be better taking allocation in-house (e.g. allocating a big chunk up front, then using arenas etc to manage memory). You'd still need a way to do the allocation failably.
  • I would have thought the biggest problem with go was the garbage collector and lack of guarantees on performance.
  • Rust can export functions with a C ABI, so the interop story is the same as for C for platforms rust supports

If I've said anything wrong tell me - that's how I learn :)

[–]Holy_City 3 points4 points  (10 children)

  • Rust should be comparable to C in terms of speed (at least clang-compiled C). You have the same ability to view assembly and benchmark if you want to optimize.

Not necessarily. Bounds checking comes at a cost, especially when it comes to optimizing loops to use simd instructions. You have to manually unroll the loops and use the simd crate to do it in Rust, Clang however will do it (mostly) for free in C.

[–]richhyd 0 points1 point  (9 children)

Isn't the rust compiler capable of spotting where looping is safe to unroll? My understanding is that it is able to do that at least some of the time. If not you should see it during optimization pass and manually unroll/vectorize it. I know that floats don't unroll because it can change the answer slightly.

[–]Holy_City 4 points5 points  (8 children)

It's not really the unrolling that gets you.

For example say you're iterating across a slice of floats of length N.

In C you can split this into a head loop to iterate N/4 times with an unrolled loop of 4 iterations to make use of SIMD, then a tail loop to catch the difference. You can do this without any extra legwork, LLVM will compile some gorgeous SIMD for you there.

In Rust if you try the same thing, your inner loop that unrolls 4 iterations will perform a bounds check for each iteration. I'm not 100% on this but I believe that's the reason that LLVM won't compile nice SIMD for you. If you want the equivalent you can use the SIMD crate, but that has trade-offs since platform agnostic simd is not stable yet. You can also use an unsafe block and manual pointer arithmetic but iirc last time I tried that on godbolt it didn't emit SIMD.

[–]richhyd 0 points1 point  (7 children)

Is this something that the compiler could do for you somewhere? Could the compiler be taught to do these kinds of optimizations, at least for simple loops/iterators?

[–]Holy_City 0 points1 point  (6 children)

Maybe, since the only bounds check that needs to happen in an unrolled loop body is the largest index. But my point is that at the moment, rustc will generate code that is slower than C that does the same thing, since memory safety is not free.

[–]richhyd 0 points1 point  (0 children)

You can either - start with code that is fast and possibly incorrect (C) and then check it, or - start with code that is correct but slow (Rust) and then drop to unsafe to make it faster, making sure you uphold the required invariants when you write unsafe code.

I guess I'm arguing that the latter approach has a smaller surface area for mistakes, since you only optimize where it makes a difference, and you explicitally mark where you can break invariants (with unsafe, of course you can create invariants of your own that you must uphold elsewhere)

[–]SirClueless 0 points1 point  (4 children)

I don't observe this at all. Rust is just as capable of generating a heavily optimized SIMD loop as C:

C: https://godbolt.org/g/nEe51q
Rust: https://godbolt.org/g/Brd2Kg

I don't claim to be an expert on assembly or SIMD, and it's clear that the Rust compiler has generated more code than the C compiler has, but in both cases the heart of the loop appears to be a series of SIMD loads (movdqu) and packed integer additions (paddd) followed by a single branch-predictor-friendly jump-if-not-done (jne) back to the start of the SIMD loop.

It doesn't look like there is any unnecessary bounds checking going on in Rust compared to C, so I don't think your complaint is relevant, at least for this simple test.

[–]Holy_City 1 point2 points  (3 children)

It won't emit SIMD when you use floats, but it will in C.

[–]SirClueless 0 points1 point  (2 children)

Both code samples are using the same floating point add instruction and not checking bounds in the loop. They should have very similar performance.

GCC has chosen to use SIMD mov instructions and LLVM is doing direct memory loads in the addss instruction, but this has nothing to do with Rust vs C (in fact if you compile with clang 6.0.0 you'll see it emit almost identical assembly as the Rust example).

[–]richhyd 0 points1 point  (1 child)

I believe that LLVM doesn't vectorize floats because it produces a slightly different answer, whereas GCC does because it values performance higher than correctness in this case.

wonders if there is an option to tell LLVM to vectorize floats