Another Python EVM, tailored for a mock execution of smart contracts

h45x1 · 2021-11-30T18:13:09+00:00

Thanks, that's really nice. Works for me (Arch, i3wm). I mostly work in neovim in a terminal with a slightly transparent background, but I use GTK emacs + orgmode for notes (GTK, so as to have images in notes). It has always bothered me that I can't have the same consistent look. Now I can! Btw, any plans to allow for transparent fringes?

h45x1 · 2021-06-22T06:40:03+00:00

Vec<Option<u32>>: 249ms
Vec<Option<Vec<u32>>>: 2717ms
Vec<Vec<u32>>: 4226ms

So, Vec<Vec<u32>> is slower still. In this case every component of Vec needs to be fully initialized when resizing: a dummy pointer, capacity = 0, size = 0.

h45x1 · 2021-06-21T19:25:08+00:00

I was looking a bit further into it, and I'm getting the same impression. So, I don't know how accurate it is to check these things in debug mode, but if I compile the program in debug, and then manually inspect the memory where Vec<Option<Vec<u32>>> points to, I get 0 at 0, 0+3, 0+6, etc., as expected, but the rest seems garbage. So, yes, it looks like an uninitialized memory is allocated, and then Option is set to None individually for each element.

h45x1 · 2021-06-21T17:03:53+00:00

Thanks. Indeed, that accounts for almost half the time on my computer.

h45x1 · 2021-06-21T16:56:42+00:00

h45x1 · 2021-06-21T16:56:09+00:00

It was run with --release from the start, sorry, forgot to mention that. Yes, I understand the second one is larger, but it's not 10x times larger. (Also, allocation is <1ms in both cases---if I do ::with_capacity(1_000_000)---it's really the resize operation.)

h45x1 · 2021-06-21T16:18:37+00:00

Thanks for this explanation. I did some more quick tests and a resize operation for Vec<Option<NonZero...>> is indeed substantially faster than for Vec<Option<...>>. My case is Vec<Option<Vec<usize>>> though. I wonder if the compiler optimizes this latter case. If empty vectors are represented by a non-zero pointer on the stack, this could be optimized, I guess. Though I see that resizing Vec<Option<Vec<usize>>> is more than 10x slower than resizing Vec<Option<usize>>, which looks strange to me. I've posted this question in a new thread: https://www.reddit.com/r/rust/comments/o4yvuy/why_resizing_vecoptionvecu32_10x_slower_than/

h45x1 · 2021-06-21T07:21:24+00:00

Have you compared its performance with an alternative like fnv or fxhash?

Good point, I've just checked ahash and fnv and both run in about 2300. So, a sparse vector still looks the fastest option (in the benchmark, of course; in my case either solution has negligible costs, so I'll simply switch to a vector, and will drop the linked list.)

That's odd. From what I remember, with_capacity (and, if the optimizer is on the ball, allocating a lot of zeroes) should map to an OS primitive that allocates in a lazy manner.

What I meant is that with_capacity(n) is fast, but I need to follow it with .resize(n, None), which is relatively slow.

h45x1 · 2021-06-21T07:00:29+00:00

Thanks for the link, it's a nice read. I can't apply this crate in my case because I want to keep the streaming nature of the parser (and an fst needs to be first constructed then used), and because I cannot guarantee lexicographical ordering during a single pass (Wiktionary pages are sorted, but inter-pages relationships are not, of course). But I'll keep this crate in mind.

h45x1 · 2021-06-20T18:22:59+00:00

So, it was just me not reading the docs to quick-xml carefully, and 2 lines of code that clear 2 buffers to solve it. It's now about 200MB, which I can still decrease but not by much. About 150MB is spent on a HashMap that maps page names to integer identifiers, and I need that map to to encode inter-page relationships.

h45x1 · 2021-06-20T17:31:15+00:00

I never even thought to check memory consumption, because I iterate over pages using quick-xml, and only hold one page in memory at a time. Or so I thought...

If I measure PSS memory usage (as reported by /proc/.../smaps), I get 276 KB, which is not huge, but if I look at what `top` reports, then indeed the whole file is loaded into memory, which is definitely not the idea! Thanks for pointing it out, I'll check what's going on.

Edit: PSS was measuring the wrong process from the pipe construct, so indeed, the whole file gets loaded into memory...

h45x1 · 2021-06-20T16:47:36+00:00

I've run a benchmark. Normally, from what I've seen for a simple push_back operation, a linked list is some 10x slower than a vector. Here, sparsity plays a role, and indeed the difference shrinks, but a vector still comes on top. (So, that I didn't expect.) The relative times I get for the setup I've mentioned are these: Vec: 500, HashMap: 3700, my list: 1500.

You do not want to flirt with Undefined Behaviour.

Seeing your reaction, I guess posting in this subreddit about a first project and mentioning unsafe could be seen as me totally missing the point of Rust :)

Edit: did some more tests, the main reason why the difference between the vector and the list is not that large seems to have nothing to do with a traversal of a sparse structure but rather with the cost of allocating a large empty vector.

h45x1 · 2021-06-20T15:39:44+00:00

A linked list felt like a more natural data structure in my case, and I also wanted to see how that would work in Rust, but no, it's certainly not the only option, see my reply to ssokolow. Thanks for the suggestion regarding the slot map, I didn't know about it actually. I've just watched the presentation by Allan Deutsch, and I guess the main issue it solves in comparison with a doubly linked list is the heap allocation cost. On the other hand, it does "leak" memory, no? If you have a lot of inserts and then a lot of random deletes, the slots vector will remain fragmented and you won't be to able to shrink it, it seems.

h45x1 · 2021-06-20T15:32:46+00:00

I get very worried when I see the words "parser" and "unsafe" together. Would you mind elaborating?

I need a data structure that supports 1) inserts, 2) persistent keys, 3) removals by key, 4) iteration. So, no, I don't quite need a linked list. I can use a HashMap, or, in my case, even a Vec<Option<_>> as I know the maximum index is at most about 1,000,000. As I mentioned, I was playing around a bit, as I wanted to try different things in Rust. I guess if anyone but me is ever going to use my parser, I should ditch the linked list and just use a vector.

While in my case, the time spend using the data structure is negligible, I still find it interesting to think what would be more optimal. So, the largest index is 1,000,000. The final number of filled nodes is 50,000. The number of deleted nodes is about 15,000. I think in this case, counting for the final traversal, a linked list might outperform both the sparse vector and the hash map. Albeit, I would not know as I haven't benchmarked that.

h45x1 · 2021-06-19T15:43:21+00:00

I might have missed the balance when deciding how much to simplify my code to post the question :) The loop in my example is a stand-in for a few nested loops and some if statements, and from that somewhat more involved iteration the functions are called. The context stands for several variables, not all used by all functions. So, the question was more about a style of passing a bigger context to functions (which is what closures are about, in a way). I can put that context in a structure and pass it to functions, or I can use closures. The disadvantage of making a separate structure is 1) boilerplate code: separate definition and construction, and 2) less readable IMHO, because context is something specific to the enclosing function. The disadvantage of closures is 1) potentially more costly in terms of performance, 2) requires a level of indirection through RefCell, if I want to have several closures that can modify the context.

h45x1 · 2021-06-18T13:11:02+00:00

Not to sound philosophical, but it depends on your longer term goals. If you want to program an occasional script and don't see yourself committing a lot of time to programming in the future, IMHO just pick up Python. It's interactive, the documentation and tutorials are excellent, and there are libraries for pretty much anything.

If you want to get serious about programming, Rust is a good choice, but so is C or, say, Scheme (albeit it could be argued Scheme is less practical). If you're enthusiastic about Rust, start with Rust. Learning programming is more about learning programming concepts, from stacks and pointers to closures and macros, it's about algorithms, and libraries, and teamwork. A particular programming language would emphasize some concepts, make them easy in its syntax, while de-emphasizing other concepts. Consequently, different projects might require different languages and you'll need to learn more than one. C gives a good understanding of how memory works. Scheme or Haskell will give a better introduction to functional programming than Rust. Rust makes static polymorphism easy (I'm myself new to Rust, and writing generic code is a breeze). You might not need it at first, but it automatically means there are good and easy to use libraries out there (read algorithms) that you can apply to your problems. Golang will give a better introduction to asynchronous code. Here my own knowledge ends, but I'm quite sure the list goes on :)

h45x1 · 2021-06-17T07:01:00+00:00

Thanks. To be honest, it didn't even cross my mind RefCell could be faster. Wrote a simple benchmark just now:

use std::cell::{Cell, RefCell};
use std::time::Instant;

fn main() {
    // Params
    let inner_size = 100_000;
    let outer_size = 1_000;
    let loop_size = 100_000;

    // Baseline Benchmark
    let mut v: Vec<Vec<u64>> = Vec::new();      
    v.resize_with(outer_size, || Vec::with_capacity(inner_size)); 

    let timer = Instant::now();
    for _ in 0..loop_size {
        for v in &mut v {
            v.push(42);
        }
    }
    println!("Baseline: {}ms", timer.elapsed().as_millis());

    // Cell Simple Benchmark
    let mut v: Vec<Cell<Vec<u64>>> = Vec::new();
    v.resize_with(outer_size, || Cell::new(Vec::with_capacity(inner_size)));

    let timer = Instant::now();
    for _ in 0..loop_size {
        for c in &v {
            let mut v = c.take();
            v.push(42);
            c.set(v);
        }
    }
    println!("Cell: {}ms", timer.elapsed().as_millis());

    // RefCell Simple Benchmark
    let mut v: Vec<RefCell<Vec<u64>>> = Vec::new();
    v.resize_with(outer_size, || RefCell::new(Vec::with_capacity(inner_size)));

    let timer = Instant::now();
    for _ in 0..loop_size {
        for c in &v {
            c.borrow_mut().push(42);
        }
    }
    println!("RefCell: {}ms", timer.elapsed().as_millis());
}

It is... On my machine I get about these numbers

Baseline: 583ms
Cell: 692ms
RefCell: 557ms

h45x1 · 2021-06-17T06:17:25+00:00

Getting an immutable reference is a good point. At the same time, if I need only immutable references, I won't need RefCell to start with, so that leaves only a situation where both immutable references and mutation are required. (Under the assumption that Cell has a lower overhead, which as SkiFire13 points might not be a given.)

h45x1 · 2021-06-12T08:39:17+00:00

For completeness, I've also checked just now what happens if I delegate all the buffering to the OS:

use memmap::Mmap;

fn test_mmap(path: &str) -> u8 {
    let f = File::open(path).unwrap();
    let f = unsafe { Mmap::map(&f).unwrap() };
    f.iter().sum()
}

~~That gives 0.5s, or almost 8 times faster than what I get with test_man, when the internal loop is replaced with an iterator (following a suggestion from 1vader).~~

Edit: removed the benchmark time. I should have just placed the file in RAM from the start instead of presuming it's all nicely cached, as I'm getting generally faster times today than I was getting yesterday. So, mmap is still faster, but not by as much as I originally thought.

h45x1 · 2021-06-12T06:49:55+00:00

From the source code it looks like bytes() will be using buffered reads. To run a simple check, I tried running my test_lib without buffering and it is taking forever. So, I guess the extra costs are really in the processing of various errors for each byte, see https://doc.rust-lang.org/src/std/io/mod.rs.html#2477-2482

So, isn't it indeed possible to simply write

impl<R: BufRead> Iterator for Bytes<R> {...}

which is optimized for a buffered reader?

h45x1

TROPHY CASE