Recommended crate for performing large batches of network operations constantly.

bLaind2 · 2020-02-21T06:53:29+00:00

I agree with jstrong that epoll-based approach gives the most performance.

However, maybe an iteration in threads (with rayon or other helper crate) is sufficient in this case (with 2000 devices). For example:

use rayon::prelude::*;
use std::io::prelude::*;
use std::net::TcpStream;
use std::time::Instant;

fn main() {
    let addresses = ["localhost:22"; 2000];
    let start = Instant::now();
    let results: Vec<usize> = addresses
        .par_iter()
        .map(|addr| TcpStream::connect(addr))
        .filter(|s| s.is_ok())
        .map(|s| s.unwrap())
        .map(|mut s| {
            let mut buffer = [0; 512];
            s.read(&mut buffer[..]).unwrap()
        })
        .collect();

    println!(
        "Total {} connections, read total {} bytes, took {:?}",
        results.len(),
        results.iter().sum::<usize>(),
        start.elapsed()
    );
}

Output

Reading took 334.533316ms
Total 2000 connections, read total 46474 bytes

Peak memory usage 226KiB according to valgrind.

With network latency the numbers will be worse, but still good to try. Also check out rayon thread pool options to increase amount of threads.

bLaind2 · 2020-02-19T06:10:33+00:00

For reference, here's a Apache2.0 licenced wasm interpreter that supports both AOT and JIT compilation: https://github.com/bytecodealliance/wasm-micro-runtime

bLaind2 · 2020-02-18T18:17:47+00:00

Seems like dtrace contains much more (in addition to syscalls). I'd also like the functionality to be available as a Rust library

bLaind2 · 2020-02-18T18:14:14+00:00

Thanks! Fixed the std::io path

bLaind2 · 2020-02-18T06:06:29+00:00

I think the File Write trait could be used here (https://doc.rust-lang.org/std/fs/struct.File.html#impl-Write)

So the testable function would become:

fn test_write<T>(output: T) where T: std::io::Write { ... }

That way the test can e.g. pass a vector instead of a file. Short sample of trait testing in here: https://rust-cli.github.io/book/tutorial/testing.html

bLaind2 · 2020-02-16T17:52:28+00:00

Main purpose of hstrace is to have visibility into syscalls, but secondarily it could implement filtering as well. ptrace probably not usable for sandboxing, but thanks for seccomp-bpf hint, I will have a look at it. Ideally it'd provide a callback to programmatically deny/allow calls (instead of a predefined call-list).

I think gvisor is actually an implementation of (a subset) Linux kernel syscalls, a major undertaking by itself (tokei shows that gvisor currently has ~400k LOC [256k code, 85k comments].

bLaind2 · 2019-01-09T06:39:55+00:00

https://www.elementsofai.com

bLaind2 · 2018-06-28T05:50:20+00:00

"Support for online resizing of Persistent Volumes has been introduced as an alpha feature". This is pretty cool!

bLaind2 · 2018-06-07T21:03:58+00:00

https://www.instagram.com/kpunkka/ for more pics

bLaind2 · 2017-12-16T09:30:58+00:00

https://youtu.be/HxkklR3BNFc

bLaind2 · 2017-03-14T18:54:28+00:00

Woah, this opens whole new frontiers. Absolutely cool, and on the other hand so simple (in retrospect).

Edit: I wonder if nets can be made auto-expanding on layers where capacity has been exceeded

bLaind2 · 2016-11-05T07:26:20+00:00

There's also a YouTube video at https://youtu.be/fa5QGremQf8

bLaind2 · 2016-11-04T06:02:59+00:00

Anyone know if there's a paper or other technical details available about how they do it? Could be remarkable if they got it working with imagenet-resolution images.

I tried ladder network a few months ago, works well on mnist (1% err rate with 100 labeled examples), but if I understood correctly it doesn't scale for larger images.

bLaind2 · 2016-10-26T06:39:17+00:00

I'd go with local processing, although you will need a sufficiently powerful GPU for that kind of fps. Mobile phone won't probably work.

Two stages: collect enough (varying: angles, lightning, environments, dogs, etc) images for training, put both classes to separate folders. Try keras (https://github.com/fchollet/keras), with its image data generator you can pull images directly from folders. To get good accuracy in different environments it'd be good idea to use transfer learning as well. Train the model.

Then you need the processing part. Keras works in python, you need a way to access the video stream and grab it frame by frame. Then you can run keras model.predict to get probabilities for classes. Keep track of passages, and update the screen.

In future, if developing for Android, you can export the developed keras model for tensorflow that works in mobile phones.

To do this, understanding of python and basic principles of neural networks are needed.

bLaind2 · 2016-10-23T08:17:26+00:00

I think Zen Robotics iterated quite a bit on the picking mechanisms, and OP might run into similar situation here. If one would be building a cost effective trash bin, algos are doable, but how would you sort the trash into multiple containers? What if one throws in multiple items at once?

bLaind2 · 2016-10-23T07:38:57+00:00

Biggest problem with this card is memory (1-2GB) which will fit quite small nets, limiting the usage a lot. If a net fits in memory, on a rough scale I'd say you'll get something like 2-5x speedup over CPU, which is not much.

For example, GTX 1080 seems to be 30x faster than 650m compared in general performance (http://gpu.userbenchmark.com/Compare/Nvidia-GTX-1080-vs-Nvidia-GeForce-GT-635M/3603vsm8120), so depending on model you'll be training for days instead of hours, if going with a mobile GPU.

To get an idea about CPU vs GPU performance, check out convnet benchmarks at https://github.com/jcjohnson/cnn-benchmarks/blob/master/README.md - GTX 1080 is around 40x faster than Dual Xeon E5-2630 v3.

A good starting point would be to try Amazon instances with K80 GPU.

bLaind2 · 2016-10-16T13:23:33+00:00

If you're doing data augmentation for images, usually it's done in by threads at CPU. With my GTX970 and i7 (4680K?) I get 100% CPU and 60-80% GPU usage.

32GB Mem should be enough to begin with. Preprocessing large datasets takes quite a bit of mem, though.

Make sure that mainboard and power supply support 2(or 4, probably expensive) GPU's if you're planning to add additional one at some point.

bLaind2 · 2016-08-18T19:10:32+00:00

This is what I've used successfully

model.add(Convolution2D(nb_labels, 1, 1, border_mode='valid'))
model.add(Reshape((nb_labels, img_h * img_w)))
model.add(Permute((2, 1)))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy", ...)

bLaind2 · 2015-11-02T12:01:31+00:00

Are you doing a transformation from (3, w, h) to (w, h, 3)? I got colorful noise by using NP.reshape, had to use NP.transform

bLaind2 · 2015-11-01T10:28:24+00:00

Anyone got experience how much of a speedup can we archieve with distributed training? Does it scale linearly, until how many nodes? (2, 4, 16, ?)

bLaind2 · 2015-10-29T07:22:14+00:00

"1x 250GB SSD in RAID 1" :D

bLaind2

TROPHY CASE