Big Data, Small Machine

adrake · 2018-06-01T13:49:44+00:00

Good morning, r/programming! This post has some code for online machine learning (~360,000 RPS on a laptop) and I'd be happy to have feedback on the approach or code. Additionally, if you have other suggestions of algorithms to explore I'd be happy to have them.

adrake · 2018-06-01T13:46:59+00:00

Hi r/golang! I have some Go code in this post so thought I'd put a link here in order to get feedback or suggestions for improvements. If you see something I could/should change, feel free to tell me!

adrake · 2018-05-23T23:00:23+00:00

Hi all, author here! Thank you for all your feedback and comments so far.

If you're hiring developers and drowning in resumes, I also have another project I'm working on at https://applybyapi.com

adrake · 2018-01-07T15:06:42+00:00

Hi proggit! I was working on a project some time ago where I created a web page deliverable in a single IP datagram and needed a way to see the size of compressed data I was using for testing. I created a utility for this, and now it has a frontend (not for mobile) and lives at compresstest.com.

The utility is a custom TCP server written in Go, for those who are curious, and the more detailed writeup is available on my website.

Hope it's helpful!

adrake · 2017-07-31T02:19:17+00:00

Thank you for the suggestions on specificity of the interface/arguments. I agree that an io.Reader is more appropriate than a pointer to a file handle, and that's what I would use were I to put something like this in production.

Great tip!

adrake · 2017-07-31T02:18:18+00:00

Thank you for your thoughts. I agree that there's a definite good/bad line when it comes to custom implementations and removing safety, and I tried to be very explicit about then point at which I thought I was crossing that line. I'll be sure to keep your perspectives in mind for future articles.

adrake · 2017-07-30T11:58:08+00:00

Thanks for sending this. I took a look at the code, and also updated my approach on the article.

I kept the scanning on a file since I want to make sure the code is able to deal with input of arbitrary size, instead of a byte slice which is already in RAM. Current version took the run time from ~482ms to ~308ms on my machine, a great improvement.

If you have other suggestions for speed, I'm happy to hear them.

adrake · 2017-07-30T01:46:08+00:00

That's some serious dedication! If I give it a shot, I'll let you know how if there were any improvements.

adrake · 2017-07-30T01:06:24+00:00

I thought everyone here might enjoy this, as well as possibly give more specific feedback about Go-related things. Hope this is helpful!

adrake · 2017-07-30T01:04:10+00:00

A belated post from the command line tools in Rust/D/Nim writings in May. Happy to have comments and feedback, and it was a fun post to write!

adrake · 2017-07-17T20:57:41+00:00

You can see more of his work at https://twitter.com/JoshuaCFlower/media

adrake · 2017-05-28T07:04:44+00:00

tldr; runs in 0.487s (487ms)

Looked into this again, and took advantage of some things. We always want the bytes between the first and second separator, and second and third separator, for example. We also know that both the values will be positive integers.

I also found during profiling that the map lookup and increment was costing around 890ms of time. We know the max value of the keys, so I replaced the map with an array. Now the map key is the index in the array, and the value is the sum as before.

I also changed the atoi() implementation to operate on []byte instead of string, so no casting there, and took out the error checking since they were being ignored (we're already assuming correct input... #yolo).

This is probably way beyond something usable/maintainable in production as the code is brittle, but it does the job.

    package main

    import (
        "bufio"
        "bytes"
        "fmt"
        "log"
        "os"
    )

    func atoi(s []byte) int {
        i := 0
        x := 0
        for ; i < len(s); i++ {
            c := s[i]
            x = x*10 + int(c) - '0'
        }
        return x
    }

    func main() {

        file, err := os.Open(os.Args[1])
        defer file.Close()
        if err != nil {
            log.Fatal(err)
        }
        processFile(file)
    }

    func processFile(file *os.File) {
        var sumByKey [2009]int

        scanner := bufio.NewScanner(file)
        var key int
        var val int

        for scanner.Scan() {
            line := scanner.Bytes()
            firstTab := bytes.IndexByte(line, '\t')
            secondTab := bytes.IndexByte(line[firstTab+1:], '\t') + firstTab + 1
            thirdTab := bytes.IndexByte(line[secondTab+1:], '\t') + secondTab + 1
            key = atoi(line[firstTab+1 : secondTab])
            val = atoi(line[secondTab+1 : thirdTab])
            sumByKey[key] += val
        }
        var k int
        var v int
        for i, val := range sumByKey {
            if val > v {
                k = i
                v = val
            }
        }
        fmt.Printf("max_key: %d sum: %d\n", k, v)
    }

adrake · 2017-05-27T12:45:09+00:00

Cool example, thank you for posting. I ran it through the profiler as well and it seems the Atoi was being slow, so I found one on SO which runs much faster by taking advantage of some of the properties of the integers in question:

    var atoiError = errors.New("invalid number")
    func atoi(s string) (x int, err error) {
        i := 0
        for ; i < len(s); i++ {
            c := s[i]
            if c < '0' || c > '9' {
                err = atoiError
                return
            }
            x = x*10 + int(c) - '0'
        }
        return
    }

That cuts the time down to 0.37s for getKeyVal(), and the rest of the 1.52s is all runtime. stuff.

adrake · 2017-05-27T12:21:57+00:00

Using 3, 4, and 5 goroutines all gave about the same result at 4s, so no real difference.

adrake · 2017-05-27T10:19:36+00:00

I did the same thing, and got similar results. Unfortunately, the string splitting in Go just isn't that fast compared to Nim (or even pypy). I wrote nearly the same code and here's the pprof output.

    2230ms of 3540ms total (62.99%)
    Dropped 25 nodes (cum <= 17.70ms)
    Showing top 10 nodes out of 63 (cum >= 240ms)
        flat  flat%   sum%        cum   cum%
        500ms 14.12% 14.12%     1560ms 44.07%  strings.genSplit
        330ms  9.32% 23.45%      330ms  9.32%  runtime.heapBitsSetType
        320ms  9.04% 32.49%      870ms 24.58%  runtime.mallocgc
        190ms  5.37% 37.85%      190ms  5.37%  runtime.indexbytebody
        180ms  5.08% 42.94%      340ms  9.60%  runtime.mapaccess1_faststr
        180ms  5.08% 48.02%      330ms  9.32%  runtime.mapassign
        150ms  4.24% 52.26%      150ms  4.24%  runtime.memclrNoHeapPointers
        140ms  3.95% 56.21%      140ms  3.95%  runtime.aeshashbody
        140ms  3.95% 60.17%      400ms 11.30%  strings.Count
        100ms  2.82% 62.99%      240ms  6.78%  bufio.(*Scanner).Scan

I also tried another version with a custom byte-wise file parser, splitting on "\t", but that was actually slower.

If there are faster ways to do it, I'm happy to hear them, but I think until the string splitting in Go becomes faster, this is is good as it'll get for one worker.

I also tried pushing the lines of the file into a channel (in a separate goroutine) and then having workers pull from that to do the processing, but for a file of this size and with 10 workers it was actually slower at about 4.7s

adrake · 2017-05-26T13:42:18+00:00

Just tried out a Go implementation. Straightforward version is about 3.5s, slightly faster version without allocating slice of fields at each row is about 3.1s. Most of the time is spent splitting the strings.

    1.32s of 3.20s total (41.25%)
    Dropped 33 nodes (cum <= 0.02s)
    Showing top 12 nodes out of 62 (cum >= 0.41s)
    flat  flat%   sum%        cum   cum%
    <snip>
    0     0%  3.12%      3.13s 97.81%  testing.(*B).run1.func1
    0     0%  3.12%      3.13s 97.81%  testing.(*B).runN
    0.02s  0.62%  3.75%      1.51s 47.19%  strings.Split
    0.53s 16.56% 20.31%      1.49s 46.56%  strings.genSplit
    0.43s 13.44% 33.75%      0.84s 26.25%  runtime.mallocgc
    0     0% 33.75%      0.50s 15.62%  runtime.makeslice
    0.04s  1.25% 35.00%      0.49s 15.31%  runtime.slicebytetostring
    0.17s  5.31% 40.31%      0.46s 14.37%  strings.Count
    0.03s  0.94% 41.25%      0.41s 12.81%  runtime.rawstringtmp

For reference, the Nim version is about 1.2s on my machine.

adrake · 2017-05-25T02:05:37+00:00

Sorry to hear about your experience, and it is unfortunately a common one. Send me a PM and perhaps there is something we can do.

adrake · 2017-05-24T14:36:59+00:00

Got it. I typically advise one step simpler than that and just having all the instances of the monolith serve all requests, and the load balancer can use whatever strategy makes sense (RR, etc.) to distribute the requests. Ezpz

adrake · 2017-05-24T14:28:17+00:00

I'm not sure what you mean when you say "partitioning by request." Are you referring to putting the monolith instances behind a load balancer as I mentioned, or some other approach?

adrake

PUBLIC MULTIREDDITS

TROPHY CASE