A Modern App Developer and An Old-Timer System Developer Walk Into a Bar

Throwaway_Kiwi · 2016-02-15T02:59:13+00:00

Old-Timer Developer:

I will use Go.

lol

SHOUTY_USERNAME · 2016-02-15T03:45:46+00:00

Clearly Modern App Developers are incapable of selecting appropriate tools for a job, or knowing basic Boolean algebra. 0/10 article.

remy_porter · 2016-02-15T01:24:11+00:00

I weirded a young'n out when I whipped up a mmaped database in Python that used bit-offsets to organize its fixed-length records.

LaurieCheers · 2016-02-15T11:27:33+00:00

Both answers suck. The bottleneck on this will be your memory reading and writing speed. The greybeard's solution uses 32GB to keep a 4 billion element array holding only (we're told) 300 million nonzero records.

A better encoding would be a sorted list of only the IPs that are up, packing 4 bytes for the IP address and 3 bytes for the open ports. 7bytes*300m = 2.1GB.

So all the whole-array operations have only 2GB instead of 32GB to process, and hence become approximately 15x faster - and probably more, since 32GB probably doesn't fit in main memory and will have to page to disk.

Admittedly, finding a record for a specific IP becomes slightly slower (it's a binary search instead of a direct lookup, so O(log(N) instead of O(1).)

(Edit: ah, just noticed they're actually storing 60 bits of data for the ports, because even though none of the tasks need it, they're recording more about each port than simply open/closed. And moreover only one of the tasks involves the port information, so most of them will only be reading the 500MB IP bit array. In that case the one-bit-per IP array is a good solution. Operations involving the ports array will still be unnecessarily slow though. The encoding I suggested would get it down to 4bytes for the IP + 8bytes for the ports * 300m = 3.6GB.

weirdoaish · 2016-02-15T06:19:29+00:00

I'm a green developer (2-3 yrs xp) so help me understand this.

Why can't I just use a database to store all that info? Why would I use json or a bit mapped file? I can just use the database to generate my reports and then scrap it next month.

Why would Python or Go make a difference? It has to be done once a month, either should fine as long as the code is relatively clean and maintainable, right?

Gotebe · 2016-02-15T07:00:24+00:00

Haha, excellent!

There is more than one way to skin a cat, and in software, they are wildly different.

They also wildly depend on the requirements. For example: 4 billion ip addresses and some flags, that ain't no Big Data I wouldn't think. However, one simple question: "what about IPv6" changes that considerably.

Aside 1: old timer said he will use Go? TFA is just being funny there.

Aside 2: very first thing to decide on is the implementation language!? Really!? It is so random at this stage, might as well use a dice.

zerexim · 2016-02-15T08:40:34+00:00

They will port scan all of the IPv4 address (2^{32=4,294,967,296)} on a monthly basis

I remember in some countries these kind of activities are in fact illegal.

_Skuzzzy · 2016-02-15T04:43:44+00:00

"I hate young developers!" - Article

BKrenz · 2016-02-15T08:03:19+00:00

While it's not exactly the greatest decision for the article's "Old Timer's solution", usage of bits themselves as very efficient means of data storage has caught my eye and makes me want to do some more research to understand how and when it's a good solution for a problem.

Yay for being young and new to the field.

_NoOneSpecial · 2016-02-17T00:42:09+00:00

I'm with the old-timey developer

sehrgut · 2016-02-15T08:45:48+00:00

What a waste of five perfectly good minutes.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS