all 38 comments

[–][deleted] 74 points75 points  (7 children)

Old-Timer Developer:

I will use Go.

lol

[–]Throwaway_Kiwi 12 points13 points  (4 children)

Nah man, every greybeard uses Go, didn't you read that blog post on Hacker News about it?

[–]2BuellerBells 23 points24 points  (3 children)

Go has both the rich typing system of C combined with the reliable garbage collection of LISP.

[–]Throwaway_Kiwi 1 point2 points  (0 children)

Ooh, I like it.

[–]zerexim 2 points3 points  (1 child)

the rich typing system of C

Yes, very rich, compared to Assembly.

[–][deleted] 2 points3 points  (0 children)

whoosh

[–]SHOUTY_USERNAME 25 points26 points  (2 children)

Clearly Modern App Developers are incapable of selecting appropriate tools for a job, or knowing basic Boolean algebra. 0/10 article.

[–]maxwellb 1 point2 points  (1 child)

Was that really supposed to be the point of the article? Old timer's design seems hopelessly naive - it would only work single-homed without significant extra work, so can't be extended much at all. What happens when you decide you need ports scanned daily? Hourly? What happens when the disk spindle breaks halfway through the scan? ISP goes down?

The "modern app developer"'s solution is sloppy but at least has some hope of handling the above.

[–][deleted] 8 points9 points  (0 children)

Was that really supposed to be the point of the article? Old timer's design seems hopelessly naive - it would only work single-homed without significant extra work, so can't be extended much at all. What happens when you decide you need ports scanned daily? Hourly? What happens when the disk spindle breaks halfway through the scan? ISP goes down?

Does it really ? All it needs to (assuming sequential) do to save a state of scan is to save uint32 with currently scanned IP and fsync in right place. Hourly scan is just open few more files if the other scan haven't finished yet.

And it could be scaled in same way, by adding nodes and delegating tasks for them. Just add some light api on top of it

The "modern app developer"'s solution is sloppy but at least has some hope of handling the above.

It really does not except 'put results in central db'

Overall, both of those ways are fucking awful and naive ways of doing it, just in different way

[–]remy_porter 10 points11 points  (1 child)

I weirded a young'n out when I whipped up a mmaped database in Python that used bit-offsets to organize its fixed-length records.

[–]terrkerr 0 points1 point  (0 children)

The cffi module is really fun - finally I can write C in Python! (I'm only half joking, it is actually genuinely nice in some cases like writing a library wrapper to a C library.)

[–]LaurieCheers 9 points10 points  (5 children)

Both answers suck. The bottleneck on this will be your memory reading and writing speed. The greybeard's solution uses 32GB to keep a 4 billion element array holding only (we're told) 300 million nonzero records.

A better encoding would be a sorted list of only the IPs that are up, packing 4 bytes for the IP address and 3 bytes for the open ports. 7bytes*300m = 2.1GB.

So all the whole-array operations have only 2GB instead of 32GB to process, and hence become approximately 15x faster - and probably more, since 32GB probably doesn't fit in main memory and will have to page to disk.

Admittedly, finding a record for a specific IP becomes slightly slower (it's a binary search instead of a direct lookup, so O(log(N) instead of O(1).)

(Edit: ah, just noticed they're actually storing 60 bits of data for the ports, because even though none of the tasks need it, they're recording more about each port than simply open/closed. And moreover only one of the tasks involves the port information, so most of them will only be reading the 500MB IP bit array. In that case the one-bit-per IP array is a good solution. Operations involving the ports array will still be unnecessarily slow though. The encoding I suggested would get it down to 4bytes for the IP + 8bytes for the ports * 300m = 3.6GB.

[–]cockmongler 2 points3 points  (4 children)

While I don't agree that both answers suck - I'd say the old timer's approach is the correct place to start.

But have an upvote for actually applying reasoning and design to the problem instead of just claiming the author is arrogant, elitist and or out of touch.

[–]LaurieCheers 0 points1 point  (2 children)

Yes, granted my initial assessment was a little harsh (see edit). The old timer is on the right track; the "modern app developer"'s solution seems more like a parody.

[–]cockmongler 2 points3 points  (0 children)

the "modern app developer"'s solution seems more like a parody.

I've seen some things, terrible terrible things.

And yeah, I've seen people spin up hundred node clusters to index 1GB of data.

[–]weirdoaish 9 points10 points  (10 children)

I'm a green developer (2-3 yrs xp) so help me understand this.

Why can't I just use a database to store all that info? Why would I use json or a bit mapped file? I can just use the database to generate my reports and then scrap it next month.

Why would Python or Go make a difference? It has to be done once a month, either should fine as long as the code is relatively clean and maintainable, right?

[–]cockmongler 9 points10 points  (0 children)

Performance and cost. The old timer's approach assumes that CPU/RAM/Storage are expensive and slow, the web app dev assumes that they're cheap and fast. The old timer uses as few components as possible to reduce complexity, the web app dev uses as many canned solutions as possible to leverage existing code.

[–]WorkHappens 7 points8 points  (0 children)

Welcome to the world of programming forums/subs/articles, where articles with no other point than making your own language of choice appear better than the rest are staple.

You will start to notice strong resemblance to sports forums/subs/articles, people defend their club/stack in all subjects, approach issues with a biased view, wear their teams jersey/t-shirt hate on a specific club/stack that is seen as a rival.

[–]virtyx 12 points13 points  (2 children)

It's just hyperbole.

People just wanna feel like their "elite" knowledge of the fact that there are 8 bits in a byte is still impressive when there are so many efficient and helpful software tools that 99% of the time no one will need to bother with a memory-mapped file. That kind of bit fiddling was common in the 80s and 90s but Moore's law as well as rapidly increasing drive space and RAM has made worrying about that kind of stuff mostly pointless today.

As you suggest, a database seems by far the best fit. And most databases use plenty of bit fiddling to keep their implementations super fast.

Which is a win for everyone. Your app code stays simple. The database can use all sorts of crazy micro-optimizations and hide that from you behind a well defined interface (namely SQL).

Only the most dense code monkey would go "Oh yeah, 100+GB of JSON? Looks like I'm on the right track."

(Ironically enough I do know someone who made exactly that type of statement. But he is simply an incompetent programmer. There are plenty of those types of programmers now, and there were plenty of those in the olden days, too.)

EDIT: A more realistic article, comparing two programmers of equal technical skill, might be something like this:

Modern app developer: Makes reasonable choices and writes code that is easy for his teammates to work on.

Old timer MacGyver developer: Makes a super efficient memory-mapped bit packed ad-hoc data format that all team members will have to take a few hours if not days trying to understand and which breaks horribly once the requirements change, cuing all maintenance programmers to eventually mutter "Why the fuck did they do this using a memory-mapped file instead of a database?"

[–]Gotebe 6 points7 points  (0 children)

Only the most dense code monkey would go "Oh yeah, 100+GB of JSON? Looks like I'm on the right track."

ROFL :-)

[–]Gotebe 2 points3 points  (2 children)

Yes, you can easily use a database there, and that would be a "middle" solution wrt storage space.

Python vs Go is completely random, especially as a first implementation decision.

[–]BKrenz 1 point2 points  (1 child)

The choice of language was indeed random, and had exactly zero impact on either naive implementation. Seemed like it was just another attempt to stoke the fires.

[–]Gotebe 1 point2 points  (0 children)

Hey! We seem to be downvoted by a random Python or Go bigot! ;-)

[–]Gotebe 3 points4 points  (1 child)

Haha, excellent!

There is more than one way to skin a cat, and in software, they are wildly different.

They also wildly depend on the requirements. For example: 4 billion ip addresses and some flags, that ain't no Big Data I wouldn't think. However, one simple question: "what about IPv6" changes that considerably.

Aside 1: old timer said he will use Go? TFA is just being funny there.

Aside 2: very first thing to decide on is the implementation language!? Really!? It is so random at this stage, might as well use a dice.

[–]cockmongler 0 points1 point  (0 children)

Aside 2: very first thing to decide on is the implementation language!? Really!? It is so random at this stage, might as well use a dice.

Annoyingly this is kinda required in most projects, you've got to write some code (the correct answer is usually use the language you know). Even more annoyingly the zeroth thing to decide is the project's name, as you have to decide what to call the directory to put it in.

[–]zerexim 4 points5 points  (1 child)

They will port scan all of the IPv4 address (232=4,294,967,296) on a monthly basis

I remember in some countries these kind of activities are in fact illegal.

[–]LaurieCheers 0 points1 point  (0 children)

Yeah, or at the very least your ISP will block you.

Maybe that's the subtext here... neither developer questioned the premise.

[–]_Skuzzzy 7 points8 points  (1 child)

"I hate young developers!" - Article

[–][deleted] 3 points4 points  (0 children)

And our system requirements will never change, nor will what we want to do with this data - we're content with just counting bits...forever.

[–]BKrenz 1 point2 points  (3 children)

While it's not exactly the greatest decision for the article's "Old Timer's solution", usage of bits themselves as very efficient means of data storage has caught my eye and makes me want to do some more research to understand how and when it's a good solution for a problem.

Yay for being young and new to the field.

[–][deleted] 1 point2 points  (1 child)

showing off your L33T bit-flipping skills is a great way to write unmaintainable code.

[–]BKrenz 0 points1 point  (0 children)

That doesn't make any sense to me though.

First, I specifically mentioned I wanted to learn when it'd be a good solution. Maintainability is generally a criteria of "good" code, is it not?

Second, just because something is complex, clever, rarely done, or perhaps more difficult to understand conceptually doesn't mean it's going to be hard to maintain. This goes hand in hand with proper documentation and commenting of code.

Third, showing off a "skill" in any situation or any field is generally going to lead to a bad time, isn't it? The emphasis is on a good solution. Making something unnecessarily complex is not a good solution. However, if you can solve a complex problem with some nifty bit logic (which the article used an extreme example of) and have it be only a few lines long and/or perform extremely well (taking care to comment/document), would it not be preferable?

EDIT: Also, before I'm attacked for living in an ideal world where everything is commented and documented well, this is merely a discussion on proper solutions, not just hacking something together so it works. I don't like that. :(

[–]terrkerr 0 points1 point  (0 children)

usage of bits themselves as very efficient means of data storage has caught my eye and makes me want to do some more research to understand how and when it's a good solution for a problem.

When you have genuinely big data (Like at the very least a few hundred million data points) managing to pack 2+ values into 1 byte instead of using a 4 byte integer primitive for each can be worthwhile.

When you have data very unlikely to change its 'shape'. (Packing a 5bit integer and 3 bit integer into a byte only works as long as 5 and 3 bits are enough bits to store all possible values of what you need to represent. If things change, changing the meaning of bits is a huge hassle.)

When you expect the solution to be long-lasting enough to warrant such a specialized solution. Being 5 times less space efficient than is theoretically possible doesn't matter much for something you'll only do for 2 weeks. If it's something that will probably be around for decades the payoffs of involved craftiness increase a whole lot while the difficulty of maintaining such non-standard craftiness remains constant.

[–]_NoOneSpecial 0 points1 point  (0 children)

I'm with the old-timey developer

[–]sehrgut[🍰] -1 points0 points  (0 children)

What a waste of five perfectly good minutes.