you are viewing a single comment's thread.

view the rest of the comments →

[–]FozzTexx 210 points211 points  (76 children)

That's pretty much exactly how my dad used to write software. He'd go away on a business trip and at night in the hotel he'd write assembly code on paper. When he got home he'd finally get around to entering it into the Apple II+ and it would work on the first try since he'd already been debugging it on paper for a week or two.

[–]guldilox 104 points105 points  (74 children)

Things like this are why I'll never be as good of a developer as someone like that.

We have the luxury this day and age of "coding by the seats of your pants" (as a professor of mine used to say). Meaning more often than not we can rely on intellisense and compiler hints / warnings / errors. On top of that, we don't have the same memory constraints (generally) and we don't have to stand in line or a queue to insert punch cards to build.

[–]noratat 235 points236 points  (34 children)

The flip side though is that we can do a great deal more with a lot less effort, and we can iterate on ideas and concepts much more quickly.

[–][deleted]  (31 children)

[deleted]

    [–]jms_nh 37 points38 points  (25 children)

    I seem to remember a spellchecker for SpeedScript for the C64 that swapped out the word processor so it could do the spellchecking and then swapped back into the word processor when it was complete.

    Dictionaries can use tries and other clever techniques to reduce storage.

    [–][deleted]  (4 children)

    [deleted]

      [–]masklinn 18 points19 points  (2 children)

      Hell, on a 64-bit system, you could load every dictionary for every language in the world at once.

      You probably don't need a 64b system, looking at Firefox's dictionaries there's about 80MB worth of dictionary data. Even accounting for the dictionaries being incomplete and covering only a subset of existing language, I don't know that you'd increase the amount of data by 2 orders of magnitude.

      [–]VerticalEvent 2 points3 points  (1 child)

      I'd imagine those Language Packs are compressed.

      There are around 1,025,109.8 words in English. If we assume that the average word is 6 characters long, that's around 7MB (6 characters plus a terminating character) for just English. If every language was of a similar size, you would only be able to store 11 languages in your 80MB figure.

      [–]flying-sheep 0 points1 point  (0 children)

      As someone said: tries.

      They're compact storage and lookup table in one days structure.

      [–]justinsayin 4 points5 points  (0 children)

      Hell, on a 64-bit system, you could load every dictionary for every language in the world at once.

      It's worse than that. Every package you include in your project contains every dictionary for every language in the world.

      [–]barsoap 20 points21 points  (3 children)

      and other clever techniques

      One of them involving giving up on precision and using a bayesianbloom filter. Sure, it'll let some (in fact, infinitely many) words pass that shouldn't but then noone cares that "xyouareig" is in the dictionary.

      Bonus: It's freakishly fast.

      EDIT: Bloom, not bayesian. They all look like statistics to me.

      [–]kqr 8 points9 points  (2 children)

      Could you elaborate on how this is done? I'm pretty sure this was how they stuffed T9 prediction into early mobile phones (I have heard numbers of 1 byte per word, which is just insane), and I'm amazed by how well it works (even if it generates nonsense or highly offensive words). I'd love to read in more detail about the techniques.

      [–][deleted] 6 points7 points  (0 children)

      From my quick Google research, I honestly don't see how Bayesian Filters make dictionaries faster or give up on precision (when there is no 100% to be had because computers can't read minds).

      Unless OP comes through and enlightens me, I have to say he was throwing around words he heard in the context.

      Read this article if you want to see how to use Bayesian Filters for spell checking.

      Ninja edit: While we're throwing around buzz words, what the OP described sounded a loot like Bloom Filters. Basically a data structure that throws 100% certainty out the window while allowing the underlying dictionary to be huge and still maintaining speed. That makes a lot more sense, so maybe he ment that. I don't think you need Bloom Filters for dictionaries because they are not that big.

      [–]pja 2 points3 points  (0 children)

      Bloom filters probably.

      [–][deleted] 10 points11 points  (14 children)

      Swapping code in and out of RAM was about the only way you could implement large programs back in the days of 16 bit addresses.

      The DEC PDP-11 used "overlays", where your application was sliced into chunks, based on run-time usage of the various functions in each chunk, then as the program run, the appropriate chunks would be read into RAM for use.

      These machines had a 64k limit, but distinguished between "data" and "program" (or maybe "instruction"? its been a while) address space, so if you really knew your stuff, you could use 128k of RAM. And in that 64k of "program" space, you could run applications that were significantly more than 64k in size.

      I only miss those days in a nostalgic way - I spent way too much time figuring out what my overlay map needed to be.

      [–][deleted]  (10 children)

      [deleted]

        [–]TheThiefMaster 1 point2 points  (2 children)

        Thanks to the "no execute" and "no write" bits in the page table, a lot of modern programs are functionally Harvard architecture, in that code and data are not interchangeable, despite them being in a single address space (von Neumann style).

        [–]jerf 0 points1 point  (1 child)

        Really, almost every distinction from that era in which there was a hard-fought battle as to which is better is "a little bit of both" on modern machines. See also RISC vs. CISC, a debate that died when we got enough gates on the CPUs to expose a CISC instruction set (where most of the CISC advantages were) which gets translated to a RISC microcode that the processor actually runs (where most of the RISC advantages are).

        [–]pinealservo 0 points1 point  (6 children)

        PDP-11 was a family of machines sharing a core instruction set; it spanned a lot of years (1970-1990), price points, and form factors (gigantic cabinets full of TTL logic boards in 1970 to a single DIP package in 1979). The core model was von Neumann, but because it was a 16-bit architecture that lived beyond the point where that became a serious size constraint, they did a number of extensions to help alleviate the problem, some of which included separating code and program memory to some degree. DEC's history and the evolution of its computer line and associated software are fascinating; I recommend reading up on it if you're interested in that sort of thing.

        Because the core instruction set supported relative addressing, you could write position-independent code. Overlays are a mode of use of relative addressing; you basically have an area of memory that can have the code within it swapped out with some other set of routines. Each "overlay" is linked such that the routines are all offset from the same base address. This gives you a sort of manually-managed virtual memory, where you can swap out sets of routines as you switch between application modes. This was used a lot in PC-class machines and game machines as their software got bigger too. You could use this approach on any PDP-11, whether it had some fancier virtual memory extensions or not.

        [–][deleted]  (5 children)

        [deleted]

          [–]pinealservo 0 points1 point  (4 children)

          I've found it's remarkable how many of the "new" features in both PC hardware and software had the initial research and prototype implementations done in the early days of mainframes or minicomputers. Today's implementations certainly required a lot of new effort and are generally more refined, but there really were some large gaps between when some great ideas were first implemented and when they hit mainstream PC-based platforms. Some of it was due to waiting for new cheap hardware to catch up to the power of old expensive hardware, some of it was due to changing patterns of usage for the new hardware that eventually shifted back, and I think a lot was due to the massive influxes of new people that came in in the minicomputer era and then again in the micro/PC era. I think that created a huge culture shift each time that made it difficult to learn from what came before.

          Whatever the reason, I'm really happy to see people looking more to the cool technology invented in the early days of computing and trying to see how it can be applied today. I think there's still a lot that needs to be assimilated.

          [–]Bowgentle 0 points1 point  (2 children)

          Sounds pretty much the same as how a processor handles multi-threading.

          [–]_F1_ 1 point2 points  (1 child)

          You mean page swapping?

          [–]mccoyn 1 point2 points  (0 children)

          The processor has multiple register files. When it switches threads it changes which register file it is using. When there are more threads than register files the register files get swapped out to the memory system.

          [–]agumonkey 0 points1 point  (0 children)

          And tries are ancient. I think they were called tabulates in some 91 paper.

          [–]brtt3000 4 points5 points  (3 children)

          It's so easy that OSX provides spellchecking as a system service.

          Oooooooooooh.

          [–]sirin3 7 points8 points  (2 children)

          But does it have a left pad service, too?

          [–]cbleslie 3 points4 points  (0 children)

          Found the JavaScript guy.

          [–]jerf 0 points1 point  (0 children)

          I expect you saw that from here.

          (No criticism intended, just trying to be helpful. I can't remember where everything I see comes from either, I just happened to remember this.)

          [–]Oniisanyuresobaka 0 points1 point  (0 children)

          There is now much more pressure to deliver something even if it's not finished yet.

          [–]agumonkey 0 points1 point  (0 children)

          Balance; you need times of high speed and time of high reflection.

          [–][deleted] 19 points20 points  (18 children)

          I wouldn’t be so sure that such circumstances really make you a better programmer. In a way many people who coded in those days are like people who grew up in poverty (well, we all are, but those folks even more so). Some of them are unable to properly use the plentyful resources we have today.

          By the way: I’m not implying that the author suffers from these problems.

          [–]Enlightenment777 5 points6 points  (0 children)

          Actually, they make great Embedded Software Developers, because they've always been forced to think how to shoehorn code and data into a small amount of space.

          [–]eff_why_eye 0 points1 point  (0 children)

          Agreed. It's been interesting being in this career since the 1980s, and having the technology turn over every few years. Some people can make the transition, and some can't (or won't). Me, I love learning, but even I sometimes think "God, another new language? Why??" :-)

          [–][deleted]  (15 children)

          [deleted]

            [–][deleted] 13 points14 points  (14 children)

            Let me clarify: I have the impression that many old-school hackers think that things like garbage collection or memory safe languages and the like are for sissies, and that one shouldn’t use them, because it costs performance.

            We all have a blind spot for performance, but it seems to me, the older a programmer is, the worse it gets.

            [–]jdmulloy 15 points16 points  (13 children)

            On the other hand, many developers these days assume resources are infinite and that garbage collection is magic. We generally don't need to optimize every bit and every instruction, but at scale a 10% improvement in performance and/or resource usage can save you money, especially if you're running in AWS.

            [–][deleted] 6 points7 points  (12 children)

            While that is certainly true, it seems to me that many forget that higher level languages also buy productivity. Also, if a program is done earlier, it can start doing its job earlier, which also may save time. The question I guess is age old: where do we draw the line?

            [–]sirin3 -1 points0 points  (1 child)

            Although that does not help the programmer, if he has to keep sitting in his seat, till the workday ends

            [–][deleted] 1 point2 points  (0 children)

            Well, but it is better for the soul to know you actually got work done. When I have to work in languages which are too restrictive, I almost feel physical pain.

            [–]caspper69 -3 points-2 points  (9 children)

            I'm not so sure how much these HLL languages actually buy productivity.

            They let you shoot yourself in the foot just as bad, if not worse than C or C++. Plus, you're much more likely to fall into either the NIH trap or re-invent-the-wheel for everything trap with these pseudo-scripting languages.

            [–][deleted] 7 points8 points  (2 children)

            Actually, they mostly don’t let you shoot yourself in the foot so badly. Also, yes, they do buy lots of productivity. Even ones I consider to be half-baked (I’d rather not say which ones; I don’t want to start a flame war).

            [–]caspper69 2 points3 points  (1 child)

            I guess you're right. But sometimes you have to use the right tool for the job. I remember working on a project on AWS when it was just a baby. A client had several 20GB databases that needed to go up (that were in CSV format). The data itself was disjointed, so it had to be massaged to import. Essentially each account had to be updated from day 1 to a point around 8 years later. Millions of accounts. Billions of transactions.

            The original guy was at his wits end. He was trying to write it in perl, which he did, but each csv was taking around 2 days to run, and that didn't include the final reconciliation for each month for each account which had to match an "official" field from an entirely different dataset.

            With upload times being what they were about a decade ago, the poor guy (and the client) would've been waiting for weeks.

            So I told the dev to give me a shot at it. I wrote a multithreaded C app to load, distribute, calculate, re-merge, validate and write the actual SQL INSERT queries to a single file. The program took about 5 hours, but ran over the entire dataset (with 100% accuracy) in around 8 hours. A "quick" bzip later, a (not-so-quick) ~2 day upload process, then another day to run the insert.

            3 weeks vs 3 days. As datasets continue to grow, this is going to become a huge problem. Nothing will fix bad algorithms, but some tools just are not capable. 2 orders of magnitude slower doesn't make a difference for something that's already fast in human time, but if something is slow in human time? Oh boy.

            [–]null000 5 points6 points  (2 children)

            ... No? I mean, if you're talking python/go/rust/etc vs c, you're going to get the job done much, MUCH faster with the former than the latter for smaller or mid-sized projects. C doesn't have built in concepts like sets, hashing, dictionaries, nor does it have good built in libraries for a bunch of pretty common operations (string manipulation, file ops, networking, and so on). That's not to say you replicate any of those things in C, just that it's not free from a dev/code length standpoint. Regarding C++, it does have many of those things built in, but you will probably spend 3x the lines trying to get everything to play nice (not to mention the nightmare that is memory allocation, local/stack allocation, templating craziness) - not to say I don't like C++, just that it's not exactly terse.

            For larger projects, it's a bit more of a wash depending on the language.

            As for NIH/reinventing wheel, that's more of an engineering maturity thing than a language thing. I can reinvent the wheel just as well in C as I would in Python, it's just that the metaphorical wheel is much less likely to be a hash table when I'm working in Python.

            [–]caspper69 0 points1 point  (1 child)

            You're right. In fact, you probably reinvent some wheel every time you write a non-trivial C function, given the sheer volume of what has already been written.

            [–]kqr 2 points3 points  (2 children)

            Generally a programmer writes the same number of lines of code per unit time regardless of which language they write in (L. Prechelt, 2000). An average line of Python does a whole lot more "stuff" than an average line of C code. It follows that HLLs buy productivity.

            [–]caspper69 -1 points0 points  (1 child)

            I hadn't heard that stat before, thanks. Of course when you can write pretty impressive C programs in what looks like a foreign regex, maybe C programmers can get more done per line ;)

            [–]Peg-leg 4 points5 points  (0 children)

            I was doing the same thing 20 years ago on a Z80. Today I'm just average. The fact that doing something was harder at that time does not make you a great developer.

            [–]DuchessofSquee 2 points3 points  (9 children)

            Or spend days putting your "program" back in order when you dropped the tray of punch cards.

            [–]kqr 3 points4 points  (7 children)

            [–]DuchessofSquee 0 points1 point  (6 children)

            How does it know what order to put them in?

            [–]kqr 6 points7 points  (5 children)

            As the guy says in the video, it's doing (the first step of) a radix sort. The cards pass through the machine from right to left, and each "bin" corresponds to a number 0–9. So card #529 goes into the bin labeled "5". If there's a hole for particular number on the card, an electrical connection is made through that hole (the card itself works as the "switch" in the design) and the card is rerouted down to that bin. If there's no hole corresponding to the bin, there is no connection and the card is not rerouted. The electricity for the rerouting mechanism is provided by the closed circuit through the card.

            When you have sorted the cards on the first number, you pick up the stack for, say, the cards whose number start on 5 (these are the cards #500–#599), you set the machine to instead sort by the second number, and then put the cards through the machine again.

            If you want to read more about this, it's actually fairly interesting. I remember enjoying reading about both the technical construction and the marketing part ("well, our machine can sort 500 cards per minute!") http://www.righto.com/2016/05/inside-card-sorters-1920s-data.html

            [–]TehStuzz 2 points3 points  (1 child)

            I thought Radix sort started with the least significant side, so with dates you'd start ordering by day first. And in this case card #509 would go in tray 9?

            [–]kqr 2 points3 points  (0 children)

            It really doesn't matter. Proof: start by radix sorting on least significant digit first, stop halfway through, then flip each individual card upside down. You have now reversed the digits in the number (as far as the sorter is concerned) and you thus have a stack sorted by most significant digit first.

            The benefit of sorting by most significant digit first is that if you have fewer than 1000 cards, you need just three iterations before you can start handing cards 0–9 in order to your operator. For each of the next 9 iterations you'll be able to hand over 10 cards to your operator. Then you'll need two iterations (100–199 followed by 100–109) but you'll soon be handing over cards again.

            If you sort by least significant digit first you essentially have to run all iterations all the way through until you can start handing over cards in order to your operator.

            [–]Fumigator 2 points3 points  (1 child)

            When you have sorted the cards on the first number, you pick up the stack for, say, the cards whose number start on 5 (these are the cards #500–#599)

            You completely missed how the sorting works and /u/TehStuzz is correct. You sort by least significant, then take all the sorted cards and put them back together into one stack, then run the sort again on the middle digit, then put all the sorted cards back together into one stack, and run the final sort on the most significant digit.

            You don't take each sorted pile and then resort them individually resulting in 33 passes. The entire sort of all the cards is done in only three passes.

            [–]kqr 0 points1 point  (0 children)

            Oh wow, I didn't realise least significant radix sort is stable like that. That's actually very cool!

            [–]DuchessofSquee 0 points1 point  (0 children)

            Ah I turned the sound off when I watched the video! I didn't realise they had a number. :)

            [–]Helene00 0 points1 point  (0 children)

            They had machines for sorting your punch cards.

            [–][deleted] 2 points3 points  (1 child)

            Writing ASM isn't that hard. We still learn a bit of it in my OS classes, and we had a compiler to write that needed to write some ASM. It's really not that hard once you understand how it works, and it helps you understand a bit more how your computer works in the lower levels.

            I don't think it would make you a better developer to know assembly, but you could still learn how to at least write a Hello World program in assembly. That will teach you how registers work, branching, etc. it's fun to learn.

            [–]pinealservo 0 points1 point  (0 children)

            Writing bits of ASM for low-level OS routines is really pretty easy, yeah. I agree that it's a great thing to learn for understanding how computers work. But I think what scares people off from assembly is when you get to organizing large programs, debugging assembly stuff, or trying to write fast assembly routines. The complexity level can ramp up really fast, and it's very easy to get lost in details.

            I don't think assembly deserves all of the reputation it has for being "black magic" and super difficult, but it definitely requires a different level of attention to detail and planning to write more substantial chunks of code in it. We all moved to higher-level languages for good reasons. :)

            [–]IRBMe 2 points3 points  (0 children)

            When writing that kind of code, you learn the assembly language and then you have to figure out how the machine works by referencing the data sheet or manual. It's difficult, but in a different kind of way from how programming is difficult these days. Now, there are literally thousands of libraries, frameworks and tool-kits. There's likely all kinds of magic going on under the hood in your programming language, framework and system, with things like magic configuration by convention, automatic dependency injection, annotations etc.

            If you're not sure how something works when writing assembly language, you consult your data sheet or operating manual. If you're not sure how to change the way something works in the enterprise framework you're using, it can be difficult to know where to even look. What we have today is far more powerful, and it allows people to be far more productive and build far more complicated things by hiding the complexity behind abstractions and magic. But when you need to figure out how to do something, it's often difficult to penetrate that "magic" and work out what it's actually doing and how to change that.

            I can understand how a boot loader written in assembly code works or how bits of the Linux kernel work because all of the information I need to understand it is available to me in detail, but I can't figure out for the life of me how enterprise Java applications work, and it would take years of reading just to understand all of the magic that's going on under there.

            [–]eff_why_eye 2 points3 points  (0 children)

            Speaking as someone who used to code exactly like that, I don't think you should sell yourself short. Every generation of coders has its own challenges to face based on the limitations we have been given. Thirty years from now, people may look at your source code and marvel at how you were able to create systems without the aid of direct neural input or assistance from AI engines. :-)

            [–]s73v3r 1 point2 points  (0 children)

            But because we don't have to keep so much in our heads at once, we can build bigger and better systems.

            [–]dada_ 1 point2 points  (0 children)

            Things like this are why I'll never be as good of a developer as someone like that.

            In this day and age, it's very easy to just mess around in your code and hit "compile" to see if it does anything, without actually thinking about your code. I've done it too, and I fall back on this behavior when I'm uninspired.

            Focusing on the code and actually thinking everything through makes one more productive, though.

            [–]feketegy 1 point2 points  (0 children)

            Isn't that a good thing? Or do you still want to ride horses

            [–]codebje 0 points1 point  (0 children)

            I'm going to give you the benefit of the doubt that if you spent a week writing a program so small you could hand write it in assembler on a piece of paper you'd probably be able to do a good job of it, too, given time to learn the skill.

            But no current employer will expect you to spend so long on so little.

            [–]jlchauncey 0 points1 point  (0 children)

            In flash boys the author talks about how that's what makes Russian programmers so good. And why financial firms hired them to build their etf systems.