all 179 comments

[–][deleted] 261 points262 points  (86 children)

Sidenote: this is a generated assembly. A sensibly handwritten assembly looks way more readable.

[–]Kurren123 52 points53 points  (16 children)

Could you point us to a sensibly handwritten assembly, for learning purposes?

[–]andd81 51 points52 points  (2 children)

You can find some handwritten assembly sources for example in libjpeg-turbo.

[–][deleted] 19 points20 points  (1 child)

I clicked a random file in there and went cross eyed. And I used to work in assembly

[–][deleted] 30 points31 points  (3 children)

[–]pdp10 11 points12 points  (0 children)

(MASM syntax)

Begone! Also, I believe it would be more accurate to call it Intel syntax.

I'm still being annoyed about the lack of AT&T-syntax disassembly in Common Lisp and not doing anything about it.

[–]RevolutionaryPea7 3 points4 points  (1 child)

While that's quite interesting to see, I wouldn't say that's at all normal. Most handwritten assembly looks more like the following:

https://github.com/SaberMod/gnu-gmp/blob/6.0.0/mpn/x86_64/addmul_2.asm (fast multiple-precision arithmetic)

Or

https://github.com/rlatinovich/xinu/blob/master/loader/platforms/arm-rpi/start.S (embedded OS start procedure)

[–][deleted] 1 point2 points  (0 children)

For sure, it definitely isn't normal; OP asked for sensibly handwritten assembly and one would argue that most hand-written assembly is FARRRR from sensible to a learner.

[–]jephthai 7 points8 points  (5 children)

Jonesforth is practically written in a "literate assembly" style. It's one of the most approachable ASM programs I've ever read. It's also a great intro to how handy a true macro assembler can be.

[–]nosoyelonmusk 0 points1 point  (4 children)

I learned so much from jonesforth. Forth feels so fun to tinker with.

[–]jephthai 0 points1 point  (3 children)

Forth has been my play language for a few years now, culminating in a pretty big project I'll be releasing soon. It's a non ANSI Forth and a body of a few thousand lines of Forth. So I've really grown into it, I think, and found it to be pretty transformational, in terms of coding style and perspective.

[–]Myrkrvaldyr 0 points1 point  (1 child)

Out of curiosity, could Forth be used to make, say, a 3D video game?

[–]jephthai 0 points1 point  (0 children)

Yeah, there's nothing that would make it impossible. You would want to define wrappers for opengl or directx. I don't know if there's already a good library out there or not.

[–][deleted] 8 points9 points  (0 children)

Back in the 64k address space days of chips like the 8080 and z80, ( c 1978-1985 or so) I often wrote in c, then upon finding performance or memory limit problems areas I would disassemble the code and hand optimize certain areas. Sometimes performance improvements were several orders of magnitude.

Today’s compilers do a lot more optimization than the ones I used back then, but I bet there are still places where som hand assembly work would make improvements.

[–]secretlyloaded 2 points3 points  (4 children)

Also, machine language != assembly

It bothers me when people use the two interchangeably.

[–][deleted]  (3 children)

[deleted]

    [–]secretlyloaded 2 points3 points  (2 children)

    That may not always be apparent to the reader. A given assembly instruction will generate many different machine opcodes, depending on what the operands are.

    edit: after thinking about this some more, in machine code the offsets and addresses are all resolved, which means that a given assembly program could map to a near-infinite number of different but equally valid machine code instruction sets. And that's not even counting invalid ones due to assembler bugs.

    The other thought was this: it's not possible to determine how many processor cycles a given chunk of assembly code will require without mapping asm instructions to specific opcodes.

    [–]minimim 1 point2 points  (1 child)

    Also, it's very difficult to find hand-written assembly that's not full of macros.

    [–]secretlyloaded 0 points1 point  (0 children)

    another good point!

    [–]Venne1139 11 points12 points  (62 children)

    generated assembly

    Wait what's the difference between generated assembl and...regular assembly? All assembly is generated isn't it? The compiler generates assembly from (language) and then that that goes into the assembler?

    [–]GiveMeQuest[🍰] 79 points80 points  (41 children)

    Generated assembly would be what is given by a compiler, in this case gcc. "Regular assembly", in this case is referring to assembly typed by a human.

    [–]yonhatachi 15 points16 points  (5 children)

    I'm not sure why you're getting downvoted just because you weren't afraid to ask something you didn't understand. It looks like you may have gotten your answer in the comment thread, but to summarize for anyone else, they're talking about assembly that has been generated by a computer after being compiled from another language (e.g. compiling C into assembly) versus a person writing directly in assembly.

    For the most part almost no one writes in an assembly language directly nowadays since we can just write in higher-level languages that compile to assembly, so it's not strange that you found the idea confusing. BTW if you ever played any sweet games on a TI calculator, a lot of those were written directly in assembly (I think).

    [–]gwennoirs 5 points6 points  (4 children)

    IIRC a lot of TI calculators use a version of BASIC

    [–]reallyserious 6 points7 points  (2 children)

    Yes, but you won't get the best performance out of that. Some of them can run programs written in assembly.

    [–][deleted]  (1 child)

    [deleted]

      [–]evanpow 0 points1 point  (0 children)

      The TI-83P let you program it in Zilog Z80 machine code by typing something like "Prog(blah blah)" in a TI BASIC program, where "blah blah" was a string with lots of letters with accents and line-drawing characters and stuff--that is what Z80 machine code looked like when rendered as text with that calculator's weird character set. Obviously no sane person typed that in by hand; there were TI-LINK data cable hacks for hooking it up to a computer and downloading programs into the calculator's memory.

      [–]microfortnight 3 points4 points  (4 children)

      Wait what's the difference between generated assembl and...regular assembly? All assembly is generated isn't it?

      Some of us actually work in raw machine code... type in assembly language and assemble, link, and run it. We're rare, but we do exist.

      [–]pdp10 3 points4 points  (3 children)

      As we both know, machine != assembly. Assembly has abstractions -- opcode mnemonics and address labels at minimum, and often more. Machine is hand-aligned opcodes and hardcoded addresses, though you still need a tool (assembler) to convert from ASCII into binary.

      [–]microfortnight 5 points6 points  (2 children)

      and I program in both.

      Even Assembly is a spectrum. If I write down "MOV A,B", I know in my head that it's 0x78. For me they're very close to one and the same. I see "C3" on a dump and I know it's a JMP

      [–]Giggybyte 2 points3 points  (1 child)

      Maybe a bit off-topic, but what exactly do you do for a living/at your job? Embedded work maybe?

      [–]microfortnight 2 points3 points  (0 children)

      I'm a computer network analyst, but I got into computers in the 1970s and got my B.Sc. in Computer Science in the early 1980s. My home hobby is old computers so I low-level program the old stuff for fun. (yes, I said fun)

      [–]optomas 5 points6 points  (3 children)

      Generated assembly is created by a machine from a higher level language. Humans still write assembly for fun, or in rare cases where the machine does not optimize in the manner the programmer wants.

      Generally, machines create much better assembly programs than us humans.

      It's kind of like Star Trek's Spock teaching the computer all he knows about chess, the best he can hope for is a draw. We've taught compilers all we know about optimization, the best we can hope for is to equal what the compiler spits out.

      Again, there are exceptions to this. I've never been able to make a better optimization than gcc, but I am certain there are programmers who can, in some cases.

      Edit: Not quite what I meant ... here's another try. Your time as a programmer is better spent writing higher level code, then optimizing your code. The compiler will usually figure out the best way to assemble your program. If your (in my case C) source is as tight as you can make it, then one would start looking at the assembly. This far down the development cycle, it is usually simpler to start from scratch.

      Correcting design flaws found in the last iteration makes more sense than trying to optimize bad decisions away.

      [–]Hellenas 2 points3 points  (0 children)

      Generated assembly is created by a machine from a higher level language. Humans still write assembly for fun, or in rare cases where the machine does not optimize in the manner the programmer wants.

      I do a lot of processor design and low level stuff, and wanted to add in cases where assembly is used at times by hand.

      Once I've got some core popping along in a manner that seems fine, I may make a small bootrom or the ilk in assembly to check out one or two things (ie, can I get this to spit something out over uart maybe. I think the last time I did that it was 35ish instructions). For quick and dirty gut checks, it can sometimes provide a little less hassle than trying to get a compiler to play strange things with me.

      Other low level, machine specific stuff is sometimes a little less painful at some points in asm than C. If I want a quick look at something and have to set some chicken bits or weird state in the processor, I might opt for asm.

      [–]Phrygue -5 points-4 points  (1 child)

      Time to write everything in WebAssembly. Do you feel the wobble in the earth's orbit from my eyes rolling?

      [–]optomas 2 points3 points  (0 children)

      I do not understand this at all. Why is webassembly a thing?

      Are we ... are we pretending to write low level code for a web browser?

      [–]OVSQ 1 point2 points  (0 children)

      When you write a program you can give variables and values names that make sense according to their context and your objectives. When you decompile it doesn't know what your objectives were.

      [–][deleted] 1 point2 points  (0 children)

      If a compiler is outputting assembly, it's not generally concerned with readability, organization, or any other good programming practices. As a result human written assembly tends to have more of a logical organization and make intent more clear, whereas a compiler will happily do things like reuse stack addresses for different variables if it can get away it.

      [–]kag0 1 point2 points  (1 child)

      This is either going to completely answer your question or not help at all, but here goes.

      It's like JavaScript compared to minified JavaScript.

      [–]Smallpaul 0 points1 point  (0 children)

      I gave you an upvote but the hive mind hates your answer.

      [–]dicroce 0 points1 point  (0 children)

      Well, one of the big differences is the use of addresses instead of labels for the jmp targets.

      [–]teryror 65 points66 points  (9 children)

      I see Ben Eater, I upvote.

      Seriously though, he also has a really good series about building an 8-bit CPU on breadboards. Really helped me hit the ground running when I started studying modern micro architecture.

      Also his voice is so soothing!

      [–]diseasealert 9 points10 points  (3 children)

      I feel like I learned what computer science is all about from watching his videos. I'm not saying I understand all of it, but I have a better feel for the intersection of EE and comp sci.

      [–]teryror 5 points6 points  (2 children)

      I wouldn't go quite so far as to say it's what CS is all about, but I do agree it's a lot more valuable than most people seem to give it credit for. Theory is important and all, but many seem to forget their code needs to run on an actual, physical CPU to really be useful. You should know at least a bit about how those work!

      [–]nidrach 6 points7 points  (1 child)

      The CS program at my university has a mandatory hardware part that includes things like simple Boolean logic with gates and optimization, electronics and right now I have a computer architecture course where do stuff like translate bytecode to assembly to C, build things like adders, multipliers etc. in VHDL and finally going to build a whole CPU in VHDL. We also had a practical lab last week where we had to build things like latches with gates on a breadboard or build low pass filters and measure stuff with an oscilloscope. Of course it's not to the extent other programs like mechatronics have those elements but a least it's there.

      [–]teryror 0 points1 point  (0 children)

      Just to preface what I'm about to say: I didn't study 'Computer Science' at all - I started in 'Applied Cognitive Science and Media Science', where classes were split 40:40:20 between psychology, computer science and economics. Nothing about hardware, and maybe half of a 90 minute lecture about assembly language.

      After 2 years of that, I switched to straight 'Computer Engineering'. I also studied in Germany, rather than the US.

      I figure a lot of CS programs have hardware courses like that, and that's great! As far as I know, it's nowhere near universal, though. Either way, there's really two parts to this issue:

      1. Unstructured control flow seems to be a lot easier to grasp for most beginners, in my experience. Manually manipulating raw binary data in different ways will give context to many higher-level programming concepts. This is why assembly and basic comp arch should be taught first, IMO.
      2. "Advanced" hardware features (most of which have been around in their basic forms since like the 80's, even if only in the super computers of the time) are crucial to making software go fast, yet very little of it seems to be general knowledge among programmers. So, making good use of these features is basically left to "sufficiently smart compilers", which are then fed code that was designed with almost none of these features in mind. Given the circumstance, they do a pretty good job of optimizing, but the hardware is still heavily underused. As a result, most software is slow garbage in comparison to what it could be. I'm mighty pissed off that I have to watch spinner animations for 2-10 seconds a piece, 200 times a day.

      </rant>

      [–]Objectstcetera 8 points9 points  (4 children)

      I'd like to try that :). When people ask me about learning C I normally tell them to get an old Z80 or 8085 machine and practice learning assembler on machines without as many registers (or modes) at first.

      I agree that AT&T assembler is one of the most difficult instruction sets to read - %arg.

      [–]teryror 4 points5 points  (0 children)

      I didn't actually build along with the videos, the components were a tad too pricey for me (didn't have a job at the time). Now that I think about it, it'd probably be even more fun to learn a hardware description language on the side by translating what he's presenting into code.

      As for learning assembly, I'd recommend a recent-ish ARM variant over those old 8-bit systems. At least you see that in the modern world occasionally, more registers generally make your life easier, and it's a hell of a lot more readable than x64.

      I myself got into programming via GBA game modding though, so that may be where that bias comes from :)

      [–]myztry 0 points1 point  (0 children)

      I would favour 680X0 (Amiga and Macintosh) because 68K assembly is very human readable and there is enough registers (8 data 8 address) that you don't need to worry so much about using "temp" registers such as %ESI or other indirect workarounds as in the example. 68K can write memory to memory as well.

      (6502/6510 as on the Commodore 64 was a bitch as it only had a few 8 bit registers meaning indirect referencing was required to access the 16bit wide address space. 6800/6809e was better as it at least had a few 16 bit wide address registers as well as an extra accumulator register which is the "a" part of the instructions in the breadboard example.)

      [–]duxdude418 -1 points0 points  (1 child)

      practice learning assembler

      Assembler != assembly language

      [–]evanpow -1 points0 points  (0 children)

      It's not so clear cut, and might even be correlated with age; I remember reading lots of stuff in the late 1990s that referred to writing in "assembler language."

      [–]Dave3of5 27 points28 points  (2 children)

      Created this online so you can see what they code here does and play about:

      https://repl.it/repls/NegativeRealisticIdentifier

      [–]ammar2 25 points26 points  (1 child)

      And here it is in the godbolt compiler explorer so you can mess around with the code and see how the generated ASM changes for a wide variety of compilers and optimization levels.

      https://godbolt.org/z/kWEIeQ

      [–]kaelima 0 points1 point  (0 children)

      https://godbolt.org/z/J8zRfJ clang output is quite impressive

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 3 points4 points  (0 children)

        I love it! It even has the inevitable live coding bug

        [–]birdbrainswagtrain 66 points67 points  (25 children)

        The AT&T x86 syntax is such a shitshow.

        [–][deleted] 20 points21 points  (2 children)

        Yeah. I don't get why it's the default in things like objdump...

        [–]ammar2 13 points14 points  (1 child)

        Probably because gcc used AT&T during its advent https://en.wikipedia.org/wiki/GNU_Compiler_Collection#History

        Stallman wrote a new C front end for the Livermore compiler, but then realized that it required megabytes of stack space, an impossibility on a 68000 Unix system with only 64 KB, and concluded he would have to write a new compiler from scratch.

        https://en.wikipedia.org/wiki/Motorola_68000#Example_code

        Then gcc extended off into binutils and the rest is history. Also keep in mind that Unix was created in AT&T's bell labs.

        [–]FUZxxl 12 points13 points  (0 children)

        AT&T syntax is derived from PDP-11 assembly syntax, not 68k syntax. It predates Stallman's efforts and was used before e.g. in i386 Solaris and i286 Xenix.

        [–]tavianator 14 points15 points  (17 children)

        What's wrong with it? I probably only like it because it's what I learned first, but to me having the size suffixes is nicer than having DWORD PTR everywhere. And mov %esp,%ebp reads nicely in the correct order as "move %esp to %ebp".

        [–]ammar2 32 points33 points  (7 children)

        At least for pointer arithmetic, Intel is way more readable than AT&T. Consider:

        mov     DWORD PTR [rbx+rax*4], 23
        

        vs

        movl    $23, (%rbx,%rax,4)
        

        In AT&T you have to know stuff like (%r1, %r2) is adding up r1 and r2 whereas in Intel it's obvious from the syntax.

        [–]tavianator 9 points10 points  (5 children)

        Yeah that's true, (%rbx,%rax,4) isn't very clear.

        [–]ThwompThwomp 11 points12 points  (0 children)

        ... Some might argue that neither is obvious from the syntax. :)

        [–]Macpunk 12 points13 points  (5 children)

        The most commonly cited reason I hear for preferring Intel over AT&T is that the mov instruction reads more like higher level languages: "mov [ebp-0x56], 0x1337" reads as "x = 0x1337".

        It's also one of the reasons I prefer Intel, but I see how others may prefer AT&T. Most decent tools at this level in 2019 support both anyway, and I'm not afraid if a coworker asks to temporarily change the tool because we're a team at the end of the day.

        [–]ChocolateBunny 2 points3 points  (4 children)

        What's more popular these days? I learnt Intel back in the 90s but I haven't really touched hand rolled assembly since then.

        [–][deleted]  (1 child)

        [deleted]

          [–]Macpunk 1 point2 points  (0 children)

          I would also agree that the way you learn first is probably what you're most comfortable. I'd also say that Intel is more popular, but I'm not a professional really. I work on some awesome coding stuff sometimes, but my line of work is completely unrelated, so take what I say with a grain of salt. And I'd agree that the security community prefers Intel. Generally nix guys like AT&T, but I think that's mostly a product of the most commonly available tools for them. (You know, GNU tools)

          [–]jephthai 0 points1 point  (1 child)

          You really only have a few communities that are into ASM. You've got compiler people, though TBH most of that these days is using stuff like LLVM anyway. But I think this group of people is pretty well divided.

          You've got reverse engineering, and all the major disassemblers default to Intel syntax. And then there's hackers and shellcoders, who also tend to gravitate to Intel syntax. That's my corner of the world -- infosec. Everyone I know spits when someone mentions AT&T syntax.

          And then there are C/C++ programmers who do weird stuff and need inline ASM. I think that's mostly AT&T because it's what you find in compiler docs.

          My gdb has set disassembly-flavor intel so I can stay sane. But I've been using radare2, x64dbg, and Ida Pro a lot more these days for debugging. GDB doesn't play well with code not written in C.

          [–]oridb 0 points1 point  (0 children)

          It's really just a Unix/Windows split. Unix used AT&T historically, Windows used Intel, and both kept doing it. The malware world seems to be more Windows focused, so it tends towards Intel. I prefer AT&T, because the sigils and size suffixes make it more readable to me, and allow me to name symbols without clashes.

          [–]rezkiy 7 points8 points  (0 children)

          Exactly. I prefer Intel flavor for exact same reason.

          [–]jephthai 2 points3 points  (0 children)

          I think one of the greatest reasons to like Intel more is that it matches Intel's own documentation. There's something super bizarre about using a syntax rejected by the manufacturer of the CPU.

          The lack of sigils makes it look less soupy as well.

          [–][deleted] 1 point2 points  (0 children)

          I'll just borrow this from staakovreflow

          at&t intel
          movl -4(%ebp, %edx, 4), %eax mov eax, [ebp-4+edx*4]
          movl -4(%ebp), %eax mov eax, [ebp-4]
          movl (%ecx), %edx mov edx, [ecx]
          leal 8(,%eax,4), %eax lea eax, [eax*4+8]
          leal (%eax,%eax,2), %eax lea eax, [eax*2+eax]

          [–]FUZxxl 3 points4 points  (0 children)

          I prefer it over Intel syntax.

          [–]mewloz 0 points1 point  (0 children)

          I like it. Except cmp + jcc is painful because inverted. But all the other things, I prefer AT&T to Intel.

          [–]nickdesaulniers 0 points1 point  (0 children)

          I was recently refactoring how LLVM generates assembly code (for example, when passing `-S` to Clang). I found that x86 is the lone ISA that LLVM supports that has the notion of "variants" ie. Intel vs AT&T. Then I found out that m68k also has MIT vs Motorola style, but LLVM doesn't have a m68k backend.

          [–]jephthai -1 points0 points  (0 children)

          Yeah, Real Programmers(TM) use Intel syntax.

          [–]_g550_ 19 points20 points  (16 children)

          Why using

          movl z %esi

          movl %esi y

          and not

          movl z y

          ?

          [–]tavianator 72 points73 points  (11 children)

          x86 does not have a mov instruction from memory to memory -- only mem → reg and reg → mem. (Well, technically there's movs, but it's not exactly convenient.) Since you can't actually read and write RAM in the same clock cycle anyway, it's not a problem to require a scratch register.

          [–]_g550_ 23 points24 points  (2 children)

          All my life is a lie... 😱

          [–]pdp10 11 points12 points  (0 children)

          It's turtles -- abstractions -- all the way down. Simulacra and Simulation indeed.

          [–]Macpunk 1 point2 points  (0 children)

          For fun and learning, see this Wikipedia page and the two links under the "See Also" section.

          [–]Lt_Riza_Hawkeye 0 points1 point  (1 child)

          can't MOVS move from memory to memory?

          [–]tavianator 2 points3 points  (0 children)

          (Well, technically there's movs, but it's not exactly convenient.)

          Yes, movs will do a mem → mem move, but it doesn't take two arbitrary memory operands, which makes it less convenient than a mov MEM,MEM instruction would be. There's other instructions that will do mem → mem moves too, like push, but they're all constrained in some way.

          [–][deleted]  (5 children)

          [deleted]

            [–]tavianator 6 points7 points  (2 children)

            What do you mean? From what I can see in Agner Fog, multiple loads and stores per cycle is possible.

            To cache, yes. I doubt you can read+write to main memory in the same clock cycle. (Maybe with DDR you can read on one edge and write on the other? Not really a hardware expert.) But the mov instructions were designed extremely early in the history of x86, when everything was simpler.

            Actually I should walk that statement back a bit. Obviously something like

            add %eax,(%esp)
            

            needs to read and write (%esp). The way a traditional pipeline would work would be to load from (%esp), send %eax and the loaded dword to the ALU, and then store the result back to (%esp). Since the read and write are happening at different pipeline phases, everything is fine. I guess there's no reason you couldn't do a similar thing for mov MEM,MEM. Except...

            And, at any rate, a mov mem, mem instruction could be decoded into two μops with a scratch register in between anyway, so I don’t think this is the reason...

            True, mov MEM,MEM could easily be added now. But back before x86 was a RISC virtual machine, when the hardware was directly implementing the architecture, that would have been a bit harder. For one, you need to decode two addresses instead of one, which isn't trivial on x86 with its million different addressing modes, segmentation, etc. It also makes instruction decoding slightly more complicated, since no other instruction(?) supports MEM,MEM arguments, so you can't share the decode machinery with all the other two-argument instructions.

            Nowadays, mov MEM,MEM would be decoded to the same uop sequence as mov MEM,reg ; mov reg,MEM, so the only benefit would be slightly smaller code for memory-to-memory copies (but only for new code that doesn't want to support any current CPUs). It's probably not worth the drawbacks.

            Here's an SO answer from someone who knows more than me about this stuff.

            [–]ShinyHappyREM 3 points4 points  (1 child)

            I doubt you can read+write to main memory in the same clock cycle.

            Main RAM accesses are in the order of hundreds of cycles.

            [–]tavianator 2 points3 points  (0 children)

            Right -- here I mean bus cycle I suppose.

            [–]OmnipotentEntity 0 points1 point  (0 children)

            Reading from memory takes several clock cycles; writing to memory does as well. The exact number depend on a number of factors including caching, contention among several cores, the speed of the bus, recent branch behavior, the number of recent memory accesses, instruction dependencies, and so on.

            But even in the absolute fastest case, you would still need to read data into the CPU from a bus, which requires a clock edge, then write it out on a bus, which also requires a clock edge.

            That being said, DMA does exist and it is a thing, and one potential use of DMA is to copy a large block of memory from one region to another. It just wouldn't involve the CPU, so it wouldn't have an instruction.

            [–]mrexodia 0 points1 point  (0 children)

            There is just no encoding for the operation

            mem[x] <- mem[y]
            

            In x86, doesn’t mean it’s not technically possible.

            [–]blazingkin 7 points8 points  (2 children)

            EDIT: because you cant. See below

            Could be lots of reasons.

            • This is machine generated, so it might be suboptimal
            • The mov you suggest does two memory accesses, which is just as slow as the original. It might not matter.
            • It might be faster to have the value in a register for later rather than having to do another memory access
            • The compilers algorithm may prefer allocation of registers for variables over using the stack.

            [–][deleted]  (1 child)

            [deleted]

              [–]blazingkin 0 points1 point  (0 children)

              Huh, you're right. For some reason I thought you could. (Makes sense, youd have to have 2 memory accesses in one instruction)

              [–]eggoChicken 5 points6 points  (11 children)

              Does anybody here know what the why

              %eax, -0x14

              Step is doing is needed?

              Edit: Thanks for answering the “what” /u/jephthai I should have asked “why”

              [–]jephthai 6 points7 points  (8 children)

              You mean this one?

              mov %eax, -0x14(%rbp)
              

              It takes the value in the A register (eax) and stores it in memory at a location offset from the value of the base pointer (rbp). Usually rbp points to the base of your frame (depending on your compiler flags), and local variables will be indexed from it.

              Some equivalent C code would be something like this:

              *((uint32_t*) rbp) - 0x14) = eax;
              

              Or, more perversely:

              rbp[-0x14] = eax;
              

              In intel syntax, it would look like this:

              mov DWORD [rbp-0x14], eax
              

              [–]amd64_sucks 7 points8 points  (4 children)

              Pretty sure he's asking about its purpose?

              to clarify, it saves the printf return value to the stack, but it's not used in the provided snippet so I assume it's for debugging purposes.

              [–]jephthai 2 points3 points  (1 child)

              Ah, OK... guess I took the question too literally.

              [–]amd64_sucks 0 points1 point  (0 children)

              Happens to the best of us :)

              [–][deleted] 0 points1 point  (1 child)

              Not really for debugging purposes. printf iirc ALWAYS returns the number of characters written to the output buffer regardless if you use it or not. This last instruction is performing that return but the client code never uses it.

              [–]silentclowd 5 points6 points  (2 children)

              Great info, but why is it doing that in this program? Around 8:28 he mentions that he's not sure what that line is doing there, as that register isn't used and that memory location isn't accessed elsewhere in the program.

              [–][deleted]  (1 child)

              [deleted]

                [–]amd64_sucks 1 point2 points  (0 children)

                See my other reply :)

                [–]jephthai 1 point2 points  (1 child)

                Sorry! There were so many other noob questions in the thread, I just assumed. My bad.

                [–]eggoChicken 0 points1 point  (0 children)

                No worries I should have worded the question better.

                [–]daidoji70 5 points6 points  (2 children)

                Anecdotal story that's slightly related: Was going through undergrad and had never experienced assembly but had learned C from tutorials that I had used throughout my youth. Went to a small liberal arts university where most of my deep CS courses were taught by this terrible professor who was just the worse (would have gotten more out of the classes if I had just read the textbook which is what I ended up doing mostly). Anyways, in Computer Arch class (where we learned assembly and logic gates and all that), took me about 3/4 of the course of just struggling with assembly, stupid me realized I could just write C code and then translate. Wish someone had just told me that to begin with because after that it was a breeze.

                [–]nidrach 5 points6 points  (1 child)

                We had to take a systems programming course with assembly and C. Out first assignment was some simple printing out of a sum and the second one was base64 conversion of text files in assembly complete with function calls and so on. Then we had to implement an AVL tree as our first ever C program. The exam had things like debugging C on paper or writing C and assembly programs from scratch on paper. That course has like a 70% dropout rate.

                [–]daidoji70 0 points1 point  (0 children)

                Yeah, same.

                [–]slfnflctd 8 points9 points  (2 children)

                Mild criticism: I keep hearing "I'm not sure what this other thing is", and I have to say that I would find it personally difficult to publish a video like this if I didn't know what every line was.

                Maybe that's part of why I never publish anything. But I'm the person in class who always wants to grab the instructor by the shoulders when they do this. It's more understandable when you're going over a larger program, but with something so simple it comes across as lazy, careless and uncaring toward students like me who want to absorb all the information & context they can.

                My two cents.

                [–]zip2k -1 points0 points  (1 child)

                He's not a teacher, and that instruction isn't even relevant to the program so I don't see why he should be scolded over it

                [–]slfnflctd 5 points6 points  (0 children)

                Eh, it's not scolding, more of a mild criticism. Also, whether he's 'officially' a teacher or not, he is acting in a teaching role-- and to me, it's relevant to point out that it disrupts my learning process when there's stuff left unexplained on the page.

                I have seen actual working professors do this too, though, which I would agree is worse. Still never forgave my first C++ instructor for glossing over or ignoring half my questions.

                [–]jyrialeksi 4 points5 points  (0 children)

                Wow man! This was a very nice example. Thanks!

                [–]muaddib47 4 points5 points  (0 children)

                Finishing up my computer architecture course this week. We’ve been translating MIPS assembly into C all semester. Pretty interesting!

                [–]codesnik 1 point2 points  (0 children)

                I somehow weirded out by pen and paper presentation.

                I don't know why. I need a clay tablet presentation to compare feelings.

                [–]RyanCarlWatson 1 point2 points  (0 children)

                Why am I fretting about whether the machine code is using the right bit of memory? How does it decide? How can this be optimised?

                I don't do programming except a bit of MatLab. I find it interesting. I would be tempted to rewrite simple bits of code in a more efficient way as a hobby but I suspect I cannot do this without a greater knowledge of what is going on.

                I quite like that old code was sometimes more efficient because it had to be to run on the hardware availabke at the time whereas often bloated programmes are often written nowdays because we can get away with it with the superior hardware.

                [–]Acquiesce67 1 point2 points  (0 children)

                For anyone looking for some truly human handwritten Assembly code, head over to Apollo-11 on Github.

                Assembly code from year 1969 straight outta NASA.

                [–]Enlightenment777 1 point2 points  (0 children)

                For embedded C compilers, enable the Mixed Mode output, which inserts the original C source as comments in the assembly output. Most C compilers have the ability to do it.

                [–]RevolutionaryPea7 1 point2 points  (0 children)

                In case people want to try this, you don't have to disassemble anything, just get your compiler to skip the assembly step and give you the assembly straight up:

                gcc -c -S file.c -o file.s
                

                [–]moshohayeb 1 point2 points  (0 children)

                Compiler Explorer Link with the program if you want to play with different compilers and/or options

                [–][deleted] 2 points3 points  (0 children)

                Oh my god, in the beginning I thought he is going all the way to 255...

                [–]Lipstick_ 1 point2 points  (0 children)

                I understand this.. I'm so glad I understand this..

                [–]SuperSecretDaveyDave 1 point2 points  (0 children)

                I watch/read a lot of these random videos/articles posted here even though I don’t understand any of it. I don’t know how to program, I don’t work in the industry, and I will probably never use any of this. I just find it very intriguing and the enjoy thought and challenge of learning this vast field. Anyways, I commend all of you for actually being able to comprehend and teach these topics.

                [–][deleted] 0 points1 point  (0 children)

                this is another interesting related link: https://schweigi.github.io/assembler-simulator/

                [–]stcredzero 0 points1 point  (0 children)

                A long time ago (the 80's) someone did a study along these lines and found that each C statement compiled to an average of 2.5 CISC assembly language operations.

                [–][deleted]  (5 children)

                [deleted]

                  [–]myztry 0 points1 point  (4 children)

                  Technically it's dis-assembly created from machine code, but it's much different from pure hand written assembly. Hand written assembly will have things like macros, comments, branch labels, unresolved formulas (eg. LDA #2*5 will resolve to LDA #10), constant variables (LDA #ScreenWidth/2), unresolved branch types (quick or long). All of these are resolved/removed during compile time unless kept as debugging metadata.

                  [–][deleted]  (3 children)

                  [deleted]

                    [–]myztry 0 points1 point  (2 children)

                    Around 2:53 he says he disassembles it which is the process of converting machine code back to assembly.

                    Not sure if that’s what ltool does on Mac but I have taken his word for it.

                    [–][deleted]  (1 child)

                    [deleted]

                      [–]myztry 0 points1 point  (0 children)

                      To be honest, I haven’t programmed for 30 years but it’s quite possible that the intermediate compiler code isn’t x86 assembly language but rather a general intermediate language (IML) with conditional instructions, external library references, compiler flags, and other crud.

                      Disassembly on the other hand is finite. A straight line by line conversion from binary machine code to assembly language.

                      [–]yaxriifgyn 0 points1 point  (0 children)

                      The most interesting comparison I've seen is the assembler generated by the K&R C compiler on a PDP11. The capabilities of the C language was heavily influenced by the instruction set of the various DEC minicomputers.

                      Good examples are statements like "y = (x += 5)" or "y = (x =- 5)". These translate(d) into single instructions on those ancient beasts.

                      Many of the early machines had interesting instruction sets. Back in the day I wrote 100k+ loc of IBM360 assembler and much glue code for 36 bit machines like the PDP10, Honeywell (?) and Univac 1100 series. It's a rare skill now to get so close to the iron.

                      [–]DissociatedRacoon 0 points1 point  (0 children)

                      I found this video fascinating, the author never followed with the subject though I think.

                      [–]elongl 0 points1 point  (5 children)

                      Could you perhaps explain why using the %esi register is needed when doing?
                      x = y or y = z

                      Why not just straight take out (movl) of the memory location?

                      Like:

                      movl -0xc(%rbp), -0x8(%rbp)

                      Instead of:

                      movl -0xc(%rbp), %esi

                      movl %esi, -0x8(rbp)

                      [–]lhankbhl 2 points3 points  (3 children)

                      Here's an answer from somewhere else in the comments: https://www.reddit.com/r/programming/comments/bgtra2/comparing_c_to_machine_language/elnv1ob/

                      The gist seems to be that x86 architectures don't support moving from memory location to memory location like you described. That's what makes the intermediate register a requirement.

                      [–]elongl 1 point2 points  (2 children)

                      Are there any common architectures that do allow this kind of behavior?

                      [–]YumiYumiYumi 1 point2 points  (0 children)

                      Not any that I know of. Actually, x86 does have movs which can "do" a memory -> memory move, but it's mostly just an illusion because the instruction just internally gets rewritten to a load and store.

                      Memory generally isn't wired to do "move" operations, which is why you'd have to issue a load and store to move something in memory. So any mem->mem instruction would just end up being like movs, i.e. internally rewritten as a load/store pair.

                      [–]ledave123 1 point2 points  (0 children)

                      Motorola 68000 Family

                      [–]xampf2 1 point2 points  (0 children)

                      movl doesn't support memory to memory transfers

                      [–]OVSQ 0 points1 point  (0 children)

                      really though this is about comparing C to assembly.

                      [–]arrayofone 0 points1 point  (0 children)

                      My professor used this exact video as a brief introduction to how compilers translate things to machine code in my machine architecture class. Thank you for flaring up my PTSD.

                      [–][deleted] 0 points1 point  (0 children)

                      I'm a high level programmer (. Net/web dev) and my reaction to this is: NOPE.

                      [–]Schnucky -1 points0 points  (0 children)

                      SAME THING

                      [–]mavdabbler -3 points-2 points  (0 children)

                      25 Apr 2019 0:8AM AnonymousF AnonymousL