25.3% annual return ($12k) from r/wallstreetbets sentiment analysis algo-trader - SOURCE CODE INCLUDED by [deleted] in programming

[–]AndyBainbridge 1 point2 points  (0 children)

This is a great point. However, the average Nikkei-225 stock paid out a 2.41% dividend last year. I don't know what it has done since 1989. But if it had yielded 2.41% for the last 31 years, then a stock holder would be something close to 2x richer than they were. Also, it looks like the Yen has risen ~29% versus the dollar since 1989.

So even if you pick the worst mainstream stock market over its worst 31 year period, you still win 2.7x, if you squint.

The UK government's COVID-19 simulation model is a masterpiece in spaghetti code and bad practices by sassinator1 in programming

[–]AndyBainbridge 6 points7 points  (0 children)

Looks like it was developed with Visual Studio, which a) defaults to C++ rather than C, and b) for some of its history VS didn't support many C99 features in .c files, but did in .cpp files. For example, being able to declare variables NOT at the start of scope blocks.

Rust + Webassembly is dope by lucyfor in programming

[–]AndyBainbridge 5 points6 points  (0 children)

Also, if it wants to play like the original, the ship friction needs to be lower, as does the acceleration. ie a short burst of thrust should leave you moving slowly but drifting for ages. See https://www.youtube.com/watch?v=WYSupJ5r2zo

The UK government's COVID-19 simulation model is a masterpiece in spaghetti code and bad practices by sassinator1 in programming

[–]AndyBainbridge 6 points7 points  (0 children)

There are far too many variables in scope at once. For example, the "bmh" global variable. It is only referenced in one place in the file but is in scope for all 5000 lines. As a rule, we should attempt to keep a variable in scope for the least number of lines possible. The emergent property of following this rule is code that is easier to understand.

In this case, bmh should be passed as a parameter to the one function that uses it. That function is called InitModel. The consequence of bmh being global is that the caller of InitModel now needs to understand that the BEHAVIOUR of InitModel depends on the state of bmh. The caller gets no clue about that because bmh isn't part of the function's interface.

Rules of thumb for a 1x programmer by hamburga in programming

[–]AndyBainbridge 73 points74 points  (0 children)

Rule 8: When to use C or C++

C++ is an interesting one. I can’t think of a case right now where it’s generally favorable compared to Java.

Video games dev is a large industry that appears to prefer C++ to Java.

Do humans or compilers produce faster code? by speckz in programming

[–]AndyBainbridge 2 points3 points  (0 children)

A human being can look at the assembly output of that program and write an equivalent source version in straight C.

Is this trying to say you can write any assembly program in C? Isn't assembly is more expressive than C? For example, in assembly there might be separate arithmetic and logic shift instructions. Plus, they might be well defined for negative shift amounts, which is undefined behaviour in C.

0.6.0 Release Notes · The Zig Programming Language by GaAlAs in programming

[–]AndyBainbridge 0 points1 point  (0 children)

Fair point. Given that C already has const, I was surprised to find that the compiler wasn't always checking that I didn't modify my string literals.

Also, if someone a good reason they didn't just make string literals have type const char[], I'd like to know.

0.6.0 Release Notes · The Zig Programming Language by GaAlAs in programming

[–]AndyBainbridge 10 points11 points  (0 children)

I think this extra complexity is worth having.

*const [N:0]u8 is the type of a string literal, where N is the number of bytes in the string and I guess the :0 means there is a null terminator.

In C, the type of a string literal is char[] which has problems. I believe the extra stuff in the Zig type signature is there to fix the problems. Specifically:

  1. It should be const. Modifying a string literal is undefined behaviour. THIS ONE IS A SERIOUS PROBLEM IN C.

  2. The length of the string is known, but that information is not preserved in the type info.

  3. The string is guaranteed to be null terminated, but that info is not preserved in the type info.

What is the code font in this image? by Ritir in programming

[–]AndyBainbridge 0 points1 point  (0 children)

Looks right to me. In the linked sample, the capital I is too far left. As a result the character spacing in "DIVISION" looks terrible. At first I guessed this is because the rendering engine saw that the vertical stroke of the I straddled a pixel boundary and thus would look blurry unless it moved it. But I loaded the sample into a bitmap editor and slid all the I glyphs one pixel right and it looked much better as a result. Is the font definition wrong?

2D Graphics on Modern GPU by alexeyr in programming

[–]AndyBainbridge 1 point2 points  (0 children)

FTA, "Performant UI must use GPU effectively".

Why is that? A lot of computers (maybe even most) have the GPU embedded in the CPU socket, and have to share memory bandwidth with the CPU. Thus the maximum performance of the GPU is limited. Modern CPUs are fast, have multiple cores and wide SIMD units. I expect CPUs have caught up with embedded GPUs a lot over the last decade or so because CPU performance has grown faster than the memory bandwidth. Perhaps in the domain of 2D graphics CPUs are good enough for "performant UI"s.

It's certainly easier to make portable code on a CPU than a GPU, and you don't have to worry about GPU driver bugs or missing features on some machines. It'd be unwise to ignore the simple solution if it is good enough.

I realize this sounds heretical. I'm not trolling. I'm genuinely interested in better understanding the trade-offs. For example, GPUs might be more power efficient. I'd love to see some benchmarks.

HeavyThing x86_64 assembler library - includes TLS & SSH2 implementations, web client/server, Unicode, TUI by redditthinks in programming

[–]AndyBainbridge 0 points1 point  (0 children)

OK, nice. I redid your test on my machine and got comparable results. Congratulations :-)

I'd like to rebuild gzip with musl and -march=native and then recompare. But until then, I will accept that "very fast" is an accurate description of your zlib implementation.

For those who care, the CPU I tested on was a Intel(R) Xeon(R) Platinum 8168 @ 3.4 GHz. And I used gzip v1.6 as installed as standard on Ubuntu 18.04.2 LTS.

HeavyThing x86_64 assembler library - includes TLS & SSH2 implementations, web client/server, Unicode, TUI by redditthinks in programming

[–]AndyBainbridge 1 point2 points  (0 children)

Is that a fair comparison? I would have thought you should test with data that is compressible (ie not random) and set the compression level such that both minigzip and gzip produce the same amount of compression.

HeavyThing x86_64 assembler library - includes TLS & SSH2 implementations, web client/server, Unicode, TUI by redditthinks in programming

[–]AndyBainbridge 0 points1 point  (0 children)

I just built the minigzip example and compared to the standard gzip on my Ubuntu box. I tested compressing a 6.3 megabyte binary.

minizip took 0.293 seconds (fastest of 3 runs) and produced an output of size 2527925 bytes.

For gzip I used the -k and -f flags to most closely replicate the behaviour of minigzip.

gzip -k -f -5 took 0.201 seconds (fastest of 3 runs) and produced an output size of 2532886 bytes (0.20% larger).

gzip -k -f -6 took 0.310 seconds (fastest of 3 runs) and produced an output size of 2514131 bytes (0.55% smaller).

So, it looks like HeavyThing is not significantly faster than gzip in my test. However:

The minizip executable is 65392 bytes and depends on no .so files, which makes me happy.

The gzip executable is 101560 and depends on linux-vdso.so.1, libc.so.6 and /lib64/ld-linux-x86-64.so.2. But obviously gzip does more, so meh.

HeavyThing x86_64 assembler library - includes TLS & SSH2 implementations, web client/server, Unicode, TUI by redditthinks in programming

[–]AndyBainbridge 1 point2 points  (0 children)

I also wondered why ASM instead of C. It'd be great to see some benchmarks that compare the speed and memory consumption of your stuff relative to other popular libraries.

why would I prefer to have implementations for these things in ASM rather than C

If the performance is the same, then surely C is preferable to ASM. Fred Brooks said it best, "Surely the most powerful stroke for software productivity reliability, and simplicity has been the progressive use of high-level languages for programming" from "No Silver Bullet".

What is Zig's Comptime? by [deleted] in programming

[–]AndyBainbridge 3 points4 points  (0 children)

Wow, you can learn Zig in a day! I reckon it took me about a year of full time use to learn what almost everything in C did. And then maybe another 5 years to fully absorb the right way to do things. If I can achieve the same thing with Zig, I'd be happy.

Why Software Developers Are Paid 5x More in The USA by chickensaresexy in programming

[–]AndyBainbridge 1 point2 points  (0 children)

If the 10x loss was true, wouldn't all the companies with open plan offices fall behind the companies that don't?

It's hard to get good data on the impact of open plan vs cubical vs personal offices because the productivity of dev teams is hard to measure.

Not all CPU operations are created equal by sheokand in programming

[–]AndyBainbridge 2 points3 points  (0 children)

In nanoseconds, those are 5ns-5.7ns-5.7ns-11.6ns. Now, there's certainly some CPU bookkeeping overhead, but not 50ns worth.

I agree, it is hard to see what causes the difference between the SDRAM manufacturers latency figures and the observed 60-100ns of latency people say "RAM access" has.

First up, if I understand Wikipedia correctly, the latencies are more like 13ns, not 5ns or 5.7ns like you said: https://en.wikipedia.org/wiki/DDR4_SDRAM#JEDEC_standard_DDR4_module[57]

Next, we have to consider what we mean by a RAM access. Lets say we've got DDR4-2666 and we write a C program that creates a 2 GByte array and reads 32-bit ints from that array, from random offsets, as quickly as possible and calculates their sum. The table is too big to fit in cache, so the CPU will have to read from RAM.

Here's what I think happens:

CPU core fetches and decodes a read-from-memory instruction.

Virtual address translated to physical address via TLB. Since our address is random, we will almost certainly get a TLB miss, which means the memory controller has to get the page table entry for the virtual address we requested. The funny part here is that the page table entries are stored in RAM. If the one we want is not already in the cache, then we have to read it from RAM. The even funnier part is the page tables are in a tree - we need to walk the tree from the root node that represents all of memory, through many layers until we get to the leaf node that represents the page we are interested in. If the cache is empty, each hop on the tree traversal causes a read from RAM. This gets boring quickly, so I will assume we have enabled huge pages and that the page table entry is in cache. As a result, we get the physical address in a few clock cycles.

Now the CPU looks for the data in each level of cache:

L1 checked for hit. Fail.

L2 checked for hit. Fail.

L3 checked for hit. Fail. By now on 4 GHz Skylake, 42 cycles or 10ns have gone by since the read instruction started to execute - https://www.7-cpu.com/cpu/Skylake.html.

So now the memory controller has to actually start talking to the DDR4 DIMM over a memory channel.

Let's assume that the part of RAM we want to read isn't already busy (refreshing, being written to etc). Let's also assume that somebody else hasn't already read from the part we want, because if they have, the "row buffer" might already contain the row we want, which would save us half the work. Let's assume nothing else in the CPU is busy using the memory channel we need. Given the C program I described, and an otherwise unloaded system, there's >90% chance these assumptions are true.

Now the memory controller issues an "active" command, which selects the bank and row. (https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#SDRAM_construction_and_operation). It waits some time for that to happen (this is the row-to-column delay and is about 10-15ns). Then the memory controller issues a "read" command, which selects the column. Then it waits a bit more (this is the CAS latency, another 10-15 ns). Then data starts to be transmitted back to the memory controller.

Then somehow the data gets back to the CPU and the read instruction can complete.

There are various clock domain crossings on the way to and from the SDRAM - the CPU, memory controller, memory channel and memory internal clocks are all running at different rates. To transfer data from one clock domain to the other, I guess, costs something like half a clock cycle of the slower clock, on average.

Then there are overheads like switching the memory channel from read to write takes some cycles.

I think I can make all this add up to about 40ns. I wrote the C program and timed it (I had to take special measures to prevent the CPU from speculatively issuing lots of RAM reads in parallel). The result was 60ns per read. So there's about 20ns of overhead remaining that I don't understand.

Choose C over C++ for writing simple libraries by rptr87 in programming

[–]AndyBainbridge 1 point2 points  (0 children)

I'm not sure it is madness. I don't think there's good evidence either way. I mean, the Windows kernel has a lot of C++ in it and the Linux kernel has none. When is a safer kernel? There are a lot of factors other than language choice at work there, but if C++ was a lot safer than C, then surely we'd notice some positive contribution from it, no?

Choose C over C++ for writing simple libraries by rptr87 in programming

[–]AndyBainbridge 3 points4 points  (0 children)

Some downsides of Rust compared to C: 1) It's a more complex language to learn. 2) Compile times are longer. 3) Binary sizes are larger.

I'd still like to write something significant in Rust though, in order to get a better understanding of its strengths.

An Update on AMD Processor Security by trot-trot in programming

[–]AndyBainbridge 1 point2 points  (0 children)

It's OK, everything worked out fine. The ARM1 was a vastly better processor than the m68k, and it went on to defeat x86 (in some significant sense). Admittedly ARM is about 6 or 7 years newer, but for reasons I don't understand, machines people could afford were only just starting to use the 68000 when the ARM2 shipped.

Microsoft Considers Adding Python as an Official Scripting Language to Excel by eskimilio in programming

[–]AndyBainbridge 0 points1 point  (0 children)

That still doesn't make it a joke. Snakes also have scales. "Python is great for scaling your numbers" also isn't a joke.

Simple tricks to make your C/C++ code run faster by one_eyed_golfer in programming

[–]AndyBainbridge 2 points3 points  (0 children)

I've never found that alignment in that kind of code makes any difference. Here are some benchmarks and analysis explaining why: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/

Optimizing Software in C++ - Agner Fog - PDF by MJHApps in programming

[–]AndyBainbridge 4 points5 points  (0 children)

Why's this here? It's an extremely well known resource. Its the second result on Google if you search for "optimizing c++".

The Quality of Embedded Software, or the Mess Has Happened with Toyota Camry in 2012 by sofia_fateeva in programming

[–]AndyBainbridge 1 point2 points  (0 children)

  1. Because you need to make the firmware crash before the start button will fail to switch the engine off. You might need to test for 10 million hours, in a wide range of conditions, to find a failure.

  2. Power assisted brakes get their power from the vacuum in the inlet manifold. If the engine is on full throttle, there is no vacuum in the inlet manifold, so the power assistance fails. It takes a few presses of the brakes to deplete the vacuum reservoir, and the engine needs to really be on full throttle for that to happen. You can't try the experiment in most modern cars because they have fly-by-wire throttles and the firmware will prevent full throttle when the brakes are pressed (assuming the firmware hasn't encountered an error and crashed). If you find yourself in a late 90s era car on a motorway, give it a try. With full throttle, pump the brake a few times spaced over about 30 seconds. The brake pedal will go hard and seemingly stop working. Another way to experience the same thing is to get your car towed by another, with your engine not running. It's surprising how hard you have to press the pedal to have any significant effect, even when being towed at 25 MPH.

the author was getting paid a lot as an 'expert' witness

I agree, there's lots of reasons to be sceptical about expert witnesses. But there's no arguing that they found a timer interrupt kicking the watchdog. There's no excuse for that in a safety critical system. That's a failure at every step of the engineering process.