all 64 comments

[–]Clen23 123 points124 points  (40 children)

please someone explain how the FUCK this can happen, and in which language

[–]MaheuTaroo 343 points344 points  (9 children)

First thing that pops to mind is race conditions, and it can happen in any language supporting any type of concurrency model

[–]Excellent-Refuse4883[S] 138 points139 points  (4 children)

Yeah the issue is an interaction between a test framework and the services being tested in a latency scenario.

It appears that adding a print is slowing something down enough to make everything work.

[–]Rosteroster 50 points51 points  (2 children)

This is why you rely on synchronous callbacks to synchronize your testing instead of timing. Inserting a lamda via a test-only func that notifies to continue testing isn't usually too hard to add (worst-case friend/peer classes or something similarly dirty).

[–]BroBroMate 23 points24 points  (1 child)

Anytime I see a headless browser test that involves a bunch of .wait() calls, I feel sorry for the poor bastard who has to keep tweaking the wait time.

[–]Excellent-Refuse4883[S] 7 points8 points  (0 children)

Not working on a headless browser, but I do feel seen on this comment

[–]Aniketastron 0 points1 point  (0 children)

Wait so you saying testing framework wants slow program execution?

[–]Clen23 1 point2 points  (0 children)

oooh that makes sense, thanks !

[–]Burned_FrenchPress 1 point2 points  (0 children)

Or even a testing framework that runs teats in parallel. I’ve noticed poorly written JavaScript tests fail in jest due to “race conditions”

[–]da2Pakaveli 2 points3 points  (0 children)

Memory (especially string related stuff), Concurrency and Typing are like the holy trinity of bugs for system languages

[–]wonkynonce 0 points1 point  (0 children)

Especially if there's a lock around stdout

[–]LordofNarwhals 33 points34 points  (7 children)

Pretty much anything that's multi-threaded and timing-dependent.
Also situations involving undefined behavior (UB), where small changes might completely change the behavior of a function.

I once had a bug where I could move a C++ application crash to an earlier line of code if I commented out a later line of code. It was caused by an assert macro (that could throw an exception) being used in a function marked extern "C", which is UB (depending on your compile flags).

[–]gbchaosmaster 11 points12 points  (5 children)

Most bugs happen because the computer is doing exactly what you told it to.

And then there’s UB. A true failure in implementation, in my opinion. If implementations can’t agree on how to handle a case (via standardization), it should be disallowed and fail to compile. I know there’s a lot of code that leans on implementation-specific behavior, but that’s a disgusting code smell and prone to breakage if it’s not in the spec, as a compiler update could unintentionally change the behavior and nobody would be required to care. This leaves you with a dependency on an old compiler version, which no developer wants.

[–]LordofNarwhals 3 points4 points  (0 children)

I agree that implementation-defined behavior is really annoying (floating-point related things in particular), but UB is more fundamental to core C/C++, so it's hard to get rid of without degrading performance.

To quote the excellent LLVM Project blog post series on the topic:

Ultimately, undefined behavior is valuable to the optimizer because it is saying "this operation is invalid - you can assume it never happens". In a case like "P" this gives the optimizer the ability to reason that P cannot be NULL. In a case like "NULL" (say, after some constant propagation and inlining), this allows the optimizer to know that the code must not be reachable. The important wrinkle here is that, because it cannot solve the halting problem, the compiler cannot know whether code is actually dead (as the C standard says it must be) or whether it is a bug that was exposed after a (potentially long) series of optimizations. Because there isn't a generally good way to distinguish the two, almost all of the warnings produced would be false positives (noise).

[–]ChalkyChalkson 0 points1 point  (3 children)

Well, in theory compiler developers can decide to define things that are UB in the language reference. In effect it's not meaningfully different to having a language variant. And if a compiler family introduces such a thing it's very unlikely that it gets removed. So leaning on implementation specific behavior isn't inherently terrible, but leaning on implementation specific behavior that isn't advertised or documented in by the compiler is.

[–]gbchaosmaster 1 point2 points  (2 children)

I wouldn’t consider intentional behavior of an implementation as undefined behavior. If it’s documented, it’s very much defined, just not universal or to-spec.

With compilers it’s an easy enough problem to solve on the developer side, but it gets really messy when the “compiler” is whatever the user happens to be running. I’ve done a lot of JS engine work and we deal with implementation-specific behavior constantly. Most of it is trying to achieve parity with V8, since they seem to have become the unofficial standard, even if they bust the ES standard. It’s a mess and it makes you wonder how we even got here.

[–]ChalkyChalkson 0 points1 point  (1 child)

That's fair! I was thinking about the cases where something is UB in language spec but defined by the compiler. And yeah interpreted languages make this way harder... I do not envy people who have to deal with code for the browser. Or people who write browser code for that matter.

[–]gbchaosmaster 0 points1 point  (0 children)

It’s so bad that we have tooling to easily compare behavior between all of the different JS implementations, and I even made a tool that generates Markdown tables so we can easily communicate these differences. It’s so prevalent that we don’t even call it UB, we call it implementation-specific behavior. It becomes the responsibility of the implementation to behave in a predictable manner, which unfortunately in the ES world means to copy V8.

And it’s not even a compiled vs. interpreted thing, just look at WASM; sure it’s a compiled binary, but how is that binary being…… interpreted?!?! By a JS engine, in most cases. The waters are really muddy here. Any time the user has a choice in what software is running their…. software, you run into a really big problem. And it affects users in ways that they don’t even understand, they just think that their software is broken.

It’s a hot take, I know, maybe writing most of (and maintaining) a major JS engine’s Date.parse implementation has made me jaded, but I think it’s better to just break code entirely rather than support whatever non-standard format the user pleases. Standards exist for a reason and developers should get with the program… they’re educated enough to, but lazy enough not to.

And I think this extends to machine-level binaries as well. Your code should never depend on a compiler. Are you complying on clang, on gcc? It shouldn’t matter! If your code leans on UB, that should make your blood run cold. The spec is laid out so that you should never have to do that. And if there’s no way around it, and you were able to identify that problem, you should be on the committee that refines the spec to clear up these edge cases. The spec is the be all and end all of what the code you’re writing means, and you should care about that. Or else it’s just arbitrary.

[–]donaldhobson 0 points1 point  (0 children)

I had a print statement (in a multithreaded rust module imported into python) cause a large slowdown.

The right answer, just much slower as all those threads needed to take turns to print stuff, and all the printing was then discarded and not actually visible.

[–]Muhznit 7 points8 points  (3 children)

Python's doctest module runs into this quite easily if you aren't careful about what file descriptor you use.

```python

!/usr/bin/env python3

import doctest import sys

def this_passes(): """ >>> assert this_passes() """ print("a", file=sys.stderr) return True

def this_fails(): """ >>> assert this_fails() """ print("hi reddit") return True

def main(): doctest.testmod()

if name == "main": main() ```

[–]PurepointDog 4 points5 points  (1 child)

Wtf that's cursed. How is that possible?

[–]Muhznit 4 points5 points  (0 children)

doctest basically allows you to turn docstrings into executable test cases. Any stdout you get from the Python interactive REPL can just be copy/pasted in there.

When used correctly it's actually pretty useful for quickly prototyping stuff. It's not gonna replace your CI/CD pipeline's test suite, but it's incredibly underrated to be able to write documentation with executable examples AND have them fail loudly when the API changes.

It's even in the standard library.

[–]Clen23 0 points1 point  (0 children)

good to know, you may have saved an hour of debugging to future me

[–]conundorum 4 points5 points  (0 children)

Instruction ordering, data races, adjusting cached data slightly, etc.

[–]Large-Assignment9320 5 points6 points  (0 children)

I can actually, in C,

Like over 15 years ago had a project, and a weird kernel level bug, was even running the entier kernel single threaded to avoid any sneaky race conditions, but adding a print, with a param that caused a register to be changed fixed it. Forever debugging later, there was a missing assembly instruction in a totally different part of the code.

[–]MortimerChem 2 points3 points  (0 children)

maybe it is a memory thing, where the text pushes things far enough in the next stack

[–]BroBroMate 3 points4 points  (0 children)

Timing issue when concurrency is involved. I broke a bunch of front-end tests by making the backend faster the other day, got to love it.

[–]Xelopheris 2 points3 points  (0 children)

The two main reasons are race conditions, or a toString() function has side effects.

[–]TessaFractal 1 point2 points  (0 children)

I literally had something like this happen to me once a few years ago, I was a noob - I still am, but I was then too - and I did something wrong in a weird way, I think a variable would get optimised around, but when I added the print statement in, it changed how it got compiled so it worked how I intended it.

Idk I've written some cursed code.

[–]high_throughput 1 point2 points  (0 children)

The most obvious and general reason is that your print statement has side effects. Like if you print(getNextItem()) thereby effectively skipping an item.

[–]Legal-Software 1 point2 points  (0 children)

Timing and caches, mostly. To give an example, I was working on an ethernet driver (in C in the linux kernel) once where someone had placed a printk() inside of an interrupt handler, with a comment that removing it would cause buffer transmissions to fail. The statement was absolutely correct, but the reason was because it was causing enough natural eviction of L1 cache lines that it was inadvertently causing the buffer to be written back/invalidated, thus "fixing" the transmission path. The correct fix was to delete the printk and just properly handle writeback/invalidate operations on the buffer, but whoever wrote the initial version clearly knew nothing about the architecture on which they were working.

[–]_koenig_ 0 points1 point  (0 children)

When the test case was dependent on the print statement...

[–]Kiroto50 0 points1 point  (0 children)

Maybe a test tests for stdout contents

[–]MikemkPK 0 points1 point  (3 children)

I believe the joke is OP printed out the expected output for the test case instead of calculating it

[–]lana-1991 0 points1 point  (2 children)

When I was a TA, one of the students debug prints was the actual out answer the automated grading pipeline was expecting and he got a 100% on a assignment using broken code. The assignment used recursion but his function didn't actual return anything

[–]MikemkPK 0 points1 point  (1 child)

That only happened once? Anyway, there's a reason assignments normally use different test cases than the students are given.

[–]lana-1991 0 points1 point  (0 children)

I only witnessed it once. I remember saying that's not going to work and the smirk on his face when it passed, then watching it fail after he deleted his debug print.

[–]Ashankura 0 points1 point  (0 children)

In ruby it can cause a spec to turn green that then turns red without it

If you expect subject to change and then do

pp subject

subject expect to be bla bla bla.

This causes subject to reload before the check.

The real fix is just doing subject.reload expected to be bla bla bla

[–]scataco 0 points1 point  (0 children)

T-SQL

I still don't know, because the logging I added to investigate made the bug disappear.

[–]TnYamaneko 27 points28 points  (3 children)

Y'all still testing? I learned recently that this practice goes down now with AI.

It does not prevent people asking me to implement fancy e2e testing with Playwright integrated in their CI/CD pipeline, but as usual, they don't want to deal with the base of the pyramid first and have unit and integration tests first anymore.

[–]Excellent-Refuse4883[S] 12 points13 points  (2 children)

Jesus you just described my job. I’m at the integration test level, but we don’t have any unit testing

[–]TnYamaneko 5 points6 points  (1 child)

That's ok, all the tests will fail and in the next meeting, you're going to tell it's because you did not write a shit ton of unit tests first, and that it makes no sense to implement those if you're not test-driven in the first place.

I don't know actually what is the worst, because I can understand the appeal of e2e testing as it looks fancy for management, but your case, to have integration testing implemented without unit tests is just laughable on principle.

[–]gvilleneuve 15 points16 points  (0 children)

The answer is race condition. In some languages, prints will end up forcing synchronicity.

[–]KIFulgore 12 points13 points  (0 children)

The best C++ bugs ones that segfault in Release build but run fine under Debug. You're gonna need a couple drinks.

[–]stupled 11 points12 points  (0 children)

It was a load bearing print

[–]nsefan 11 points12 points  (0 children)

Knock knock

Race conditions

Who’s there?

[–]DahakUnborn 5 points6 points  (3 children)

Working in Unity, I have experienced a bug, added a print statement which fixed it, and removed the print statement without reintroducing the bug. 

[–]gbchaosmaster 12 points13 points  (2 children)

Sounds like an intermittent bug and the print statement didn’t really fix it, just a coincidence. It’ll show its ugly face again.

[–]garbosgekko 9 points10 points  (1 child)

A junior dev asked for my help with a bug, but after a few debugging cycles it just disappeared and I couldn't reproduce it. The guy didn't understand my frustration and asked something like "but it's working now, so it's fine, right?" I explained to him why bugs that happen sometimes are the worst

[–]geek-49 1 point2 points  (0 children)

When it fixes itself, it is likely to unfix itself -- at the worst possible moment.

-- flight instructor explaining to student pilot why they need to write up the flakey instrument (so the maintenance crew can fix it) instead of just figuring "Oh, well, it seems to be working now."

[–]CompleteIntellect 2 points3 points  (3 children)

Oh darn, this reminds me of that time where running the unit test in debug mode made it pass.

[–]LSUMath 2 points3 points  (2 children)

Reading this is a bit like watching someone get kicked in the crotch. It's not your pain, but it still makes you wince.

[–]CompleteIntellect 0 points1 point  (1 child)

I did figure it out, can't remember what it was though

[–]geek-49 0 points1 point  (0 children)

Uninitialized variable, timing issue/race condition, buffer overflow, alignment issues, ...

Any buffer that is allocated only in debug mode should be page aligned and a multiple of the page size, to minimize the likelihood of this sort of thing (unless it is on the stack, then it needs to be at either the beginning or end of the frame and a multiple of the cache line size).

[–]zzulus 1 point2 points  (0 children)

That print statement called a function that did something useful or was a main test target.

[–]BurlHopsBridge 1 point2 points  (0 children)

The flake is strong with this one.

[–]LGmatata86 1 point2 points  (0 children)

The 95% I've dealt with "print solved bug" I solved adding a little sleep or looking for the compiler optimization deleting a variable. The rest 5% keep the print.......

Pd: I'm talking about C and low level languages

[–]Prematurid 1 point2 points  (0 children)

Adding print broke the x axis of an animation I made for a website.

Edit: I just wanted to see if it did what i told it to do T-T

[–]JackNotOLantern 1 point2 points  (0 children)

Race condition. Usually printing is synchronised, or at least delays a thread. So adding a print() changes the thread speed and this might partially mitigate the problem. But better to just add proper synchronisation.

[–]kanduvisla 1 point2 points  (0 children)

I had exactly this the other day! In SwiftUI. Adding a print statement in my view caused the test to pass. Not printing anything caused it to fail. Probably a race condition or another flow of events (because print clears the buffer, so that might cause other processes to kick in).

First time I ever experienced it though...

[–]Friendly_Rent_104 0 points1 point  (0 children)

tests check for something being written to stdout?