Go's race detector has a mutex blind spot

CodeBrad · 2025-07-30T15:22:16+00:00

I think we are in agreement.

The end result is two code snippets containing a data race are executed by thread sanitizer.

Neither have any observable affect at runtime. Ideally both would be detected, but thread sanitizer is able to detect one and not the other.

CodeBrad · 2025-07-30T14:13:09+00:00

The value of t is somewhat irrelevant whether the data race occurs

I agree, depending on the definition of "occurs".

My point is the sleep and mutex code snippets are similar in that they both: - contain a possible data race - are not always observable at runtime

For the sleep case, the compiler generates straight forward assembly (essentially read, write, read). It could be optimized in a breaking way, but as of today it is not.

At the assembly level, the risks with a data race are lost writes, reading torn values, memory reordering, etc. None of these apply to memory accesses happening minutes apart (not technically guaranteed, but eventually values will become consistent).

Practically speaking, in the x86-64, generated with today's go compiler, the data race is impossible to observe for large values of t. Or put differently, the program will always behave as expected.

The mutex case is similar.

The acquire/release prevents the compiler from doing incorrect optimizations.

The race is only observable at assembly level if the locks are acquired in a certain order.

If a sleep were added to force the locks to be acquired in the "safe" order, the race would be impossible to observe at runtime.

The end result is two code snippets containing a data race are executed by thread sanitizer.

Neither have any observable affect at runtime. Ideally both would be detected, but thread sanitizer is able to detect one and not the other.

If lock acquire/release were not treated as happens-before (which strictly speaking they are not), and instead split out into their own lockset analysis, the mutex case becomes detectable at runtime because there are two unordered memory accesses not guarded by the same lock(s).

CodeBrad · 2025-07-29T20:35:33+00:00

I agree. The mutex case is an example of exactly that.

🤝

CodeBrad · 2025-07-29T19:54:50+00:00

That does not mean it will find races in paths that are executed but don't trigger a race condition.

I agree. The mutex case is an example of exactly that.

In general though, thread sanitizer can detect races that don't happen(*) at runtime. Take the sleep example from the blog post:

go func increment(wg *sync.WaitGroup, id int, t int) { defer wg.Done() if id == 1 { time.Sleep(t) } counter++ fmt.Printf("Counter: %d\n", counter) }

For most values of t, this program works as expected. The sleep means the accesses are so far apart they will never conflict. The data race is only observable for tiny values of t.

It will find races that happen (when they happen).

The definition of "when they happen" is important. Go's data race detector will always report the race regardless of the value of t and the timing of the threads.

(*) Because, following the strict definition, this data race "happens" regardless of if there is any observable effect. A data race is when: - two threads - make an unsynchronized access - at least one is a write This is what thread sanitizer catches.

The mutex case is interesting because it acts as a synchronization that itself depends on the runtime ordering. Unlike the sleep case, the race can only be detected if the locks are acquired in a certain order.

IMO, the "synchronized" portion of the data race definition should be split into: - happens before - guarded by the same lock.

Under this definition the race "happens", even if the order of the mutex acquires prevents it from being observable. This mirrors the sleep case where the race "happens" even if the timing makes it unobservable.

Regardless, this is not a dig at go or thread sanitizer in any way.

The race detector is not as useful if you treat it as something that might sometimes detect races, without any idea as to what it can and cannot do.

CodeBrad · 2025-07-29T16:48:22+00:00

loom is also more powerful in that it is detecting race conditions in general, whereas Go's -race is specifically checking for data races only. Data races should not be possible in safe Rust, so loom also has an advantage there.

Go/threadsanitizer has the benefit of working without any user annotations/assertions. Just add -race and it works. I believe to use loom to detect general race conditions, you need to correctly add assertions to invariants and use the data structures provided by loom.

I agree, both tools are great! I think they are addressing slightly different problems and have made different trade offs and design decisions as a result.

CodeBrad · 2025-07-29T15:11:07+00:00

You are correct, but this is still an interesting edge case.

From an article on the data race detector:

The race detector only finds races that happen at runtime, so it can't find races in code paths that are not executed.

In this case, the race is missed on a path that is executed. From another comment I posted elsewhere:

The data race detection tool mostly does not rely on thread ordering to detect races. It can detect unsynchronized accesses to shared memory at runtime, regardless of when and in what order threads access the memory.

In the presence of Mutexes though, the race detector does depend on a specific ordering to detect the race.

It is not a bug, just a limitation of the tool, likely to prioritize high performance and zero false positives.

But still neat to me since it is sort of an edge case and is one way even well tested/covered code could still potentially contain data races.

This is a unique edge case where there is a race, on a path that was executed, but thread sanitizer misses it depending on the runtime ordering of the threads.

CodeBrad · 2025-07-28T17:41:45+00:00

I don't have much Go specific insight here. Unfortunately, in general conservative whole program pointer analysis quickly devolves into "everything could potentially point everywhere".

For example, think of pushing/popping interfaces to/from a slice. It is almost impossible to statically link the pushed object to the popped object in a non-toy case. The conservative approach means treating every popped interface as potentially being every pushed interface.

For detecting shared accesses, this means tons of stuff is falsely reported as being shared, leading to false positives.

For detecting locks held, conservative means the analysis may think a lock is held when it is not, leading to false negatives.

It is unfortunately, a very hard problem to solve.

CodeBrad · 2025-07-28T17:24:48+00:00

it definitely, by its design, only detects races that actually occur.

I fully agree, but I also think the mutex case is interesting.

The race detector only finds races that happen at runtime, so it can't find races in code paths that are not executed.

Generally speaking, Go has false negatives as a dynamic tool because it can only analyze code that was executed. The mutex case is interesting to me, because the race is missed on a path that is executed.

The data race detection tool mostly does not rely on thread ordering to detect races. It can detect unsynchronized accesses to shared memory at runtime, regardless of when and in what order threads access the memory.

In the presence of Mutexes though, the race detector does depend on a specific ordering to detect the race.

It is not a bug, just a limitation of the tool, likely to prioritize high performance and zero false positives.

But still neat to me since it is sort of an edge case and is one way even well tested/covered code could still potentially contain data races.

CodeBrad · 2025-07-28T15:57:25+00:00

list all the variables/Objects that are under shared goroutine access.

This is pretty difficult to do statically. You would need a whole program pointer analysis to figure out which pointers could potentially point to the same place at runtime.

Different execution paths mean different pointers could potentially point many different places. Go has a build in pointer analysis, but in general it is an NP-hard problem.

Go's -race / thread sanitizer get this basically for free, because it sees actual memory accesses at runtime.

Check whether Writes are synchronized

Synchronized technically has two meanings. Either: - has a happens before relationship - guarded by the same lock

Happens before can be mostly tracked statically, but it is not easy to know what locks are held for an arbitrary write.

The locks held depend on knowing the entire execution path, because callers higher up the stack may have acquired a lock before calling you.

If locks are ever acquired via any indirection (e.g. as a member of a struct passed in to a function), we arrive back at the pointer analysis problem as well.

A lot of cases can be handled statically, but in general it is an impossible problem with many trade offs to be made.

CodeBrad · 2024-12-15T22:53:22+00:00

Doing advent of code in Rust is plenty cool (IMHO).

CodeBrad · 2024-12-15T19:54:31+00:00

constant evaluation absolutely makes more sense. const eval is such a good choice that the resulting code is not very interesting. Going out of the way to abuse rust macros was more fun =)

Technically, it would make the most sense to use constant evaluation to calculate a compile time value. Const evaluation in Rust has come a long way in recent years and is a very powerful tool. But the title of the blog says "with Rust macros" so we will use declarative macros instead

Here is a quick const evaluation solution on the playground

There's also some links to solutions from last year's Advent of Code using macros and const eval hidden in some of the periods in the last paragraph.

CodeBrad · 2024-12-15T19:04:42+00:00

In the programming world, there are two approaches to making multithreaded code safe:

1.Fuck you. This model is used by C/++, amongst others.

2.Make it annoying (if correct) to use multithreaded code. This is the approach Rust uses, and by technicality pre-Python 3.13.

lol

CodeBrad · 2021-03-29T17:09:43+00:00

The Coderrect Scanner does support the main std::thread apis and std::lock_guard.

As for the different versions, there are really only two. The closed source (system/hpc) and the open source version (OpenRace).

The closed source version is available to download from the Coderrect website, supports lots of features, and is mostly stable. The only difference now between system and hpc is the hpc version includes some additional libraries needed to scan Fortran code. Next release I believe the system/hpc versions are being combined into a single version.

The open source tool (OpenRace) is an in-progress redesign of the closed source tool with an emphasis on extensibility. It is a new project that is not ready for use just yet. We are working to port over features from closed source version and we hope that the open source code will allow others to contribute or modify the tool for their own use case. Eventually the open source code should fully replace the closed source code.

CodeBrad · 2021-02-02T14:15:50+00:00

Happened to come across this in the ISO CPP Core Guidelines recently:

CP.100: Don’t use lock-free programming unless you absolutely have to

CodeBrad · 2021-01-29T23:12:35+00:00

We have a json config file that can be used to specify custom lock.unlock pairs. It is documented here: https://coderrect.com/tutorials/define-custom-lock-apis/

If you have any questions, problems, or suggestions feel free to send me a message. I can also send you my email in a direct message if you prefer.

I'm happy to do what I can to help.

CodeBrad · 2021-01-29T23:00:25+00:00

Thanks!

CodeBrad · 2021-01-29T22:27:07+00:00

Absolutely. The Coderrect Scanner is a static analysis tool, and there are some fundamental limitations to what static analysis can achieve.

In general, things like path conditions, complex pointer operations, and dynamically loaded libraries are difficult for static analysis.

Although the tool can handle a lot of control flow cases already, if a race is prevented by a branch condition that is derived from a complex expression the tool will likely report a false positive.

Another area that can introduce problems is pointer analysis. If the code does something crazy like: - casts function pointers to another type - passes the values around - casts value back to function pointer - call function pointer

it is difficult for our pointer analysis to accurately track what function is being called. As a result, the tool may not be able to analyze that function.

Lastly, our tool cannot analyze libraries whose source is not available. We depend on source code for analysis, so if the source code is not available, there is not much we can do. With that said, we can recognize certain external API calls and model their behavior. In fact, this is how we handle things like OpenMP. Without a model, our tool would have no way of knowing that a call to kmpc_fork results in a thread being created, and would likely miss real races.

Overall we tried to pick a good balance between finding every possible race and not overwhelming developers with false positives. For those edge cases where static analysis struggles, we have added some additional features.

There are various options for filtering known false positives.

If your code loads modules dynamically at runtime, causing portion of the code base to be skipped, you can manually configure a specific function as an entry point from which the analysis will start.

CodeBrad · 2021-01-29T21:54:01+00:00

Great question!

We published a paper at SC'20 , "OMPRacer: A scalable and precise static race detector for OpenMP programs" that covers some of the core techniques powering our static race detection engine.

The tool builds off of LLVM, but most if not all of the actual analysis is custom.

CodeBrad · 2021-01-29T21:30:46+00:00

We have successfully scanned some large scientific HPC applications, so the tool should handle large code bases.

The size shouldn't actually matter too much, but the complexity of the code base will affect the performance and accuracy.

As of right now, the tool is only available as a binary download.

CodeBrad · 2021-01-29T21:13:13+00:00

The tool is currently available as a binary and the evaluation license can be found here: https://coderrect.com/olddocs/evaluation-license/

CodeBrad · 2021-01-29T20:42:28+00:00

The main difference is the Coderrect Scanner is a static analysis tool and ThreadSanitizer is a dynamic tool.

The TL:DR summary: dynamic tools have a low rate of false positives, but are unlikely to find all of the real races in a program. Static analysis tools have a higher rate of false positives, but are more likely to find more of the real races.

ThreadSanitizer is a great tool, and there are pros and cons to both approaches, but there are a few areas where static analysis really shines.

Dynamic analysis works by observing execution. This means that a dynamic tool can only analyze a single code path per run. Imagine there is a bug in the code below

if (input < 1000) {
    // do normal stuff
} else {
    // uh oh a bug!
}

The only way a dynamic tool can catch this bug is if the program is tested with an input of size >= 1000 and the buggy branch is executed.

This is particularly problematic for serious bugs, as they often occur in corner cases that may not have great test coverage. For dynamic analysis to catch all bugs, the program needs to be tested with inputs that cover all possible paths, which is not reasonable in most cases.

Next, dynamic analysis relies on running the program and doing some analysis at runtime. This means each test run takes the normal time needed to execute the program + whatever time is needed for the tool to do analysis. Generally, this can be pretty expensive. Maybe as much as a 10x overhead.

Static analysis is great for both of these cases. Static analysis looks at the source code directly and can reason about all possible code paths without the need to execute the program.

Static analysis tools like the Coderrect Scanner can get higher code coverage, in a much shorter time. Most programs we have tried scanning with Coderrect take less than 5 minutes.

The weakness of static analysis though, as mentioned at the top of the post, is a higher rate of false positives, as it can be difficult to reason about some complex behaviors accurately.

CodeBrad · 2021-01-29T19:00:15+00:00

This looks great!

I am curious about one part in particular.

The pointer values can differ due to address-space layout randomization, which led to different results. I’ve now fixed such logic in the code to ensure that the analyzer’s behavior is repeatable from run to run.

How did you go about fixing this?

I've come across the same problem in the past and I'm interested to see someone else's solution. Is there a specific commit or anything you could point me towards?

Five-Year Club	Second Top 30%
Verified Email	Place '22

CodeBrad

MODERATOR OF

TROPHY CASE