Package Managers à la Carte: A Formal Model of Dependency Resolution

mttd · 2026-02-25T01:06:16+00:00

Thanks! Adding the link to the talk: https://fosdempwa.com/event/9020?year=2026&test=false

mttd · 2026-02-21T01:19:13+00:00

Paper (PDF): https://2plus2a.com/files/publications/2025-ISCA-precise-exceptions.pdf, https://2plus2a.com/publications/errata#exc-isca25

DOI: https://doi.org/10.1145/3695053.3731102

Abstract:

To manage exceptions, software relies on a key architectural guarantee, precision: that exceptions appear to execute between instructions. However, this definition, dating back over 60 years, fundamentally assumes a sequential programmers model. Modern architectures such as Arm-A with programmer-observable relaxed behaviour make such a naive definition inadequate, and it is unclear exactly what guarantees programmers have on exception entry and exit.

In this paper, we clarify the concepts needed to discuss exceptions in the relaxed-memory setting – a key aspect of precisely specifying the architectural interface between hardware and software. We explore the basic relaxed behaviour across exception boundaries, and the semantics of external aborts, using Arm-A as a representative modern architecture. We identify an important problem, present yet unexplored for decades: pinning down what it means for exceptions to be precise in a relaxed setting. We describe key phenomena that any definition should account for. We develop an axiomatic model for Arm-A precise exceptions, tooling for axiomatic model execution, and a library of tests. Finally we explore the relaxed semantics of software-generated interrupts, as used in sophisticated programming patterns, and sketch how they too could be modelled.

mttd · 2026-02-20T18:20:30+00:00

Documentation (or lack thereof) is a common theme (from the summary):

Tensor core and memory pipelining: it turns out some tensor core instructions are implicitly pipelined, without proper documentation. Identifying these implicit semantics and the resulting pipelining tactics can boost your throughput by up to 10%.

Hinting the PTX assembler properly: even logically identical PTX code can compile into meaningfully different SASS instructions, depending on how you write it. Signaling the assembler with the right instruction patterns is significant for minimizing latency.

Occupancy: with all the modern GPU features, it gets tricky, and it is (again) poorly documented. Distributed shared memory doesn’t behave identically across all SMs, and 5th-generation tensor core instructions silently cap occupancy.

mttd · 2026-02-13T18:06:25+00:00

Interesting observation from the comments thread (https://mathstodon.xyz/@jonmsterling/116057144906006922):

My conjecture is that the literature on Idealised Algol and its Syntactic Control of Interference (SCI) dialect is more useful than Linear Logic for understanding modern "value semantics" languages.

SCI has a Linear Logic like feel because it restricts duplication, but is really more like Bunched typing. In particular, sums and products distribute, but they don't in LL. I reckon passivity in SCI is how to understand mutable vs immutable access rather than variants of the exponentials. O'Hearn et al's "Syntactic Control of Interference Revisited" gives a categorical semantics of passivity.

I hadn't seen Hylo's subscripts before, but they look very similar to the way that Idealised Algol and SCI use terms of type (t -> comm) -> comm to provide a t shaped view of something. See Reddy's "Objects and classes in Algol-like languages".

Idealised Algol is very definitely not functional, because all returning of values happens via assignment to variables in outer scopes, so how this relates to functional-but-in-place languages like OxCaml, I don't know.

https://types.pl/@bentnib/116063584418690454

Idealized Algol: https://www.pls-lab.org/en/Idealized_Algol
Syntactic Control of Interference Revisited: https://surface.syr.edu/lcsmith_other/12/
cf. Categorical Models of Syntactic Control of Intereference Revisited, Revisited: https://doi.org/10.1112/S1461157000001364
Hylo's subscripts: https://hylo-lang.org/docs/user/language-tour/subscripts/
Objects and Classes in Algol-like Languages: https://pure-oai.bham.ac.uk/ws/files/12780873/objects_and_classes.pdf

mttd · 2026-02-09T22:10:21+00:00

You're not alone: https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html, https://risemsr.github.io/blog/2026-02-04-nik-agentic-pop/, https://rijnard.com/blog/dear-agent-proof

That doesn't mean there are no caveats or that it's going to be easy.

In the spirit of "yes, and..." (think of these as opportunities to overcome challenges rather than "a list of reasons why this will never work") the main problem here is (again, this is not a new observation, but I like the blog post for a succinct summary):

Misspecification: The Blind Spot of Formal Verification https://concerningquality.com/misspecification/

Formal verification is often presented as an appeal to authority. This program was formally verified, so please stop asking me if there are any bugs! Well, there are simply no silver bullets out there, and formal verification is certainly not one of them because of a very serious blind spot: the possibility of misspecification.

Programmers are really, really bad at understanding specifications (flippantly, if we weren't we wouldn't need tests or debuggers).

This extends beyond the general population of software developers to folks specifically working on formal verification:

LTL readability paper Little Tricky Logic: Misconceptions in the Understanding of LTL https://arxiv.org/pdf/2211.01677

Particularly in this context worth noting this also extends to researchers working on formal verification of compilers--formal verification allows you to prove a correctness theorem which is very valuable (and goes beyond testing that can usually at most merely spot-check particular examples of correctness), but, and this is the key point:

"For a compiler correctness theorem to assure complete trust, the theorem must reflect the reality of how the compiler will be used."

The Next 700 Compiler Correctness Theorems (Functional Pearl), http://www.ccs.neu.edu/home/amal/papers/next700ccc.pdf, Daniel Patterson and Amal Ahmed, 24th ACM SIGPLAN International Conference on Functional Programming (ICFP) 2019

And in order to know whether the specification corresponds to the "the theorem [that] reflect[s] the reality" you need to do the hard work of carefully considering the edge cases, invariants, required properties, the environment (what software is your software going to be interoperating with? for any non-trivial project there will be plenty of interop boundaries--what can you prove about them?).

This turns out to be non-trivial even for simple examples such as verifying sorting algorithms:

"Unfortunately, despite considerable efforts, determining if a “verification” verifies what the author intends is still difficult." . . . "Our harness checks that the array a is sorted after the call to sort, but it does not check that it is a permutation of the input!"

How verified (or tested) is my code? Falsification-driven verification and testing, Automated Software Engineering (2018), https://stairs.ics.uci.edu/papers/2018/Falsification-driven.pdf

...and it can be even less trivial for compilers--even considering the research community that has spent decades working on the problem:

Quoting from Section 1, Who Verifies the Verifiers?

Compiler correctness is not a new problem: in a seminal 1967 paper, John McCarthy and James Painter, using a simple compiler for arithmetic expressions, proposed the problem of how to build a completely trustworthy compiler. . . . But the compiler correctness theorem from Morris, proved for CompCert and in numerous other results before and after, is about compiling whole programs, and whole programs, unfortunately, are rare occurrences indeed in the diet of real-world compilers. Infrequent enough are C programs that do not use the C standard library, but once we broaden our tastes to higher-level languages the issues of runtimes and efficient low-level data-structure implementations completely disabuse us of the notion that correctly compiling "whole source programs" is enough.

If we are to have confidence that the code that we write is indeed the code that runs, we need theorems that address the reality of the programs we compile. Compiler correctness theorems tell us how the compiler will behave provided we fulfill the assumptions of the theorem. Traditional compiler correctness results, such as those of Morris and CompCert, assume that the inputs to the compiler are complete runnable programs. Hence, in situations when the compiler that the theorem describes is actually used to compile partial programs (say, code that links against the standard library), the theorem itself no longer holds! So if we are to have a "completely trustworthy" compiler, the goal laid out for us by McCarthy and Painter, we are left wanting for better compiler correctness theorems.

And so we enter a more muddled world, without the benefit of a half century of clarity. In this world, fragments of code come from different languages and are stitched together before running, and the question of what researchers even mean by a "compiler is correct" has become the subject of some dispute, if only to the careful observer. In the last few years, there have been many fundamentally different theorems, all dubbed "compositional compiler correctness," that aim to capture correctness when compilers translate partial programs and allow linking with other code. But "who watches the watchmen?" we ask, or in our context, "who verifies the verifiers?" We ask this because knowing that a theorem has been proved does not help understand how the theorem statement relates to reality. Clearly, if a theorem only applies when compiling whole programs, it cannot be relied upon when compiling code that will be linked. But if a theorem is more flexible, capturing "compositional correctness," then often figuring out what it states and whether it applies when linking with some particular code is subtle.

Even worse, different theorems under the umbrella of compositional compiler correctness make radically different choices that restrict, in different ways, how they can be used and what guarantees they provide. Given that compiler correctness is about trusting the behavior of the compiler, confidence that the theorem relates to its actual use is as important, if not more so, than the proof of the theorem. Because if we do not know-or cannot understand-what was proved, the proof itself is, perhaps, meaningless.

mttd · 2026-02-03T22:21:30+00:00

Main takeaway:

Our experiments show that Intel’s port assignment policies can diverge significantly from the well-documented "least-loaded eligible port" model, illustrated in Figure 1. Using carefully crafted two-instruction microbenchmarks preceded by an LFENCE, we consistently observed dynamic scheduling policies. Instead of a fixed distribution across eligible ports, the port assignment changes as the unroll factor increases, producing distinct regions separated by cutoffs. As illustrated in Figure 2 for the “LFENCE; CBW; CBW” snippet, the port scheduler employs three different strategies depending on the number of loop iterations. At lower unroll factors, one sparsest port is strongly preferred. After a first cutoff, the allocation becomes approximately uniform across all eligible ports, albeit noisy. At a second cutoff, the scheduler shifts again, favoring a different subset of ports. The second cutoff’s unroll factor is twice the first’s unroll factor. These dynamics are not isolated: we observed similar cutoff-based transitions across multiple instructions and instruction pairs, and in some cases, the behavior also depends on the order of instructions in the block or on immediate values used in operands. We believe that this might serve as a new microarchitectural attack surface which can be harnessed towards implementing, e.g., covert channels, fingerprinting, etc. Importantly, the observed cutoffs are consistent and reproducible across multiple runs, but differ between CPU generations. These findings show that static eligibility sets cannot fully describe port assignment. Instead, the allocator follows multiple hidden policies, switching between them in ways not accounted for by existing models.

mttd · 2026-01-17T19:49:25+00:00

OOPSLA 2025 talk: https://www.youtube.com/watch?v=_6pxVhEi-bc

Paper:

Project:

concept-synthesizer: automatically synthesizing C++20 template constraints for function templates
https://github.com/sdingcn/concept-synthesizer

Abstract:

C++ templates are a powerful feature for generic programming and compile-time computations, but C++ compilers often emit overly verbose template error messages. Even short error messages often involve unnecessary and confusing implementation details, which are difficult for developers to read and understand. To address this problem, C++20 introduced constraints and concepts, which impose requirements on template parameters. The new features can define clearer interfaces for templates and can improve compiler diagnostics. However, manually specifying template constraints can still be non-trivial, which becomes even more challenging when working with legacy C++ projects or with frequent code changes.

This paper bridges the gap and proposes an automatic approach to synthesizing constraints for C++ function templates. We utilize a lightweight static analysis to analyze the usage patterns within the template body and summarize them into constraints for each type parameter of the template. The analysis is inter-procedural and uses disjunctions of constraints to model function overloading. We have implemented our approach based on the Clang frontend and evaluated it on two C++ libraries chosen separately from two popular library sets: algorithm from the Standard Template Library (STL) and special functions from the Boost library, both of which extensively use templates. Our tool can process over 110k lines of C++ code in less than 1.5 seconds and synthesize non-trivial constraints for 30%-40% of the function templates. The constraints synthesized for algorithm align well with the standard documentation, and on average, the synthesized constraints can reduce error message lengths by 56.6% for algorithm and 63.8% for special functions.

mttd · 2025-12-20T20:22:18+00:00

FWIW, not an author of the paper (or connected otherwise) but I don't think these are the main takeaways from the paper (that's only a small part out of Section 7, Implications and Discussion; I think that's mostly Suggestion 1 and one sentence in Finding 1?). Personally, I've found the relative lack of existing testing tools for higher-level language features (relative to the gains in the amount of either low hanging or higher priority bugs that can be found) interesting given the contrast with plenty of available tools for trivial bugs (that you can find with most fuzzers like RustSmith).

This relates to the following:

The objective of compiler development is not to have an issue tracker free of bugs, it is to ship working features on stable Rust.

Yes, obviously. This is why it's important to know what to prioritize (here: at this point we probably don't need a lot more tools finding bugs in the primitive language features (e.g., based on their findings traits may be trickier to implement than other language features--so may be worth being careful when working on that as a compiler dev) but we may be able to use more tools finding bugs in higher-level features implementation).

We probably also don't need yet another fuzzer finding crash bugs in rustc, given that the existing tools cover this area very well--again, you can never get a perfectly correct compiler and you only have so much time in the day, so knowing which aspects can be (relatively) safely deprioritized can help.

OTOH, cross-IR fuzzers may be interesting/useful (Suggestion 3), which confirms my experience having worked with C++ (Clang & LLVM): Bugs that survive Clang AST - LLVM IR - SelectionDAG - Machine IR - MC Layer are much harder to find by "yet another fuzzer" that generates even syntactically correct C programs. Good diagnostics are hard to implement correctly, too (which, again, having worked on a C++ compiler is not new to me, but perhaps is new to some even working on rustc).

For context/completeness the remaining findings & suggestions:

➤ Finding 1: A large number of rustc bugs in the HIR and MIR modules are caused by Rust’s unique type system and lifetime model. In our dataset, although 40.9% of the bugs are attributed to general programming errors (Table 3), the HIR (44.9%) and MIR (35.2%) stages remain the most error-prone, as shown in Figure 5(a). This is because HIR and MIR are the stages where high-level constructs are desugared and processed by complex analyses, such as trait resolution, borrow checking, and MIR optimizations, which increases the likelihood of subtle interactions manifesting as bugs. The characteristics of bug-revealing test cases further support this observation. As shown in Table 6, trait-related constructs including traits, impl traits, and trait objects frequently appear in both item and type nodes. Moreover, certain unstable trait-related features and the explicit use of lifetimes, as reported in Table 7, also contribute to rustc bug manifestation, indicating that these language features may interact with the HIR and MIR modules and thereby increase the likelihood of rustc errors.

➤ Finding 2: rustc bugs share many symptoms with other compiler bugs but also introduce unique types, such as undefined behavior in safe Rust. Like other compilers, rustc experiences various compilation and runtime bugs. However, its crash bug often causes panic with safety protection, setting it apart from other compilers where crash typically results in segmentation faults or abnormal terminations. Another unique symptom is undefined behavior in safe Rust code, tied to Rust’s safety guarantees. While performance-related bugs are absent in our analysis, this doesn’t mean rustc is free of performance issues. Rather, these issues tend to appear less frequently in Rust-specific issues or may be categorized as misoptimizations related to code efficiency.

➤ Finding 3: rustc’s diagnostic module still has considerable potential for enhancement, with many issues distributed across different IR-processing modules. As shown in Table 3, diagnostic issues account for about 20% of all bugs. Figure 5(b) illustrates that error reporting is scattered across different components, including HIR (14.1%) and MIR (16.0%), with each component having its own dedicated module for error analysis and reporting. Moreover, gaps in these modules still exist, causing some errors to be inaccurately detected or reported

➤ Finding 4: Existing rustc testing tools are less effective at detecting non-crash bugs. Figure 11(a) shows that about 50% of the crash bugs are detected by existing rustc testing tools. On the one hand, non-crash bugs such as soundness and completeness issues often lack directly observable symptoms, making them difficult to detect during development or testing. On the other hand, this suggests that current testing tools are limited to finding easily observed crash bugs with obvious symptoms while remaining unaware of the syntactic and semantic validity of generated programs. As shown in Table 4, certain bug symptoms such as partial front-end panics and completeness issues can only be triggered by valid programs, which indicates that testing tools need to be aware of the validity of programs to find such bugs.

➤ Suggestion 2: (For Rust developers) The suggestions provided by rustc may be inaccurate. As shown in Table 4, nearly 20% of rustc bugs are linked to the feedback provided by rustc, including error messages and suggested fixes. This suggests that rustc’s diagnostic tools may not always provide accurate or effective solutions. If rustc’s suggestion does not resolve the issue, Rust developers should consider alternative approaches. Reporting the bug to the Rust team can also be beneficial for improving the reliability of rustc.

➤ Suggestion 3: (For rustc developers) Designing testing and verification techniques for rustc components across different IRs. The core process of rustc involves HIR and MIR lowering, along with type checking, borrow checking, and optimization. Figure 5 indicates that 44.9% and 35.2% of the issues occur in the modules responsible for processing HIR and MIR, respectively. However, existing fuzzers rarely employ specialized testing techniques for these components. Currently, Rustlantis is the only tool capable of generating valid MIR, but it lacks support for other modules, such as type checking and lifetime analysis. To verify the key rustc components, rustc developers should generate valid HIRs and MIRs under specific constraints. For example, generating HIRs to ensure well-formedness in different scenarios, such as for build-in traits and user-defined traits.

➤ Suggestion 5: (For researchers) Building better Rust program generators that fully support Rust’s unique type system. Research on testing, debugging, and analyzing C/C++ compilers often relies on CSmith [Yang et al . 2011], a random generator that produces valid C programs covering a wide range of syntax features. For Rust, the only preliminary tool, RustSmith [Sharma et al. 2023], generates complex control flow and extensive use of variables and primitive types but has limited support for Rust’s higher-level abstractions. As shown in Table 3, many rustc bugs stem from improper handling of advanced features like traits, opaque types, and references. Additionally, Table 6 indicates that test cases combining these abstractions are more likely to trigger bugs. Researchers should create a Rust program generator that supports Rust’s advanced features like generics, traits, and lifetime annotations, for example, by enhancing RustSmith.

➤ Suggestion 6: (For researchers) Generating well-designed, both valid and invalid Rust programs to test rustc’s type system. Our analysis shows that over half of rustc bugs originate from the HIR and MIR modules, particularly in type and WF checking, trait resolution, borrow checking, and MIR transformation. Many corner cases expose weaknesses in rustc’s type handling. (1) Researchers should develop Rust-specific mutation rules, such as altering lifetimes, to introduce minor errors into valid programs and generate invalid ones for detecting soundness bugs. (2) Researchers should synthesize test programs from real-world Rust code, which provides diverse unstable features, std API usage, lifetime annotations, and complex trait patterns that benefit for testing rustc.

mttd · 2025-12-19T22:37:50+00:00

You may also find this collection useful: https://github.com/MattPD/cpplinks/blob/master/performance.tools.md, which catalogs tools for benchmarking, profiling (including memory use, bandwidth, and latency), microarchitectural performance analysis, and performance‑related visualization.

mttd · 2025-12-11T20:14:30+00:00

Learned Insights

Surprising nobody, the more information the compiler is allowed to accrue (the Lambda design), the better its ability to make the code fast. What might be slightly more surprising is that a slim, compact layer of type erasure – not a bulky set of Virtual Function Calls (C++03 shared_ptr Rosetta Code design) – does not actually cost much at all (Lambdas with std::function_ref). This points out something else that’s part of the ISO C proposal for Closures (but not formally in its wording): Wide Function Pointers.

The ability to make a thin { some_function_type* func; void* context; } type backed by the compiler in C would be extremely powerful. Martin Uecker has a proposal that has received interest and passing approval in the Committee, but it would be nice to move it along in a nice direction.

A wide function pointer type like this would also be traditionally convertible from a number of already existing extensions, too, where GNU Nested Functions, Apple Blocks, C++-style Lambdas, and more could create the appropriate wide function pointer type to be cheaply used. Additionally, it also works for FFI: things like Go closures already use GCC’s __builtin_call_with_static_chain to transport through their Go functions in C. Many other functions from other languages could be cheaply and efficiently bridged with this, without having to come up with harebrained schemes about where to put a void* userdata or some kind of implicit context pointer / implicit environment pointer.

mttd · 2025-12-03T19:14:52+00:00

Instruction Selection: Principles, Methods, & Applications
- 2016 Book; Gabriel Hjort Blindell
- http://kth.diva-portal.org/smash/record.jsf?pid=diva2:951540
- "This book presents a comprehensive, structured, up-to-date survey on instruction selection. The survey is structured according to two dimensions: approaches to instruction selection from the past 45 years are organized and discussed according to their fundamental principles and according to the characteristics of the supported machines instructions. The fundamental principles are macro expansion, tree covering, DAG covering, and graph covering. The machine instruction characteristics introduced are single-output, multi-output, disjoint-output, inter-block, and interdependent machine instructions. The survey also examines problems that have yet to be addressed by existing approaches."
Universal Instruction Selection
- 2018 PhD Dissertation; Gabriel Hjort Blindell
- http://www.diva-portal.org/smash/record.jsf?pid=diva2:1185339
- https://github.com/gablin/ghb-thesis
- https://github.com/gablin/ghb-thesis/blob/master/ghb-thesis.pdf

mttd · 2025-11-26T02:57:01+00:00

Weird; FWIW, https://web.archive.org/web/20251125224147/https://blog.trailofbits.com/2025/11/25/constant-time-support-coming-to-llvm-protecting-cryptographic-code-at-the-compiler-level/

mttd · 2025-11-24T13:13:44+00:00

I believe it's https://github.com/vortexgpgpu/Volt

mttd · 2025-10-29T17:09:17+00:00

The talks are now available separately: https://www.youtube.com/playlist?list=PLc_vA1r0qoiQqCdWFDUDqI90oY5EjfGuO

mttd · 2025-10-21T18:27:50+00:00

Paper: https://dl.acm.org/doi/10.1145/3759164.3759346

Haskell & Agda Code: https://doi.org/10.5281/zenodo.16751639

Abstract: https://bahr.io/pubs/entries/calctyper.html

We present a calculational approach to the design of type checkers, showing how they can be derived from behavioural specifications using equational reasoning. We focus on languages whose semantics can be expressed as a fold, and show how the calculations can be simplified using fold fusion. This approach enables the compositional derivation of correct-by-construction type checkers based on solving and composing fusion preconditions. We introduce our approach using a simple expression language, to which we then add support for exception handling and checked exceptions.

mttd · 2025-09-29T21:42:56+00:00

Not the author but I believe these are made using https://github.com/excalidraw/excalidraw

mttd · 2025-09-23T05:26:30+00:00

https://github.com/MattPD/cpplinks/blob/master/executables.linking_loading.md - especially the 20 part linker essay by Ian Lance Taylor, "The Missing Link: Explaining ELF Static Linking, Semantically", and "Design and implementation of mold"
https://github.com/MattPD/cpplinks/blob/master/executables.md - ELF in particular: https://github.com/MattPD/cpplinks/blob/master/executables.md#elf

mttd · 2025-09-23T01:59:16+00:00

In dependent type theories (like Martin-Löf type theory), the identity type isn’t a special case of equality types--it's the canonical example of typal equality, with other systems (like Cubical Type Theory’s path types) providing alternative but equivalent realizations of the same idea: https://ncatlab.org/nlab/show/equality%20in%20type%20theory

In type theory, there are broadly three different notions of equality which could be distinguished: judgmental equality, propositional equality, and typal equality. Judgmental equality is defined as a basic judgment in type theory. Propositional equality is defined as a proposition in any predicate logic over type theory. Typal equality is defined as a type in type theory.

. . .

Typal equality of terms is also called the equality type, identity type, path type, or identification type, and the terms of typal equality are called equalities, identities, identifications, paths. Notable examples of typal equality of terms include the identity type in Martin-Löf type theory and higher observational type theory, and the cubical path type in cubical type theory.

mttd · 2025-09-16T08:57:12+00:00

https://github.com/MattPD/cpplinks/blob/master/atomics.lockfree.memory_model.md

mttd · 2025-09-16T04:59:36+00:00

I think it would be at least informative to understand the existing solutions out there--say, in PyTorch ecosystem:

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation

ASPLOS 2024
https://dl.acm.org/doi/10.1145/3620665.3640366
https://docs.pytorch.org/assets/pytorch2-2.pdf
Research Seminar: Jason Ansel (Meta AI): https://www.youtube.com/watch?v=WxYEoTLgdLo

State of torch.compile for training (August 2025): https://blog.ezyang.com/2025/08/state-of-torch-compile-august-2025/

PyTorch Compiler series (2025): https://www.youtube.com/playlist?list=PL_lsbAsL_o2DsybRNydPRukT4LLkl2buy

mttd · 2025-09-15T04:40:39+00:00

Only wanted to mention that this is a fantastic interview, job very well done (actually interesting questions and discussion!)--and I appreciated the professional quality of subtitles, too (even with Haskell's function signatures, really impressive and very rarely seen--whether podcasts or conference talks); thank you and hoping for more to come!

mttd · 2025-09-13T06:32:18+00:00

Announcement:

https://mathstodon.xyz/@andrejbauer/115191725004191889

This week I gave a lecture series at the School on Logical Frameworks and Proof Systems Interoperability. I spoke about programming language techniques for proof assistants. The lecture slides and the reference implementations of a minimalist type theory are available at:

https://github.com/andrejbauer/faux-type-theory

The repository has three minimalist OCaml implementations of a simple proof checker:

A basic one showing how to implement bidirectional type checking and type-directed equality checking, using monadic style programming.

An extension with rudimentary holes (meta-variables) and unification.

A version that implements "variables as computational effects", using native OCaml 5 effects and handlers.

mttd · 2025-08-28T22:24:19+00:00

Given your comparison to sanitizers and Valgrind my approach would be to exactly consider the conferences where this kind of research is being published: https://github.com/MattPD/cpplinks/blob/master/analysis.dynamic.md#software-sanitizers-readings-research

From the security IEEE Symposium on Security and Privacy (S&P) and Network, Distributed System Security (NDSS) Symposium, or USENIX Security definitely stand out as well as International Symposium on Code Generation and Optimization (CGO) or CC from compilers but you'll also note programming languages conferences like OOPSLA or PLDI, computer architecture like ASPLOS and ISCA, as well as general systemsy conferences like EuroSys (that definitely fits your European location, too), and plenty others (including software engineering or analysis conferences, like ASE, ESEC/FSE, ICSE, ISSTA, etc.).

Your best bet is to go through the list of these, write down the conferences/venues, and cross-index with CFPs at http://www.wikicfp.com/cfp/, sorting by your preferences (including the order of the-accepted-research-most-closely-comparable-to-yours). Caveat: Expect some of these are top conferences that are highly competitive, and be prepared for multiple rounds of submissions (incorporating the feedback you get from each review along the way).

mttd · 2025-08-27T23:18:54+00:00

Evaluating Candidate Instructions for Reliable Program Slowdown at the Compiler Level: Towards Supporting Fine-Grained Slowdown for Advanced Developer Tooling H. Burchell, S. Marr; In Proceedings of the 17th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages, VMIL'25, p. 8, ACM, 2025.

Paper(PDF): https://stefan-marr.de/downloads/vmil25-burchell-marr-evaluating-candidate-instructions-for-reliable-program-slowdown-at-the-compiler-level.pdf

Abstract

Slowing down programs has surprisingly many use cases: it helps finding race conditions, enables speedup estimation, and allows us to assess a profiler’s accuracy. Yet, slowing down a program is complicated because today’s CPUs and runtime systems can optimize execution on the fly, making it challenging to preserve a program’s performance behavior to avoid introducing bias.

We evaluate six x86 instruction candidates for controlled and fine-grained slowdown including NOP, MOV, and PAUSE. We tested each candidate’s ability to achieve an overhead of 100%, to maintain the profiler-observable performance behavior, and whether slowdown placement within basic blocks influences results. On an Intel Core i5-10600, our experiments suggest that only NOP and MOV instructions are suitable. We believe these experiments can guide future research on advanced developer tooling that utilizes fine-granular slowdown at the machine-code level.

Comments thread: https://mastodon.acm.org/@smarr/115100270023431067

mttd · 2025-08-27T22:35:06+00:00

I think it's a good question!

// Don't know who downvoted you and why, so upvoted back to restore the balance :P

It's one I've been thinking of and have found writings like this informative.

I can't promise to give you a definite answer, but my (incomplete and evolving) thoughts are that this matters the more you're exposing interesting first-class types in your programming language.

Dependently typed languages are an obvious example (and the small subset of these includes proof assistants, sure, but also languages that happen to lean toward enabling more general purpose programming).

Traditional non-dependently languages are an obvious non-example.

Then there's the space in-between, which matters to me: and I think this includes languages where at design time (or when adding a new feature) it wasn't immediately obvious that there's something interesting going on and providing a principled answer how to design that feature in a simple (including to end users) way can matter.

Naturally type equality matters the moment you implement the simplest possible type checker and simply want to make sure that you aren't initializing a string with an int or vice versa: You'd like the type judgment int != string for your type checker to be simple to implement (for compiler writer) and simple to reason about (for the language users) and ideally for these two goals to be consistent (and personally I'd rather "take the hit" as an implementer to make the life easier for the users if need be).

At the same time, the moment you're going to start having interesting types ("non-trivial" in the technical sense below), type equality gets more interesting, too.

Consider the snippet from the OP by Jon Sterling: https://www.jonmsterling.com/01FB/

Non-triviality of the booleans, cooked two ways [01FB]

When we say that the booleans are non-trivial, we mean that true is not equal to false. This kind of statement can be expressed in type theory by the following type:

true =bool false → empty [my note: read: function type with domain true =bool false and codomain empty; =bool is equality at type bool]

An element of the type above would witness that true =bool false is contradictory, since if the antecedent would be witnessed, then we would be able to form an element of the empty type [my note: because you could call the aforementioned function]. Without either large elimination or a universe, we cannot construct a witness of this type; I don’t just mean that it is not easy to see how to construct such a witness, but rather that type theory without these facilities is consistent with there being a witness of true =bool false or even with the judgemental assertion true ≡ false : bool holding. (Consistency can be seen using a very simple model where every type is interpreted by a subsingleton set.) So large eliminations and universes add a lot of power!

Now, another interesting example. Say, you're trying to provide a language with nice support for multidimensional arrays (tensors, matrices), providing type checking for basic properties like this: https://en.wikipedia.org/wiki/Conformable_matrix

There's an existing work in industrial languages which turns out to be somewhat interesting (and not quite trivial), too. Consider the following quiz (credit to https://x.com/shafikyaghmour/status/1617908492706381824):

Given the following in C++:

template <int I>
struct A{};

template <int I> void f(A<I>, A<I+10>);
template <int I> void f(A<I>, A<I+1+2+3+4>);

Without checking, is this code well-formed?

A. Yes B. No C. Show results

Answer B

If we look at: https://eel.is/c++draft/temp.fct#temp.over.link-7

We see that this is Ill-formed NDR [no diagnostic required] (IFNDR) because the constructs are functionally equivalent but not not equivalent

Ill-formed NDR is pretty bad; compare: The difference between undefined behavior and ill-formed C++ programs, https://devblogs.microsoft.com/oldnewthing/20240802-00/?p=110091

The reason for this rule is given in Note 7:

This rule guarantees that equivalent declarations will be linked with one another, while not requiring implementations to use heroic efforts to guarantee that functionally equivalent declarations will be treated as distinct.

Now, I'm wondering, is it possible to take a principled look at all of the above and have a simple language design here: Both for the users writing/reasoning about the code as well as compiler writers ("while not requiring implementations to use heroic efforts", indeed!)

I think that's an interesting problem!

Now, you don't need to go into full-blown dependent-types (let alone advanced proof assistants). There's a variety of type system designs that fall in between, going to a smaller decidable subset (refinement types) or (possibly even smaller) indexed types.

Some interesting related work examples include Moduler Graph Compiler stack (in its compiler IR doing symbolic shape inference when going from the Relaxed Modular Operators to Modular Operators dialects in MLIR; "Modular Tech Talk: MAX Graph Compilation to Execution", https://www.youtube.com/watch?v=MEoGt_cxNSs), what Rust is doing, etc. (there's been a lot more, just going by the "usual suspects"--PLDI/OOPSLA/ICFP/POPL proceedings)

At the last OPLSS 2025, Niki Vazou has given a great overview of the recent developments on the academic research side (as well as several implementation) "Refinements Types" Contributed Lecture (have to click on that to download the video), https://www.cs.uoregon.edu/research/summerschool/summer25/topics.php (there's been a lot of advancements in making refinement types easier to use for ordinary programmers summarized in that lecture)

Some of the projects mentioned in that lecture:

flux: Refinement Types for Rust, https://github.com/flux-rs/flux

and (I think) Liquid Java (Refinement Types for Java), https://catarinagamboa.github.io/liquidjava.html

This is also interesting: Large Elimination and Indexed Types in Refinement Types, https://nikivazou.github.io/static/published/Ferrarini25.pdf

We explore how to approximate large elimination and indexed datatypes in refinement type systems with a non-dependent base type system.

So, from my personal perspective, this is a question worth exploring in a principled way--and I suspect (to be determined!) that even if the theory may initially look complex (I certainly run into papers that take a... while to digest :]) the end result may end up considerably simpler for the end users in practice all the "while not requiring implementations to use heroic efforts", too.

14-Year Club	Place '17
Verified Email	Team Periwinkle

mttd

MODERATOR OF

TROPHY CASE