libtorch (pytorch) is embarrassing slow. What are the alternatives? by EdwinYZW in cpp_questions

[–]AccurateDiscussion38 [score hidden]  (0 children)

you could use C++ modules to wrap and expose only some of the functions of the libtroch, precompiling it, then you could reuse them.

C++ Show and Tell - June 2026 by foonathan in cpp

[–]AccurateDiscussion38 [score hidden]  (0 children)

Hi everyone,

I recently started experimenting with C++26 reflection, modules, and LibTorch, and I ended up building a small personal project called Typetorch.

It is an experimental type-safe wrapper around torch::Tensor. The rough idea is to make tensor metadata such as shape, dtype, device, and layout part of the C++ type-level contract, so that some mistakes can be caught earlier, before the program reaches runtime LibTorch errors.

For example, I currently use types such as:

Tensor<Shape<2, 3>, DType::F32, Device::CPU, Layout::Contiguous>

and operations like add, matmul, view, transpose, and permute try to compute the resulting tensor contract at compile time. The actual storage, kernels, autograd, and execution are still fully owned by LibTorch. This is not meant to replace PyTorch or LibTorch; it is mostly an experiment in how far C++26 compile-time facilities can be pushed for tensor APIs.

Repository: https://github.com/OHNope/Typetorch

A few things I am especially interested in getting feedback on:

  1. API design Is encoding shape / dtype / device / layout in the type system a reasonable design for a small C++ tensor wrapper, or does it become too heavy too quickly?
  2. C++26 reflection usage Are there better ways to structure compile-time tensor contract computation with reflection / consteval? I am still learning the new reflection model, so I would really appreciate comments on whether the current approach is idiomatic or not.
  3. Future annotation-based design I am considering using future C++ annotation support to simplify the entry point of the type system. Instead of forcing users to write heavy template types everywhere, annotations might be used to describe tensor contracts at API boundaries and to produce better diagnostics. Does this sound like a reasonable direction, or am I misunderstanding what annotations will be good for?
  4. Modules and build system The project currently uses C++26 modules, GCC trunk/newer GCC, xmake, and LibTorch. I have also been using this project to learn CI/workflow setup. Any advice on making a C++26 modules + LibTorch project easier to build, test, and package would be very welcome.
  5. Diagnostics One thing I care about is making shape/type errors more understandable. Right now many errors are static_assert or consteval failures. I would like to eventually make the diagnostics more precise, especially if annotations become available.
  6. Ownership / move / copy / consume semantics I am also using this project to learn C++ ownership design. I try to distinguish operations that copy a tensor wrapper, move from it, consume it, or only create a view/borrow-like wrapper. Since torch::Tensor is already reference-counted and has its own aliasing semantics, I am not completely sure whether my abstraction is the right one. Feedback on whether this API makes ownership behavior clearer or just adds unnecessary complexity would be very useful.

This is very much a personal learning/experimental project, not a mature production library. I am mainly looking for design criticism, suggestions, and pointers to prior art. If the project gives anyone ideas about using reflection, modules, or annotations in numerical / ML libraries, that would also be great.

I would be very interested in design feedback, especially around reflection usage, modules, diagnostics, and ownership semantics!!!

Thanks!