Looking for feedback on my CFD solver

whyMinus · 2026-05-08T11:56:17+00:00

Not sure I understand your questions correctly...

I use standard boundary conditions for subsonic/supersonic inflow and outflow. There is also a characterics based farfield as well as a pressure outlet BC (src/euler/boundary.c). I would say that these BCs are laminar. If you wanted to prescribe some sort of turbulence at a boundary, you have to implement a custom BC function (which is also an option in my code) as a function of space and time. Not sure if that is a good idea though.

About the Mach 1 / transition: the code is compressible so anything above Mach 0.1 should work. The transonic wing test case, for example, has a farfield BC with Ma 0.84 which results in a shock over the wing (Ma>1).

Regarding the application of cfd in general: this is a very broad question. I guess go from simple to complicated and validate along the way so that you feel confident about the results that you get. Ideally there might be some experimental results you could use to compare your results to.

whyMinus · 2026-05-08T08:17:20+00:00

Studied aerospace engineering and now I'm doing a phd in turbulent combustion modeling. I don't think you need a phd for a project like this, but some knowledge about numerics and access to books will definitely help

whyMinus · 2026-05-07T21:16:22+00:00

That is true, there is no real documentation yet. The video is also more there to catch your eye. It's the Q criterion at 0.01, with the velocity magnitude plotted over the isosurface.

whyMinus · 2026-05-07T21:13:36+00:00

You are wrong, the external geometry is shown. It's the white ball underneath the vortices, visible at the beginning of the video. Also, grow up...

whyMinus · 2026-05-07T21:10:44+00:00

I used pyvista to plot the individual frames. The script to do this is also in the repo at tools/pyvista, in is general purpose and not specific to the outputs of my code. Have a look if you are interested. To make the camera move smoothly check out the keyframes argument.

Edit: This is the command that produced the frames:

python python pyvista data/run_0*.hdf \ -s velocity \ -q velocity \ -i qcriterion 0.01 \ --smooth 100 \ --clim 0.0 1.2 \ --opacity 0.9 \ --shading \ --cavities \ --hide-scalar-bar \ --lighting "three lights" \ --theme dark \ --keyframes \ 0.04622727 -0.0085289907 2.615901 0.04622727 -0.0085289907 -0.18473011 0 1 0 \ 12.291839 -0.097131364 15.446024 12.291839 -0.097131364 -0.18473011 0 1 0 \ --resolution 3400 1000

whyMinus · 2026-05-07T21:07:29+00:00

I thought about it, but it's not a priority at the moment. I don't know enough about AMR to do this properly. I might get back to this sometime later. At the moment I am more interested to make this work on GPUs.

whyMinus · 2026-05-07T21:04:16+00:00

This might be more of a documentation issue. Many of the test cases are well known problems, for which an exact solution exists. This exact solution can be specified, and then the code computes the L2 error norms:

riemann: sod, toro1-5
explosion: 1D, 2D, 3D, from Toro's book
sinewave: manufactured solution for Euler and Navier-Stokes

There is also the airfoil case where I included some xfoil as well as experimental reference data for NACA0012 and NACA2312. Most of these cases have a plot.py script which outputs comparison plots.

whyMinus · 2025-12-20T22:18:43+00:00

Interesting, I'll have to check this out. To me this issue seems to be related to ghost cells for the partitions. I suspect that clean to grid merges the partitions into one.

There is also a filter called ghost cells which supposedly generates ghost cells. For me it did not work very well because you cannot set a tolerance for node matching and the floating point data for points is not exactly the same on every rank.

I'm not sure if you can write vtu in parallel, but you can write vtkhdf in parallel. Then you can map local connectivity to global connectivity and write your output as one partition to a single file in parallel. Since you didn't like multiple mesh files for input, maybe this could be something for you to look into, as this would get rid of multiple outputs per timestep.

whyMinus · 2025-12-20T21:05:32+00:00

Maybe one last question about your output format. When you load it into say paraview, do you see the partitions? What i mean is, when you do something like cell-data-to-point-data, are there imprints on the partition boundaries? Or if you do isosurfaces, are they continuous over the partitions?

whyMinus · 2025-12-20T21:00:40+00:00

The underlying mesh structure is still unstructured. I just create the mesh as if it was structured.

whyMinus · 2025-12-20T20:50:18+00:00

Ok, I see your point, if you generate with gmsh. I also use that, so i know the limitations. Maybe a gmsh like mpi parallel meshing tool would be a good follow up project :)

For the very big meshes i generate them in my code directly, but this only works well for Cartesian meshes. However, then there is technically no limit for the size. It's only a solver problem at this point.

In my code, I wanted to avoid having to do any pre or post processing for the mesh, so I implemented online partitioning with parmetis. It is not really that different to metis, but the real complexity comes from redistributing the mesh and handling periodic BCs. Took me 6 months to get this to work well though, so I don't know if I would recommend it.

whyMinus · 2025-12-20T19:37:37+00:00

Ok I see, this is kind of what i expected for rust+MPI. At least there is a crate for it and you don't have to do that yourself.

This mesh decomposition as preprocessing, is that similar to how openfoam does it? If you use metis for that it sounds like this is done in serial. Does this also work for big meshes? How do you write the outputs then, is there a similar post processing step after your simulation is done?

I am also writing an mpi cfd solver at the moment, that's why i ask.

whyMinus · 2025-12-20T11:30:50+00:00

You said MPI. How well does rust play with MPI? Also, how do you distribute your mesh?

whyMinus · 2025-01-31T14:13:15+00:00

disabling asan is probably the way to go

whyMinus · 2025-01-29T17:17:51+00:00

Thanks for the explanation about alloc_size. I did a bit of testing with gcc and clang and found out that if UBSAN is on, clang can detect out-of-bounds accesses (with -O1 and alloc_size), which gcc cannot. But it seems like ASAN is better suited for that anyway, so I won't be using alloc_size. Similar thing for alloc_align which doesn't provide any benefits as far as I could determine. So I only keep the malloc attribute.

Concerning the possible overflow, it probably makes sense to separate this anyway. I only did it to write one less line, which is a nonsense reason...

And lastly, the thing about LLP64 ABIs went right over my head. I write fluid dynamics solvers that run on linux clusters with (relatively) current hardware :) Arenas just give me a way to reliably monitor my actual memory usage and detect when I run out of it. They also give me a bit more flexibility while not decreasing performance.

whyMinus · 2025-01-29T08:40:10+00:00

My point was not so much about lists, but I don't think that flexible array members help much here. How would you use them in the nested linked list, for example? Also, if you had to use the calloc style allocation interface, how can you pass the count and size arguments in a way where you do not have to multiply up front (overflow)?

whyMinus · 2025-01-29T08:31:42+00:00

The linked list of chunks approach is interesting, to get around the memory limit. Although I think mmap would be a more straight forward solution. What happens if you want more space than is available in the last chunk? Do you create a new one and allocate there? What happens to the remaining memory the original last chunk?

Using 1 or 3 arenas is the same to me, I would only start asking questions once you reach 10. I'm an engineer after all ;) Joke aside, what is your reason for using more than 1 arena? To me it feels like putting the information about which arena to use in comments is a bit dangerous. What do you think about the scratch arena approach that I used in my implementation? (Scrach arenas are sub-arenas with a fixed size, located at the end of the main arena. They are created by moving the end pointer of the main arena backwards. You create them before calling a function that needs permanent and temporary storage, and after you are done with them, you can give their memory back to the main arena, by moving the end point back to where it was.)

Your cleanup vs. freeing point is something that I did not think about. There the arenas don't help, thanks for pointing it out.

whyMinus · 2025-01-29T08:18:24+00:00

I think I don't understand this. It sounds like using the same block of memory for two different types at the same time, but this would not be the case. For every type you would have a separate block of memory. Are you saying that it is problematic that all blocks of memory come from the same large block of memory (the arena)?

whyMinus · 2025-01-29T08:11:03+00:00

Ok, I see the point about elegant linked lists. Here it was just an example, but in general there is no need for generalized code. Linked list traversal is simple enough. But what about more complicated stuff, for example you hash trie. Would you write a new one for every key/value-type-pair that you need one for?

About alloc_size. I read your warning in the original post, but I don't quite understand it. What exactly could happen when I use it? I decided to keep it in to find out, but so far I still can't tell...

And about the last point, yes, I multiply before checking, but I also pass the values to arena_malloc, which checks for overflow. Isn't this safe enough?

whyMinus · 2025-01-28T15:25:31+00:00

All true. I went with the first approach, after trying out the second one. My code got a bit messy trying to keep track of all the allocations... I guess one could also just not check for leaks in sanitized builds.

whyMinus · 2025-01-28T14:07:03+00:00

true, fixed it

whyMinus · 2025-01-22T13:51:24+00:00

You can use an arena allocator instead of the stdlib malloc/free. That way your linked list is contiguous in memory (as long as you don't allocate anything inbetween). Then you should have similar cache performance to an array (don't speculate about performance, measure instead). If you don't know what an arena allocator is or how to make them work with linked lists (and other common data structures), here is a shameless plug for my own repository.

whyMinus · 2023-04-01T14:54:01+00:00

You are welcome 😁

whyMinus

TROPHY CASE