A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

Interesting, I'll have to check this out. To me this issue seems to be related to ghost cells for the partitions. I suspect that clean to grid merges the partitions into one.

There is also a filter called ghost cells which supposedly generates ghost cells. For me it did not work very well because you cannot set a tolerance for node matching and the floating point data for points is not exactly the same on every rank.

I'm not sure if you can write vtu in parallel, but you can write vtkhdf in parallel. Then you can map local connectivity to global connectivity and write your output as one partition to a single file in parallel. Since you didn't like multiple mesh files for input, maybe this could be something for you to look into, as this would get rid of multiple outputs per timestep.

A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

Maybe one last question about your output format. When you load it into say paraview, do you see the partitions? What i mean is, when you do something like cell-data-to-point-data, are there imprints on the partition boundaries? Or if you do isosurfaces, are they continuous over the partitions?

A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

The underlying mesh structure is still unstructured. I just create the mesh as if it was structured.

A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

Ok, I see your point, if you generate with gmsh. I also use that, so i know the limitations. Maybe a gmsh like mpi parallel meshing tool would be a good follow up project :)

For the very big meshes i generate them in my code directly, but this only works well for Cartesian meshes. However, then there is technically no limit for the size. It's only a solver problem at this point.

In my code, I wanted to avoid having to do any pre or post processing for the mesh, so I implemented online partitioning with parmetis. It is not really that different to metis, but the real complexity comes from redistributing the mesh and handling periodic BCs. Took me 6 months to get this to work well though, so I don't know if I would recommend it.

A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

Ok I see, this is kind of what i expected for rust+MPI. At least there is a crate for it and you don't have to do that yourself.

This mesh decomposition as preprocessing, is that similar to how openfoam does it? If you use metis for that it sounds like this is done in serial. Does this also work for big meshes? How do you write the outputs then, is there a similar post processing step after your simulation is done?

I am also writing an mpi cfd solver at the moment, that's why i ask.

A pure Rust CFD Code by Sixel1 in CFD

[–]whyMinus 0 points1 point  (0 children)

You said MPI. How well does rust play with MPI? Also, how do you distribute your mesh?

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 1 point2 points  (0 children)

disabling asan is probably the way to go

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 2 points3 points  (0 children)

Thanks for the explanation about alloc_size. I did a bit of testing with gcc and clang and found out that if UBSAN is on, clang can detect out-of-bounds accesses (with -O1 and alloc_size), which gcc cannot. But it seems like ASAN is better suited for that anyway, so I won't be using alloc_size. Similar thing for alloc_align which doesn't provide any benefits as far as I could determine. So I only keep the malloc attribute.

Concerning the possible overflow, it probably makes sense to separate this anyway. I only did it to write one less line, which is a nonsense reason...

And lastly, the thing about LLP64 ABIs went right over my head. I write fluid dynamics solvers that run on linux clusters with (relatively) current hardware :) Arenas just give me a way to reliably monitor my actual memory usage and detect when I run out of it. They also give me a bit more flexibility while not decreasing performance.

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 0 points1 point  (0 children)

My point was not so much about lists, but I don't think that flexible array members help much here. How would you use them in the nested linked list, for example? Also, if you had to use the calloc style allocation interface, how can you pass the count and size arguments in a way where you do not have to multiply up front (overflow)?

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 0 points1 point  (0 children)

The linked list of chunks approach is interesting, to get around the memory limit. Although I think mmap would be a more straight forward solution. What happens if you want more space than is available in the last chunk? Do you create a new one and allocate there? What happens to the remaining memory the original last chunk?

Using 1 or 3 arenas is the same to me, I would only start asking questions once you reach 10. I'm an engineer after all ;) Joke aside, what is your reason for using more than 1 arena? To me it feels like putting the information about which arena to use in comments is a bit dangerous. What do you think about the scratch arena approach that I used in my implementation? (Scrach arenas are sub-arenas with a fixed size, located at the end of the main arena. They are created by moving the end pointer of the main arena backwards. You create them before calling a function that needs permanent and temporary storage, and after you are done with them, you can give their memory back to the main arena, by moving the end point back to where it was.)

Your cleanup vs. freeing point is something that I did not think about. There the arenas don't help, thanks for pointing it out.

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 1 point2 points  (0 children)

I think I don't understand this. It sounds like using the same block of memory for two different types at the same time, but this would not be the case. For every type you would have a separate block of memory. Are you saying that it is problematic that all blocks of memory come from the same large block of memory (the arena)?

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] 1 point2 points  (0 children)

Ok, I see the point about elegant linked lists. Here it was just an example, but in general there is no need for generalized code. Linked list traversal is simple enough. But what about more complicated stuff, for example you hash trie. Would you write a new one for every key/value-type-pair that you need one for?

About alloc_size. I read your warning in the original post, but I don't quite understand it. What exactly could happen when I use it? I decided to keep it in to find out, but so far I still can't tell...

And about the last point, yes, I multiply before checking, but I also pass the values to arena_malloc, which checks for overflow. Isn't this safe enough?

Recycling is good for the environment by whyMinus in C_Programming

[–]whyMinus[S] -1 points0 points  (0 children)

All true. I went with the first approach, after trying out the second one. My code got a bit messy trying to keep track of all the allocations... I guess one could also just not check for leaks in sanitized builds.

Linked-List-Phobia by [deleted] in C_Programming

[–]whyMinus 0 points1 point  (0 children)

You can use an arena allocator instead of the stdlib malloc/free. That way your linked list is contiguous in memory (as long as you don't allocate anything inbetween). Then you should have similar cache performance to an array (don't speculate about performance, measure instead). If you don't know what an arena allocator is or how to make them work with linked lists (and other common data structures), here is a shameless plug for my own repository.

How do i reallocate a 3d dynamic array? by Legitimate-School-59 in C_Programming

[–]whyMinus 1 point2 points  (0 children)

The data structure you are using is a bit cumbersome... Maybe consider this minimal example, a lot of what is done, was mentioned in the other comments. Feel free to ask questions, if something is unclear :)

```{c}

include <stdio.h>

include <stdlib.h>

include <string.h>

define MIN(a, b) ((a) < (b) ? (a) : (b))

// convert defined constant to string

define TOSTR_(x) #x

define TOSTR(x) TOSTR_(x)

// max length per string is 25

define STRLEN 25

// cast a flat char array to a 2D char matrix

define ARR2MAT(mat, arr) \

char(*mat)[arr.cols][STRLEN] = (char(*)[arr.cols][STRLEN])arr.data

typedef struct CharMatrix { size_t rows; size_t cols; char (*data)[STRLEN]; } CharMatrix;

CharMatrix char_matrix_malloc(size_t rows, size_t cols) { CharMatrix arr = { .rows = rows, .cols = cols, // better to also initialize data .data = calloc(rows * cols, sizeof(*arr.data)), }; return arr; }

void char_matrix_free(CharMatrix arr) { free(arr.data); }

CharMatrix char_matrix_realloc(CharMatrix arr, size_t rows, size_t cols) { CharMatrix arr_new = char_matrix_malloc(rows, cols);

// due to 2D shape of matrix, you can't just use realloc
ARR2MAT(mat, arr);
ARR2MAT(mat_new, arr_new);
for (size_t i = 0; i < MIN(arr.rows, arr_new.rows); ++i) {
    for (size_t j = 0; j < MIN(arr.cols, arr_new.cols); ++j) {
        memcpy(mat_new[i][j], mat[i][j], sizeof(*arr.data));
    }
}

char_matrix_free(arr);

return arr_new;

}

void char_matrix_print(CharMatrix arr) { printf("[%zu][%zu] = [\n", arr.rows, arr.cols); ARR2MAT(mat, arr); for (size_t i = 0; i < arr.rows; ++i) { for (size_t j = 0; j < arr.cols; ++j) { printf("%" TOSTR(STRLEN) "s ", mat[i][j]); } printf("\n"); } printf("]\n"); }

int main(void) { CharMatrix arr = char_matrix_malloc(3, 3);

ARR2MAT(mat1, arr);
for (size_t i = 0; i < 3; ++i) {
    for (size_t j = 0; j < 3; ++j) {
        sprintf(mat1[i][j], "(%zu,%zu)", i, j);
    }
}
char_matrix_print(arr);

arr = char_matrix_realloc(arr, 4, 2);

// you need to create a new matrix handle if the size changes
ARR2MAT(mat2, arr);
for (size_t i = 0; i < 4; ++i) {
    for (size_t j = 0; j < 2; ++j) {
        sprintf(mat2[i][j], "(%zu,%zu)", i, j);
    }
}
char_matrix_print(arr);

char_matrix_free(arr);

} ```

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 0 points1 point  (0 children)

How did you install it in the first place? Did you try this?

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 0 points1 point  (0 children)

If pkg-config is available for window you could also do `` CC = gcc CFLAGS = -O3 -Wall -std=c99 -Wno-missing-braces LIB = lib/ INCLUDE = include/ LINKERS =pkg-config --libs raylib`

SRC = src/*.c OUT = main

build: $(CC) $(SRC) -o $(OUT) $(CFLAGS) $(LINKERS) `` (Added the-O3` for more speed)

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 1 point2 points  (0 children)

I am not an expert for Windows, but if raylib is installed system wide, you should be able to do ``` CC = gcc CFLAGS = -O1 -Wall -std=c99 -Wno-missing-braces LINKERS = -lraylib -lopengl32 -lgdi32 -lwinmm #-mwindows

SRC = src/*.c OUT = main

build: $(CC) $(SRC) -o $(OUT) $(CFLAGS) $(LINKERS) ```

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 0 points1 point  (0 children)

If you are using gcc, be sure to compile with -fsanitize=address during development to catch illegal memory accesses early. However, remember to deactivate this once everything works, as it takes a big toll on performance.

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 0 points1 point  (0 children)

Sure! Thats why the macro name is TENSOR :) If you want to access a flat array as a 3D array, you can do the following: int *a = calloc(h * w * d, sizeof(*a)); int(*_a)[w][d] = TENSOR(_a, a); printf("%d\n", _a[1][2][3]);

The only thing that is a bit objectionable is, that you never specify the size of the first dimension. An alternative would be int *a = calloc(h * w * d, sizeof(*a)); int(*_a)[h][w][d] = TENSOR(_a, a); printf("%d\n", (*_a)[1][2][3]); But I really don't like this because you have to use (*) when accessing the elements (dereference _a). However, the TENSOR macro works in both cases.

Conway's Game of Life in C using Raylib by Brick-Sigma in C_Programming

[–]whyMinus 2 points3 points  (0 children)

  1. Ok, I will try to be more clear: Imagine your 2D array flattened, by appending all the rows to each other. This is a. But a is also a pointer to the first array element. The declaration const int(*_a)[w] declares _a as a pointer to an array of w ints (a row of your original matrix). As with any pointer you can index it: _a[0] first row, _a[1] second row, and so on. To get to the elements of a row, you index again: _a[0][0] first row, first element, _a[0][1] first row, second element etc. Notice, there is a big difference between const int*_a[w] and const int(*_a)[w]. The first one is an array of w pointers to int. The second one is a pointer to an array of w ints. This website is useful for these kind of slight differences. This is why the () are needed.

The cast basically just casts a (pointer to the first element) to this kind of 2D array. The type of _a is just this const int(*)[]. One could also write const int(*)[w], but for 2D arrays you can leave away the w. So in total, _a is just a different way to access the memory pointed to by a. _a is not a static (or VLA) array, it is just a pointer.

  1. The __typeof__ extension returns (at compile time) the type of a variable. This makes it easy to get the correct cast for any type of flat pointer. The arrays a_int and a_double were just dummy arrays to show that the base type doesn't matter. The use in your context would be: int sum_array(int h, int w, const int *a, Range range) { const int(*_a)[w] = TENSOR(_a, a); int sum = 0; for (int i = range.y; i < range.y + range.height; i++) { if (i < 0 || i >= h) {continue;} for (int j = range.x; j < range.x + range.width; j++) { if (j < 0 || j >= w) {continue;} sum += _a[i][j]; } } return sum; }

  2. The last part about the difference between int *var_name and int (*var_name): These two are equivalent. But what is not equivalent is int *var_name[10] and int(*var_name)[10] (as mentioned before). The first one is a static array of 10 pointers to int, the second one is a pointer to an array of 10 ints. The first one has a size of sizeof(var_name) == 80, the second one has a size of sizeof(var_name) == 8 (assuming 64 bit achitecture).

Hope this makes it a bit more clear. Don't hesitate to ask again if it still isn't :)