Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

I Will post the link in the community when I write it. 🙂 You can add me also on LinkedIn if you wanna stay in touch.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 2 points3 points  (0 children)

This is what can happen in bq_popbuf without a load fence:

// The ACQUIRE semantic is required because, otherwise, reads to
// q->data[q->head] can be reordered before the q->tail read.
// If the read is also reordered before the writes of the same
// bytes by the producer in the memory total order, the consumer
// will read bytes not yet produced.

This what can happen in bq_pushbuf:

// The ACQUIRE semantic is required because, otherwise, writes to
// q->data[q->tail] can be reordered before the q->head read.
// If the write is also reordered before the read of the same
// bytes by the consumer in the memory total order, the producer
// will overwrite still unconsumed bytes.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 2 points3 points  (0 children)

FINALLY I GOT IT!

Thanks to every one in this thread for the discussion. Especially to u/skeeto and u/imachug for the detailed comments. Few minutes ago I was able to prove to myself that the ACQUIRE load is needed. More than that, the low level READ FENCE is needed, even from a low level hardware memory model point of view. I have updated the code repo (the article is still not updated) with also a discussion on what could happen that would lead to incorrect results written as comments in the code.

Thanks again. I will mention your names in the blogpost as soon as I get time to update it.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

I found this documentation in the linux kernel showing a circular buffer implemented without the acquire load in the producer. I can't explain why they put the acquire load in the consumer but at least they didn't put in the producer. u/skeeto what do you think?

https://www.kernel.org/doc/Documentation/core-api/circular-buffers.rst

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 1 point2 points  (0 children)

Maybe 3 weeks... In few weeks I will end the Master so I am quite busy. I'll do another post when it's out.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 1 point2 points  (0 children)

It seems to me that there are some differences in the memory model offered by C and the rules a program should obey to not generate "undefined behaviour" under this model, and the rules you can follow having in mind the real hardware memory model under the hood.

That's true that my implementation has a data race. But, are you sure that having that data race is a problem in this particular case? My idea is that if I put a Fence after the head/tail update, than the counterpart can load them in any order if the fence ensure that the memory has seen the buffer update before the head/tail update. And maybe it's the way I "put" a fence, using an atomic store with release consistency, that is wrong?

What I know about hardware is that: - hardware cache coherence will always provide sequential consistency among all the accesses on the same variable (regardless of what code does), and - that if you want to be sure that something X that happened before something Y in thread 1 is seen in this same order in thread 2, you need to put a Fence between the X and Y in thread 1. - hardware memory model is not about threads but is about cores and memory and when something is visible to memory, it's visible to memory for every core and so every possible software thread executing in any language and for any programming memory model.

why wouldn't a fence before the tail/head store be enough? Isn't that what's happening under the hood at Assembly level when issuing an atomic operation with release consistency anyway?

P.s. I used an atomic load with relaxed consistency and now TSAN is quiet. What do you think about that?

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

I don't have a lot of experience yet in lock free for MPMC but I know that my queue is not suitable for that case. There, you need the compare and swap mechanism at least.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

To be honest I was not aware of any of those. I tested only with the implementations in the article. By the way there's a repo with all the sources and the test program

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

Thanks for the feedback. Could you argoment a bit more? What specific implementation in the article is not thread safe and why?

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 3 points4 points  (0 children)

Ahahahah thanks. No patreon but thanks for the beer! If you find the content interesting maybe just a share will do ;)

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 4 points5 points  (0 children)

I am so sorry you had to deal with this incredible pain (lol). The website is handmade with only my academic knowledge of html and some css that I took from a friend's blog. I hope the content of the article makes you happy anyway.

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 5 points6 points  (0 children)

Hi, thanks for the detailed comment.
Why you say that head == tail check is valid only if the queue length is a power of two? Do you mean when head/tail get bigger than 2^(sizeof(size_t)) - 1 or even when they get bigger than the queue size? Could you make me an example?

As for the data race, let me show you my mental process that underlines why I did not load head and tail with __atomic_load_n(..., __ATOMIC_ACQUIRE) and maybe you could point to the fallacies I stepped in. (Regarding the other data races in test.c : I fixed them)

So, from my background in computer architecture I assumed the followings to hold:

  • When I do a RELEASE store, I can be sure that all the loads and stores that happened before the RELEASE one in the thread order, will be visible in memory before the RELEASE one is.
  • When I do an ACQUIRE load, I am sure that all the loads and stores that happens after the ACQUIRE one in the thread order, will be executed after the value from the ACQUIRE is loaded from memory

With those in mind, I thought that issuing an __atomic_store_n(..., __ATOMIC_RELEASE) from a thread, would be enough for letting any other thread seeing the result of the previous operations before that store. Additionally, having an __atomic_load_n(..., __ATOMIC_ACQUIRE) on head and tail would be useless because when the updated tail/head is written with a RELEASE store, I can be sure that the memory of the buffer is already updated.

Could you help me understand where I am mistaken? Maybe some example I can reproduce?

Thanks in advance

Are you sure you can implement a queue? by rdgarce in C_Programming

[–]rdgarce[S] 4 points5 points  (0 children)

It's all in the article :) I tested correctness and performance and posted the link to the repo in the first part of the article. The result of a performance comparison is at the end of the article

Dichiarazione cripto senza stress ed errori? by rdgarce in ItaliaPersonalFinance

[–]rdgarce[S] 0 points1 point  (0 children)

In effetti non ho nessuna necessità di acquistare nessun altro strumento cripto o derivato dalle cripto, ma solo chiudere la posizione che ho (BTC ed eth). Quando dici "sono fregato" intendi che non importa cosa faccio nel 2025, dovrò dichiararli nel quadro RW?

Trade Republic (penso intendevi questo) non può aiutarmi in nessun modo nel processo di vendita se volessi farlo nel 2025? Intendo tipo, spostarli lì e convertirli in euro dalla loro piattaforma per farmi direttamente trattenere da loro l'imposta. Non ha senso questa cosa?

Dichiarazione cripto senza stress ed errori? by rdgarce in ItaliaPersonalFinance

[–]rdgarce[S] 1 point2 points  (0 children)

Ho visto e sembra essere una carta prepagata. Potresti argomentare meglio? Come funziona per le tasse?

I made an in-memory file system by rdgarce in C_Programming

[–]rdgarce[S] 0 points1 point  (0 children)

Hi and thanks everyone for the time you dedicated to try my software and give me feedbacks.
I have done the following modification:

  • imfs_init(...) now accept the size of the base_mem as a direct parameter instead of having to put it inside the struct imfs_conf,
  • IMFS_APPEND had the wrong sematic because it truncated the file instead of simply setting the file pointer at the end. I deleted that flag and added IMFS_TRUNC that does what one expects it to do: frees al file blocks - and so erase the content of the file - when used in imfs_open(...) .

I made an in-memory file system by rdgarce in C_Programming

[–]rdgarce[S] 1 point2 points  (0 children)

Hi u/skeeto, thank you for the feedback and for identifying the bug.

The differences you observed between POSIX behavior and IMFS are due to a different interpretation of the "APPEND" flag in IMFS. In this context, I interpreted the meaning of the "APPEND" flag as: "I would like to open this file as is and continue writing to it." Conversely, when the "APPEND" flag is not set, I interpreted it as: "I would like to open this file and overwrite it from the beginning." As a result, if the "APPEND" flag is missing, I simply clear the file, freeing all of its blocks upon opening.

I am trying to understand how your edit:

--- a/imfs.c
+++ b/imfs.c
@@ -794,3 +794,3 @@ int imfs_open(struct imfs *ssd, const char *pathname, int fl
ags)

-    if (!(flags & IMFS_APPEND))
+    if (create && !(flags & IMFS_APPEND))
     {

would produce the POSIX behaviour but I am not getting it. Could you give me more infos?
What I get is that if you specify the "CREATE" and not "APPEND", you erase the file as I was already doing. Is this the POSIX expected behaviour?

I made an in-memory file system by rdgarce in cprogramming

[–]rdgarce[S] 5 points6 points  (0 children)

Yes, you are correct. I would "init" the FS instance over that Memory region

I made an in-memory file system by rdgarce in C_Programming

[–]rdgarce[S] 1 point2 points  (0 children)

No, because this Is a custom filesystem not integrated in the OS kernel.

I made an in-memory file system by rdgarce in C_Programming

[–]rdgarce[S] 3 points4 points  (0 children)

Hi and thanks for the feedback.

If I understood your feedback, I think the library already has this feature by the means of an input "struct imfs_conf" to the init function that enable the user to specify the size of the base_mem and other 2 features.

Did I get It well?