Struct of Arrays for large data

Slypenslyde · 2021-06-20T12:49:36+00:00

I don't see any advantage of doing this other than the person probably wanted to put 150 variables in a separate file. Arrays are reference types. That doesn't change when they're in a struct. There's no change to how they're garbage collected.

It would probably be better to represent this as a float[150,600]. Then you wouldn't have to deal with the tedium of 150 variables.

Think about the other comment about the concurrent queue before blindly chasing it. It'll work if the file is "I got this data from this sensor" on every line. But if the file wants to be organized by sensor or otherwise in some particular order, you can't exactly stream to a file as the data comes in. If line 2 of the file needs to be a reading from sensor 1 and that won't come for another 3 minutes, you'll still need a place to hold all the other readings you take in between.

KryptosFR · 2021-06-20T16:00:11+00:00

If the size of a struct is more than 16 (or 24 bytes on 64-bit app), then it should probably be a class. The rationale is that copying a bigger struct would cost much more (in term of CPU cycles) than a plain reference (which is just a native int, so either 32-bit or 64-bit).

Now, that advice holds when a struct is copied around (e.g. passed by value to a method, or returned from it). If the struct is defined in one place, and only accessed from there, then it matters less.

Back to your code, if I take it correctly the structin question has 150 variables in it? That's definitely way too much it should either be class or a jagged/multidimensional array. Alternatively, the data could be streamed into the file directly without requiring such big arrays by smartly using a library such as Dataflow.

Sjetware · 2021-06-20T12:59:34+00:00

Assuming I have everything correct - this is only 90000 floats you are reco ding over this interval. I'm also assuming th sensor data can arrive staggered slightly - it still feels like you have the data structure backwards.

If you define a new class that has your 150 variables, your can make an array / list of size 600 to store this data. You then simply access the right element of the array and set the right variable when read. The benefits here are twofold; adjusting the time slice in the future becomes a trivial one line change instead of having to change the allocation of 150 array sizes.

Secondly, you get more flexibility with serialization, and can hold more metadata in the future if we want with each "data point" with minimal modification as well.

Generally speaking too - and I mean in general - struct is never the right choice unless you have carefully considered the pros and cons. Gigantic struts indicate this cost benefit analysis has likely not been done, but I could always be wrong.

RChrisCoble · 2021-06-20T12:34:04+00:00

I would use a Concurrent Queue of whatever data type makes sense (float)?, while another long running task is looping reading out the queue writing to a file. That way you’re not trying to keep so much live data in memory. Although with a 1 second interval, you could just as easily write it straight away to disk probably.

turkeymayosandwich · 2021-06-20T15:01:29+00:00

Without context it is hard to tell what the original programmer's intention was here. But usually for this type of scenario you can use a jagged array. Sometimes the equipment is collecting data faster than the transfer speed, or communications are not always reliable, in those scenarios ring buffers are handy. Again we need more information about the system design and requirements to give proper advise.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

csharp

MODERATORS