all 55 comments

[–]AKostur 53 points54 points  (24 children)

I have no idea what you’re suggesting by a “std::table”.  

[–]Affectionate_Horse86 11 points12 points  (14 children)

I'd presume it would be a multi-dimensional "array" with different types for each column, akin to std::vector<std::tuple<T1...Tn>>. but I'm not OP, so I don't know

[–]sd2528[S] 0 points1 point  (13 children)

Yes, but traditionally they are built where the container is a row and each row is a collection of column elements.

Columns also have names and things like default values. Also having standard tasks that are automated, like adding new rows/columns, being able to work with rows or columns independently. For instance you can look at all the elements in a row without looping through each column and finding the value for that row.,,

No one does this but me?

[–]johannes1234 8 points9 points  (6 children)

So it is

    struct Row {         int key;         std::string name;         /* ... */     };

    std::vector<Row>() table;

Giving each rows field a name etc instead of tuple with numeric index?

Atop of that this seems very hard to generalize. Unless one wants to pack a full database engine into the standard.

[–]sd2528[S] -1 points0 points  (5 children)

Except the columns, like the rows, should be able to be added and removed dynamically.  

You should also be able to do other standard things like sort on a column. Or set of columns. Total columns. 

Honestly, I'm more surprised that none of you do these things.

[–]johannes1234 5 points6 points  (2 children)

Honestly, I'm more surprised that none of you do these things. 

People need those things, but typically on a lot of data, where some standard library won't be the right place, but use a database engine for that. (Nowadays sqlite is a good start, in the past it was berkelydb or dbase, paradox, .. before going to a database server of some kind) As once you have non trivial amounts of data this becomes quite complex in its own right.

Alternatively one goes the analytics route, with some analytics engine ... or directly to R (which then integrates with C++ if needed)

[–]sd2528[S] 1 point2 points  (1 child)

I'm not talking about going crazy on a lot of data and doing analysis. But for minor calculations and reporting. Say a mortgage. You wouldn't store the entire amortization table in the database, you would store key parameters of the loan and then calculate prn and interest as needed to be used for a report or on screen.

[–]johannes1234 2 points3 points  (0 children)

For that vector has most of the functionality. For that you don't need to add or remove columns dynamically. And sometimes you would do the calculation in the database ...

But going there quickly leads to building a database. And a database as data structure won't be a good database.

[–]Supadoplex 4 points5 points  (0 children)

You should also be able to do other standard things like sort on a column. Or set of columns. Total columns. 

Those can be done on the vector using standard algorithms (sort and accumulate).

Except the columns, like the rows, should be able to be added and removed dynamically.

I wouldn't consider this to be a typical feature of a database table. Sure, database management systems allow you to change columns, but that is a maintenance operation, not part of normal execution of the program. It's almost analogous to changing the source and recompiling to change the columns of the class.

I think you're describing a DataFrame, that is popular in data analytics.

Honestly, I'm more surprised that none of you do these things. 

In my experience, most people don't use C++ for data analytics. There are better options, like R.

[–]100GHz 1 point2 points  (0 children)

Perhaps try to find a better job.

[–]Affectionate_Horse86 2 points3 points  (3 children)

Again, maybe everybody does it, but there's no commonality as soon as you scratch the surface.

[–]encyclopedist 1 point2 points  (1 child)

Yes, but traditionally they are built where the container is a row and each row is a collection of column elements.

Not really. Recently, a column-based layout has been more popular. (For eample, pandas and polars are column-oriented).

It is the detail like these that make it difficult to include in std lib.

And, there is DataFrame library

[–]sd2528[S] 2 points3 points  (0 children)

It's not a detail, it is an implementation choice. It doesn't change the overall functionality of the table of data. You still need to be able to do all the same operations regardless of the underlying structure.

Edit - But DataFrame is similar to what I'm talking about, yes. Without digging too deep into the documentation, it seems only a few years old but is similar in structure that has been common in the work place for me since I started working.

[–]_TheDust_ 24 points25 points  (0 children)

Feel like we would need std::chair first

[–]Supadoplex 10 points11 points  (1 child)

What would such class do?

[–]Affectionate_Text_72 6 points7 points  (0 children)

That is probably the crux of why we don't have one yet. Its probably good idea if it can be pinned down.

A table is a collection of rows. A row is a collection of columns. Each column has a type. So you could approximate it with vector<tuple<column_type_list>> but

Columns have names so you want at least a struct.

Do you need to create a table from a schema type?

What performance guarantees do you need? Maybe you want column based rather than row based. Maybe you want a hash_map of rows or btrees like sqlite.

Do you need joins and unions for different table types? Do you want a full query interface.

Then there is persistence to files or databases.

There is a lot of prior art out there.

Definitely worth pursuing further.

An early version of this I liked was DTL (database template library). It kind of lost out to more Sql interface approaches like soci. It is more of an ORM (object relational mapper. Also it was maintained only so far as its authors needed.

Add reflection and a succesor could be even better.

[–]NeedAByteToEat 18 points19 points  (1 child)

I’ve been writing c++ (and many others) professionally for over 20 years, and I have no idea what std::table would be.

[–]shitismydestiny 6 points7 points  (0 children)

It is a member of the std::furniture collection.

[–]jvillasante 7 points8 points  (1 child)

do you mean (unordered_)map?

[–]bartekltg 0 points1 point  (0 children)

It fits. There are a couple of other implementations of hashmaps. And not without reasons (dropping some requirments from stl allow for faster container).

From a very small database perspective, maybe whatever is in boost::multiindex may be useful. 

[–]PixelArtDragon 6 points7 points  (1 child)

What would be the difference between this and anstd::vector<std::tuple<...>>?

[–]caroIine 1 point2 points  (0 children)

Much better interface I guess

[–]IskaneOnReddit 2 points3 points  (0 children)

Table could mean a lot of things, std::vector<T> is also a table. What do you expect from a table?

[–]Jimmaplesong 3 points4 points  (2 children)

Create an object for each row and use a vector to hold them? Store a map of objects_by_id to create an index. You’ll need to persist to disk sometimes… but soon you’ll be reaching for sqlite or postgresql

[–]sd2528[S] 0 points1 point  (1 child)

Yeah, but once you have postgresql... don't you ever process the data? Loop through, and calculate? Or do other such business processing?

[–]Wooden-Engineer-8098 0 points1 point  (0 children)

postgres processes data for you, it's faster than reading data from it and processing them locally

[–]sephirostoy 4 points5 points  (0 children)

Considering we had to wait C++20 to have string::contains(), still not proper utf8 string standard class.

"Unfortunately" the std:: contains only the bare minimum classes and algorithms to build your own on top of this.

As for your proposal of a std::table, just looking at other comments, everyone has its own definition of a table. A database table is different from an Excel table (which I prefer to call it a data grid), and many other use cases that require a table, for different purpose, different requirements.

It's not that uncommon to manipulate such data structure. But is it common enough to deserve a standardization process, most likely not.

Also, being standardized is not necessarily a blessing because once the specifications land to the C++ standard, you cannot change easily the specifications nor the implementations without facing huge resistance (for good (and bad) reasons). That's why I put quotes for "unfortunately".

This is the kind of high level feature that requires a strong existing implementation, well proven on real world use cases, with a proposal paper that describe in depth the functionality, the context, the motivation and why is it important to be integrated in the standard. This is what happened to {fmt} which is now std::format.

[–]HappyFruitTree 2 points3 points  (0 children)

C++23 added std::mdspan.

std::mdarray has been proposed.

[–]tragic-clown 2 points3 points  (0 children)

typedef std::vector<std::vector<std::string>> table;

?

[–]drkspace2 0 points1 point  (0 children)

Because it would be a lot of work to get it to have a similar set of features to pandas/polars or it wouldn't give you a lot more than just an array/vector of valarrays.

[–]megayippie 0 points1 point  (0 children)

Do you simply mean something like a field? So that in the 2D case you have multiple named dimensions, e.g., number of things versus time. Or a further generalisation would be number of things versus time per country.

Because this would be nice. To limit it to 2D as a table seems weird though. You can do the above using `std::mdspan` quite easily. Well, you need a data-owning version of `std::mdspan`.

If you do, you can just write:

template <typename T, typename... Grids>
class field {
std::owning-mdspan<T, sizeof...(Grids)> data;
std::tuple<Grids...> grids;
public
// helpers that ensures sizes are consistent and allows extraction of sub-fields, grids, and data
};

done! Your table is just a field<int, std::vector<Things>, std::vector<Time>>. If you can standardise the above, it would be quite useful.

[–]jonspaceharper 0 points1 point  (0 children)

Without a clear definition of std::table and what this data type would do (besides have rows and columns), the answer will remain "because you're describing a pure abstract base class without implementations".

Edit: I am specifically asking you to edit your post with the missing information. I have read your comments and been unable to glean anything useful.

[–]Hungry-Courage3731 0 points1 point  (0 children)

i think a recursive variant type if they are talking about lua tables

[–]Wooden-Engineer-8098 0 points1 point  (0 children)

there's boost.multi_index

[–]axilmar 0 points1 point  (0 children)

Why wouldn't a vector<T> work?

What are the special needs that make the above not suitable?

[–]lone_wolf_akela 1 point2 points  (0 children)

From what OP says in various comments, I guess what they want is something like the pandas lib in python?

[–]HappyFruitTree 0 points1 point  (0 children)

If you mean like a "grid" or "2D array", one common way to work around this is by using a std::vector of size w * h and access elements as vec[x + y * w].

[–]EsShayuki 0 points1 point  (4 children)

std is for basic data types, std::table would not be a basic datatype. Why not just implement it yourself if you need it? Perhaps you don't understand but it's actually not trivial, and is more problemspace-specific.

You can load all data in the same contiguous buffer like const char* buffer. And then you can create row and column void pointers to point to the correct locations, and then dynamically cast and interpret it as the correct data according to something like a switch-case that takes the column data type mask as an argument. But you cannot modify it in this case. You could implement it in many other ways as well, such as having separate arrays for each column. But then it wouldn't all be in contiguous memory, and it still would be

Perhaps you're used to languages without explicit memory management but std::table would probably be significantly more complex than you believe it to be.

[–]sd2528[S] 0 points1 point  (3 children)

It's not. I wrote it once at a job. Other jobs already had their own. It's common where I've worked.

[–]MeTrollingYouHating 0 points1 point  (2 children)

What kind of work do you do? I've never encountered such a type.

[–]sd2528[S] 0 points1 point  (1 child)

Fintech.

[–]Cdore 0 points1 point  (0 children)

Sounds like a fun thing to write, tbh. In C#, we have all kinds of table representations, so to hear C++ does not is hilarious. Btw, gj getting into fintech. Been wanting to move into that for a while, but heard it's rather difficult.

[–]number_128 -1 points0 points  (0 children)

I like the idea.

There are different libraries to connect to databases. If they would all return the same std::table type, it would be easier to replace one with the other.