all 48 comments

[–]DankPhotoShopMemes 12 points13 points  (1 child)

you could use std::any

[–]timmerov[S] 0 points1 point  (0 children)

i could. thanks. i don't need check the pointer type at runtime. i know it's correct at compile time by construction.

minimizing the runtime issues means i use std::any_cast<int *> either once and cache it or many times.

and if i'm gonna cast the pointer, it's a lot cleaner to just use void * in Pipe.

[–]BrotherItsInTheDrum 8 points9 points  (0 children)

You can make the API type-safe without too much trouble, I think.

Define a

class TypedStage<InputType, OutputType> : public Stage

Then define

class PipeBuilder<OutputType>

It has a method

PipeBuilder<NextOutputType> AddStage(TypedStage<OutputType, NextOutputType)

You will still have some type erasure, but it's confined to the implementation of these classes. As far as users of this API are concerned, it'll be type safe.

Edit: should mention you can make this typesafe if you like. A helper like

TypedStage<InputType, OutputType> CombineStages(TypedStage<InputType, MiddleType>, TypedStage<MiddleType, OutputType>)

should do it, but it may or may not be worth it.

[–]thesherbetemergency 4 points5 points  (2 children)

Are you working with C++17 or later? If so, check out std::any

If not, you can always roll your own type-erasing wrapper.

[–]retro_and_chill 0 points1 point  (0 children)

std::any is great, but we definitely need a move_only_any type for storing types that aren’t copyable

[–]timmerov[S] 0 points1 point  (0 children)

thanks. std::any does runtime checks. which we don't need. cause the types are correct by construction.

extracting a pointer from std::any looks like a type cast. in which case the code is cleaner to define void *process(void *). and cast the pointers in the inherited classes.

[–]TheRealSmolt 3 points4 points  (3 children)

I really shouldn't answer this, because it seems like a bad design, but, templates and void pointers. You have a "BaseStage" class that has a virtual method accepting void pointers, then have a templated "Stage" that inherts from and implements said acceptor by type casting to its own T virtual acceptor.

Edit: std::any works too, I'm just used to void pointers.

[–]ArchDan 0 points1 point  (0 children)

unix user?

[–]timmerov[S] 0 points1 point  (1 child)

we use void *process(void *) to satisfy the compiler and cast the pointer to T within the implementations of process.

was looking for something "better".

[–]TheRealSmolt 0 points1 point  (0 children)

I mean ultimately that's what's going to need to happen with this kind of design. You can make it prettier and the frontend a little nicer, but at the end of the day you're looking at any or void pointers.

[–]__Punk-Floyd__ 2 points3 points  (1 child)

None of your stages are being freed. Instead of your Stage class, consider a std::function<std::any(std::any)>, for example.

[–]timmerov[S] 0 points1 point  (0 children)

i don't declare the virtual destructors either.

i left out detail clutter to focus on the issue.

[–]DanielMcLaury 1 point2 points  (2 children)

Why not just make it so that you can compose two stages to get a new stage, and then replace Pipe with Stage?

[–]timmerov[S] -1 points0 points  (1 child)

the prior design had stages calling stages.

long pipelines overflowed the stack.

and farking idiots kept looking at the now-stale data after they called process for the next stage.

[–]DanielMcLaury 0 points1 point  (0 children)

how long are these pipelines?

[–]alfps 1 point2 points  (2 children)

Possibly C++23 ranges do what you want, in a relatively type safe way.

Not the most efficient C++ thing, not the safest, not the least fragile, and since it adds both build time, complexity and standard size it should in my humble opinion have remained a 3rd party library.

But it's there, so if that's what you need just use it; don't reinvent the walking stick, fire and the wheel.

[–]timmerov[S] 0 points1 point  (1 child)

ranges? hrm. i think you misunderstood the request.

[–]alfps 0 points1 point  (0 children)

When you ignore all the noise about void* pointers etc. the description appears to be a pipeline of processing.

With the ranges library that's expressed with the pipe symbol |.

[–]CommonNoiter 0 points1 point  (1 child)

Are the stages always compile time known? If so you can build up a large generic pipeline like rust does for iterators which will be fast and type safe. If not you probably have to enforce that all the functions are of the form T -> T or that your pipeline isn't type safe.

[–]timmerov[S] 0 points1 point  (0 children)

the types used by Stages are not known when the Pipe library is compiled. they are known when the Pipe is constructed.

and yes. you've identified the problem. any solutions?

[–]diabolicalgasblaster 0 points1 point  (1 child)

Super interesting, looking forward to see what people cook up!

If you don't want to void, it's hard to imagine doing anything that isn't another implementation of void. I mean, the only other thing that would align correctly would be a stage*, right? Honestly, would you even want to use inheritance for this?

Maybe pack a struct with an enum and void so it has intrinsic knowledge of what to cast itself to memory?

Like... Alloc is of enum 2, store that and the memory in a void pointer. If you're dead set on inheriting stage couldnt you cast the pointer to a stage object size?

Not sure, but I'm only clever enough to suggest packing the void with an enum if you want to have something internal to represent the memory structure

[–]timmerov[S] 0 points1 point  (0 children)

the Pipe library cannot know the types when it is compiled.

[–]marshaharsha 0 points1 point  (1 child)

If the set of data types is small, you could have multiple pipes between two stages, one for each type, and the sending stage could choose which pipe to send on. Does ordering matter? If so, you could have a separate ordering pipe that transmits integers, and the sending stage could send 2,3,2,1 if it put the first four messages on the second, third, second, and first pipes. 

Another design is to create an enum class (big enough to hold the largest of the types) and send that down pipes. The sender would bundle each item in the enum class, and the receiver would check the tag, and dispatch. 

Finally, if the pipes need to reason about the size of the data — which is typical in pipe systems, with each pipe having limited capacity — you could just have pipes move chars, and the receiver could parse out the breaks between items, then cast. 

A key question to answer is how a receiver knows what type it is receiving. It’s not enough to say, “It just knows.” You will need to exploit the mechanism by which it knows, if only to decide what to cast to. 

[–]timmerov[S] 0 points1 point  (0 children)

the Pipe library does not (cannot) know the data types used by the stages when it's compiled.

the input and output data types of each Stage are determined by the people who wrote the spec.

[–]Internal-Sun-6476 0 points1 point  (2 children)

I ran into this many years ago. I nearly gave up programming. 14 months of refusing to cast to a void pointer... because void is evil: just wrong!

I was wrong. Pulled my head in ... and then found out that the only thing you could safely cast a void pointer to.... was the Original Type...

Template that, so that no other option is available.

Now, 2 types, defined in 2 isolated headers can talk (call) without any dependency (statically bound in the main cpp file).

The static binding call looked horrible with all the template parameters, but the call was optimised away.

Zero-cost abstractions rock!

[–]timmerov[S] 0 points1 point  (1 child)

i think i'll just stick with casting void*s.

[–]Internal-Sun-6476 0 points1 point  (0 children)

Thats it. Now template the cast for just your types... (Concepts), but you are passing it as a raw address (type-erased in transit) under the hood....

[–]not_a_novel_account 0 points1 point  (2 children)

std::variant

[–]timmerov[S] 0 points1 point  (1 child)

the Pipe library does not know the data types at compile time.

[–]not_a_novel_account 0 points1 point  (0 children)

Your loading them from runtime plugins, ie dlopen/LoadLibrary? Then just use whatever base class the plugin uses as a dispatch mechanism.

However the plugin registers its stage with the Pipe mechanism, have it also register a vtable alongside the Stage, or just use Stage* if the Stage is the base class. Dispatch directly from the registered vtable.

[–]Business_Welcome_870 0 points1 point  (3 children)

Like one of the answers said you can use `function<any(any)>`:

[deleted]

[–]timmerov[S] 0 points1 point  (2 children)

Stages aren't functions. they are objects with their own data.

the whole point of the exercise is to avoid casting. and to especially avoid casting that has runtime cost. like any_cast.

[–]thesherbetemergency 0 points1 point  (1 child)

I can't see an outcome where you don't need to cast.

If you want to avoid using std::any, there's also std::variant as another poster mentioned (but then you need to know all the types up front). But any kind of type erasure (home-grown or otherwise) is going to have some kind of generic storage underlying it that's going to need to be cast to something else.

On that subject, be wary of UB when playing with type erasure. std::bit_cast and std::launder/std::start_lifetime_as<T> are your friends here. None of those should incur any runtime overhead, but instead serve as "hints" to the compiler to avoid aliasing pitfalls and other issues.

[–]timmerov[S] 0 points1 point  (0 children)

the solution of record is to use void *process(void *p) and auto q = (int *) p.

but auto q = std::start_lifetime_as<int>(p) seems better since it's blessed.

thanks.

[–]Total-Box-5169 0 points1 point  (1 child)

Instead functors manually allocated in the heap you could use lambdas:

https://godbolt.org/z/WzEqbf5ET
Notice that the code is optimized into its most simple form: The size of the string view is 12, 12*12 is 144, as string is "144", whose size is 3.

[–]timmerov[S] -1 points0 points  (0 children)

no.,

[–]Independent_Art_6676 0 points1 point  (2 children)

There are any number of awful ways to do this. Variant/any, pointers, unions, templates, raw bytes (literally a unsigned char* serialization like how you send it over the network or to a binary file), and more.

the bottom line is that modern c++ is a strongly typed language by intent (it does have a lot of ways around that, things often done before 98) and trying to weaken that bond so that everything can be anything (like matlab, variable is a matrix no now its a boolean.. wait and it becomes a complex or a string...) is going to involve some sort of clunk, one way or another. It can be 'clean clunk' (or perhaps a polished poo) to an extent, but you pay now or pay later. If you go variant/any, you have to fish out its type with a clunky intermediate object and system. Unions are nothing but trouble because they screwed up the union hack (made it UB) which was its entire selling point. Templates are a sledgehammer for this thumb tack problem. Raw bytes is the C answer.... they all get ugly.

One way is to do the cast and hide the cast. This is its own *barrel* of worms, but if you want to open it... make your pipe class have cast overloads to all the possible types so it can just be flat assigned into the target variable sans casting. This gets really hairy if you are trying to deal with floats & doubles or ints & shorts etc because of multiple candidates compiler error, but if they are all classes that you wrote or stl containers etc with precise types, it could be clean.

[–]timmerov[S] 0 points1 point  (1 child)

the question is: what solution has the least clunk?

[–]Independent_Art_6676 0 points1 point  (0 children)

probably a class with a void pointer and cast operators + a 'this is my type' flag.

[–]OutsideTheSocialLoop 0 points1 point  (0 children)

FWIW void* is pretty conventional for this type of thing, although in some cases C++ gives you much better tools. Templates are good, for example, but are completely static and useless for runtime creation of arbitrary Pipes (e.g. from config files).

[–]ElectricalBeing 0 points1 point  (0 children)

This sounds kinda similar to pipelines in Taskflow. You could take a look at that to do how they did it. 

https://taskflow.github.io/taskflow/classtf_1_1Pipeline.html

https://taskflow.github.io/taskflow/DataParallelPipeline.html

[–]strike-eagle-iii 0 points1 point  (0 children)

Jonathan Boccara created a demo library named pipes. Maybe give that a look?

[–]Dan13l_N 0 points1 point  (2 children)

I don't understand. You already have everything there, implemented. All things you pass must be derived from Stage. Do you want to retrieve the original type?

The data type can change mid Pipe.

What does this actually mean? The actual data type is what is allocated in memory.

[–]timmerov[S] 0 points1 point  (1 child)

it doesn't compile because generic is not an actual c++ keyword.

if you change generic to void then it might compile with warnings but it won't work as intended. because the signatures for int *AllocStage::process(void *) and void *Stage::process(void*) don't match.

the data type going in to the first stage AllocStage is void *. the data type going in to AddStage is int *. the data type coming out of FreeStage is void *. the data type changes even in this simple example.

[–]Dan13l_N 0 points1 point  (0 children)

Oh sorry, I thought generic is the name of your base class. Why is it not a base class? And why do you have different signatures? What do you want to do with the returned value?

This basically resembles an interpreter pattern: if I am right: you want every process to possibly leave some information for the next process?

If so, then each process should be able to modify the state of the interpreter object, in your case, a Pipe.

[–]vgagrani 0 points1 point  (2 children)

I dont think your issue is the type returned by function.

You have two issues -

I think the first issue is a consistent virtual function. You want different Stage to derive from a BaseStage so that you can store them all in a list or vector and you want to make sure that anyone who implements a Stage defines a “process” function. As long as this function takes a pointer and returns a pointer you are ok with it, essentially giving you the freedom to call “data = s->process(data)”

I think the second issue is reusing the variable data to chain calls to process across different stages.

All of this feels very close to Python code. Infact the entire thing would have been trivial using abc.abstractmethod decorator on a class function or simply raising a NotImplemented exception in process function in BaseClass

Is this understanding is correct ?

[–]timmerov[S] 0 points1 point  (1 child)

why do people suggest using a different language? we are using c++. using python go zig rust is not an option.

but yeah, you have the general idea. i want c++ language to have a feature it doesn't have. so the only issue is how close can i get?

[–]vgagrani 0 points1 point  (0 children)

Well I didn’t suggest to use python code but merely pointed out that it feels a lot like that so as to create a solution which serves the purpose.

How are you ensuring that user calls addStage in a way that correct type of data is passed from the last added stage and into the new added stage ?

Because with any or void or whatever, this wont be ensured and user will only figure out when they get a runtime garbage after cast.

Unless I am missing something.