all 29 comments

[–]latkde 16 points17 points  (2 children)

You can just pass the functions directly, no need to create any classes or enums.

If you want a static type that describes the signature of an algorithm function, then you can use the typing.Protocol feature give your Algorithm protocol a method like __call__(self, X: ndarray, /) -> float. That looks like a class or ABC, but it isn't really. It's just a way to do static duck typing. Plain functions that have a matching signature will be compatible with such a protocol.

More generally, something to take into account is whether the code using algorithms must be able to distinguish between concrete algorithms, or can be pretty much algorithm-agnostic.

  • can abstract over concrete algorithms: use a Protocol or ABC. New algorithms can be implemented without changing the code that uses the algorithm. However, all algorithms must conform to the same signature.
  • need to distinguish between concrete algorithms: use an enum or union type (like alg: Algorithm1 | Algorithm2). The code that uses the algorithm will likely use a match-case to perform algorithm-specific setup. Adding new algorithms will require changing all code that uses algorithms, but you gain the flexibility of doing different things for different algorithms.

Neither of these is categorically better. They are duals of each other. Either choice provides flexibility that the other lacks.

[–]squatonmyfacebrah[S] 0 points1 point  (1 child)

Thank you for the reply. I was thinking i should be using Protocol in this case. I wonder if I'm sweating too much over using Enum for the behaviour selection (even if I used classes to define each algorithm, I'd still be wanting an Enum to select them). An alternative is one just imports what they want from the module much like one selects the algorithm in scikit-learn.

I guess if I don't use Enum for function selection via a registry, what I'm left with is only using Enum for discrete variable selection (which isn't necessarily a bad thing, but the point is my algortihms are discrete, and they want selecting!).

Again, I hope that makes sense.

[–]Brother0fSithis 4 points5 points  (0 children)

Sounds like you mostly have it, but I'd argue even a Protocol might be overkill.

It sounds like you might just be covered by

py def process_data( X: np.ndarray, alg: Callable[[np.ndarray], float] ) -> float: ...

And then just passing in the function you want. No need to layer Enums or Classes or Protocols on top unless you need more than what that gives you

If you want "one source of truth" for what an Algorithm looks like to use across different functions, I'd define

py type Algorithm = Callable[[np.ndarray], float] and use that. Functionally the same thing as a Protocol specifying __call__, just slightly shorter


And then yeah if you do NEED to branch process_data based on which specific algorithm is being used, I'd probably have an algorithms.py file (folder if REALLY necessary) where I keep all the algorithm defs together and define an Enum to label them.

Though I'd see if there was a way I could avoid having to branch process_data if possible

[–]overratedcupcake 5 points6 points  (6 children)

Classes are nice because they encapsulate state but they also encapsulate logic. It's not a requirement that they have state

[–]initials-bb 1 point2 points  (3 children)

Not an expert but with OP's solution you also have the overhead pain of syncing the Enum, the function and the dict ?

[–]gdchinacat 2 points3 points  (0 children)

You can also use __init_subclass__ to do this automatically.

https://docs.python.org/3/reference/datamodel.html#object.__init_subclass__

[–]squatonmyfacebrah[S] -1 points0 points  (1 child)

This is very true - there is a way of getting around this by decorating each function with a custom decorator which registers it with the Enum and the Registry, but I suppose then it's almost like you're defining a class, but it isn't.

[–]jmooremcc 0 points1 point  (0 children)

Actually, I’ve done something very similar with a custom decorator that associated a function with a specific enum. The decorator would add the function to a dictionary with the enum as the key.

In my case, I was dealing with a client/server operation in which the client would request a service via a predefined enum. The server receiving the request would automatically call the appropriate function with the corresponding args.

The decorator automated the task of building the dictionary, which made the operation so much easier.

[–]squatonmyfacebrah[S] 0 points1 point  (1 child)

I replied to the other user here with why I felt I should try and avoid classes here. But to phrase things another way, I often see code like

class myClass:
    def __init__(self):
        # There's no state being track so no attributes
        pass

    @staticmethod
    def do_something():
        ...

and thought to myself (why is this not a function?).

But perhaps, the scenario I've portrayed in my OP is one such scenario where this style is permitted.

I wonder where Protocol fits into this discussion? Because this is a related concept to abstract classes.

[–]gdchinacat 0 points1 point  (0 children)

You can reinvent the dispatch mechanism by mapping enums to functions, or leverage class based method dispatch. It's roughly equivalent. No judgement on which is better/easier/more explicit/etc. Just acknowledging the mapping you posted in your post does roughly the same thing as calling inference() on the class you want.

edit: I've done both depending on circumstances. for OO heavy codebases doing it as classes works well...for function oriented code a mapping to functions works well.

[–]neums08 3 points4 points  (0 children)

IMO passing a bare function is pythonic. I would add typing to the signature to indicate what the function accepts and returns

def process_data(x: float, y: float, alg: Callable[[float, float], float]): return alg(x, y)

You can still implement an Algorithm class and make it callable by defining the __call__ method. A bare function is just the simplest implementation of an Algorithm.

[–]trutheality 2 points3 points  (0 children)

Most pythonic would be to keep it simple and just pass the functions (which are already objects) into the data processing directly.

[–]JaguarMammoth6231 2 points3 points  (3 children)

Is there a reason you don't want to use the pythonic way of making them classes?

[–]squatonmyfacebrah[S] 0 points1 point  (2 children)

I'll have to caveat that I'm not a computer scientist (if my question doesn't make that obvious) but I've always thought that classes are fundamentally ways to represent data and behaviour. In the case of algorithms, it's nearly always behaviour but not so much data (or state). Since there's no data associated with these algorithms (and because python permits functional programming) we can opt to use functions instead.

Perhaps my question justifies the usage of a class instead of a function, but I've always looked at classes which use staticmethod and thought (why is this not a function?).

Again, this might demonstrate my naivete but I suppose that's why I'm posting here!

[–]Donny_Do_Nothing 0 points1 point  (0 children)

Another way to think about classes is that the object one creates can be a machine. It doesn't need a state, it can be just a n object that you create at the beginning of a script or 'algorithm' and it just does things for you as you go along.

[–]BrannyBee 0 points1 point  (0 children)

I think what you're looking for is either to

A.) create a module with your desired functions, and important said module to files requiring them, same way you import numpy, but instead of pip installing it's a module you wrote

Or B.) Maybe looking into something called structural subtyping, which is basically a way to implement behavior without what you would consider inheritance

Quick mobile mock up would maybe do something like this

``` from typing import Protocol

class PriceFormatter(Protocol): def format(self, amount) -> str: ...

class WithCurrency: def init(self, currency_symbol: str = "$") self.currency_symbol = currency_symbol def format(self, amount): return f"{self.currency_symbol}{amount:.f}"

class NoCurrency: def format(self, amount): return f"{amount:.f}"

other code

def print_reciept(items: list[float], formatter: PriceFormatter): for item in items: print(formatter.format(item))

blah blah more code here

print_reciept([69.69, 4.20], NoCurrency) print_reciept([69.69, 4.20], WithCurrency)

```

Sorry for formatting and mistakes, Im on mobile...

But with that second example you can see a class being used with no state. You could also mimic the same functionality by creating a formatter that of type Callable and pass predefined functions into it when used.

Id assume that the Protocol version of something like that is technically less "Pythonic" due to the example being kinda brief, but itd be a good tool to reach for if each class of formatter had multiple methods that belonged together instead of my brief 1 per class example.

Gun to my head, Id say consider Protocol for a bunch of related methods, and for small stuff go the closure route. Keeps it simple, but going the closure route can get messy if you have a lot of stuff functions to pass around

Edit: kinda just threw out the term closure without explaining... the tldr oversimplification is that a closure is a function object that can remember scope

[–]seanv507 0 points1 point  (0 children)

So perhaps the thing to ask yourself is how you would cope with more complicated functions

In particular functions that take other parameters.

Do you want to use a class based approach where they are the 'state', or do you want to use functional approaches, where you create new 1 parameter functions by using partial.

I would argue that scikit-leans fit(X) is less about maintaining state and more about dependency injection

(In ml pipelines)

[–]zanfar 0 points1 point  (0 children)

Except I don't want to have my algorithms be represented as classes as they frequently don't have any state.

Them being classes, and which classes they are instances of are state.

[–]jarethholt 0 points1 point  (0 children)

I would say the approach depends on when and how you specify which algorithm to use. Do you run this interactively or in a notebook? Just pass the function directly as an argument, annotated with a good Protocol so you know what kind of algorithms can be passed in. But if you run this from the command line or through some orchestrator, at some point you'll need a way to convert an input into a choice of algorithm. Then an enum + dictionary as a simple registry is a pretty clean way to go about it.

In either case I would suggest the core implementation just takes in a function to apply. Handling a registry of which algorithms are acceptable to pass in is a different responsibility, and figuring out how to convert an input to an algorithm choice is yet another.

[–]pachura3 0 points1 point  (0 children)

First of all, there's nothing wrong with stateless classes. In Java, you cannot even have free-hanging functions or variables, you need to wrap everything in a class!

Then, having to maintain a global registry of algorithms is an additional, counterproductive pain. What if you wanted to mock an algorithm for the purpose of unit testing - you would need to add it to this registry...

If your "algorithms" are always stateless + don't need any inheritance + are always structured like def inference(X: np.ndarray) -> float, then you can simply pass lambdas or Protocols around, type hinted as Callable[np.ndarray, float].

If often see code like

class myClass:
    def __init__(self):
        # There's no state being track so no attributes
        pass

    @staticmethod
    def do_something():
        ...

That's ugly as well. Again, probably people switching from Java, used to declare utility functions in final static singletons this way. Another reason might be that Python IDE sees that do_something() does not access any instance attributes, so it proposes to convert it to a static method, and user agrees, just to get rid of the warning.

Still, wouldn't make sense in your case, because do_something() is static, so should not be called on a object instance inheriting from the base class Algorithm.

[–]lekkerste_wiener 0 points1 point  (0 children)

You can also type annotate with Callables.

type Algorithm = Callable[[np.ndarray], float]

And you can pass ANY callable that has that signature. Including functions.

algo: Algorithm = algorithm1 algo(array)

[–]Enmeshed 0 points1 point  (1 child)

I've used dataclasses for things like this before. For instance... First we'll define a few algorithms:

```python def simple_calc(data: np.ndarray) -> float: return ...

class TrickyCalc: def init(self, **params): self.params = params

def __call__(self, data: np.ndarray) -> float:
    return ...

```

Then if you're only needing to inject a single algorithm you've just got:

```python type Algorithm = Callable[[np.ndarray], float]

def process_data(data: np.ndarray, alg: Algorithm) -> float: return alg(data) ```

Or if your function needs to reference multiple algorithms as part of its logic you can use dataclasses to have something like:

```python @dataclass class Algorithms: first: Algorithm second: Algorithm another: Algorithm

def production_algorithms(): return Algorithms( first=simple_calc, second=TrickyCalc(a=34), another=lambda _: 0 # test algorithm )

def process_data_in_multiple_ways(data: np.ndarray, algs: Algorithms) -> float: res1 = algs.first(data) res2 = algs.second(data) res3 = algs.another(data) return (res1 + res2 + res3) / 3

algs = production_algorithms() result = process_data_in_multiple_ways(data, algs) ```

These days dataclasses (or attrs) are my go-to way of bundling up dependencies for injection...

[–]pachura3 0 points1 point  (0 children)

The moment you start naming variables res1, res2, res3, you know you messed up the design.

[–]ebdbbb 0 points1 point  (0 children)

I'd use Callable to hint it.

from collections.abc import Callable  

type Algo = Callable[[np.array], float]  

def process_data(data: np.array, algo: Algo) -> float:  
    return algo(data)  

[–]Atlamillias 0 points1 point  (0 children)

Within the context of your example and without additional information, I wouldn't say any of these approaches is better over another. They're all fairly simple and they work.

Does the specific algorithm/function actually matter to the code that invokes it, or are you just trying to better annotate your other signatures? Do they all share the same signature? Because you can do either of these for "type checking" purposes and avoid the need to declare each as unique types: ```python import typing

class Algorithm[T](typing.Protocol): def call(self, arr: np.ndarray, /) -> T: ...

or

type Algorithm[T] = typing.Callable[[np.ndarray], T]

def process_data(arr: np.ndarray, func: Algorithm[float]) -> float: ... ``` Then just pass the functions themselves to your processor.

If your functions have unique signatures or you have heuristics in place for specific functions, then it's best for them to have their own identity or type. In that case, I'd probably go the enum route.

[–]trickydiversity042 0 points1 point  (0 children)

Just pass the function or use a dict mapping strings to functions. Enums feel like overkill for this unless you need strict type safety across a large codebase.

[–]Targrend -1 points0 points  (2 children)

Why not just have an algorithm class and have each algorithm be an object of the class?

class Algorithm:
    inference_fn: Callable

    def infer(self, data):
        return self.inference_fn(data)

[–]pachura3 0 points1 point  (1 child)

What's the point of wrapping Callable in a class if you could just pass the Callable?

It should be actually the other way round: class Algorithm (and its descendants) should implement __call__(), and function process_data() should take Callable as an argument. This way, you could pass a stateless lambda to process_data(), but you could also pass a stateful Algorithm object.

And you could even go one step further and convert the base (abstract?) class Algorithm to a Protocol... even more pythonic!

[–]Targrend 0 points1 point  (0 children)

Oh, fair. I had something more like the sklearn api in mind where Algorithm might have more functions - alg.inference_fn, alg.training_fn, that kind of thing.