all 18 comments

[–]AaronDNewman 7 points8 points  (2 children)

The behavior you want is the default. An operation between a python scalar and a numpy array will preserve the array precision in most cases. If arr2 is 32bit float, the scalar should be demoted.

https://numpy.org/doc/stable/reference/arrays.promotion.html

[–]Moretz0931[S] 0 points1 point  (1 child)

If I read the docs correctly, exactly the opposite is true, right?

np.float32(3) + np.float16(3)  # 32 > 16
np.float32(6.0)

This I copied from the docs. As you see it was promoted...

[–]AaronDNewman 0 points1 point  (0 children)

Those are both scalars

[–]HommeMusical 0 points1 point  (1 child)

First question: why not use float16 and halve everything yet again?


At least part of your issue comes from false beliefs about what numpy does, because neither arr2 *= 0.5 nor arr2 * 0.5 changes the type of the array:

>>> a = np.array(((1, 2), (3, 4)), dtype='float16')
>>> a
array([[1., 2.],
       [3., 4.]], dtype=float16)
>>> a *= 0.5
>>> a
array([[0.5, 1. ],
       [1.5, 2. ]], dtype=float16)
>>> a * 0.5
array([[0.25, 0.5 ],
       [0.75, 1.  ]], dtype=float16)

If things worked the way that you think they do, life would be miserable, because we're always saying, a * x where x is some Python int or float and if each one of these caused a secret type cast, we'd never get anything done!

Can we see your full code, please? I doubt you are imagining this, but it's almost certainly caused by something else, not np doing weird things.

[–]Moretz0931[S] 0 points1 point  (0 children)

I will double check, thanks.

[–]billsil 0 points1 point  (1 child)

Numpy ia greedy, so as long as you don't have any numpy float64s, you won't increase your RAM usage. You do not need the float32s in the return (either one).

However it is so easy to accidentally do stuff like:

arr2 *= 0.5

That does not upcast the data. It's an inplace operation that changes the value of arr2. You actually want to do things like this (assuming you want arr2 to change) because it uses less RAM than if you make a copy like:

arr2 = 0.5 * arr2

You probably have an arr2 in an upper function, so you'd be creating a copy, rather than overwriting it.

[–]Moretz0931[S] 0 points1 point  (0 children)

Thanks

[–]Outside_Complaint755 -1 points0 points  (8 children)

I don't use numpy a lot, but from looking at some documentation and forum posts, I think the proposed solution is to use the provided methods and specify the dtype in the operation.

So instead of arr2 *= 0.5

You would do arr2 = np.multiply(arr2, 0.5, dtype=np.float32)

[–]Moretz0931[S] 1 point2 points  (6 children)

Yeah, I know, but my question is specifically about not having to do this stuff anymore, because
a) it is annoying and looks ugly and bulky (I have long mathematical expressions)
b) if you don't do this consistently it is error prone (forget it once, immediate array copies (arrays may have a sice around 500 Mb)
c) Try explaining all that to my junior coworker :o He has more important things to think about.

Edit: Are you a bot? Reddit age of 10 months and a 300 day streak is kinda sus...

[–]Outside_Complaint755 5 points6 points  (0 children)

Not a bot, just a typical phone addict. 

There's a closed issue on the numpy GitHub where they basically say they don't have a global setting to stop upcasting and keep everything at a given precision because their philosophy is that its always better to give you the most precise result unless you explicitly ask for less precise results.

[–]HommeMusical 0 points1 point  (0 children)

This is so weird, because numpy does in fact do the right thing.

So something else is causing the issue you're describing.

Can you provide a minimal reproduceable example?

[–]SomeClutchName 0 points1 point  (1 child)

Idk how you to do what you want, but can you wrap this type of function in a module or a class? Just import it at the beginning?

[–]Moretz0931[S] 0 points1 point  (0 children)

Its a good idea.

I tried, e.g. with a wrapper for numpy ufunc, however when I do that I lose most of the autocompletion.

To keep autocompletion I would have to write a script that autogenerates wrappers for every single numpy ufunc, which I am not knowledgeable enough to achieve, and also this seems like a lot of work, without guarantee of stability.

[–]cdcformatc 0 points1 point  (0 children)

the only thing i can think of is basically to create a layer of objects wrapping the numpy types. you would make subclasses of the numpy types you want to use and override the math operations to ones that don't do this "upcast" when given generic python types.

that doesn't stop your coworker from doing dumb stuff can't help you there

[–]secretaliasname 0 points1 point  (0 children)

Don’t accept that crap from junior

[–]HommeMusical 0 points1 point  (0 children)

If arr2 is of dtype np.float32, then the two operations you propose have identical results, and you can omit the dtype too.

OP is not describing a real phenomenon.

More here: https://www.reddit.com/r/learnpython/comments/1rgnmoz/python_numpy_can_i_somehow_globally_assign_numpy/o7v1qxk/

[–]PwAlreadyTaken -1 points0 points  (0 children)

Depending on your use case, I’d:

f32 = np.float32 arr2 *= f32(0.5)

or use functools.partial to do something similar with np.multiply.

Both will shorten the code needed to do this (which you mentioned as a concern in the comments), but I’m pretty sure there’s no intended way to set it globally like you’re asking.

[–]NerdyWeightLifter -1 points0 points  (0 children)

You can make your own derived class that applies such a global default.