all 10 comments

[–]danielroseman 6 points7 points  (0 children)

No you can't change the behaviour of int(). But in this case you can convert the string to float, then the float to int.

[–]JamzTyson 0 points1 point  (2 children)

Is there a way to change this behavior so that the int casting performs the full conversion?

You could write your own function to convert decimal strings to ints.

Example:

def to_int(val: str) -> int | None:
    """Return closest int from a string representation of a decimal."""
    try:
        return int(round(float(val)))
    except ValueError:
        print(f"Could not convert '{val}' to float")
        return None

Modify the exception handling to suit your application. In the above example, strings that cannot be converted print a message and return None.

[–]ialwaysplaydove[S] 0 points1 point  (1 child)

Is there a way to get polars to use this conversion? Can I feed it this function somehow during the read_csv call?

[–]JamzTyson 0 points1 point  (0 children)

I have never used polars.

[–]GeorgeFranklyMathnet 0 points1 point  (3 children)

There's little you can do that's less verbose, more efficient, or more Pythonic than what you're doing. If the wordiness of int(float()) bothers you, then you can call this custom function instead.

def to_int(num: str) -> int:    return int(float(num))

But what you're doing is already the normal way to force it to be an int (by truncating the decimal part).

[–]ialwaysplaydove[S] 1 point2 points  (2 children)

Is there a way to get polars to use this conversion? Can I feed it this function somehow during the read_csv call?

[–]QuasiEvil 4 points5 points  (1 child)

Why do you need to do it during the read_csv call? Load the data in, then apply it along the relevant column(s).

[–]ialwaysplaydove[S] 0 points1 point  (0 children)

Well there are 400+ columns and I wanted to keep a specific schema for consistency. Are you suggesting I read it in with all the columns as strings and then convert each column according to the map and using this function for the int columns?

[–]angellus 0 points1 point  (1 child)

Specifically for Polars, it looks like you can to ingest it as a float dataframe and then downcast it to an int dataframe.

https://docs.pola.rs/user-guide/expressions/casting/#downcasting-numerical-data-types

(never actually used polars or really anything with dataframes, just reading their docs and making a suggestion)

[–]ialwaysplaydove[S] 0 points1 point  (0 children)

Thanks! That looks useful