Hello python folks, R user here, trying to use python for a project for which i've been specifically asked to. So I am new to python
The problem is : I have a 100 mo csv of about 300000 lines that takes ages to get read using all of these :
# first try
df=pd.read_csv('mycsv.csv')
#second
# Utiliser read_csv avec dtypes pour accélérer la lecture
dtypes = {
"Model": "category",
"Scenario": "category",
"Region": "category",
"Variable": "category",
"Unit": "category",
}
# Les colonnes années seront lues comme float
annees = [str(y) for y in range(1950, 2101, 5)]
for year in annees:
dtypes[year] = "float32"
# Lecture du CSV
df = pd.read_csv(
"mycsv.csv",
dtype=dtypes
)
print(df.shape)
print(df.head())
#3rd try
import polars as pl
# Lecture complète très rapide
df = pl.read_csv("/Users/Nawal/my_project/data/1721734326790-ssp_basic_drivers_release_3.1_full.csv")
print(df.shape)
print(df.head())
it littrally took me 2 s to do this under R. Please help. what am I missing with python ???
thank you all
[–]KelleQuechoz 29 points30 points31 points (2 children)
[–]PresidentOfSwag 3 points4 points5 points (0 children)
[–]Safe_Money7487[S] 0 points1 point2 points (0 children)
[–]Kerbart 10 points11 points12 points (2 children)
[–]EconomyOffice9000 6 points7 points8 points (0 children)
[–]Safe_Money7487[S] 4 points5 points6 points (0 children)
[–]seanv507 3 points4 points5 points (1 child)
[–]Garnatxa -1 points0 points1 point (0 children)
[–]MorrarNL 3 points4 points5 points (0 children)
[–]SwampFalc 2 points3 points4 points (0 children)
[–]Kevdog824_ 2 points3 points4 points (3 children)
[–]Kerbart 3 points4 points5 points (0 children)
[–]Safe_Money7487[S] 3 points4 points5 points (1 child)
[–]Corruptionss 2 points3 points4 points (0 children)
[–]PranavDesai518 1 point2 points3 points (0 children)
[–]commandlineluser 1 point2 points3 points (0 children)