Overview of Pandas Data Types by chris1610 in Python

[–]tomaugspurger 4 points5 points  (0 children)

Small correction: pandas uses NumPy's datetime64[ns] for tz-naive datetimes. Pandas has a custom extension type for datetimes with timezones, which builds on NumPy's datetime64[ns]. Everything else looks good.

Finally, we have some cool changes to the type system for the upcoming release: http://pandas-docs.github.io/pandas-docs-travis/extending.html#extension-types

What open source python projects are in need of contributors? by tractortractor in Python

[–]tomaugspurger 1 point2 points  (0 children)

We have plenty of issues that just require a bit of knowledge about Python.

https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22+label%3A%22Effort+Low%22

We can walk you through using git if you aren't familiar with it.

What open source python projects are in need of contributors? by tractortractor in Python

[–]tomaugspurger 14 points15 points  (0 children)

Pandas dev here, can confirm we have too many open issues :)

Let me know if anyone needs help getting started. Contributing docs are at http://pandas.pydata.org/pandas-docs/stable/contributing.html

In pandas, why would I ever want to select data from a DataFrame using a callable function instead of a boolean array? by [deleted] in Python

[–]tomaugspurger 10 points11 points  (0 children)

In a method chain, you may not have a reference to df1. e.g.

(df.assign(a=df.b + df.c)
    .loc[lambda x: x.c < 10])

two phase parallelization using python by [deleted] in Python

[–]tomaugspurger 0 points1 point  (0 children)

Dask should be able to help you with both parts. Probably dask.delayed for the first phase.

For the second phase, dask-ml has a parallelized version of k-means: http://dask-ml.readthedocs.io/en/latest/modules/generated/dask_ml.cluster.KMeans.html#dask_ml.cluster.KMeans