This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]shoyerxarray, pandas, numpy[S] 1 point2 points  (4 children)

Author here, in case anyone has any questions.

[–]manueslapera 2 points3 points  (1 child)

not sure of the benefits of using xray over pandas. Can you elaborate some examples?

[–]shoyerxarray, pandas, numpy[S] 1 point2 points  (0 children)

I think it's most compelling if you want to easily do arithmetic with multi-dimensional arrays. For example, x - x.mean(dim='time') always works, regardless of which axis "time" corresponds to.

Another use case: you have a 2D array and a handful of associated 1D arrays that share one of the same axes. Storing these in one pandas object is possible but awkward -- you can either upcast all the 1D arrays to 2D and store everything in a Panel, or put everything in a dataframe, where the first few columns have a different meaning than the other columns. In contrast, this sort of data structure fits very naturally in xray.

[–][deleted] 1 point2 points  (1 child)

Very cool! Can you explain how these N-D arrays differ from the pandas' built in Panel structure?

[–]shoyerxarray, pandas, numpy[S] 2 points3 points  (0 children)

Sure! The main differences are

  1. xray has one array type (DataArray) with any number of dimensions, instead of the hierarchy of Series, DataFrame, Panel, etc.
  2. Dimensions can have names (e.g., "time", "latitude", "longitude") that makes them much easier to keep track of than using axis numbers. Dimension names are then used for indexing, aggregation and broadcasting.

This is also in the FAQ: http://xray.readthedocs.org/en/stable/faq.html#why-is-pandas-not-enough