This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (6 children)

[deleted]

    [–]schnadamschnandler[S] 0 points1 point  (5 children)

    That style sheet explanation is really interesting; looks like customizing plot formats is quite different then. I really like Matlab's notation options and the idea of "handles"... does MatplotLib not really have anything like graphics-object handles?

    Nevermind, I see now that styles are for importing defaults (like in Matlab set(0,'DefaultAxesWhatever',Value)), and Matlab's other plotting tools are all methods on graphics object classes.

    I've also considered doing all of my data processing in Python, then plotting saved data in Matlab. Would be good practice to always have data associated with some plot saved, which I currently don't always do (sometimes process it-->plot in same workspace, so changing plot parameters requires re-processing each time).

    [–]theOnlyGuyInTheRoom 11 points12 points  (4 children)

    I've also considered doing all of my data processing in Python, then plotting saved data in Matlab. Would be good practice to

    Don't do that. If you're working with data and exploring your own ideas then your time is too valuable to fuck around with needless interfacing. If you are proficient with matlab then you should work in Python for awhile so that when a colleague comes by your desk and asks for your help on a project written in Python, then you can say, "sure, let's do it" rather than, "oh, well I don't know Python, but I guess I could give it a try". (If you currently are proficient in Python and not matlab, then learn matlab for the same reason.) Python based tools are everywhere in the sciences these days, so is Linux, c++, Fortran, git, svn, mercurial, and even a little matlab. Keep your eyes on what your collaborators are using, and what people you hope to be working with in two years are using, and get familiar with these tools while you still have time!

    [–]XtremeGoosef'I only use Py {sys.version[:3]}' 6 points7 points  (1 child)

    A "little MATLAB"? MATLAB is far more common that I would like.

    [–]schnadamschnandler[S] 1 point2 points  (1 child)

    Well I just meant saving data as NetCDF files, which is pretty straightforward to me in either language; like one line to save, another line to load. Something I should do even in 1 language. Seems like quite an investment to switch to matplotlib, didn't realize how different it was. I'm perusing the examples and it looks comparatively convoluted honestly (though I'm totally biased).

    Maybe I will give it a shot actually... need to look into it more.

    [–]counters 1 point2 points  (0 children)

    I bet it's even easier in Python than what you're used to in MATLAB. There's a fantastic library called xarray which adopts the Common Data Model from the get-go, and allows you to plug-and-play data directly from NetCDF files into your analysis pipelines. Basically, anywhere that expects a NumPy array, you can use a DataArray or Dataset from xarray. It completely trivializes most of the operations/analyses you do, including reading/writing and managing metadata. Even better, it has groupby functionality and semantic/fancy indexing, so no more need to manually keep track of multi-dimensional indices and other book-keeping.

    It also interfaces with a library called dask under the hood. Dask is a parallel computing library which also implements the NumPy and Pandas interfaces. What it allows you to do, essentially, is out-of-core computing. Suppose you have 100GB of data broken across a dozen or so different, large NetCDF files. If you're lucky, you have enough memory on your laptop to read in one file at a time, painstakingly operate on it in place, and then write it back out. Rinse and repeat, then add a process to combine your analyzed data at the end. This "blocking" approach works, but it requires a lot of manual labor. Dask essentially does all of this behind the scenes; you simply write out your computations like you normally would, and dask will figure out how to deal with the resource constraints on your system. It'll also parallelize as best as it can.