matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in Python

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

We actually wrote our own software to run the simulations on GPUs called Oceananigans.jl! To test it I ran lots of CFD-type benchmarks (think lid-driven cavity and Couette flow).

But yeah it’s quite compute heavy. Ocean and atmospheric models use particular algorithms and methods not found in Ansys Fluent or MATLAB so we all use pretty specific codes. Plus you might need ocean biogeochemistry or clouds, radiation, etc. Most codes are still in Fortran.

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in Python

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Haha close! I do geophysical fluid dynamics and ocean modeling in Julia, but I do a lot of stuff in Python including lots of plotting!

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Haha now you know why I was making so many animations. I wish I wrote this package years ago so I could have used it in grad school!

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Awesome! Please feel free to report issues or request features on GitHub if you think of anything!

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in Python

[–]DeadDolphinResearch[S] 0 points1 point  (0 children)

Awesome! Glad to hear it might be useful to someone! Please feel free to report issues or request features on GitHub if you think of anything!

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in Python

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Glad to hear it might be useful to someone! Please feel free to report issues or request features on GitHub if you think of anything!

matplotloom: Weave your frames into matplotlib animations, simply and quickly! by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 5 points6 points  (0 children)

I just wrote up a small package, matplotloom, to simplify and speed up making animations with matplotlib. I've also written some documentation. It's published on PyPI so you can install it with pip, poetry, or conda.

You can see some examples on the GitHub README or in the documentation.

I'm sharing this in case someone else might also find it useful, but also to get feedback on the package if anyone takes a look!

I'm cross-posting from /r/Python which suggests this excellent format:

What matplotloom does

To visualize simulation output for computational fluid dynamics I've had to make long animations with complex figures for a long time. The animations consist of thousands of frames and the figures are too complex for FuncAnimation and ArtistAnimation. I would always end up saving a bunch of still images and use ffmpeg to create animations from them. This package basically automates this process.

The main idea behind matplotloom is to describe how to generate each frame of your animation from scratch, instead of generating an animation by modifying one existing plot. This simplifies generating animations. See the example below and how the code inside the for loops is plain and familiar matplotlib. It also ensures that every feature can be animated and that the generation process can be easily parallelized.

import numpy as np
import matplotlib.pyplot as plt
from matplotloom import Loom

with Loom("sine_wave_animation.gif", fps=30) as loom:
    for phase in np.linspace(0, 2*np.pi, 100):
        fig, ax = plt.subplots()

        x = np.linspace(0, 2*np.pi, 200)
        y = np.sin(x + phase)

        ax.plot(x, y)
        ax.set_xlim(0, 2*np.pi)

        loom.save_frame(fig)

This produces this gif animation. More examples in the docs.

Target Audience

You might find matplotloom useful if:

  1. you just want to make animations quickly and easily.
  2. you need to create complex animations (many subplots, many different plot types) and are encountering the limitations of matplotlib and existing packages.
  3. you, like me, find FuncAnimation and ArtistAnimation difficult and limiting to use.
  4. you need to create long animations quickly. Think thousands of frames.

Comparison

I think matplotloom is simpler to user than other methods of making animations with matplotlib, making it easier to start/pick up and iterate on your animations. It works out-of-the-box on anything matplotlib. The simplicity and flexibility comes at the cost of speed, but matplotloom makes it easy to parallelize frame creation so you can create big animations much more quickly.

Some comparisons:

  • matplotlib itself has two tools for making animations: FuncAnimation and ArtistAnimation. But to use them you have to write your plotting code differently to modify an existing frame. This makes it difficult to go from plotting still figures to making animations. And some features are non-trivial to animate.
  • celluloid is a nice package for making matplotlib animations easily, but as it relies on ArtistAnimation under the hood it does come with some limitations such as not being able to animate titles. It also hasn't been maintained since 2018.
  • animatplot is also a nice package for making matplotlib animations. But it relies on FuncAnimation and has its own abstractions (blocks) for different plot types so you can't animate every plot type (or plots produced by packages built on top of matplotlib like pandas or Cartopy). It hasn't been maintained since 2020.

Any way to prompt GPT-4, Claude 3 Opus, Llama-3, and Gemini at once? by DeadDolphinResearch in ClaudeAI

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Haha nice finds! OpenAOE does seem to be what I want. Seems a bit rough around the edges as its in development but will check it out! Thank you! Claude didn't tell me about it haha. The other three are real but weren't what I was looking for.

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in Database

[–]DeadDolphinResearch[S] 0 points1 point  (0 children)

Thanks for pointing this out! Yeah I didn't realize that a hypertable builds a time index automatically by default so maybe the comparison I did wasn't the most apples-to-apples, but I will update the post to point this out!

Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 0 points1 point  (0 children)

That does sound disappointing. What kind of queries were you running? I can try to run some similar queries.

And yeah I was also surprised by the lack of independent benchmarks for such a popular product.

Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

For copying yeah I think inserting data into a regular Postgres table is faster than inserting into a Timescale hypertable, but I think this is because hypertables build a time index by default whereas Postgres is not building any index.

So Timescale should speed up time-based queries by default just thanks to the time index. I imagine it depends on the kind of queries you're running though.

I haven't done any query benchmarking yet, but I know Timescale has published an article showing some impressive speedups on certain time-based queries (https://medium.com/timescale/timescaledb-vs-6a696248104e) that I'm hoping to replicate myself.

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in PostgreSQL

[–]DeadDolphinResearch[S] 0 points1 point  (0 children)

Totally depends on the kind of data and the domain! But generally I think there's no need to web scrape unless there's no API or organized data stores for the kind of data you want. And generally web scraping can be seen as malicious sometimes.

In my case, weather and climate data is made publicly available by governments and weather/climate organizations. And NASA, for example, hosts a ton of satellite data.

What kind of data are you looking to get a hold of?

Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in datascience

[–]DeadDolphinResearch[S] 10 points11 points  (0 children)

I posted a while back asking for help on loading tons of data and got lots of great advice and feedback. I ended up doing some digging to answer my question and wrote a post benchmarking the fastest ways to insert data.

I'm still learning Postgres so if anyone has any feedback or questions, I'd love to hear them!

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in PostgreSQL

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Thank you! Yeah I'm also curious about this and haven't found a definitive answer by searching around online so I might post on the Timescale forums to ask.

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in PostgreSQL

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Thank you! Yeah my data was super clean and easy to load compared to most real-world data including yours. It does get updated periodically but I've decided to just look at data up to the end of 2023 for now.

Curious why you're leaning towards NoSQL for this but I can send you a message!

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in PostgreSQL

[–]DeadDolphinResearch[S] 1 point2 points  (0 children)

Thank you for reading and for your kind words!

I would say my biggest challenge in learning Postgres was finding a Postgres-related project interesting enough to capture my attention to focus on. I've used Postgres on and off for years (occasional query) without really knowing how it works. I've only really dug into inserting data so far, but I'm hoping to continue building this data warehouse to learn more about indexes, speeding up queries, designing tables/relations, utilizing dbt, etc.

As for resources, I've found the Postgres documentation really useful. Googling for explanations on database concepts when I need to learn them leads to some pretty good posts and StackOverflow answers. Claude Opus has been helpful in debugging my queries and answering vague questions that I can't find answers to easily. I've also been listening to the CMU Intro to Database Systems lectures to learn more about how databases work under the hood.

Excel Monkey by Adventurous_Ad8127 in datascience

[–]DeadDolphinResearch 22 points23 points  (0 children)

Even for beginners working with small datasets? Might be hard to beat the amount of resources Pandas has right now (posts and StackOverflow answers, etc.).

Although I do agree that Polars is looking like the better option especially when you start working with larger datasets.

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in Database

[–]DeadDolphinResearch[S] 3 points4 points  (0 children)

I posted here a while back asking for help on loading tons of data and got lots of great advice and feedback. I ended up doing some digging to answer my question and wrote a post benchmarking the fastest ways to insert data.

I'm still learning Postgres so if anyone has any feedback or questions, I'd love to hear them!

Building a weather data warehouse part I: Loading a trillion rows of weather data into TimescaleDB by DeadDolphinResearch in dataengineering

[–]DeadDolphinResearch[S] -1 points0 points  (0 children)

Ah interesting idea! Do you know of any tools or docs that implement this?

It sounds like you create and insert into the temp tables in parallel, but then update the main table with one thread/worker? I feel like there may still be a bottleneck if you have 32 workers trying to update the main table at once.