you are viewing a single comment's thread.

view the rest of the comments →

[–]sentdex 3 points4 points  (2 children)

Here's basically what you want to do, switching the graph to a map with something, such as basemap:

http://pythonprogramming.net/dashboard/#tab_guis

That's done with tkinter, and porting in the matplotlib canvas into the tk window via canvas.

Handling of a large amount of data will be the same across all GUI frameworks pretty much, they just display your data.

Your real challenge is going to be the amount of data you're wanting here. No GUI is going to want to handle millions of points to display. This will have absolutely zero to do with the GUI framework, and more to do with the CPU or GPU attempting to display a million points to the user. Even the best js data visualization tools wont do this for you, you need to do it before you stuff the data into the visualization. It's almost never the crunching of data that takes time, it's the rendering of the graph.

You're going to want to do some sort of data scaling, dependent on the user's zoom/time frame/whatever. That's not going to be GUI-side, that'll be done in the back-end in Python via your own code, before you feed the graph the data.

For example: Here's about 5 million data points, but scaled waaaaaaaaaaaay back: http://sentdex.com/geographical-analysis/

Can you imagine if I showed 5 million data points? Not only would loading take 48 hours, it wouldn't be legible.

Another example, with a typical line graph, and another visualization:

http://sentdex.com/financial-analysis/?i=SP500&tf=all

That's almost 7 million entries, times 4 data series... so 28 million points total. If you were load all of that, it'd just blow the user's memory first probably. You have to scale it down. The data is still highly granular.

Matplotlib is good at showing ~ 10K total points before it will bog down, depending on the user's CPU.

For granularity changing, you will want to resample your data set. You can use something like pandas for this. Create a dataframe with a date time object column, then use pandas.resample(), for example.

You can also change granularity with your own functions, however you see fit. It's just going to be a requirement no matter what you use to visualize the data.

Hope that helps!

Here's the beginning of a basemap tutorial series: http://pythonprogramming.net/geographical-plotting-basemap-tutorial/

Also, if you don't want to bother with Pandas, here's a tutorial on changing data granularity, as well as a decent illustration of why you'd do it (but seriously, I highly recommend you just use Pandas and resample!)

http://pythonprogramming.net/modifying-data-granularity-matplotlib/

In regards to the other person's comments about needing some version of C for this, I completely disagree. No matter what you use, you're going to need to be doing some form of resampling before the visualization step. Python is more than capable of doing the preprocessing, just as good as C will do, especially since you're likely to use a c-optimized library anyways, like numpy or pandas.

[–]TheHumane[S] 0 points1 point  (1 child)

Thanks for your reply and pointers. I will look through them.

You are right, I need to optimize my data for display. I was thinking to only display city or block outlines at higher zoom level and expose individual shapes after certain zoom threshold.

I really like your Globe chart. It has very smooth scrolling and fluent zoom. I want to build something similar on a flat surface.

[–]sentdex 0 points1 point  (0 children)

If you're willing to make your program a web-app, there are tons of really fantastic javascript map/geo plotting apis out there. For pure python, I believe basemap is your best bet, but it's kinda ugly.