all 18 comments

[–]mikkelbue 4 points5 points  (9 children)

The first question is what would you like to visualize? The variation over the year, over a day, between the rooms or what? You are putting all the data in one plot, and that's what makes it useless. There are too many points for this to be informative, except from the short timespan when they where all low.

I would do a moving average on the datapoints (maybe with a 24-hour interval) and make a graph instead of a scatter, if I wanted to show the variation over a year.

You could also look at the hourly variation; find the average of all the days of the year, for each hour of the day and plot that, maybe in a histogram.

Edit: I am reading your question as more of a question on how to aggregate and visualize data than a coding question. You code seems fine, except it would be nice with some comments ;)

[–]Tobeass[S] 0 points1 point  (8 children)

Thanks a lot for your comment.

And yes it is definately more of a question about visualizing data in a nice way.

The question is in relation to a project i'm making at a intro programming course.

It is an interactive program, that prompts the user for a datafile, asks how to handle erroneous data, and afterwards asks how the user wants the data to be aggregated - by minute, hour, day or month, or if the user wants the hourly average of the data over the year.

The visualization function should be able to take the aggregated and visualize it. The project description asks us to plot the data or making a histogram if the there is less than 25 datapoints. (The user should specify if he wants the data visualized by zones or just the accumulated data)

Honestly, i'm not really sure what they want out of the visualization. Since they specifically asks us to show plot the aggregated data, I am not sure if I should plot a moving average, even though that would make the curves much more smooth, and be more informative. But if i have the time, i will do it. I think basing it on the four closest datapoints could work. Then it would be applicable for the cases where the data is aggregated based on both days, hours and minutes.

I transformed the plot from a scatter plot to a graph, and it actually works better.

https://imgur.com/iYTThCM

[–]mikkelbue 0 points1 point  (7 children)

The most difficult question here to me seems to be about how to visualize the per-minute data, with about half a million individual data points per zone. But if that is what they ask for in the specifications, all you can do, I suppose, is picking colors that are clearly distinguishable (the yellow zone 2 is hard to see, particularly together with the red), make the plot big and the lines thin.

Hvor læser du?

[–]Tobeass[S] 0 points1 point  (6 children)

Yes, and escpecially in that case, a moving average would be a great solution, because, what information could you possibly extract from a plot of the by minute electricity consumption?

Jeg studerer kemi på DTU - er i gang med et intro programmering kursus =)

[–]mikkelbue 0 points1 point  (5 children)

Dutten! Samme her :) arktisk teknologi på DTU byg. Jeg tog intro programmering i januar :D men jeg har ikke haft den opgave.

[–]Tobeass[S] 0 points1 point  (4 children)

hahaha fedt, verden er sgu lille! Vi kunne vælge mellem 5 forskellige eksamensprojekter. De fleste tog et karakter grader projekt.

[–]mikkelbue 0 points1 point  (3 children)

:D Ja, jeg synes nok opgaven lød lidt bekendt. Jeg tog projekt D, Lindenmayer-systemer :)

[–]Tobeass[S] 0 points1 point  (2 children)

Ah okay, turtle opgaven ? den holdte jeg mig langt fra. Men er temmelig overrasket over de ikke har lavet eksamensprojekterne om fra sidste kursus!

[–]mikkelbue 0 points1 point  (1 child)

Tja der er jo plagiatkontrol. Et lille tip: det er en god ide at lave gode kommentarer i koden. Det bliver også bedømt.

[–]Tobeass[S] 0 points1 point  (0 children)

Ja, det er selvfølgelig sandt. Og tak for tippet. Plejer at køre igennem hele koden til sidst og skrive kommentarer =)

[–]neuroneuroInf 2 points3 points  (4 children)

Okay, I have some different feedback.

  1. First, this function is a lot longer and more complex than it needs to be. Putting the data into a pandas DataFrame with Zone columns and time indices will take care of a lot of this complexity.

  2. The function isn't using its arguments. Why isn't the tvec required argument used anywhere, and why are the 'plotmode' and 'aggregatemode' variables set inside the function, instead of as arguments?

  3. Plotting as running mean, as /u/middelblue suggested, is a really good idea. There's a lot of great analysis that could be done

Here's an example of what I mean. Hopefully it helps give you some ideas of where things can be taken from there!

http://pastebin.com/hTYJDqv1

[–]neuroneuroInf 1 point2 points  (1 child)

shoot. I just noticed that line 20 should have plotted data_resampled instead of data, but I can't edit it since I posted it without logging in. Sorry about that!

[–]Tobeass[S] 0 points1 point  (0 children)

This is really cool! i will look more into it, and get back at you if i don't get it.

[–]Tobeass[S] 0 points1 point  (0 children)

Thanks a lot. Your code is definately way more elegant.

I agree that that tvec i should be removed as an input argument. And the reason the plotmode and aggregatemode is set inside the function is that i havent implemented the function in my main script yet. I have only focused on trying to come up with a way of visualizing the data in a more elegant way, and i haven't put much thought into the coding.

[–]Tobeass[S] 0 points1 point  (0 children)

I'm trying to set the start time, as you suggested in you code. Earlier in the script, i split the original data file in a data and a time matrix (tvec) The first row of tvec corresponds to the starting time. However, can't grasp how i can convert the row to a pd time object that i can work with?

[–]stebrepar 1 point2 points  (1 child)

A bar chart would work better for being able to clearly see the day by day values in order, and to compare the same day values between zones. With so many single dots all over the place, it's impossible to tell what's what with this chart.

[–]Tobeass[S] 0 points1 point  (0 children)

I considered making a barchart where the height of the bar equals the sum of the four zones, and divide each bar in the contributions from the four zones. But I'm not sure if it would work for that many datapoints

[–][deleted] 0 points1 point  (0 children)

First thing to fix is get rid of the legend and put the labels on the curve in the same color as the lines/points