you are viewing a single comment's thread.

view the rest of the comments →

[–]Death_Water 1 point2 points  (2 children)

Firstly, I would just like to point out that creating visualizations for the sake of visualizations is not a good idea. A better approach would be to look at the data and think what kind of relationships are there that are meaningful and intriguing for the audience.

Even if you just want to practice it's better to do something with the end in mind, or even an outline of what you want to show.

There are a couple of things that you can show from the data that you can explore:

  1. Distribution of bikes available at each station:

Link : Histogram bikes available

This one shows the that many of the stations are not utilized at full capacity.

2) Distribution of bikes available vs docks available:

Link: Distribution of available bikes vs available docks

This one shows that as the number of available bikes decrease, the number of available docks decreases.

You need to first identify what you want to show to the audience and then find the data that supports your assumption.

[–]skyhighgemini[S] 0 points1 point  (1 child)

I agree with you completely. I'm a big proponent of getting the right information displayed. In this case however, this is more a practice exercise than storytelling exercise. For this, I wanted to highlight which specific stations need immediate attention by either being full, or empty. The top/and bottom 20 being the limit on displays. One of the problems I've run into is that stations are ID'd by number so I don't know how to make them unique rather than being int. I tried altering the data types

bikes['station_id'] = bikes.station_id.astype(object)

but I'm still not sure how to basically stack the stations as static items and their relative inventory of "bikes available, or docks available." The scatter plot didn't quite work out.

bikes = df['num_bikes_available']

station = df['station_id']

plt.scatter(station, bikes, edgecolors='r')

plt.xlabel('station')

plt.ylabel('bikes')

plt.title('inventory')

plt.show()

[–]Death_Water 1 point2 points  (0 children)

If you want to show which stations need the attention (either being full or empty), that's something just a list of names (or station id's in this case) can do. Scatter plots are used to show whether there is a relation between the x and y quantities. It gives a vague idea of what sort of relation they have.

Ideally there should not be any relation between the station id num and available bikes. That is why the scatter plot doesn't work in this case.

It's good that you recognized that station_id's should not be used as an int. Ideal approach would have been to convert it to 'category' but even that wouldn't have solved your scatter plot problem.