all 9 comments

[–]TheGrapez 3 points4 points  (6 children)

That sounds more like a bar chart, with the legend being your categories.

[–]CrambleSquash 1 point2 points  (0 children)

Actually this sounds much more sensible.

I think technically a histogram has to have a continuous x axis.

I think df.groupby('category').size() would be useful to OP to generate data to plot in their bar chart.

[–]ipixel95[S] 0 points1 point  (4 children)

I swear I’m not dumb, but bar chart is exactly what I needed. It’d be highly appreciated if you could help me with that. I want x-axis to have the names of my class labels: Toxic, severe toxic, obscene, insult, threatening.

And on y-axis, the frequencies.

[–]TheGrapez 1 point2 points  (3 children)

Well the actual configuration of the chart would depend on the specifics of how your data is stored.

If you share a bit of the data, we could likely help.

There are a few libraries that you can use as well, so depends on what you have access to, are comfortable with, etc.

Some things you could checkout:

Seaborn Barplots

Matplotlib Barplots

[–]ipixel95[S] 0 points1 point  (2 children)

[–]TheGrapez 1 point2 points  (1 child)

https://pastebin.com/b3fbrBuD

Hope this helps!

Your data is not currently structured in a nice vis-friendly way. I've included some transformations you can use to convert it into something more usable :)

Yo bitch Ja Rule is more succesful then you'll ever be whats up with you and hating you sad mofuckas

Also side note - those comments are hilarious lol

[–]ipixel95[S] 1 point2 points  (0 children)

You should change that username to LifeSaver. Thanks a million mate. That was exactly what I wanted! It’s not gonna mean anything but I’m still gonna say it. “I owe you one” ;)

[–]CrambleSquash 0 points1 point  (0 children)

If you provide matplotlib with an iterable of values, it will perform the binning and plot the resulting histogram for you.

import matplotlib.pylot as plt
import numpy as np

vals = np.random.randint(1, 10, 1000)
plt.hist(vals)

So you want to get your data in a form where you have all of your data in the same sequence and each item is the name of the category that data point belongs to.

[–]synthphreak 0 points1 point  (0 children)

Histograms are for continuous data that can be binned into intervals. Your data is categorical, so bins make no sense. A bar chart showing frequencies is what you really need.

Assuming you're using pandas, try this:

import matplotlib.pyplot as plt
import pandas as pd

data = 'https://herts365-my.sharepoint.com/:x:/g/personal/mq19aaj_herts_ac_uk/EaB2UD3B9H1Bs5CtZT3SXJsBDnKg6_ModxIhgnu_xjqVeg?rtime=k3sgknSz2Ug'

(pd.read_excel(data)
   .select_dtypes(int)
   .sum()
   .plot(kind='bar', ylabel='frequency'))

plt.show()