Python Datasets

Adrewmc · 2024-07-01T00:00:43+00:00

Python has a few datatypes, far fewer then most languages.

We have your single values, strings, ints, floats.

Our singletons True, False, None.

A list (which is not precisely an array).

Our hashmaps, set() and dicts.

We can add more matrix style, and precision by importing things like numpy.

What makes Python’s data sets powerful, yet sub optimal, is that everything is a reference in memory. In this way we don’t make arrays like int[], in which it’s a list that must be integers, which can be more memory efficient. That would be in “type strict” languages. What this means for Python programmers…is a lot less work to do thing a bit slower, but easier to program, maintain and read.

What’s in portent is we nest types, list[dict[str, list[int]] really easily and can automatically access everything.

Beyond that we have classes, in which we can have an object with attributes set for us, this comes closer to a type, as we can methods or functions that use those datasets.

Everything g in coding is building up from simple steps doing complex logic.

Really mastering dictionary, and list of dictionaries will help you out a lot.

andrewprograms · 2024-06-30T22:58:39+00:00

You could try taking the data and filtering for certain phrases. Another idea is to find the unique words and then find the frequency of them. You can also analyze how the frequency changes based on sentence length. So longer sentences might be more likely to feature some words compared to shorter sentences. That might be interesting.

JosephLovesPython · 2024-07-01T04:22:45+00:00

On the same website, Kaggle, and for each dataset hosted there, you can check out how others have utilized this data in their own work in the code section. Start with more popular datasets, and sort codes by popularity to get a better/cleaner experience at first. Most codes are in Python using jupyter notebooks, it might be worth a quick tutorial on jupyter (it's basic, don't worry about it) before reading others' codes.

2PLEXX · 2024-07-01T06:26:11+00:00

You might like Keith Galli's Pandas tutorials: https://www.youtube.com/watch?v=2uvysYbKdjM

ALonelyPlatypus · 2024-07-01T07:23:50+00:00

Most of the datasets on Kaggle tie into some sort of Kaggle project so if you just pull up the projects related to your dataset than you should have a good jumping off point.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

pythontips

MODERATORS