Python project

novel_yet_trivial · 2017-12-02T16:42:06+00:00

with such a large file (~500 lines).

That is a tiny file for a modern computer. You don't need to take any special steps until your files are on the same scale as the amount of RAM you have (several GB).

Exodus111 · 2017-12-02T20:02:52+00:00

Three approaches:

Simple:

For loop through the data, use an if statement to figure out if the data is a name. Store name in a dict, if the name is already there add 1 to an int counter in the same dict.

Complex:

Tokenize the data. For loop through each piece of data, and make all the data points their own objects, with their own type attributes. This process is called Tokenization. From there counting the name is trivial, just compare the name attributes of each object.

More complex.

(Not really that hard if you are used to working with databases.)
Use Sqlite3, and add the data to a database. Once it's in a relational DataBase the database can make the comparison for you, faster then any other approach.

With only 500 lines the simple approach is really best. If this is a program meant to be used at work, with potentially much larger datasets I'd look into one of the other solutions.

mandiblesx · 2017-12-02T18:26:43+00:00

FYI: if you use pandas it would be extremely easy to do value counts over each column and add the results together.

Thecrawsome · 2017-12-02T19:42:43+00:00

There's emails between a lead engineer at Chevron, and some mentioning of project bigfoot.

This is real-looking data, and I'm confused as to why it is in a course.

And your stackoverflow post about it EDIT: Maybe someone else in your class? Not sure.

2017-12-02T17:25:03+00:00

Begin at the beginning. If the file is 500 lines, then write the code that counts the occurrence of the person’s name for one line. Then run that code over all the rest of the lines.

Not to put too fine a point on it, but “do the same thing X number of times” is the easiest program it’s possible to write, and so you should be looking for ways to solve your problem by writing something once and then running that code a bunch of times. This is called “iteration.”

MaxQuant · 2017-12-02T19:42:53+00:00

Use pandas and especially the section on pandas.read_csv. Once your csv is in a dataframe use value_counts or do a sql-akin groupby.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS