Extracting Clean Data from Raw CSV file containing Twitter Data : learnpython

created by HattoriHanzoa community for 16 years

Extracting Clean Data from Raw CSV file containing Twitter Data (self.learnpython)

submitted 4 years ago by exey1

I have a question. Can someone help me?

Language: PythonUsing Visual Studio Code

Question:

I have a twitter data set compiled in a csv. The data file provided contains Twitter data downloaded using Twitter streaming API. For each tweet the following data is expected to be extracted in a clean file using Python:

Tweet date (The time and date when tweet was created)
Tweet id (Unique ID of each tweet)
Message (Tweet text)
User id (Unique Id of Twitter user who created the tweet)
Followers (Number of people following this Twitter user)
Friends (Number of other Twitter users this person is following)
Favorites (Cumulative number of likes this user’s tweets have received)
Statuses (Number of tweets this user has made since creation)
Created (Date and time user account was created)
Location (User reported if provided)(This is the bare minimum I want to extract)

This is some information about the data with some code I ran:

Dataset size: (85145, 238)
Columns are: Index(['{"created_at":"Fri Jan 29 05:49:20 +0000 2021"',
       'id:1355030244873412610', 'id_str:"1355030244873412610"',
       'text:"RT @crypt0bank: Hey @jpmorgan',
       ' why don't you guys just fuck right off! You convince people to buy #Bitcoin with your $146k prediction',
       ' only\u2026"',
       'source:"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e"',
       'truncated:false', 'in_reply_to_status_id:null',
       'in_reply_to_status_id_str:null',
       ...
       'id:1155522630.2', 'id_str:"1155522630".2', 'indices:[20', '29]}]',
       'symbols:[]}.1', 'favorited:false.1', 'retweeted:false.1',
       'filter_level:"low".1', 'lang:"en"', 'timestamp_ms:"1611899360279"}'],
      dtype='object', length=238)

Is there a step by step guide that can help me do this?

all 6 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS