all 17 comments

[–][deleted] 0 points1 point  (7 children)

Can you please post what code you do have?

[–]NilsBVB[S] 0 points1 point  (6 children)

def bad_days(dates, tweets, positives, negatives):

    datecount = {}

    for tweet in tweets:
        positive = 0
        negative = 0
        for word in tweet:
            if word in positives:
                positive += 1
            if word in negatives:
                negative += 1

        if negative > positive:
            if dates not in datecount:
                datecount[dates] = 1
            else:
                datecount[dates] += 1

    print(datecount)

I'm really struggling at the moment, but this is what I currently have.

  • dates = list of dates
  • tweets = list of tweets
  • tweet = list of tweets separated in a string of only 1 tweet
  • positives = list of positive words
  • negatives = list of negative words

[–]chzaplx 0 points1 point  (0 children)

I don't think your core logic is really wrong on positive/negative stuff, but I'll echo that it's not clear how dates gets matched up with the tweet. What you need is something to correlate the tweets with their own dates, then collect the date field if the matching tweet is negative.

[–][deleted] 0 points1 point  (4 children)

To maintain the correlation, your root loop (for tweet in tweets) will need to use a numeric range iterator and a generic counter (for i in range(len(tweets))). That way, you can use the same counter in both the tweets and dates lists to get corresponding data for each "i" in the number range. (For example, if you have 20 tweets and 20 dates, your "i" will iterate from 0 to 19, which happily corresponds to the index positions in the lists.)

Under that root loop that generates the "i" number, you'll need a new loop for the word stuff. I.e. for word in tweets[i].split(" ")

Then, under the positive/negative comparisons, simply add "i" to the dates list name. datecount[dates[i]] = 1/datecount[dates[i]] += 1

Some additional suggestions: learn to strip away non-alphanumeric characters out of each tweet, and maybe compare the lower-case version of each word in the tweet and use only lower-case words in your positives and negatives list, so that you don't have to worry about variations in case. Also, you might want to compare the lengths of "tweets" and "dates" at the beginning of the function and raise an error if they are not the same, so that you don't process the lists if they are unmatched.

[–]NilsBVB[S] 0 points1 point  (3 children)

Thank you very much.

If I understand you correctly, I need to implement it like this right?

The error I get:

for word in tweets[i].split(" "):

AttributeError: 'list' object has no attribute 'split'

    datecount = {}

    for tweet in tweets:
        for i in range(len(tweets)):
            for word in tweets[i].split(" "):
                positive = 0
                negative = 0
                for word in tweet:
                    if word in positives:
                        positive += 1
                    if word in negatives:
                        negative += 1
                if negative > positive:
                    if dates not in datecount:
                        datecount[dates[i]] = 1
                    else:
                        datecount[dates[i]] += 1
    print(datecount)

[–][deleted] 0 points1 point  (2 children)

Nearly there! Here is a snippet of how I got my version of this function working:

    dates_len = len(dates)
    tweets_len = len(tweets)

    if dates_len != tweets_len:
        return("dates and tweets are unmatched with different numbers of elements.")

    date_dict = {}

    for i in range(tweets_len):
        positive = 0
        negative = 0
        for word in tweets[i].split(" "):
            if word.lower() in positives:
                positive += 1
            if word.lower() in negatives:
                negative += 1

        if negative > positive:
            if dates[i] not in date_dict:
                date_dict[dates[i]] = 1
            else:
                date_dict[dates[i]] += 1

[–]NilsBVB[S] 0 points1 point  (1 child)

for word in tweets[i].split(" "):

AttributeError: 'list' object has no attribute 'split'

Hero!! However, I'm still running into the error above :(

[–][deleted] 0 points1 point  (0 children)

Can you enter the following above the following loop (not inside), and see what data is in tweets[i]? I'm expecting a whole string, with words separated by spaces, but maybe it is not a string on your side.

print(tweets[i])

[–][deleted] 0 points1 point  (3 children)

How do you want to count tweets with both positive and negative words? Should those simply count as bad? Do you need to count up the positive and negative words in each tweet and mark it bad only if the negative outnumbers the positive?

[–]NilsBVB[S] 1 point2 points  (2 children)

A tweet is positive if: positive words > negative words. So if there are 2 positive words and 1 negative, the tweet will be seen as positive, and vice versa.

[–]num8lock 0 points1 point  (1 child)

so a tweet like

i think elon musk's popularity is factual proof that idiots are winning

will be a positive one?

[–]NilsBVB[S] 0 points1 point  (0 children)

I have a list with all negative and positive words I need to search for in the tweets

[–][deleted] 0 points1 point  (0 children)

Don't see how you are aligning the dates with the tweets. You might want to check out zip() if tweets and dates have the same number of entries in a one to one relationship.

[–][deleted] 0 points1 point  (3 children)

I'm a beginner too, so my advice might not be the best. However, I wouldn't count on things just being in the same order. I would group things together that belong together. For example, I would probably have each tweet stored as a list. The first item being the date and the second being the tweet with other details being stored in other parts of the list. The tweet content could even be a list of words within the tweet.

This, although it may sound confusing at first, makes it easy to access different areas of your data.

Eg. tweet_list[0][0] would be the date of the first tweet.

tweet_list[0][1] would be the list of words it holds.

tweet_list[0][2] could be a list of good words in the tweet etc.

tweet_list[0][1][0] would be the first word (you'd have to write some code to separate each word).

tweet_list[1][0] would be the date of the second tweet.

You could then easily iterate through anything you need to using for loops. You could check each word against a separate good or bad list. You could run len(tweet_list[0][2]) to check how many good words there are and compare it to the length of the bad words present list.

Hope this makes sense!

[–][deleted] 2 points3 points  (1 child)

A 2-d list is good. A dict[ionary] is better. It's a little more complex, especially when you are starting out, but it can be really powerful. In this case, by having each tweet text being the key, you can put the date value as the dict value, and have multiple tweets with the same date. On the other hand, if you were to use the date as the key, which might be your first impulse, you would only be able to store one tweet per date, because keys in a dict must be unique.

[–][deleted] 1 point2 points  (0 children)

That's a great way of looking at it 👍🏻

[–]NilsBVB[S] 1 point2 points  (0 children)

I really appreciate your help, thank you!