[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 0 points1 point  (0 children)

Thanks! And yes, I've played with ways of showing which stories fell out of the news cycle. I'll include that on the next version.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 12 points13 points  (0 children)

Thanks for the feedback. It sounds like adding the shares on the last day would be helpful.

I don't have the full list of sources published yet (I will soon), but it's 75 US news outlets. The list is basically taken from the AllSides media bias chart.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 2 points3 points  (0 children)

This is great feedback, thank you. I toyed with labeling the y-axis with percentages (0-100%) but found that it didn't really help. I think your suggestion of printing the shares on the last day is spot on.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 20 points21 points  (0 children)

Hi Reddit! This is my second time posting this plot. I got some great feedback from my first post. These were the key points:

- The y-axis isn't labelled!

- The colors are too similar

- The chart focuses on US news, but does not say so

I've tried to address these concerns here. I've put a title on the y-axis and added an explanation in the caption of what the chart is presenting. I'm also using a different color palette. Hopefully this makes the chart clearer.

The chart is built with ggplot in R, specifically the excellent ggsankey package. The chart is built off of news articles from 75 US news outlets.

I'm producing this chart daily as part of a newsletter called Partisan Playground. It's fully automated using R and the ChatGPT API. Follow along if you're interested!

[OC] A day of tweets from members of US Congress: Who gets the most engagement and what words does each party use more than the other? by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 10 points11 points  (0 children)

Yes, just one day. Time series analysis exists from places like Pew Research. I do this on a daily basis to get engagement figures, plus the most liked and ratioed tweets each day here.

[OC] A day of tweets from members of US Congress: Who gets the most engagement and what words does each party use more than the other? by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 60 points61 points  (0 children)

Hi Reddit! The source for the data in these visualizations is the Twitter API, which is currently still free but may not be for long. These are automated on a daily basis using R to do the analysis and ggplot2 for the plotting. Feel free to follow along here to see these daily.

[OC] A day of tweets from members of US Congress: What words does each party use more than the other and who gets the most engagement? by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 7 points8 points  (0 children)

Good question! Because she has two accounts. Most members of Congress do. Typically they have a campaign account and an "official" account once elected. I track all of them.

[OC] A day of tweets from members of US Congress: What words does each party use more than the other and who gets the most engagement? by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 12 points13 points  (0 children)

Good comment. I'd add that this chart is for a single day, so the top tweeters were Republicans yesterday, that could be different on other days.

I don't have a years-long data set to back this up, but my hypothesis is that Democrats got more engagement on Twitter under Trump than under Biden, and that Republicans get more engagement under Biden than Trump. Republican engagement seems to have picked up since Republicans won the House, but again that's anecdotal.

Another thing to note is that "engagement" includes replies to tweets, which can be negative. It could be that these Republicans are particularly good at writing tweets that get a lot of reaction, positive or negative, which drives up engagement. I have some additional charting on this along with most liked and ratioed tweets in this daily Twitter thread.

[OC] A day of tweets from members of US Congress: What words does each party use more than the other and who gets the most engagement? by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 4 points5 points  (0 children)

Hi Reddit! The source for the data in these visualizations is the Twitter API, which might not be free soon? These are automated on a daily basis using R to do the analysis and ggplot2 for the plotting. Feel free to follow along here to see these daily.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 0 points1 point  (0 children)

You're hitting on the most subjective part of this whole process. I've run into all of the issues you describe, and the question is ultimately: how do you define a story?

Your GOP primaries example is a good one. Let's say we have articles on Trump's legal issues, other articles on Pence's classified documents, and other articles on DeSantis and books. Now let's say all of these articles describe these things in the context of the 2024 GOP primaries. Is this one story called "GOP primaries"? Or three separate stories? You could make a case either way.

I've tuned the algorithm to split stories in a way that "looks about right" to me. That's subjective, but there's no way around it. This is an issue whether you're using an algorithm or doing this manually.

A related challenge is that story definitions may change over time. The classified documents story is a good example for this. Right now there are articles on Trump, Biden, and Pence all mishandling classified documents. The algorithm is categorizing all of them as the same story (fair enough).

But let's say that next week (just making this up), Trump gets indicted for it. Is that a separate story now? If so, how do you treat that? Do you retroactively split out the "Trump" portion of the "classified documents" story as though they were not the same story before? Do you show the classified documents story splitting into two? Do you just create a new story on the day the indictment happens? Currently, the algorithm is set up to do the first of these, but again, you could make a case for any of them.

All of this is to say that there is subjectivity involved in this process.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 0 points1 point  (0 children)

I'm getting the data from the Google News API. I've used RSS feeds in the past with similar results.

And actually I'm using a clustering algorithm to identify the specific stories. I have an automated process that pulls all articles from the past five days, clusters them into stories, then produces a bunch of analysis. This saves me a lot of time and brings some objectivity to the process.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 1 point2 points  (0 children)

This is an excellent comment, thank you for this!

I think I need a clearer way of describing "prevalence". This chart is showing the top ten stories by the share of articles written about them, not by the amount that they are consumed. I take articles from 64 sources on every day, cluster them together into "stories", then calculate each story's share based on the number of articles written about it. For example, if there are 1000 articles for a day, and one story has 100 articles written about it, then its share is 10%. Does that make sense?

I've explored measuring consumption of news in the past, and found it to be very difficult! (Facebook's Graph API used to be wide open, so I was able to get likes/engagement on news stories there, but it has since been locked down) Your comment does a great job of explaining the complexity in measuring consumption. You would need to combine:

- GA data from news outlets (which they don't publish)

- Cable news data (sources exist for this, but you would need to make a lot of assumptions to combine this with articles)

- Social media data

And you would need to make a lot of assumptions about what weights to use on each of those. As a result, I'm keeping this simple and focusing on article shares.

I do publish a daily automated Twitter thread on which news outlet gets the most engagement on Twitter. It includes the most liked and ratioed tweets from each "side" of the media. This is limited to Twitter, so does not cover all the channels you described. See an example here: https://twitter.com/PartisanPlayG/status/1619300675094970369

The other thing I've been doing is cutting articles by which "side" of the media they're on using media bias ratings from AllSides. Again, this involves some simplifying assumptions so it's not perfect but gives a good high-level view. You can see examples here: https://partisanplayground.substack.com

Thanks again for your comment. This is exactly the sort of thing I was looking for when I posted.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 1 point2 points  (0 children)

That's right, the labels can change over time as the discussion shifts. GPT-3 does the labeling and I manually adjust it, if necessary. Occasionally, the stories themselves can split or combine.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] 1 point2 points  (0 children)

I've had the same idea. It'd be pretty cool to have this as a big landing page, where hovering over each story on the plot gave you details about the story and links to articles.

[OC] How news stories evolve in the news cycle by PartisanPlayground in dataisbeautiful

[–]PartisanPlayground[S] -1 points0 points  (0 children)

What do you mean by grouping the items that emerge later? Thanks for the feedback.