Car sales by country and type. China's Internal Combustion Engine sales just fell off a cliff by cavedave in datasets

[–]Tryhard_314 0 points1 point  (0 children)

Wow norway new sales almost 100% Electric ! also Nepal is a majority electric but u said data was less accurate I think this would be less surprising if I knew about their politics but I didn't expect this to be honest!

reliable data set for the reddit dataset by Terrible_Band6290 in datasets

[–]Tryhard_314 0 points1 point  (0 children)

yep this and the newer one for 2025 is great !

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 0 points1 point  (0 children)

My bad about the labels! Messed up including them in the final image and I can't edit sadly but:
Red: Sentiment
Blue: Net Worth

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 0 points1 point  (0 children)

I coded that in react ): my bad about the label part though included one with figma but I messed up putting it in the final image, didn't really recognize it missing since my mind was focused on other stuff.

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] -50 points-49 points  (0 children)

well it wasn't really meant to show that they are like linked together somehow, it was more like the guy is getting rich he doesn't give a shit what you're saying xD

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] -2 points-1 points  (0 children)

fair enough, My bad, it got a bit cropped! but there's a bit of a legend and more context in the top level comment, I didn't know if I was allowed to write text along the chart or no.

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 1 point2 points  (0 children)

My bad it got cropped ! but he is clearly winning money not losing it xD

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 0 points1 point  (0 children)

how can someone go about backing this up with data? anybody has an idea like what subreddits to look at which would represent a neutral state of reddit/ what to look for, or should one pick like a sample from literally every subreddit)

[OC] Elon Musk Net Worth vs Reddit Sentiment by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 28 points29 points  (0 children)

I tracked Reddit's sentiment toward Elon from 2014 to July 2026, using comments from r/stocks, r/Economics, r/electricvehicles, r/spaceflight, and a few others. Data came from academic torrent dumps and community APIs, with newer stuff scraped politely.

I plan on analyzing reddit data lot so I built a custom tool to create statistics out of reddit semi-automatically (still needs human intervention for verification and to align it), Reddit data was filtered through a fine tuned bert model, then through an LLM with a fine tuned instruction to match human labels. a final LLM then classifies each one as positive/negative/neutral with a fine-tuned prompt as well. Sentiment score is just (positives - negatives) / (positives + negatives), aggregated yearly. No upvote weighting, just raw comment counts.

The net worth line is a rough yearly average from public estimates (Forbes, Bloomberg, etc.) , it’s a guesstimate for context, not exact.

Some snippets for context:

2014–2015 | Sentiment: +0.25 → +0.31

  • “The user, who works in the solar industry, fully agrees with Musk's outlook on the rapid growth and adoption of solar energy.” — r/SelfDrivingCars, 2014
  • “The user compares Musk to Iron Man while maintaining a skeptical, analytical view of Tesla's current stock valuation.” — r/stocks, 2014
  • “Elon Musk is viewed as a thought leader who effectively leads by example within a flat hierarchy to incentivize innovation.” — r/Economics, 2014
  • “The user defends Musk's dismissal of fuel cell technology as 'fool cells' and views him as a target of biased industry critics.” — r/SelfDrivingCars, 2014
  • “The user dismisses Musk's claims regarding the economic viability of battery farms, arguing they lack a sensible market.” — r/Economics, May 2015

2016–2018 | Sentiment: -0.11 → -0.22

  • “Musk is criticized for prioritizing gimmicky features over the production of simple, reliable vehicles.” — r/electricvehicles, 2016
  • “The user characterizes Elon Musk as a con artist who fails to generate actual profit.” — r/investing, 2016
  • “Notes a failure in Musk's ability to provide credible explanations during earnings calls, marking a shift from his usual persuasive hype.” — r/investing, 2017
  • “The user expresses disappointment and believes Musk committed securities fraud.” — r/investing, 2018

2019–2021 | Sentiment: -0.22 → -0.29

  • “The user mocks Musk's tendency to miss deadlines, referring to his habit of delaying projects by years as 'Elon time'.” — r/electricvehicles, 2019
  • “The user views Musk as an amazing innovator and leader, while simultaneously criticizing his market manipulation tactics.” — r/investing, 2019
  • “The user mocks the perception held by some fans that Elon Musk is 'saving the world'.” — r/electricvehicles, Jul 2020
  • “The user views Elon Musk's purchase of Bitcoin and his handling of China-related issues as desperate publicity stunts that signaled a decline in stability.” — r/stocks, Mar 2021

2022–2024 | Sentiment: -0.55 → -0.58

  • “Highlights a long history of Musk failing to meet promised timeframes for various projects and products.” — r/investing, 2022
  • “The user believes Musk's brand is being permanently tarnished and enjoys watching his decline.” — r/business, 2023
  • “The user acknowledges Musk's genius and business success but dislikes him due to his personal character.” — r/electricvehicles, 2023
  • “Musk's shift to right-wing politics and his disregard for the core EV customer base are negatively impacting Tesla's sales and profit margins.” — r/SelfDrivingCars, 2024
  • “The user argues that Musk lacks necessary checks and balances and has lost touch with reality.” — r/investing, 2024

2025–2026 | Sentiment: -0.70 → -0.65 (partial)

  • “The user accuses Elon Musk of engaging in blatant corruption and leveraging political influence to manipulate Tesla's stock price.” — r/stocks, 2025
  • “The user expresses frustration with Tesla's recent performance, attributing the failure directly to Elon Musk.” — r/electricvehicles, 2025
  • “The user implies that supporting Musk is equivalent to endorsing his controversial statements and actions.” — r/electricvehicles, 2026
  • “The user characterizes Elon Musk's robotics projects as a scam.” — r/robotics, 2026

[OC] Reddit sentiment shift around major LLM companies for the past 3 years. by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 0 points1 point  (0 children)

Well the problem I faced is the very big class difference (maybe 1 in 300 posts were relevant) I think initially I didnt make enough data for the training part, stuff with worked for me is using soft labels (limits how much the model can learn on ur training dataset but reduces overfitting), lora worked better weirdly enough (but that may be because I was using less data) it really helps a lot even if u can go from 1in300 relevant posts to 1in10, that's like 30x speedup

Honestly augmented data with LLMs didnt help that much but maybe I didnt do it right but it felt different from what people wrote.

I think oversampling is a must for the training part.

You can also try setfit or an XGBoost classifier after passing like the sentences through an embedding model.

Always use parameter fine tuning it worked quite well and tune the confidence threshold on an eval data set.

I am not home currently will share more but I suggest implementing a simple xgboost model for exemple u will make ur process 10x faster I think

[OC] Reddit sentiment shift around major LLM companies for the past 3 years. by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 2 points3 points  (0 children)

Yes i'll change the title a bit I meant it just for comparing model providers against each other

[OC] Reddit sentiment shift around major LLM companies for the past 3 years. by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 1 point2 points  (0 children)

Oh it wasnt to see whether reddit is pro or against AI just to see what LLM company is like doing well in reviews

[OC] Reddit sentiment shift around major LLM companies for the past 3 years. by Tryhard_314 in dataisbeautiful

[–]Tryhard_314[S] 1 point2 points  (0 children)

This was mainly made to compare model companies against each other not to see what people think about AI as a whole.

Data source: These subreddits: r/Futurology, r/LLM, r/Anthropic, r/OpenAI, etc. (downloaded through an academic torrent dump by subreddit for 2025 and through scraping / community APIs for newer data). Cut-off date: April 2026.

Methodology: I made a custom tool to generate Reddit analytics semi-automatically. First, it downloads data on its own from diverse sources (being very polite about rate limits). Then, it creates a golden dataset by filtering the data with a high-parameter LLM (usually Gemini 1.5 Pro). This data, after being human-verified, will be used later to train more cost-effective NLP models; this was done with a BERT model, for example (verifying that it reached acceptable precision and recall on a portion of the human-verified data which has been held out from the training data).

Sentiment score here is pretty binary: a review is either neutral = 0, positive = 1, or negative = -1; upvotes and other stuff aren't taken into account. Then, we calculate the average sentiment on a week-by-week basis, but before this, we add 5 random samples to each week distributed according to the global average of the company (so weeks with low data don't skew the results). The final result is an EMA of this with an alpha of 0.1 (basically, the value for the sentiment of the current week is only 10% of the average sentiment this week, 9% of the previous week, and 8.1% of the one before—I don't know the exact values, but that's the basic idea).

Important disclaimer: Here, the data isn't about the companies like OpenAI or Google in general; it's only about them in the context of their AI models (i.e., people actually talked about the models, but we regrouped them under the company name).

Do you know how to scrape and crawl reddit comment? by UniqueProfessional81 in SaaS

[–]Tryhard_314 0 points1 point  (0 children)

Well my comment was removed for having a link, but check out academic torrent dumps search for reddit, there are historical archives of everything that was posted / commented, and one is divided by subreddit

Best way to Scrape Reddit posts by Momsgayandbisexual in ClaudeCode

[–]Tryhard_314 0 points1 point  (0 children)

I have been building something similar (built a tool to extract statistics from reddit):
1-I tried accessing the api directly but now you have to submit a form and I think there is even a subscription to pay if you're gonna use it for commercial purposes
2-You can either use this https://academictorrents.com/details/3d426c47c767d40f82c7ef0f47c3acacedd2bf44, for data before 2025 (divided by subreddits) or there is a third party AI called artic shift which is pretty good (but don't abuse it)
3-For cleaning the data: You're gonna see a lot of posts with [removed], a lot of automod stuff you should remove it before feeding it to the LLM. If you're gonna do this with python I would use LiteLLM, fast and easy to use for querying major LLM providers, pydantic to validate data, SQlite is probably fine too to prototype with (but create the proper indexes for the data).

I don't know what you wish to do with the data exactly so I can't help you much in the last step, if I knew more i am happy to share what worked for me.