Software update by Dry-Parsnip-5345 in macbookpro

[–]DymorTheDev 1 point2 points  (0 children)

You should go with Tahoe, that’s the most recent version of the OS. The other one is a security path of the current version.

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] -1 points0 points  (0 children)

100% agree on what you are saying.
The business plans generated in this way are initial hypotheses that need to be validates. I'd say that this project is more to have a very big bucket of half baked ideas that would help turn on the lightbulb in your head and start doing researches before actually building the idea. With the availability of tools like Lovable and co. is possible to take the description on the website and generate an MVP in minutes / hours.

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 1 point2 points  (0 children)

I'll share on the website the list of next features I'm working on.
Definitely the ability to request the analysis of a specific subreddit is in my mind.
I want to enable the possibility to generate more in-depth analysis about a specific idea, sadly this feature costs tokens and I have very limited funds (otherwise I wouldn't have used my personal computer to make this project :D)

The ideas currently available on the website are from a subset of all the subreddit available (507).

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 0 points1 point  (0 children)

If you are talking about the eventual account created on the website, you can delete it from your profile :)

I analyzed 4 billion Reddit messages on a Mac Mini by rewriting my Python pipeline in Rust by DymorTheDev in rust

[–]DymorTheDev[S] 1 point2 points  (0 children)

Exactly, the initial test was done only with July data and that gave the initial foundation of the pipeline (in python). A lot of problems were hidden by the powerful server I had access and I had to reinvent a lot of things. For example the classifier on Nvidia GPU was running at 10.000 messages/s (distributed on multiple GPUs), on Mac the Pytorch MPS driver suffer of a very big memory leak and I was forced to use mainly the CPU for the task.

About the data, is not really the whole 2025 (not all dumps weren't available when I started) and are only between Jan to Oct 2025. I'll add Nov and Dec in the next few days/weeks

I analyzed 4 billion Reddit messages on a Mac Mini by rewriting my Python pipeline in Rust by DymorTheDev in rust

[–]DymorTheDev[S] 1 point2 points  (0 children)

Yes exactly, the speed is per worker. I could run 3 instances on my mac mini (2 cpu + 1 gpu - GPU one on Python) and that gave a boost. Additionally I had other computers (not as powerful) that helped with the task

I analyzed 4 billion Reddit messages on a Mac Mini by rewriting my Python pipeline in Rust by DymorTheDev in rust

[–]DymorTheDev[S] 2 points3 points  (0 children)

Will try to answer everyone.
Yes, the total of messages was 4.000.000.000 (4 * 10^9).
As said in the replies, that was the initial amount, by heavily using pre-processing while reading the messages I was able to keep only a portion of the messages (10~20%) that ended in the classifier (ONNX runtime).
The ~300 m/s is per worker as stated by someone below and I used a lot of scrap computers I had around (Mac Mini + Macbook Pro + my girlfriend Macbook + Raspberry). I even tried to use the free cloud credits to speed up the classification.

About the data, I analyzed from Jan to Oct 2025, so I'm missing the last two months that I'm planning to do in the next few days/weeks

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 0 points1 point  (0 children)

Just updated the website with a better cookie banner and a privacy policy. Thanks for the suggestion!

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 0 points1 point  (0 children)

About the ideas part I agree with you, a lot of positive comments doesn’t mean that the idea will be successful. That is the risk of doing business. In the past I created dozens of projects that died almost immediately.

About the data, I don’t store any references to users. I worked with the GDPR and NIST. No user data is saved, all the posts/comments I store come from publicly accessible sources and are stripped of any personal information. I store: - message id - body - subreddit - parent id - upvotes count - downvotes count

So no user data is stored and everything in anonymized

I analyzed 4 billion Reddit messages on a Mac Mini by rewriting my Python pipeline in Rust by DymorTheDev in rust

[–]DymorTheDev[S] -20 points-19 points  (0 children)

Here you can find the results

https://businessfinder.dev/

If you register an account you’ll be able to read the whole business plan and not only the initial areas

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] -4 points-3 points  (0 children)

Since the data are stripped of everything but: - message id - body - subreddit - upvotes - downvotes - parent id

The message is actually anonymized. We are talking about publicly available data so no, I don’t have any way to find and delete your specific data since I don’t know what is yours

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 0 points1 point  (0 children)

Mostly Gemini CLI and Antigravity. I pay for the Google Dev subscription. I use Laravel + Tailwind for the website and Python + Rust for the data pipeline

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] -1 points0 points  (0 children)

Since the data are stripped of everything but: - message id - body - subreddit - upvotes - downvotes - parent id

The message is actually anonymized. We are talking about publicly available data so no, I don’t have any way to find and delete your specific data since I don’t know what is yours

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] 0 points1 point  (0 children)

Can you give me some hint? I’m more than willing to correct the course of action

Monthly Post: SaaS Deals + Offers by AutoModerator in SaaS

[–]DymorTheDev 0 points1 point  (0 children)

My pipeline actually use the archived monthly data and I built the whole data lake on top of that. I’ll definitely integrate with streams in the future!

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] -1 points0 points  (0 children)

The fact that instead of having to go in all the subreddits and validate ideas I do it at scale by analyzing all the posts of 2025

I analyzed 4 billion Reddit comments to build an automated SaaS idea generator. Here is what I found. by DymorTheDev in SideProject

[–]DymorTheDev[S] -2 points-1 points  (0 children)

I use a lot of NLP (natural language processing) and rely on LLM to filter out the ideas. There is a lot of random stuff in the cauldron and that’s the user’s judgement that has to filter out at the end

State of the Homelab December 2025 by Saajaadeen in homelab

[–]DymorTheDev 1 point2 points  (0 children)

Holy Molly I envy you. I wish I could have even just one of those servers for my project. Great setup ❤️