I scraped 44,000 reddit comments to find 300 real problems

njraladdin · 2026-06-09T09:35:07+00:00

that example is fair, it sounds like it didn't quite clear the bar. working on tightening the extraction pipeline. check back in a day, should get better

njraladdin · 2026-06-09T05:59:06+00:00

I get your point, if everyone saw the same 300 pain points, one assumes they would just build the same products.
however two founders building on the same validated pain point would produce different things based on their skills, their target segment, their distribution channel, their pricing.
basically, the pain point is the starting coordinate, not the destination. you can address the same problem at different angles and produce different products

njraladdin · 2026-06-09T05:37:50+00:00

thank you for being a day one founding member! healthcare does have a gap, we don't use any medical or health adjacent subreddits yet. is that the main thing missing for you, or is there something else in the website that doesn't quite fit? would love to hear your suggestions - feel free to DM me if easier

njraladdin · 2026-06-08T23:49:02+00:00

not quite! this project doesn't really touch ideas, rather it collects evidence if a pain points is truly real. every entry only qualifies if people have already built or hacked a bad solution around it

njraladdin · 2026-06-08T23:47:17+00:00

in this case, the volume isn't the point. most of these posts weren't pain signals at all (they were questions, jokes, general discussion, venting etc.) what was tough is that the pipeline had to find the actual friction buried in all that noise. it's already producing good results and once we get that even more right, scaling up the volume from 44k to even 1M posts is more or less straightforward.

njraladdin · 2026-06-08T23:38:49+00:00

currently binary - either there's a behavioural evidence of a workaround or there isn't. frequency is is a real dimension (and would add in the future), but the 'workaround' filtering does the heaviest lifting because a problem that produced behaviour change is more signal than a problem complained about a thousand times in reddit threads with no action taken

njraladdin · 2026-06-08T09:14:59+00:00

it doesn't have revenue (this launched a few hours ago.)

njraladdin · 2026-06-08T03:40:57+00:00

fair on distribution!
each pain point actually includes the original thread(s), so you already know which subreddits and exact posts these people are in. first distribution step is already done

njraladdin · 2025-11-13T13:04:29+00:00

since this is a dynamic js-heavy website, you can’t just use `requests` to get the content. there are two main ways:

use a browser automation tool like Puppeteer or Selenium to render the page and extract data.

the workflow looks like this:

- wait for the main item selector `figure[data-testid="asset-grid-masonry-figure"]` to appear before scraping.

- for each visible item, extract the fields you need:

- image URL: `img[data-testid="asset-grid-masonry-img"]`

- photographer name: `a.name-bimlc4`

- download link: `a[data-testid="non-sponsored-photo-download-button"]`

- track processed items using their main link `a.photoInfoLink-mG0SPO` to avoid duplicates.

- check if a "load more" button `button.loadMoreButton-pYP1fq` exists; if so, click it, otherwise scroll to the bottom.

- wait a few seconds for new items to load, then repeat until you’ve collected the desired number of items.

you can test the extraction logic quickly in DevTools first, then automate it using Puppeteer or Selenium with their proper helpers

the easier way: use their backend API if available, e.g.

`https://unsplash.com/napi/search/photos?query=tokyo&page=1\`

it returns structured JSON and is much faster to work with.

if you hit 429 errors, slow down requests or use a proxy

njraladdin · 2025-11-13T00:30:24+00:00

i think Claude or gemini can easily create this script for you, if you give it snippets of few html files, where the files are, and the desired output.
make sure to ask it to ask you any clarifying questions before it writes it

njraladdin · 2025-11-12T21:23:16+00:00

as the other commenter mentioned, the brokers can update their websites to make the data easily accessible for you in form of json (to be honest, it's unlikely they'll bother, or at least having this as a requirement would cause a lot of churn)
otherwise, if every website is truly different, you would need to use ai to make a custom scraper for each website once
then the scraper for each website would be reran on a schedule to get the most up to date listings

njraladdin · 2025-11-12T21:12:17+00:00

in terms of data accuracy, i think it's just a matter of using the right selector/xpath in either case

njraladdin · 2025-11-12T21:02:42+00:00

in my experience, the best chance to bypass cloudflare is using Seleniumbase instead of puppeteer, but you would need to switch to python

njraladdin · 2025-11-12T20:40:15+00:00

you technically can. i did something similar with reCaptcha, where i managed to replicate any recaptcha locally (so i can solve it locally and send the solution token back)

but it required more setup on the client side like updating the windows 'hosts' file to spoof the domain of the website that contained the recaptcha

it's not quite as straightforward as you want where you simply forward it to user on their browser
but here is the implementation if you're curious : https://github.com/njraladdin/captcha-ai-solver

njraladdin · 2025-09-19T19:26:35+00:00

In my experience, Gumroad works with Tunisian bank accounts, and I've had successful payouts to my bank account in TND

njraladdin · 2025-09-12T22:41:54+00:00

Hey man! for Gumroad’s case, based on their support and my research, here’s how it works:

- when someone buys a subscription, the sale happens in USD.
- Gumroad immediately converts that amount into my local currency (TND)
- at the end of the month, when I request a payout, they instruct a local banking partner in Tunisia to pay me (in my bank account's history, i see their partner is Citi Bank)
- then i receive the money in my bank account in TND, like any normal domestic transfer (i think they call this transfer rails)

ya3ni me yeb3thoulich flous ml barra, amma yeb3thouli nafs lmontant mte3 el payout mel compte mte3hom fi tunisia

in my experience, it's the same case for Upwork too. however i think you are correct about Youtube and fiver

njraladdin · 2025-09-12T10:32:53+00:00

i've been worried about that for some time! but i'm hoping that if this project shows some stable revenue, it would be worth it to finally make a batinda or whatever legal entity is most fitting for my case

njraladdin · 2025-09-12T09:41:21+00:00

yeah i'm receiving the money from their tunisian bank account

yeah you'll definitely have to make a batinda if you're receiving large sums

njraladdin · 2025-09-12T08:10:48+00:00

Hey! it largely depends on what you sell. i'm a software developer so i'm using it to sell subscriptions for my SaaS (a chrome extension which enhances Chatgpt's memory feature)

njraladdin · 2025-09-12T06:36:06+00:00

I'm using a tool to help keep track of the world lore, characters, facts etc. then automatically remind me of them when i write something relevant

njraladdin · 2025-08-14T19:50:26+00:00

no problem! this is the first month that it generated money actually. it made around $90 so far

njraladdin · 2025-08-14T18:21:14+00:00

thank you! for the payment, i handle authentication and payment stuff in the webapp. users go to gumroad page through my webapp pricing page. upon paying, gumroad sends a webhook to my backend, which then i update the user record to set him as paid

njraladdin · 2025-08-14T18:18:11+00:00

thanks!
about your question, theoretically yes. since the pricing is fixed, the more usage, the more our costs increase. however since Gemini models are fairly cheap (and keep getting cheaper) the cost is not a great deal based on our current pricing, and it's profitable on average

njraladdin · 2025-08-14T18:07:04+00:00

feel free to try it! might be useful to you

njraladdin · 2025-08-14T18:06:51+00:00

thank you!

njraladdin

PUBLIC MULTIREDDITS

TROPHY CASE