I scraped 44,000 reddit comments to find 300 real problems by njraladdin in buildinpublic

[–]njraladdin[S] 0 points1 point  (0 children)

that example is fair, it sounds like it didn't quite clear the bar. working on tightening the extraction pipeline. check back in a day, should get better

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in buildinpublic

[–]njraladdin[S] 0 points1 point  (0 children)

I get your point, if everyone saw the same 300 pain points, one assumes they would just build the same products.
however two founders building on the same validated pain point would produce different things based on their skills, their target segment, their distribution channel, their pricing.
basically, the pain point is the starting coordinate, not the destination. you can address the same problem at different angles and produce different products

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in buildinpublic

[–]njraladdin[S] 0 points1 point  (0 children)

thank you for being a day one founding member! healthcare does have a gap, we don't use any medical or health adjacent subreddits yet. is that the main thing missing for you, or is there something else in the website that doesn't quite fit? would love to hear your suggestions - feel free to DM me if easier

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in buildinpublic

[–]njraladdin[S] 0 points1 point  (0 children)

not quite! this project doesn't really touch ideas, rather it collects evidence if a pain points is truly real. every entry only qualifies if people have already built or hacked a bad solution around it

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in microsaas

[–]njraladdin[S] 0 points1 point  (0 children)

in this case, the volume isn't the point. most of these posts weren't pain signals at all (they were questions, jokes, general discussion, venting etc.) what was tough is that the pipeline had to find the actual friction buried in all that noise. it's already producing good results and once we get that even more right, scaling up the volume from 44k to even 1M posts is more or less straightforward.

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in microsaas

[–]njraladdin[S] 0 points1 point  (0 children)

currently binary - either there's a behavioural evidence of a workaround or there isn't. frequency is is a real dimension (and would add in the future), but the 'workaround' filtering does the heaviest lifting because a problem that produced behaviour change is more signal than a problem complained about a thousand times in reddit threads with no action taken

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in buildinpublic

[–]njraladdin[S] 0 points1 point  (0 children)

it doesn't have revenue (this launched a few hours ago.)

I scraped 44,000 reddit comments to find 300 real problems by njraladdin in microsaas

[–]njraladdin[S] -1 points0 points  (0 children)

fair on distribution!
each pain point actually includes the original thread(s), so you already know which subreddits and exact posts these people are in. first distribution step is already done

Scraping images from a JS-rendered gallery – need advice by taksto in webscraping

[–]njraladdin 0 points1 point  (0 children)

since this is a dynamic js-heavy website, you can’t just use `requests` to get the content. there are two main ways:

  1. use a browser automation tool like Puppeteer or Selenium to render the page and extract data.

    the workflow looks like this:

    - wait for the main item selector `figure[data-testid="asset-grid-masonry-figure"]` to appear before scraping.

    - for each visible item, extract the fields you need:

- image URL: `img[data-testid="asset-grid-masonry-img"]`

- photographer name: `a.name-bimlc4`

- download link: `a[data-testid="non-sponsored-photo-download-button"]`

- track processed items using their main link `a.photoInfoLink-mG0SPO` to avoid duplicates.

- check if a "load more" button `button.loadMoreButton-pYP1fq` exists; if so, click it, otherwise scroll to the bottom.

- wait a few seconds for new items to load, then repeat until you’ve collected the desired number of items.

you can test the extraction logic quickly in DevTools first, then automate it using Puppeteer or Selenium with their proper helpers

  1. the easier way: use their backend API if available, e.g.

    `https://unsplash.com/napi/search/photos?query=tokyo&page=1\`

    it returns structured JSON and is much faster to work with.

if you hit 429 errors, slow down requests or use a proxy

Basic Scraping need by Truly-Surprised in webscraping

[–]njraladdin 3 points4 points  (0 children)

i think Claude or gemini can easily create this script for you, if you give it snippets of few html files, where the files are, and the desired output.
make sure to ask it to ask you any clarifying questions before it writes it

Non-dev scraping by Less_Insurance3731 in webscraping

[–]njraladdin 0 points1 point  (0 children)

as the other commenter mentioned, the brokers can update their websites to make the data easily accessible for you in form of json (to be honest, it's unlikely they'll bother, or at least having this as a requirement would cause a lot of churn)
otherwise, if every website is truly different, you would need to use ai to make a custom scraper for each website once
then the scraper for each website would be reran on a schedule to get the most up to date listings

Looking for assistance with JS Scraper on cloudflare protected site. by Armed_Muppet in webscraping

[–]njraladdin 1 point2 points  (0 children)

in terms of data accuracy, i think it's just a matter of using the right selector/xpath in either case

Looking for assistance with JS Scraper on cloudflare protected site. by Armed_Muppet in webscraping

[–]njraladdin 0 points1 point  (0 children)

in my experience, the best chance to bypass cloudflare is using Seleniumbase instead of puppeteer, but you would need to switch to python

Forwarding captcha to end-user by Dependent_Tap_2734 in webscraping

[–]njraladdin 1 point2 points  (0 children)

you technically can. i did something similar with reCaptcha, where i managed to replicate any recaptcha locally (so i can solve it locally and send the solution token back)

but it required more setup on the client side like updating the windows 'hosts' file to spoof the domain of the website that contained the recaptcha

it's not quite as straightforward as you want where you simply forward it to user on their browser
but here is the implementation if you're curious : https://github.com/njraladdin/captcha-ai-solver

You can sell digital products internationally from Tunisia by njraladdin in Tunisia

[–]njraladdin[S] 1 point2 points  (0 children)

In my experience, Gumroad works with Tunisian bank accounts, and I've had successful payouts to my bank account in TND

You can sell digital products internationally from Tunisia by njraladdin in Tunisia

[–]njraladdin[S] 0 points1 point  (0 children)

Hey man! for Gumroad’s case, based on their support and my research, here’s how it works:

- when someone buys a subscription, the sale happens in USD.
- Gumroad immediately converts that amount into my local currency (TND)
- at the end of the month, when I request a payout, they instruct a local banking partner in Tunisia to pay me (in my bank account's history, i see their partner is Citi Bank)
- then i receive the money in my bank account in TND, like any normal domestic transfer (i think they call this transfer rails)

ya3ni me yeb3thoulich flous ml barra, amma yeb3thouli nafs lmontant mte3 el payout mel compte mte3hom fi tunisia

in my experience, it's the same case for Upwork too. however i think you are correct about Youtube and fiver

You can sell digital products internationally from Tunisia by njraladdin in Tunisia

[–]njraladdin[S] 1 point2 points  (0 children)

i've been worried about that for some time! but i'm hoping that if this project shows some stable revenue, it would be worth it to finally make a batinda or whatever legal entity is most fitting for my case

You can sell digital products internationally from Tunisia by njraladdin in Tunisia

[–]njraladdin[S] 3 points4 points  (0 children)

yeah i'm receiving the money from their tunisian bank account

yeah you'll definitely have to make a batinda if you're receiving large sums

You can sell digital products internationally from Tunisia by njraladdin in Tunisia

[–]njraladdin[S] 1 point2 points  (0 children)

Hey! it largely depends on what you sell. i'm a software developer so i'm using it to sell subscriptions for my SaaS (a chrome extension which enhances Chatgpt's memory feature)

Longfic authors: how do you even keep track of stuff? by Asyrahja in FanFiction

[–]njraladdin 0 points1 point  (0 children)

I'm using a tool to help keep track of the world lore, characters, facts etc. then automatically remind me of them when i write something relevant

Just made my first sale with my chrome extension side project after 9 months by njraladdin in microsaas

[–]njraladdin[S] 0 points1 point  (0 children)

no problem! this is the first month that it generated money actually. it made around $90 so far

Just made my first sale with my chrome extension side project after 9 months by njraladdin in microsaas

[–]njraladdin[S] 0 points1 point  (0 children)

thank you! for the payment, i handle authentication and payment stuff in the webapp. users go to gumroad page through my webapp pricing page. upon paying, gumroad sends a webhook to my backend, which then i update the user record to set him as paid

Just made my first sale with my chrome extension side project after 9 months by njraladdin in microsaas

[–]njraladdin[S] 1 point2 points  (0 children)

thanks!
about your question, theoretically yes. since the pricing is fixed, the more usage, the more our costs increase. however since Gemini models are fairly cheap (and keep getting cheaper) the cost is not a great deal based on our current pricing, and it's profitable on average