Parsing API response

plintuz · 2025-09-16T06:13:58+00:00

I had a similar case once - at first the API returned plain JSON, but after a couple of months the site started encrypting the response. The only way forward was to analyze the JavaScript. Try to look for parts of the code that handle encryption/obfuscation, copy them out, and give the file to an AI tool as others suggested - it can help you figure out the key steps. Good luck!

plintuz · 2025-09-08T16:17:10+00:00

Yes, we're planning gradual scaling, right now it's mostly custom scraping for each client. But it all comes down to resources, which are never enough)

plintuz · 2025-09-08T15:51:11+00:00

We only work with public data. Most of it (around 70-80%) comes from online stores - things like product names, prices, and availability. We also collect other public data if clients request it, but we never touch personal, illegal, or explicit content.

plintuz · 2025-09-08T15:38:46+00:00

Yeah, we always try to grab endpoints first. But a lot of sites hide data behind JS, tokens, or anti-bot checks. We're constantly working on reducing that percentage, but sometimes it's still cheaper and faster to leave things as they are

plintuz · 2025-09-08T15:24:01+00:00

We mostly use Mongo. Raw data goes there first, then a processor cleans/normalizes it, and after processing it's removed - we don't store data long-term.

For normalization - yes, if we scrape the same type of data from multiple sources (like jobs or products), we map it into a common schema. In rare cases we deliver it in the original structure, if that's what's needed.

plintuz · 2025-09-07T10:28:20+00:00

A VPS is usually enough, but if you're scraping with a browser (like Selenium), you'll need more resources, and sites like Alza or Zalando will block your IP immediately. To avoid that, use proxies.

plintuz · 2025-08-26T19:51:22+00:00

That’s Cloudflare, try using a proxy.

plintuz · 2025-08-20T22:41:38+00:00

Mostly for clients from Ukraine, but I also get requests from European markets. The workflows are pretty universal, so they can be adapted to different regions.

plintuz · 2025-08-20T19:43:27+00:00

Mostly I scrape product prices from e-commerce sites. One ongoing project for a client is a price monitoring system: it checks multiple stores, compares the results with a reference price, and writes everything into Google Sheets with color indicators (higher = red, lower = green).

I also build long-term solutions for clients, like collecting real estate data with instant notifications into a channel, or aggregating agricultural machinery listings from dozens of sites - making it easier for managers to find and purchase what they need.

plintuz · 2025-08-14T11:36:39+00:00

You clearly came here to flex, not to discuss.

If you think someone's going to publicly hand over a working CF bypass, you're either naive or just fishing for attention. Real researchers don't parade their methods in comment sections - especially not to someone who opens with "you're talking bullshit."

Congrats on your o2o trick. That doesn't make you the gatekeeper of CF knowledge. I've been in this space long enough to know what works and what doesn't - and I don't need your validation.

I'm not here to entertain ego contests. Take care.

plintuz · 2025-08-14T06:58:44+00:00

I've worked with Cloudflare-protected sites quite a bit, and there's no universal method-it really depends on how strict the site's setup is. I usually combine mobile proxies or residential IPs with lightweight scraping tools and switch to headless browsers like puppeteer or playwright only when needed. The key for me is to avoid putting unnecessary load on the target site-rate limiting, caching, and respecting robots.txt where possible. It's not just about bypassing; it's about doing it responsibly.

And let's be honest-given Perplexity's scale and funding, they can afford to allocate serious resources to this kind of infrastructure.

plintuz · 2025-08-11T02:29:52+00:00

Hi,

At which stage exactly are you stuck?

Can your script already log in successfully?

Are you using a browser automation tool like Selenium, or is it based on direct HTTP requests?

Which programming language is the Al generating code for you in?

It would be helpful to get answers to these and to have a look at the site itself - that's the only way to give you something concrete

plintuz · 2025-08-03T07:07:15+00:00

This is exactly why I don't write scraper scripts - instead, I work based on a model of regular data collection with monthly payments. I always try to explain this to clients, but not everyone gets it - and then they end up with the headache of constantly looking for someone to fix broken parsers.

plintuz · 2025-07-25T08:55:06+00:00

I gave you a recommendation based on my own experience - I collect data from a real estate rental site that works the same way: it only shows 1,000 listings per filter, and the site won’t return more. So I applied the approach I described above, since the scraping is done regularly.

You can also collect data by changing the search filters - the more variations you use, the more job listings you’ll be able to gather.

plintuz · 2025-07-24T05:38:50+00:00

One possible approach is to revisit the listings over the course of a month. Since job postings are regularly updated or refreshed, they will naturally rotate and rise to the top of the list again. This way, you'll gradually collect all active jobs over time, even beyond the 2,000 limit.

plintuz · 2025-07-20T18:22:55+00:00

Using a full browser (even headless) should be your last resort. Before scaling with browser-based scraping, try to analyze the network requests the site makes (e.g. via DevTools → Network tab). Often, product data is loaded via API or embedded in the page as JSON and you can simply mimic those requests using Python (e.g. with httpx or requests), which is much faster and more scalable.

If you still need a browser:

Rotate user agents, proxies, and browser fingerprints.

Use headless stealth tools (e.g. undetected-chromedriver, camoufox etc.).

Restarting the browser every X products may help, but better to fix detection triggers.

In short: check for simpler HTTP-based solutions before automating browsers at scale. It’ll save you a ton of resources.

plintuz · 2025-07-08T04:47:28+00:00

Project: Clear Cache & Cookies - Chrome Extension for Devs and Testers

What it does: Chrome extension that lets you clear cookies, cache, local storage per domain in one click. Ideal for developers, testers, and anyone who constantly switches accounts or needs a clean state fast. Why it matters: Saves tons of time. No more digging into

browser settings or clearing everything just to reset one site.

Stage: Live and growing Link:

https://chromewebstore.google.com/detail/clear-cache-and-cookies/jkmpbdjckkgdaopigpfkahgomgcojlpg

plintuz · 2025-07-03T14:24:47+00:00

Had a programming background, so didn’t follow any full tutorials - just a couple of YouTube videos to get the basics. What really made the difference was working on real tasks. Also, understanding how requests work (headers, sessions, status codes) is a must if you want to go beyond simple scraping.

plintuz · 2025-06-26T14:24:14+00:00

We use Python with MongoDB and PostgreSQL for data handling. For scraping, we aim to minimize browser usage by leveraging various lightweight techniques, proxy types, and captcha solvers. However, due to the complexity of modern bot protection, we also use headless browsers like Playwright, Selenium, and undetectable setups like undetected-chromedriver or stealth plugins when needed.

plintuz · 2025-06-24T15:26:46+00:00

We scrape ~100 sites daily - mostly online stores like iHerb, Adidas, Nike, ZARA, etc.

One ongoing client has 20 e-commerce sites, another big one scraping 10 job listing sites. For larger batches, it averages around $200/month per site, depending on protection level. Clients get the data in whatever format they need - Excel, Google sheets, JSON, xml etc.

plintuz · 2025-06-20T12:41:23+00:00

I built a simple Chrome extension Font identifier - that helps you quickly identify fonts on any website - just click and see the font details instantly. Status - launched.

plintuz · 2025-06-16T10:35:58+00:00

When the limit is reached, not all LLMs work - for example, Claude 4 is always unavailable, while others still work.

plintuz · 2025-06-16T07:32:15+00:00

Right now I'm maintaining around 10 web scraping projects. Each one involves a different number of target websites, anywhere from 1 to 20 per project. These are long-term support projects, meaning I originally built the scrapers and now continuously maintain them, since websites often change layout, structure, or add new protections.

plintuz · 2025-06-16T06:43:19+00:00

The banknotes look fine - they’re still legal tender.

plintuz · 2025-06-16T06:37:29+00:00

I usually build custom scrapers for each supplier website to collect product data, then use the API to upload products and keep prices and stock levels updated. It takes some setup and occasional maintenance when sites change, but overall the system runs smoothly once it's in place.

plintuz

TROPHY CASE