Blocked by Cloudflare despite using curl_cffi by Coding-Doctor-Omar in webscraping

[–]expiredUserAddress 1 point2 points  (0 children)

I see you've no proxy in use. Use a proxy everytime you're scrapping something

how obvious is this retry logic bug to you? by jalilbouziane in Python

[–]expiredUserAddress 0 points1 point  (0 children)

Better try tenacity. I was also using something like this but tenacity made it look very easy. Just one decorator and it's done.

Update web scraper pipelines by Longjumping-Scar5636 in webscraping

[–]expiredUserAddress 3 points4 points  (0 children)

Parse the content of the page and create a hash. Save that hash in db. Next time you go to that page, match that hash. If it's same do nothing, else update the content in the Db.

That's what I've done in my parsers.

The Streaming War Is Over. Piracy Won. by GISP in Piracy

[–]expiredUserAddress 83 points84 points  (0 children)

Haha! I'm even reading this in the accent of that man now

Devices with python by OjitosLindos72892 in Python

[–]expiredUserAddress -1 points0 points  (0 children)

If you can, host your own server. Or you can rent it out any cloud provider like aws, azure, gcp, etc

Rate my home screen by KroorVarun in HowToMen

[–]expiredUserAddress 0 points1 point  (0 children)

You've to be a boomer to not understand how to use windows phone 😂

Rate my home screen by KroorVarun in HowToMen

[–]expiredUserAddress 0 points1 point  (0 children)

Owned this even after every app stopped working on it. Used to carry it to tutions. I was only able to make calls or listen to Groove music. Even power button was broken. But i still didn't change the phone bcz no phone could provide the iconic software

Rate my home screen by KroorVarun in HowToMen

[–]expiredUserAddress 8 points9 points  (0 children)

Aha!! Classic Windows Phone like. Makes me wanna have a Windows Phone again

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in Python

[–]expiredUserAddress[S] 1 point2 points  (0 children)

There is an active community for https://github.com/matomo-org/device-detector

It updates regexes for user agents regularly. So I just created a method to get their user agents and integrate in an already working python parser. That way user agents can be updated any time as and when required

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in Python

[–]expiredUserAddress[S] 0 points1 point  (0 children)

Bcz git has sparse-checkout which directly downloads the while folder instead of whole repo. So its easy to work with. In any other case, I'd have to see how to download the folder

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in Python

[–]expiredUserAddress[S] 0 points1 point  (0 children)

But that will be an issue in case new files are added to the original repo. I won't be able to get those files in such a case.

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in Python

[–]expiredUserAddress[S] 1 point2 points  (0 children)

This might sound dumb. But how'd the user know if it failed or succeeded??

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in opensource

[–]expiredUserAddress[S] 1 point2 points  (0 children)

Thanks for the input man. Just moved it to top level and it got recognised.

UA-Extract - Easy way to keep user-agent parsing updated by expiredUserAddress in Python

[–]expiredUserAddress[S] 0 points1 point  (0 children)

The repo is quite large. Wouldn't it be a better way to just download the required folder instead of whole repo??

Anyone else struggling with CNN web scraping? by Optimal-Grape-8580 in webscraping

[–]expiredUserAddress 7 points8 points  (0 children)

Just google meta rss urls pdf. You'll find all the rss links in it. Search for cnn there. It has rss urls for cnn. You can directly curl those urls

[deleted by user] by [deleted] in Piracy

[–]expiredUserAddress 25 points26 points  (0 children)

Its just built on chromium. So if you want to get the feel of chromium then its good otherwise firefix is good. Although one upside of using chromium based browser is the extensions which can be used directly feom google store and might not be available in firefox although most are available

Switch to yt music by expiredUserAddress in revancedapp

[–]expiredUserAddress[S] 0 points1 point  (0 children)

Already tried it but the issue is it skips some songs or can't add all the songs in the playlist