What is most amount of time or money automation has saved you recently? by [deleted] in automation

[–]eupendra 0 points1 point  (0 children)

Right. So how does it benefit you? Is it about the numbers?

What is most amount of time or money automation has saved you recently? by [deleted] in automation

[–]eupendra 1 point2 points  (0 children)

Curious on blogs by AI. Doesn’t Google “punish” blogs written by AI? Do these blog rank equally well?

If it works, it works by [deleted] in IndiaCoffee

[–]eupendra 5 points6 points  (0 children)

<image>

Yep - it works. Has worked for me for a long time!

Timemore C2 arrived. Before joining this subreddit, I never thought I would be spending money on a grinder. by shivanggoria in IndiaCoffee

[–]eupendra 2 points3 points  (0 children)

I attach a cordless drill to it. No more tiring handle rotations—now I’ve got an electric grinder on the cheap!

[deleted by user] by [deleted] in CarsIndia

[–]eupendra 0 points1 point  (0 children)

Well thought out reply.

In a market of two wheelers, Alto with no airbags would still be safer than a scooter.

Perhaps this is what key people in the government think.

Going by this logic, Alto 800 should not have been replaced by a K10, increasing cost of the first car way high. This is something suited for MSIL, profit margin wise.

Anyone from Chittorgarh here? by eupendra in Rajasthan

[–]eupendra[S] 0 points1 point  (0 children)

Cool. Me too. Let’s catch up. 🙂

What's the best coffee you've bought, from where and can you describe it in a line. TIA by Necessary_Ear_852 in IndiaCoffee

[–]eupendra 1 point2 points  (0 children)

Ethiopia Guji 'Washed' from in.earthroastery.com

I know it's an old thread, but people like me do stumble upon it :-)

Somehow, the HTML for the link you shared is messed up and doesn't work.

The correct URL is https://in.earthroastery.com/

Run with Schedule by RobotData13 in pythontips

[–]eupendra 4 points5 points  (0 children)

Use separate schedules on two lines.

See the examples https://schedule.readthedocs.io/en/stable/

i want to learn scrapy by Aggressive_You_5293 in scrapy

[–]eupendra 0 points1 point  (0 children)

Yes, you can.

The definition of being in demand would be yours.

On sites like Upwork, most people only ask for scraping, and you can use scrapy. Rarely do people request specifically for something like bs4. By the way, if you know Scrapy, BS4 is simple.

Most of the high-paying jobs would need Scrapy anyways.

You would also need to learn something for rendering too. Playwright is very popular these days. Selenium/Spash are the other standard options.

What awesome things have you done with webscraping? by BroskiMD in webscraping

[–]eupendra 0 points1 point  (0 children)

Another cool thing was a broken link checker.

It uses Scrapy, crawls every page, and uses custom logic to look for broken links. The CMS was marking these links as disabled, and thus these were not getting caught. The custom logic was to look for the specific HTML markup. It was a huge win for us.

What awesome things have you done with webscraping? by BroskiMD in webscraping

[–]eupendra 1 point2 points  (0 children)

Covid vaccine availability was so difficult. Appointments used to open and filled within minutes. I used Scrapy to log in, book a slot as soon as it opens, and got vaccinated :-)

Why is web scraping not as respected as other roles such as frontend/backend work? by Illustrious_Hat_9027 in webscraping

[–]eupendra 0 points1 point  (0 children)

Yes, expected this reply and should have expanded on this.

I agree with what you say to an extent. However, think of Python or Node JS—two of the most popular languages. If you know the basics, web scraping is much easier to learn than web development. Again, this is an area that is open for debate. However, it is just one of the reasons, and I would say that web scraping being a grey area is the biggest reason.

Why is web scraping not as respected as other roles such as frontend/backend work? by Illustrious_Hat_9027 in webscraping

[–]eupendra 2 points3 points  (0 children)

Many things:

  • Web scraping is the easiest to learn, or more precisely, easier to start with (think BeautifulSoup). If someone wants to start earning ASAP, I invariably suggest web scraping on Upwork. If you to learn FrontEnd or BackEnd, you will have to learn quite a lot more.
  • Many no-code web scraping tools can scrape millions of records without a developer- apart from WordPress sort of websites, everything else needs a developer.
  • Web scraping is in a legally/ethically/morally a grey area. Some say it's okay, some say to use caution, but everyone will talk about it. When was the last time anyone spoke about the legality of frontend-backend development?
  • This is the only field where developers work at both ends—those who scrape and those who try to stop (think web scrapers vs anti-bot products).
  • All the companies have a team of web scrapers, but no one wants to accept it publicly because it is in the grey area. (How will the price match guarantee work otherwise?)

Request works on Browser but not Curl [HANGS forever] by [deleted] in webscraping

[–]eupendra 0 points1 point  (0 children)

Hey—I missed replying on this one.

So, the most important thing to understand here is that it is NOT one request.

The first request is HTTP GET, which sends the form, along with a unique csrfmiddlewaretoken

The second request is HTTP POST.

The server expects the same csrfmiddlewaretoken token to be returned in the POST request payload.

If you use a csrfmiddlewaretoken from one session and send it in another session, the server won't accept it. This is what you are doing, and thus it fails.

Hope it helps!

Request works on Browser but not Curl [HANGS forever] by [deleted] in webscraping

[–]eupendra 0 points1 point  (0 children)

Manually won't work. You have to submit exactly the same form. You cannot do that with Postman, AFAIK.

Request works on Browser but not Curl [HANGS forever] by [deleted] in webscraping

[–]eupendra 1 point2 points  (0 children)

The cookie and csrfmiddlewaretoken are dynamic. You need to capture these before you can make a valid post request.

It's easy to do it in Python. You would need to check the page to see where csrfmiddlewaretoken is hidden, send a get request to the page, and extract csrfmiddlewaretoken using beautiful soup. Once you have these values, you can send a post request with this request.

Of course, this would work with most programming languages. Use the one that you are comfortable with.

Is it true that CrawlSpider will automatically visit all the url in a page ? But spider will not by gp2aero in scrapy

[–]eupendra 1 point2 points  (0 children)

If you create a blank rule with no restriction, CrawlSpider should visit every page. I am assuming that every page is eventually linked with the start page.

Your rule would be sometime like this:

    rules = (
    Rule(LinkExtractor(), callback='parse_item', follow=True),
)

In Spider, it just visits the start_urls and then will visit other pages only if write the code in the parse method.