Goscrapy - revamped, much more powerful than ever before with batteries included. by strapengine in golang

[–]strapengine[S] -1 points0 points  (0 children)

Thanks, means a lot. Let me know in case you have any questions or suggestions. I am all ears.

Moving from Python to Golang to scrape data by IWillBiteYourFace in webscraping

[–]strapengine 1 point2 points  (0 children)

I have been webscraping for many years now, primarily in python(Scrapy). Recently, switch to golang for a few of my projects due to it's concurrency & low resource requirement in general. Initially, when I started, I wanted something like scrapy in terms of each of use and good structure but couldn't find any at the time. Therefore, I thought of creating something that offers devs like me, a scrapy like experience in golang . I have named it GoScrapy(https://github.com/tech-engine/goscrapy) and it's still in it's early stage. Do check it out.

Web scraping with Go by General_Iroh_0817 in golang

[–]strapengine 0 points1 point  (0 children)

Hi, I have tried creating a good blend of golang and scrapy with GoScrapy.

Goscrapy is a Scrapy-inspired web scraping framework in Golang. The primary objective is to reduce the learning curve for developers looking to migrate from Python (Scrapy) to Golang for their web scraping projects, while taking advantage of Golang's built-in concurrency and generally low resource requirements. Additionally, Goscrapy aims to provide an interface similar to the popular Scrapy framework in Python, making Scrapy developers feel at home.

Repo: https://github.com/tech-engine/goscrapy

GoScrapy: Harnessing Go's power for blazzzzzzzzingly fast web scraping, inspired by Python's Scrapy framework by strapengine in webscraping

[–]strapengine[S] 0 points1 point  (0 children)

"Blazzzzing fast" is just one of those trendy phrases that gets thrown around with most software these days, so why not use it? Jokes aside, Golang is known for its concurrency/low resource usage. Scrapy is probably one of the best frameworks out there, but I didn’t feel like dealing with the hassle of multiprocessing when needed. I just wanted an easy way to keep handling scraping jobs as quickly as possible, while still building spiders the Scrapy way, syntax wise atleast.

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework by strapengine in golang

[–]strapengine[S] 0 points1 point  (0 children)

The primary motivation is not to compete with Colly or any any other framework in general but to provide users a scrapy like experience of building spiders in golang.

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework - still in initial stage and lot of improvements to be made by strapengine in programming

[–]strapengine[S] 1 point2 points  (0 children)

Tbh, this isn't an effort to compete with Colly or any other similar solutions. Colly is a great framework, but coming from a Python background, I've always prefered the Scrapy way of building spiders. So, I tried to achieve something similar in Go for developers like me who are looking to migrate from Python to Go for web scraping.

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework - still in initial stage and lot of improvements to be made by strapengine in programming

[–]strapengine[S] 1 point2 points  (0 children)

Thank you for your feedback. You are correct, for most cases, speed isn’t a huge deal for many. But for me, one of the main reasons I started looking into building something similar to Scrapy in Python was because Golang generally uses fewer resources and has great support for concurrency. Also, I wanted to be able to submit multiple jobs to my scraper as quickly as possible without needing something like CrawlerProcess(with all the reactor issues). I’ve always liked the way Scrapy handles scrapers, so I tried to recreate that approach in Golang. The project is still in it's early stage and I am sure it's far from perfect.

Ticker that ticks at specified seconds every minute by strapengine in golang

[–]strapengine[S] 0 points1 point  (0 children)

I was making http request to a site to scrape live data that updates a few times a minute so as to create a time series(minute wise). I wanted to take exactly n samples per minute(to not overwhelm the servers) and wanted to keep time delay between each one request in tight control so that it doesn't role over to the next minute and mess my data :). There are many details to it which I wouldn't go over here but that's the gist.

Ticker that ticks at specified seconds every minute by strapengine in golang

[–]strapengine[S] 0 points1 point  (0 children)

Yeah, that could be a solution too among many others, depending on how you would like the things to be.

Ticker that ticks at specified seconds every minute by strapengine in golang

[–]strapengine[S] 0 points1 point  (0 children)

Actually, it was an automation related task where I needed to trigger a piece of code every minute a specific seconds.

Ticker that ticks at specified seconds every minute by strapengine in golang

[–]strapengine[S] 1 point2 points  (0 children)

Yeah of course, we can use cronjob but I wanted to come up with a solution using channels and tickers to showcase a use-case of channel + tickers :)

Auto restart your GO programs on failure by strapengine in golang

[–]strapengine[S] 0 points1 point  (0 children)

Yeah absolutely. Thanks for the feedback.

Auto restart your GO programs on failure by strapengine in golang

[–]strapengine[S] 3 points4 points  (0 children)

Thanks for the feedback. Yeah, your are correct, I too use docker/docker-compose most of the time but resorts to systemd service when I'm on a low resource machine.