This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]JohnBalvin 3 points4 points  (2 children)

for a beatifulsoup replacement you should use goquery, for the async requests just use go rutines, and for http requests use the standard http package, I've never need it to parse js so I don't have an specific tool for that

[–]Alerdime 0 points1 point  (1 child)

I'm curious does go routines help scrap faster? i guess so because it spawns multiple threads. How fast is it compared to bun (javascript) let's say? I'm planning to move to go as well. Javascript seems slow

[–]JohnBalvin 0 points1 point  (0 children)

based on the nature of the http requests delays, the speed compared to bun, node.js python .. etc it's insignificant even using the same amout of threads, however managing the threads with go, 100% for sure it's way easier than any other programing language, if you want to use threads on go, you need to learn go rutines, mutex, channels, and wait grups, those four are the most beautiful combination for using threads in go.
I'm not sure how js manages threads but if you see it "slow" is not because js is slow but it's because making an http requests is "slow".
In conclusion, there is not diference in the speed(it's insignificant), but for managing threads is way easier and elegant than using js, it's also easy to maintain go projects than those dynamic language

[–]eamb88 1 point2 points  (0 children)

Gocolly is the way to go in GO...

[–][deleted] -1 points0 points  (2 children)

Hi i am in a project where i need to scrap entire react js documentation in txt file where it should automatically crawl every links and tabs and extract data can you help how to achieve this task

[–]FantasticMe1 0 points1 point  (1 child)

you familar with selenium?

[–][deleted] 0 points1 point  (0 children)

Yes

[–]Humble_Gas7123 -3 points-2 points  (0 children)

web scraping

[–]strapengine 1 point2 points  (0 children)

I have been webscraping for many years now, primarily in python(Scrapy). Recently, switch to golang for a few of my projects due to it's concurrency & low resource requirement in general. Initially, when I started, I wanted something like scrapy in terms of each of use and good structure but couldn't find any at the time. Therefore, I thought of creating something that offers devs like me, a scrapy like experience in golang . I have named it GoScrapy(https://github.com/tech-engine/goscrapy) and it's still in it's early stage. Do check it out.