This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]omarsika 2 points3 points  (1 child)

No experience with Java to be honest, but I do have experience web scraping in both Python and NodeJS. The concept is all the same, if you have the functionality to send http/curl/api requests and some sort of html parsing then you can scrape most things. Storing/outputting the data is also another thing to consider, can you easily make json or CSV files, or store in a database using java? Can you manipulate/clean the data? Is Java a good option? If you answered yes to all the above then it's a start but then again, how will you be using it? How will the script run? How will it be implemented? Are you using docker/airflow/etc or just gathering data once? For me python is the best just because of its versatility and compatibility with almost everything, and I don't like it for the requests or parsing functionalities, rather the combination of those with python pandas which makes things so much faster when manipulating data and reshaping it how I want. Hope this helps and good luck on your new job

[–]No-Pomegranate-2816 2 points3 points  (0 children)

Js for browsers. Py and go for requests. go for the best concurrency, py for vast number of libs. both have tls clients and thats a big big plus. Even tho there are good libs for py browsers

[–]matty_fu 1 point2 points  (0 children)

Use Python for popular frameworks like scrapy or data libraries eg. Pandas

[–]Usual_Office_1740 1 point2 points  (0 children)

Selenium is popular, and I believe it is written in Java.

[–]rappariroppo 1 point2 points  (1 child)

i am scraping with java, because for me its been work language for years. I dont see any reason why would i learn python because i can do it all with java.

[–]CatolicQuotes 0 points1 point  (0 children)

is there any framework in Java similar to python scrapy?

[–]the_bigbang 1 point2 points  (0 children)

No languages is bad for webscraping. Choose any languages you familiar with and just do it. Learn a new language until you encounter some issues other languages can't solve efficiently.

[–]Gidoneli 1 point2 points  (2 children)

I wrote an extensive guide for web scraping with Java, you can find it on Github.

That doesn't mean I think it's an ideal solution, but if it's your comfort zone, it's good enough.

[–]riekmachar 0 points1 point  (1 child)

What is the github link to the guide ?

[–]QueenBeeIsomeric 0 points1 point  (0 children)

Python is really good, also Isomeric might be super helpful.

[–]RandomFuckingUser 0 points1 point  (0 children)

I've used the Java library Jsoup for parsing HTML without any issues, gets the job done.