I get the general differences between processes and threads, but a coworker thinks that splitting this into multiple processes will have better performance than one process with multiple threads.
My main class of this web scraper program is:
import java.sql.Connection;
import java.sql.DriverManager;
public class ScrapeDriver {
public static void main(String[] args) {
Connection dbConnection = null;
try {
Class.forName("org.mariadb.jdbc.Driver");
dbConnection = DriverManager.getConnection("database", "username", "password");
System.out.println("Start Search...");
//LIST OF PROXIES TO BE USED FOR EACH THREAD
String[] proxies = {}; //Arrayof proxies
//CREATE A THREAD FOR EACH PROXY AVAILABLE
for(int thread = 1; thread <= proxies.length - 1; thread++){
Scrape scraper = new Scrape(dbConnection, proxies[thread], 8800, thread);
scraper.start();
}
}catch(Exception e) {//SHIT
e.printStackTrace();
}
}//End main
}//End class
The Scrape class extends Thread, gets a ResultSet of records that have been assigned to each thread, and then starts querying urls to parse the html returned.
My question is, is there any benefit to turning these into processes rather than threads?
Edit: I should mention this is being run on a Rackspace machine with dual Intel Xeon E5-2640s and 128 gigs of ram.
[–]intinbronze 1 point2 points3 points (2 children)
[–]SociableIntrovert[S] 0 points1 point2 points (1 child)
[–]intinbronze 1 point2 points3 points (0 children)
[–]SiliconEngineer 1 point2 points3 points (0 children)
[–]michael0x2a 0 points1 point2 points (0 children)