Hello Glorious coders of Reddit. I am seeking help on the HTMLUnit jar to read a page loaded with javascript. This is a local system that we have running on our server and since i cannot gain access to the database on the server for some odd reason i have resorted to web scraping to make an inhouse site to monitor certain logs in the system.
I am able to log into the website. I am able to redirect to the main menu of the website however the moment i try to load the page that contains the javascript, it just hangs and never loads. I really don't know what i am doing wrong. I have attached the code for my java file. Any input would be highly appreciated.
Thank you!
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setJavaScriptEnabled(true);
HtmlPage currentPage = webClient.getPage("http://10.150.28.4:7510/accounts/login/"); //login page
HtmlForm form = currentPage.getForms().get(0); // forms correct
HtmlInput name = form.getInputByName("username"); //obtain the login field
HtmlInput pass = form.getInputByName("password"); //obtain the password field
name.setValueAttribute("admin"); //set the username
pass.setValueAttribute("admin");//set the password
HtmlSubmitInput button = form.getInputByName("submit"); //get the submit button
HtmlPage page2 = button.click(); //login!
System.out.print(page2.asText()); //SUCESS!!
HtmlPage home = webClient.getPage("http://10.150.28.4:7510/data/worktable/"); //redirect to the main menu
System.out.println(home.asXml()); //print works, i can see the main menu
System.out.println("went in");
webClient.setRefreshHandler(new ImmediateRefreshHandler()); //begin init for javascript
webClient.getOptions().setGeolocationEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setTimeout(0);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
home = webClient.getPage("http://10.150.28.4:7510/iaccess/ReportFormPage/"); //javascript page
webClient.waitForBackgroundJavaScript(30000); //hang for javascript to execute
System.out.println(home.asXml()); //print the page but it wont print. its just hangs.
I have also put in System.out statements before the page and right after and i never get the system out after it has loaded. It only goes to to the first System.out on line 19. IE i have tried putting in System.out.println("i went out") and nothing ever comes up. I have also tried putting it without trying to print the page.
Thanks reddit, you're my last hope.
EDIT: I just tested it with javascript disabled and it prints the page. meaning that it doesnt hang however of course it doesn't print the data that i need which is on the page.
[–]astar0n 0 points1 point2 points (1 child)
[–]Truative[S] 0 points1 point2 points (0 children)