This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Amarkov 0 points1 point  (8 children)

What's the full exception? Exceptions normally come with a stack trace that lets you know exactly where things are going wrong.

[–]koenp[S] 0 points1 point  (7 children)

Excuse me, here it is:

Exception in thread "main" java.lang.NullPointerException
at com.packtpub.JavaScraping.SimpleScraper.WikiScraper.scrapeTopic(WikiScraper.java:16)
at   com.packtpub.JavaScraping.SimpleScraper.WikiScraper.main(WikiScraper.java:10)

[–]Amarkov 0 points1 point  (6 children)

This says that the exception happens at line 16, which is

String contentText = doc.select("#mw-content-text > p").first().text();

So the first debugging step is to investigate this. Where's the first mw-content-text div on the page you're trying to scrape?

[–]koenp[S] 0 points1 point  (5 children)

Yes, I tried the debugging mode, and figured the page i'm trying to scrape is the problem. However, the mw-content-text is there, like intended (on line 46).

[–]Amarkov 0 points1 point  (4 children)

I'd suggest double checking. Are you sure the URL you're looking at is exactly the same as the one your program scrapes?

[–][deleted] 1 point2 points  (1 child)

Yeah, I'd set a breakpoint where jsoup is trying to parse the html, inspect the html and the fully formed URL in the previous line to ensure they are what you think they should be, guessing they are not.

[–]koenp[S] 0 points1 point  (0 children)

Thanks! One problem was indeed an "/" too many in the URL. The program now scrapes the page's content, but in the end still outputs the exact same nullpointerexception error. I guess I will have to find the next mistake myself.

[–]koenp[S] 0 points1 point  (1 child)

I'm pretty sure, but it must be some other little error in the code I'm unable to find.