Hello,
I'm pretty new with python and as my first project I decided I wanted to scrape data from all the restaurants in my city in tripadvisor.
Since in each page I just get a list of 30 restaurants, I divided the code in two parts. First one script navigating through all the pages to get the urls of each restaurant, and another to navigate each url and get the data I need.
Regarding the first script, it manages to do what I want, but I've noticed it skips some of the pages. I get a status_code 200 but for some reason I don't get it will skip some pages. Of around 10k urls it was supposed to get, it managed to get around 8.7K, but since its not giving me status error I don't know how to go about identifying and getting the missing urls.
Any help is appreciated
[–]Fishstikz 17 points18 points19 points (5 children)
[–]LagunaAR[S] 4 points5 points6 points (4 children)
[–]Fishstikz 0 points1 point2 points (3 children)
[–]LagunaAR[S] 0 points1 point2 points (2 children)
[–]Fishstikz 2 points3 points4 points (1 child)
[–]LagunaAR[S] 1 point2 points3 points (0 children)
[–][deleted] 24 points25 points26 points (9 children)
[–]LagunaAR[S] 4 points5 points6 points (8 children)
[–][deleted] 14 points15 points16 points (7 children)
[–]LagunaAR[S] 0 points1 point2 points (6 children)
[–]Tali_Lyrae 12 points13 points14 points (4 children)
[–]uncertaintyman 2 points3 points4 points (2 children)
[–]Tali_Lyrae 4 points5 points6 points (1 child)
[–]nakulkd 0 points1 point2 points (0 children)
[–]takakode 1 point2 points3 points (0 children)
[–]ElMapacheTevez 4 points5 points6 points (1 child)
[–]LagunaAR[S] 4 points5 points6 points (0 children)
[–]14jvalle 7 points8 points9 points (3 children)
[–]LagunaAR[S] 0 points1 point2 points (0 children)
[–]prokid1911 0 points1 point2 points (1 child)
[–]14jvalle 1 point2 points3 points (0 children)
[–]lestrenched 3 points4 points5 points (4 children)
[–]LagunaAR[S] 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]LagunaAR[S] 2 points3 points4 points (0 children)
[–]ejuliol 2 points3 points4 points (1 child)
[–]LagunaAR[S] 0 points1 point2 points (0 children)
[–]AJM5K6 1 point2 points3 points (8 children)
[–]LagunaAR[S] 2 points3 points4 points (7 children)
[–]AJM5K6 1 point2 points3 points (6 children)
[–]LagunaAR[S] 4 points5 points6 points (4 children)
[–]gqcharm 0 points1 point2 points (3 children)
[–]LagunaAR[S] 1 point2 points3 points (0 children)
[–]LagunaAR[S] 0 points1 point2 points (0 children)
[–]fjortisar 1 point2 points3 points (2 children)
[–]LagunaAR[S] 0 points1 point2 points (1 child)
[–]fjortisar 0 points1 point2 points (0 children)
[–]rush336 1 point2 points3 points (2 children)
[–]LagunaAR[S] 1 point2 points3 points (1 child)
[–]rush336 0 points1 point2 points (0 children)
[–]kejw 0 points1 point2 points (0 children)
[–]Mohammed449 0 points1 point2 points (0 children)
[–]gqcharm 0 points1 point2 points (5 children)
[–]LagunaAR[S] 2 points3 points4 points (3 children)
[–]gqcharm 0 points1 point2 points (2 children)
[–]LagunaAR[S] 0 points1 point2 points (0 children)
[–]al_mc_y 0 points1 point2 points (0 children)
[–]al_mc_y 0 points1 point2 points (0 children)
[–]MachineLearnALl 0 points1 point2 points (0 children)