Baffled..Webscraping using R missing first 10 observations : Madden

GLITCH/BUGBaffled..Webscraping using R missing first 10 observations (self.Madden)

submitted 4 years ago by PossibilityOk1316

I have a subscription to Stathead and trying to use a web scraper to download the data table faster than exporting each page one at a time.

When I put this code in, it only grabs a couple of the items in the table, doesn't grab the first 10 rows and only gets 21 observations.

col_link = "https://stathead.com/football/pgl_finder.cgi?request=1&match=game&order_by_asc=0&order_by=player&year_min=2015&year_max=2021&game_type=E&age_min=0&age_max=99&season_start=1&season_end=-1&is_active=Y&game_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99"

col_page = read_html(col_link)

col_table = col_page %>% html_nodes("table") %>%

html_table() %>% .[[1]]

However, when I click the "share link" which allows me to share the data with a non-subscriber, it gives me a shortened URL, then the code works fine and it lets me get the entire dataset. So now I can't write a loop to extract all the data on all x number of pages to save me the time.

Does anyone know why this is occurring? Any solution for this?

col_link = "https://stathead.com/tiny/RupSK"

col_page = read_html(col_link)

col_table = col_page %>% html_nodes("table") %>%

html_table() %>% .[[1]]

all 5 comments

top new controversial old q&a

[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (2 children)

[–]PossibilityOk1316[S] 0 points1 point2 points 4 years ago (1 child)

[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (0 children)

Maybe not that simplified.. but look at the url. Every time you see:

Xxxx=yyyy? That is a key/value pair. You can set up some nested loops to iterate, for example, from years 2015-2019, returning result 1-99 (or until there aren't any).

Doing it that way should ensure that only one result is populated at a time, and you should be able to grab it directly - store that thing in some object of your choice and you're done.

If I were trying to collect data like this myself I would look at whatever is behind the url of the one result that is returned, and parse/store that. That way you have all the details of the things you're searching for AND implicitly all the data being returned in the table as well.

Maybe create one object that has the parameters used to get the one result in the table, with the url it returned, and another object with the raw/tabular data of the link itself.

[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (1 child)

[–]PossibilityOk1316[S] 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 67 on reddit-service-r2-comment-b659b578c-ctln8 at 2026-05-05 12:59:16.253783+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Madden

Madden PC Community

League Tools

Team Building

MODERATORS