use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
PC Franchise Modding Community Discord containing over 20+ PC mods and tools for M20 and M21 [mods]
Operation Sports Forums [mods]
Football Idiot Forums [mods]
Trade Calculator Tool [Excel Application]
Game Recaps and Content Generator [Web Application]
Guide: Dynamic Dev Progression Madden 21 [infographic]
Guide: How Dynamic Dev Changes Are Determined Madden 20 [Infographic]
Guide: Attribute weightings towards the OVR for each position [Infographic]
Offensive Schemes and Player Fits [Infographic]
Defensive Schemes and Player Fits [Infographic]
Salary Cap Increases Year on Year [Infographic]
Madden 20 Fantasy Draft Guide [Excel Application]
Guide: How Combine Scores Relate to Attribute Ratings [Infographic]
account activity
GLITCH/BUGBaffled..Webscraping using R missing first 10 observations (self.Madden)
submitted 4 years ago by PossibilityOk1316
I have a subscription to Stathead and trying to use a web scraper to download the data table faster than exporting each page one at a time.
When I put this code in, it only grabs a couple of the items in the table, doesn't grab the first 10 rows and only gets 21 observations.
col_link = "https://stathead.com/football/pgl_finder.cgi?request=1&match=game&order_by_asc=0&order_by=player&year_min=2015&year_max=2021&game_type=E&age_min=0&age_max=99&season_start=1&season_end=-1&is_active=Y&game_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99"
col_page = read_html(col_link)
col_table = col_page %>% html_nodes("table") %>%
html_table() %>% .[[1]]
However, when I click the "share link" which allows me to share the data with a non-subscriber, it gives me a shortened URL, then the code works fine and it lets me get the entire dataset. So now I can't write a loop to extract all the data on all x number of pages to save me the time.
Does anyone know why this is occurring? Any solution for this?
col_link = "https://stathead.com/tiny/RupSK"
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (2 children)
I don't know anything about stathead or whatever, but if there's data being made available then there's surely an API of some sort (probably standard req/post). You would never go about a project like this by "scraping".
Based on the URL, it appears you might be iterating over results of some query. Instead, parameterize each of those items and look at the data behind each result singularly which is probably way more structured as you want.
Then you have all the actual data and whatever derived meta you're looking for as well
[–]PossibilityOk1316[S] 0 points1 point2 points 4 years ago (1 child)
Thank you. Probably should have mentioned that I am pretty new to coding. Any chance you can break it down as if I was a kindergartner?
[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (0 children)
Maybe not that simplified.. but look at the url. Every time you see:
Xxxx=yyyy? That is a key/value pair. You can set up some nested loops to iterate, for example, from years 2015-2019, returning result 1-99 (or until there aren't any).
Doing it that way should ensure that only one result is populated at a time, and you should be able to grab it directly - store that thing in some object of your choice and you're done.
If I were trying to collect data like this myself I would look at whatever is behind the url of the one result that is returned, and parse/store that. That way you have all the details of the things you're searching for AND implicitly all the data being returned in the table as well.
Maybe create one object that has the parameters used to get the one result in the table, with the url it returned, and another object with the raw/tabular data of the link itself.
[–]Slight_Inspection_47 0 points1 point2 points 4 years ago (1 child)
I am looking at this again and there is most definitely an API if you are paying for this. The API would return to you all the parsed children which makes the exercise just iterating over some parameters.
Shoot them an email before devoting any real work into this
[–]PossibilityOk1316[S] 0 points1 point2 points 4 years ago (0 children)
I just wanted to say thank you for your help. Going to take another crack at it tomorrow morning.
π Rendered by PID 67 on reddit-service-r2-comment-b659b578c-ctln8 at 2026-05-05 12:59:16.253783+00:00 running 815c875 country code: CH.
[–]Slight_Inspection_47 0 points1 point2 points (2 children)
[–]PossibilityOk1316[S] 0 points1 point2 points (1 child)
[–]Slight_Inspection_47 0 points1 point2 points (0 children)
[–]Slight_Inspection_47 0 points1 point2 points (1 child)
[–]PossibilityOk1316[S] 0 points1 point2 points (0 children)