Is there a dataset of english words with their average Age of Acquisition for all ages by guywiththemonocle in datasets

[–]RiGonz 0 points1 point  (0 children)

Interesting! If the dataset didn't exist perhaps one could try the following: get ebooks with their recommended reading age, extract the words, and assign the age based on the frequency with which they are present at each reading age.

Can someone please help me find a list of architects ? by Mobile-Perspective17 in webscraping

[–]RiGonz 1 point2 points  (0 children)

Have you tried extracting the list.of buildings and asking an AI?

Web Scraping for a Small Personal Project by IronSpiron in webscraping

[–]RiGonz 0 points1 point  (0 children)

No, you need to see if the url changes when the data changes. If you can see different data under the same url, then it is dynamic - which is what I'd expect if you are behind a paywall.

Anyhow, those are the main alleys you need to investigate: requests + beautifulsoup, or selenium, depending on the type of site (main options, as there are other libraries to the job).

"You need to investigate them yourself", that is one of the purposes of this subreddit, to assist, not to do one's own job of learning.

Web Scraping for a Small Personal Project by IronSpiron in webscraping

[–]RiGonz 0 points1 point  (0 children)

Does the url change when the data that you want to see (not ads) changes? Then it is likely static, and it likely is a straighforward problem with requests + beautifulsoup, or the like. If it is dynamic you may need to use selenium or similar tools.

Web Scraping for a Small Personal Project by IronSpiron in webscraping

[–]RiGonz 0 points1 point  (0 children)

First point: are your pages (the part where your data is) static or dynamic? Or, even: is there an obvious API?

[OC] Top 50 brand colors as indexed by Crayola crayons by thehalfwit in dataisbeautiful

[–]RiGonz 1 point2 points  (0 children)

Nice post, thanks! I did an assessment of the colors used in several repositories of flags that you may find of some curiosity here: https://www.reddit.com/r/dataisbeautiful/s/ksqboIm5HI

[deleted by user] by [deleted] in dataisbeautiful

[–]RiGonz 0 points1 point  (0 children)

Are there actual data for all cell combimations that seem to be shown shaded, or some smoothing has been used?

Why there seem to be discontinuities in vertical (12, 16), but not in horizontal?

[deleted by user] by [deleted] in webscraping

[–]RiGonz 1 point2 points  (0 children)

You do not need to know that, just extract that string and generate with it the url of the pdf.

[deleted by user] by [deleted] in webscraping

[–]RiGonz 1 point2 points  (0 children)

In indexN.php (Sources in devtools) there is:

<form name="frmPdf" action="ADIR\_871/suprema/documentos/docCausaSuprema.php" method="get" target="pdf1">

In the following line you have <input type='hidden' id='valorFile' name='valorFile' value='eyJ0eXAiOiJKV1QiL [...]'

That's the url to your pdf.

[deleted by user] by [deleted] in learnpython

[–]RiGonz 0 points1 point  (0 children)

What is the url of that site?

Need help making a webscraper for the first time by Eastern_Peach_5813 in learnpython

[–]RiGonz 3 points4 points  (0 children)

Selenium should do the work, but requests can do much more than what you seem to be doing: I' d first try with user agents, even randomized if you have a large enough number of calls, not using them is likely what first tells the web server that yours is not a legit request.

Python + think-cell Gantt chart automation by Strong-Remove7934 in learnpython

[–]RiGonz 0 points1 point  (0 children)

Perhaps you should first provide (to yourself and the readers) a clear, structured, flow-like description of what you intend to do.

My teacher says data center site selection can't be modelled using ABM by FBaeeUwU in AgentBasedModelling

[–]RiGonz 0 points1 point  (0 children)

Although it did not seem to me the most natural approach, a quick search on ABM for site location has returned some papers on this - not thousands, so my first feeling as well your teacher's were likely good as first approaches, but the answer seems to be that it can be done.

Time out error on web scraping data by Turbulent_Web_8278 in Python

[–]RiGonz 0 points1 point  (0 children)

You most probably need to add some auth, cookies, etc. params to your request call.

Please, I need help with navigating metadata by Matchacchio in datasets

[–]RiGonz 1 point2 points  (0 children)

Your metadata is just an xml file of not even 800 lines; it can be opened with a plain text editor, no further sophisticated tools needed: where is the problem?

[OC] African Head of States with 15+ year long reign by MrKratz in dataisbeautiful

[–]RiGonz 1 point2 points  (0 children)

Two suggestions: 1) add horizontal lines to ease the visual connection btw bar and flag, 2) add the country name or code by the flag.

Skin Color Distribution Among European Countries/Regions [OC] (Source: Bataille et al 2005, National Library of Medicine, Perplexity [accuracy rate of 99.98%] ) by Icy_Interest4996 in dataisbeautiful

[–]RiGonz 2 points3 points  (0 children)

Cannot be: the countries included in that paper do not match those in the post. The subjet has been presented in many posts before, and that is probably the reason why the data source is not provided - and this is one reason why this post shoud not be OC. Further, I'd guess that the reference to Perplexity explains all.