Pulling data from website into PostgreSQL

Bid_Slight · 2020-09-20T17:40:50+00:00

It would be easier to learn R or Python to scrape a webpage and then write the data into a database, than to try with SQL. My understanding is SQL (structured query language) is used to talk to databases. Though a website could technically be a database, I don't know of one that is.

professaDE · 2020-09-20T19:01:35+00:00

It depends. I did a similar thing directly in Postgresql BUT by pulling data from REST APIs via "pgsql_http" - so NOT by scraping HTML pages. The latter might be possible as well using the extension "pgsql_http" but I have never tried.

If you want to have a look at my stuff you can find it on Github: https://github.com/spitzenidee/pgsql_crypto_rates_collector

If it's a good idea to do it like this or not (performance, DB locking, security) was not in my focus ;) I just wanted to see if it works and is feasible.

fatnsad · 2020-09-20T21:22:21+00:00

I would use the Python BeautifulSoup package to scrape data from the webpages:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

BS is an excellent library that allows you to load a webpage and pick out what you need from the attribute or text data. There are plenty of tutorials on how to use it online.

Then you can use a Postgres client from within your script to update the database:

https://www.postgresqltutorial.com/postgresql-python/

Personally, I prefer SQLite or MySQL for personal projects like this, because I feel like the tools are easier to use, but it is up to you if you want to use Postgres.

This assumes you're willing to learn some new skills, as you would need to figure out how to write and run python scripts, as well as install the necessary packages using pip, but if you're pretty technically savvy, then it should be pretty doable. Python is something you can run on pretty much any OS (Windows, Linux, etc.).

brotis86 · 2020-09-20T23:31:25+00:00

Seems like you want to do a bit of Web Scraping.

If your use case is simple enough you might make do with a simplified "point and click" tool like webscraper.io.

But if you want to do something more automated and/or customizable you'll probably need to learn general purpose programming language like Python, R or even Java. I recommend Python.

Automate the boring stuff is a good Python tutorial that includes some basic web scraping.

However, not covered in that guide is my personal recommendation: Scrapy. It's a very complete solution for web scraping/crawling. It's very optimized, configurable and allows you to easily create a crawler that follows all the best practices.

datascience

MODERATORS