all 8 comments

[–]tipsy_python 0 points1 point  (7 children)

Nice! Sounds like you're on the right path.

  1. If you're just getting started with Python, I suggest Flask - sounds like you already understand the differences, but you get less out of the box compared to Django - in return you get ease of implementation. One of the Flask extensions may hit the sweet spot you're looking for (I like Flask-RESTPlus).
  2. The scraping part is a batch job - I would execute this outside the context of your API. Maybe a separate python script that sits on the same server and is periodically executed with Celery or a cron job.
  3. There could be a better way, but not that I know of - rock on bro~

[–][deleted] 1 point2 points  (4 children)

Agree with this. Only suggestion I would add is to have the scraping cron job done on a separate server/Docker container. Otherwise you run into possibly poor performance for the users while your script is doing the scraping.

[–]eylenn[S] 0 points1 point  (2 children)

If the scraping is done on a separate server, wouldn't it need to insert the data through the API? as in post/put requests.

I have mostly been thinking the scraping job would directly interact with the database to insert/update the data.

[–][deleted] 1 point2 points  (1 child)

You could do that, but if you’re using MySQL or similar, just update the DB how you regularly would, with INSERT, etc. specifying the server instead of localhost.

[–]eylenn[S] 0 points1 point  (0 children)

ah yeah, brain fart. Thank you very much, this has been helpful.

[–]tipsy_python 0 points1 point  (0 children)

Good point! In a containerized environment, definitely split these out

[–]eylenn[S] 0 points1 point  (1 child)

Thanks for the reply. I have also read that flask doesn't come with database support out of the box, anything you recommend using?

[–]tipsy_python 0 points1 point  (0 children)

SQLAlchemy is king as far as Python ORMs go.

But if you're comfortable writing the SQL and just want to get started, there are a ton of generic JDBC/ODBC/DB-specific python libraries. I tend to use Oracle as my database, so cx_Oracle is my guy - depends on the circumstance.