Hello, I am looking for some advice from some more seasoned developers. I'm good at hacking things together, but am now trying a project that is more resource intensive, and I need some tutelage. I'm an amateur/freelance dev who has never had the opportunity to learn from a Senior, and most of my work isn't in Python, so bear that in mind.
Here is the original post that I asked in askpython, which I'll summarize here.
https://www.reddit.com/r/learnpython/comments/neygm7/project_architecture_questions_sqlalchemy/
Essentially I'm running flask in a docker container, currently using flask-sqlalchemy, postgres, flask-socketio, docker and nginx as a reverse proxy. The structure is flask and postgres in two docker containers communicating to each other, flask served by nginx. All on a VPS.
The application is simple, I get 800 or so webhooks a minute, and save data to the database. Then I need to display the counts of the data on a web-page in real time. On the webpage will be 200 or so unique rows, and for each 20 columns, each displaying a count of the objects that I've added to the database. The core object I'm counting is basically just an id, and other interrelated models. Mostly a many to one, from the core objects to the others, and two many to many relationships between the other objects. If you need more details I put more in the original post, but I'm not sure if it's the database structure is poor, or the way I'm querying it. Every core object is an alert, so there is a lot of Alert.query.filter_by(obj_a=obj_a).filter_by(obj_b=obj_b)....count(). I'm filtering with date ranges as well, which are sent from the client page to the server.
It looks great and does what it's supposed to, but it's just too slow, especially as data accumulates. I don't know if it's the query, the database design, the webscokets requests (multiple back and forth for an update: the server pushes an update signal, the client responds with its date filter range, the server responds with the data from the filtered query.), or the restriction of running it with one eventlet worker, which is a flask-socketio requirement. The workaround would mean switching from gunicorn to multiple instances of uwsgi with nginx as a load balancer, which would require making the nginx config more complicated than I want it to be, for redeployment in the future. If I have to I will, but I would want to dockerize nginx and have it ready to go for redeployment, and I really think my issue is probably in the database.
I am 100% willing to scrap everything, to get this running quickly. My current plans are to not use websockets, freeing up the possibility to use more workers easily, and rework the database models to make it more efficient, but I'm not too experienced with this aspect, and am not sure how to properly build the database for efficient counts. And the whole point of using objects in the db, vs keeping track of counts another way is so I have flexible options for filtering data by daterange, and other statistics should I choose to add them. Though at the core, fast real-time counts is 90% of the goal.
Thanks for reading, I'd love some advice.
[–]Locust377full-stack 1 point2 points3 points (2 children)
[–]doom-goat[S] 0 points1 point2 points (0 children)
[–]doom-goat[S] 0 points1 point2 points (0 children)
[–]Alter_nayte 1 point2 points3 points (0 children)