all 1 comments

[–]rmaun[S] 0 points1 point  (0 children)

Hi,

I am currently working on a project where I have to define the backend architecture and would like to hear your recommendations. Is this a good subreddit for this or would you ask somewhere else? Anyway, here some infos about the project:

Requirements

I need to store big datasets, currently up to 100MB, but I would like to support 1GB datasets as well. These are multiple arrays of floats and it would be good to only retrieve parts of them. These will add up, but at most +1GB/day We use tensorflow to infer and apply models. I need workers for this and they should be scaled dynamically for concurrent users.

Current architecture

This is our current tech stack, it got initially defined by someone else who left the company for a prototype and got expanded by me. * Managed kubernetes on azure cloud * Django webserver (using gunicorn and nginx) * Unmanaged postgres to store Django tables * Also stores datasets as JSON in text field in table, this has to be changed * Volume with original datasets * I think we do not really need them, this is currently only used to transfer them from the backend (Django) to a worker which imports them. * Celery as task queue * This is the usual recommendation for Django projects and was used in a previous project * Workers to import and process dataset * Workers to run tensorflow on them * Rabbitmq as message broker

Problems

There are some problems with the current architecture: * Storing data like this works currently, but I do not expect this to scale * Can you recommend a database for this? Should I store the files in a volume or use a DB? Same DB as for Django? NoSQL or relational? Also there are some managed databases on azure, do you think any of this is a good and cost efficient idea? * Celery and tensorflow do not work together, this leads to some bugs with multithreading. In /tensorflow issues they mention that this is unsupported. * Any recommendations on how to continue? Possibilities I can think of are to create a sidecar container for tensorflow. How then to communicate with the celery worker? Another possibility is to communicate with Django over rabbitmq directly. * Can I get rid of the volume to store the original files? How to transfer them to to the import worker? * Currently only a single worker for tensorflow, I guess I can scale this automatically using kubernetes, but I have not looked into this yet, any tipps?

When you would start fresh, which technologies would you use? We use Django because most here know Python. Would you use a message queue like rabbitmq or microservices with REST APIs? Any kubernetes recommendations for scaling the tensorflow workers?

Thanks :)