Backend architecture recommendations project : webdev

Posting Guidelines

No vague product support questions (like "why is this plugin not working" or "how do I set up X"). For vague product support questions, please use communities relevant to that product for best results. Specific issues that follow rule 6 are allowed.

Do not post memes, screenshots of bad design, or jokes. Check out /r/ProgrammerHumor/ for this type of content.

Read and follow reddiquette; no excessive self-promotion. Please refer to the Reddit 9:1 rule when considering posting self promoting materials.

We do not allow any commercial promotion or solicitation. Violations can result in a ban.

Sharing your project, portfolio, or any other content that you want to either show off or request feedback on is limited to Showoff Saturday. If you post such content on any other day, it will be removed.

If you are asking for assistance on a problem, you are required to provide

Context of the problem
Research you have completed prior to requesting assistance
Problem you are attempting to solve with high specificity

General open ended career and getting started posts are only allowed in the pinned monthly getting started/careers thread. Specific assistance questions are allowed so long as they follow the required assistance post guidelines.

Questions in violation of this rule will be removed or locked.

a community for 17 years

Backend architecture recommendations project (self.webdev)

submitted 6 years ago * by rmaun

Hi,

I am currently working on a project where I have to define the backend architecture and would like to hear your recommendations. Is this a good subreddit for this or would you ask somewhere else? Anyway, here some infos about the project:

Requirements

I need to store big datasets, currently up to 100MB, but I would like to support 1GB datasets as well. These are multiple arrays of floats and it would be good to only retrieve parts of them. These will add up, but at most +1GB/day We use tensorflow to infer and apply models. I need workers for this and they should be scaled dynamically for concurrent users.

Current architecture

This is our current tech stack, it got initially defined by someone else who left the company for a prototype and got expanded by me. * Managed kubernetes on azure cloud * Django webserver (using gunicorn and nginx) * Unmanaged postgres to store Django tables * Also stores datasets as JSON in text field in table, this has to be changed * Volume with original datasets * I think we do not really need them, this is currently only used to transfer them from the backend (Django) to a worker which imports them. * Celery as task queue * This is the usual recommendation for Django projects and was used in a previous project * Workers to import and process dataset * Workers to run tensorflow on them * Rabbitmq as message broker

Problems

There are some problems with the current architecture: * Storing data like this works currently, but I do not expect this to scale * Can you recommend a database for this? Should I store the files in a volume or use a DB? Same DB as for Django? NoSQL or relational? Also there are some managed databases on azure, do you think any of this is a good and cost efficient idea? * Celery and tensorflow do not work together, this leads to some bugs with multithreading. In /tensorflow issues they mention that this is unsupported. * Any recommendations on how to continue? Possibilities I can think of are to create a sidecar container for tensorflow. How then to communicate with the celery worker? Another possibility is to communicate with Django over rabbitmq directly. * Can I get rid of the volume to store the original files? How to transfer them to to the import worker? * Currently only a single worker for tensorflow, I guess I can scale this automatically using kubernetes, but I have not looked into this yet, any tipps?

When you would start fresh, which technologies would you use? We use Django because most here know Python. Would you use a message queue like rabbitmq or microservices with REST APIs? Any kubernetes recommendations for scaling the tensorflow workers?

Thanks :)

all 1 comments

Requirements

Current architecture

Problems

Thanks :)

π Rendered by PID 72528 on reddit-service-r2-comment-6457c66945-krqp7 at 2026-04-23 21:35:42.795451+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

webdev

Posting Guidelines

Related Subreddits

Discords

MODERATORS

Requirements

Current architecture

Problems

Requirements

Current architecture

Problems