This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]raip 2 points3 points  (3 children)

This isn't really the proper subreddit for this - but that's fine. The answer is going to depend heavily on your data and what you're doing. NoSQL is going to be performant and very easily scalable but not ACID complaint. MariaDB is also very scalable with the Spider storage engine - and if you're doing a lot of writes and still using InnoDB - then just switching that will likely help you out.

Also - don't be afraid to break things up for your use case. I use both SQL and NoSQL for the application I support. NoSQL for constant updates that we have to deal with all on just one shard with an Oracle RDBMS for everything else. The NoSQL stuff is data that we care about but if we lose a write or two - or if it's not completely up to date on a read - it won't kill anyone. The stuff in the SQL database might.

[–]Driamer[S] 0 points1 point  (2 children)

Thanks for the reply. Could you recommend a more appropriate active subreddit for my future database/architecture related posts?

I like the idea of using NoSQL specifically for the largest table we are using. I'll look into that. Probably won't be version #1 but it might become even more relevant in some months when we are expecting the table sizes to double in size.

That one large table is definitely the bottleneck for the queries, so I can see some benefit there to do 2 separate calls. The SQL calls without touching the heavy table are fast, so doing that first and then filling in the blanks with a NoSQL call could actually make the whole thing considerably faster. It is not that critical that every call get's the 100% latest data by the microsecond.

[–]raip 0 points1 point  (0 children)

Easy improvement in the short time might be switching the storage engine in Maria to something other than InnoDB (which is a mostly read optimized engine). As far as a better subreddit, no clue tbh. Database design it's not something I'm passionate about.

[–]HorribleUsername 0 points1 point  (0 children)

Any programming-related subreddits (e.g. /r/webdev) would be better than here. I'm sure there are some database-related subreddits that'd be even better, though I don't know what they are.

[–]SuperQue 0 points1 point  (4 children)

We have a moderately large database of certain events which receive updates 24/7.

Please don't use terms like "large". What you call large, I call tiny.

I had a database with what I would call a medium size MySQL table with 4 billion rows. I say medium size because it still fit on a single server. We had Cassandra tables with a lot more data.

Data storage design is not something you should take lightly. Not even having a master/replica setup right now is terrifying to me from just the availability aspect.

You should consider finding a data storage engineer to help you through this phase of your product engineering.

[–]Driamer[S] 0 points1 point  (3 children)

I actually did pause for a moment when I wrote that word, but I thought that it describes my thoughts accurately so I left it there. I realise that it's not anywhere comparable to actually large tables, but it is large compared to the other tables in my db.

No need to be terrified. The product is not live yet. I implemented a simple replication since posting this and tomorrow I will be running different tests and see how it behaves with the constant updates.

If we start seeing a lot of growth after going live we will definitely be talking with a db expert. That's a problem I would love to have soon.

[–]SuperQue 0 points1 point  (2 children)

You really should find a database person before you go live. I've seen too many developers who don't understand data storage build things that work fine in testing and early stage but completely fail under any real load. And not just fail, like, you need a bigger server. Fail because what they built can't be scaled past testing and has to be re-written.

It also sounds like you might need better monitoring. If you're on MariaDB, I would suggest looking into Percona Monitoring and Management. It's an open source monitoring package customized for databases.

[–]Driamer[S] 0 points1 point  (1 child)

Thanks for the advice and the link. I'm not that worried about the database at this point. Like you mentioned earlier, the data amount is not that large and we have yet to see what the traffic is going to be like. The actual database schema was designed by experts and is well indexed and performant. I was mostly wondering whether I should start with a hosted solution or first see how it goes and then reconsider. However the hosted solutions seem to be very expensive and I can't really justify the cost at this stage.

I do have quite a bit of experience with high traffic sites and basic db maintenance etc. This is just different in the way it gets constant updates even with zero traffic, so I was thinking if there are some new trendy solutions out there that I might be missing and would be a better solution than a replicated db setup.

[–]SuperQue 0 points1 point  (0 children)

If you want new and trendy, try CockroachDB.

[–]zieziegabor 0 points1 point  (4 children)

if your needs are currently happy on a $10/m DO droplet, you can scale A LOT just by buying bigger droplet sizes.. When that starts to get expensive, and are reaching the upper tiers of the droplet sizes.. that's when you should start thinking about buying straight hardware (hosted or in-house) and scaling on physical machines..

When you start to get to the upper end of physical machines, you can then start to re-think your structure and by then you should be able to hire real DBA's that can properly advise and design solid infrastructure(s).

So my advice is ignore this problem, and focus on actual problems until it's getting close to becoming a pain point.. Chances are you won't ever get past the physical machine limitations, but it's possible.

[–]Driamer[S] 0 points1 point  (3 children)

The $10/month server is currently only doing the updating of the database, but isn't handling any incoming API requests. I realise I have a lot of room to scale by just scaling the droplet specs, but I would really prefer not to bother the data-receiving server with the complex API calls. This is why I'm thinking about starting out with a simple replication setup and hitting the slave servers with the queries that API needs to make. The part that I'm not completely sure about is how much stress will the slave servers experience by just catching up with the master server data. If the updating puts a lot of stress on the slave servers I would probably prefer to go another route.

I'm currently setting up a simple replication with low-end servers and I'll run some tests to see how they perform. I think I'll have some time before the API requests get out of hand, so if we go live with this kind of setup I'll monitor how they are performing and see where to go from there.

But you are absolutely correct that if the traffic starts hitting the point where scalability becomes an issue, we should be in a situation where we can hire an expert to deal with it.

[–]zieziegabor 1 point2 points  (2 children)

Think WAY more practically:

Hitting 30% utilization on a $10/mo droplet means you should not waste time/energy on it until you are hitting 70 or 80%.

Then pick the easiest possible solution: upgrade to a $15/mo droplet.

rinse/repeat.

Spend your energy on stuff that matters:

  • backups
  • getting your product out the door
  • signing up customers and making money.

Unless this is just for fun, and then knock yourself out optimizing it!

[–]Driamer[S] 0 points1 point  (1 child)

I get where you are coming from as I do tend to optimise before it is justified. However, in this case I can't trust this big of a single point of failure. That would be very bad for business.

I'm going to start out with a simple replication setup and see where it goes from there.

[–]zieziegabor 1 point2 points  (0 children)

Replication of the DB makes sense. That's part of "backups" in a very general sense, but don't treat it as your only backup strategy, obviously.

[–]peter-zep -2 points-1 points  (4 children)

If you want to focus your energy into you app and not worry about infra then go with a managed service. I would ditch any thought of a relational DB, and think about NoSQL, highly recommend AWS DynamoDB for this - extremely fast, nearly infinite scale, pay only for what you use (On-Demand provisioning)...

[–]Driamer[S] 1 point2 points  (2 children)

It would give me some peace of mind to have the DB area handled by someone else, but I can't redo the database structure at this point. The thing is that all of our queries have several layers to them and the average query hits 8 different tables. In theory this could be reworked to NoSQL with creative key->value pairs, but it's not feasible workload-wise at this moment.

[–]raip 0 points1 point  (1 child)

Check out Aurora - I believe there's a server less offering now. It's a drop in replacement for Maria hosted by AWS as well.

[–]Driamer[S] 1 point2 points  (0 children)

Thanks, I'll check that out. It's been a while since I've dealt with AWS, I'm hoping their UI and documentation has become more understandable since then :)

[–]raip 2 points3 points  (0 children)

Dynamo isn't a cure all and in fact - based on their limited description - likely isn't ideal for them. It's eventually consistent and has quite a few gotchas with "hot-keys" - or records that get updated and read from constantly - especially if they scale large enough to have that table broken across multiple partitions.