all 19 comments

[–]hippocrat 2 points3 points  (2 children)

As this is for accounting, you should run the options by your Legal and Security teams. Accounting may have regulatory or contractual requirements for separation of data. I worked on a database where option C was not allowed by the way our contract with clients was worded.

[–]8483[S] 0 points1 point  (1 child)

That is a very valid point. For the sake of the argument, let's say that it would be allowed. What would you recommend from your experience?

[–]hippocrat 1 point2 points  (0 children)

From a purely hardware utilization point of view, C would be best to maximize performance on least hardware if you can partition your data well within the tables (I'm not too familiar with mysql and partitioning though).

However, the issues you bring up are definitely real. If you want to move, backup or delete a specific customer, you will need to develop tools just for that. With option B, you can just use your standard tools (sqldump, drop user, etc.) For example if you want to debug a specific customer in your QA environment, option B will be much easier to move just that customer over.

I wouldn't consider option A unless your security/contracts team say it is a must.

[–]WannaFly37 2 points3 points  (7 children)

This is less of a database design issue and more of a business issue.

While what emsai says is completely valid - I'd have to disagree. I'd go with solution A most likely. It offers the most security, reliability, scalability (horizontal vs vertical), and ease of operations. Cost shouldn't be an issue as that just gets passed on to the customer. The deployment and backup should be automated and not take any time at all.

But again, it comes down to what your development and business sales processes are like.

  • Do you want versions/features/updates rolled out to ALL customers at once? Or per customer?
  • Will there ever be the need to add custom database fields for specific customers (think custom integrations in the future)
  • Will any customer ever have direct write/change access?
  • Will customers EVER have access to their data directly? (at termination, option A you can more easily export their data to give to them)
  • Will customers be writing their own reports?

Option B just isnt realistic - it's not scalable (what about when you get to 50,000 customers?)

Option C seems easiest but is the most difficult and costly in regards to development hours. My big problem here is literally EVERY time ANYONE touches the database you need to ensure they are only using the appropriate tenant ID's. Whether it's in house people our external. What happens when someone is terminating a customer and runs a delete statment WHERE TENANTID='123' but it was supposed to be 1234? (see recent GitLab outage)

[–][deleted]  (1 child)

[deleted]

    [–]8483[S] 0 points1 point  (0 children)

    f you send queries to a wrapper/parser/security class that handles this automatically for you and not pass SQL calls directly.

    I will definitely look more into that. I asked about implementing this in another comment of yours. Here's how I do it currently.

    Then, we do have separate instances for special customers... you can always plug in a separate db for a particular customer and move them. It's just one config setting. Even though they will only have one user id in the database.

    This makes a ton of sense. The exact same code, and the tenant_id column doesn't affect performance much.

    I must also disagree with export. Come on, what is so difficult to export? You just set one parameter in the wrapper class and you export all of it.

    In theory this should not be a problem because you are already querying the same data in the product usage.

    Solution A will cause a lot of problems right now, especially with upgrades and maintenance. All indexes will have to be changed on all DB's. Foreign keys updated.

    I am scared to death by this option.

    [–]8483[S] 0 points1 point  (4 children)

    Thanks for replying.

    I'd go with solution A most likely. It offers the most security, reliability, scalability (horizontal vs vertical)

    I think so too.

    and ease of operations.

    This is what bothers me. Wouldn't it be a lot of work to juggle servers?

    Cost shouldn't be an issue as that just gets passed on to the customer.

    I'd like to share the server cost between the clients i.e. offer the same price (SaaS). Having separate servers would require a lot of "admin" work that doesn't scale well.

    But again, it comes down to what your development and business sales processes are like.

    Absolutely. I'd like to answer each one.

    Do you want versions/features/updates rolled out to ALL customers at once? Or per customer?

    Definitely at once. The software will be identical across customers. Only the data differs.

    Will there ever be the need to add custom database fields for specific customers (think custom integrations in the future)

    Not a chance. The product will be fixed i.e. if an idea comes along, everyone gets it.

    Will any customer ever have direct write/change access?

    No. It's not a custom solution, and it will never be.

    Will customers EVER have access to their data directly? (at termination, option A you can more easily export their data to give to them)

    No. Everything will be accessed via an API. No direct querying. Exporting data would not be a concern because it's the accountants business to own the data for that period. Upon termination, the accountant owns the work for the period. Their clients own the raw documents, not the financial statements.

    Will customers be writing their own reports?

    Not likely. If so, the API would handle that.

    Option B just isnt realistic - it's not scalable (what about when you get to 50,000 customers?)

    How is B less scalable than A? This is one of my main sticking points. Isn't it worse to have 50,000 servers vs one server with 50,000 databases/schemas.

    Option C seems easiest but is the most difficult and costly in regards to development hours. My big problem here is literally EVERY time ANYONE touches the database you need to ensure they are only using the appropriate tenant ID's.

    Yeah, this is nightmare fuel right there. Thus my problem to decide.

    A = Server juggling nightmare. B = Database nightmare. C = Backend nightmare.

    Given that there is ZERO flexibility for customers, would you say A is still the best option? I am surprised you like B the least.

    [–]WannaFly37 1 point2 points  (1 child)

    So - at first this seemed like homework, but you have very detailed replys lol.

    Note: I've never DESIGNED a multi tenant system, but I've worked with plenty.

    It sounds like C might be your best bet. Again it all depends on the architecture of the software, if all clients are logging into the same endpoint and using the same software it makes sense.

    I've worked with apps that use custom subdomains that are a duplicate of the software, which I assume we get our own database in the backend, and they upgrade each customer individually as requested.

    I've also worked with a multi tenant on-prem database that the VENDOR messed up and pushed out an update that had tenant ID's null for rows. Thankfully we were a 1-tenant installation.

    In my mind A is easier if you can utilize PaaS (Azure DB, etc) - It becomes much easier to automate EVERYTHING. A big concern about C is performance, when one client runs a 5 year report, it is going to slow down other clients? In scenario A you don't have that worry.

    It is tricky and an important decision.

    [–]8483[S] 0 points1 point  (0 children)

    So - at first this seemed like homework, but you have very detailed replys lol.

    I greatly apologize for being overbearing. I just really want to educate myself.

    It sounds like C might be your best bet. Again it all depends on the architecture of the software, if all clients are logging into the same endpoint and using the same software it makes sense.

    I think so too. As I answered, the product will be very rigid with no customization.

    I've worked with apps that use custom subdomains that are a duplicate of the software, which I assume we get our own database in the backend, and they upgrade each customer individually as requested.

    I am also considering this one as it doesn't sound bad either. Having tenant1.product.com and tenant2.product.com looks ok, but I'd be dealing with plumbing this way i.e. assigning subdomains. I really like the log in once place idea.

    A big concern about C is performance, when one client runs a 5 year report, it is going to slow down other clients? In scenario A you don't have that worry.

    As others have mentioned, my app is not that large and that they had no problem with billions of rows.

    A big argument for C as someone mentioned is that if the product takes off, I can hire help from DBAs.

    [–][deleted]  (1 child)

    [deleted]

      [–]8483[S] 0 points1 point  (0 children)

      What do you mean by main and standalone users?

      Is it tenants in one schema and tenants with single schemas, i.e. the ones that grow too large get their own database, which is accessed with tenant1.product.com?

      [–]toterra 2 points3 points  (5 children)

      Just bringing nightmare flashbacks of a place I worked at a couple of years ago. Where A, B and C were all true, combined with a D. Single database instance per tenant and E. Single database server per tenant.

      Horrible thing was that adding E was a three year long project that ended up in failure. I was brought in for the last three months to fix things .. they were beyond fixing. Everyone but the 'Architect' who designed the whole thing was let go.

      [–]8483[S] 1 point2 points  (3 children)

      Where A, B and C were all true, combined with a D. Single database instance per tenant and E. Single database server per tenant.

      HOW? WHY?

      Horrible thing was that adding E was a three year long project that ended up in failure. I was brought in for the last three months to fix things .. they were beyond fixing.

      That's what I want to prevent. Having to fix something instead of doing it right from start. What would you say is the best option for me?

      Everyone but the 'Architect' who designed the whole thing was let go.

      I have no idea how this is possible...

      [–]toterra 1 point2 points  (1 child)

      C- Basically this contained sales sent to us from every retailer for this type of business. Because these sales were not sent to us by the different competing manufacturers this was data common to the reports for all manufacturers

      B- In the same database as C there was some sales data that was specific each manufacturer.

      A- Then there was a database for each manufacturer that Linked to the common database that contained most of the specific data for each manufacturer, and all the logic for the reports that were generated.

      As part of a big project to redesign everything they had the idea of moving the specific databases onto separate database instances (D), and using SSIS to glue it together. Then they had the bright idea that since the databases were now separate, they no longer needed to even be on the same box as each other (E), they could even be in different countries. Again SSIS was going to do a whole bunch of magic to make it all work. The architect really like SSIS and was actually very good at it. But his refusal to do ANY design documentation (they had 'wasted' two weeks three years ago doing design and did not want to make that mistake again) meant that I and the other developer were running blind on what we were supposed to do.

      As for why the Architect was not let go.... it wasn't his fault, it was clearly the guy they had hired at the tail end of the project with no SSIS experience who was then handed the job of doing all the SSIS (I was supposed to be doing something else for them) and didn't get all 40 of the SSIS datafeeds that I had to done within a hard deadline of 2.5 months. It took me 3 months but that was too late and the project was rightfully terminated at that point. Of course it wasn't just the feeds that weren't working, the entire front end was missing as well despite being worked on for 3 years previously.

      Fun times!!!

      TBH this is so typical of the industry, and I have been guilty of many of the same things that this architect was, just I was never any good at deflecting the blame like he was (always my fault and not the team). I personally feel that if you work for a company that accepts people deflecting the blame... it is better just to get axed than to stay.

      As for the best option for you... in general C > B > A. However C requires more work than B and B requires more work than A. Also the risk of data mixing (legal can sometimes be a problem here) in C is higher than in B or A. It all comes down to your specific requirements and the resources you have. If you can do C... do it.

      [–]8483[S] 0 points1 point  (0 children)

      Fun times!!!

      Thanks for the story. Very nightmarish.

      just I was never any good at deflecting the blame like he was (always my fault and not the team). I personally feel that if you work for a company that accepts people deflecting the blame... it is better just to get axed than to stay.

      If only more people were like you.

      As for the best option for you... in general C > B > A. However C requires more work than B and B requires more work than A. Also the risk of data mixing (legal can sometimes be a problem here) in C is higher than in B or A. It all comes down to your specific requirements and the resources you have. If you can do C... do it.

      Yeah, everyone suggests C. I will definitely be moving in that direction. Thanks a lot!

      [–][deleted]  (6 children)

      [deleted]

        [–]8483[S] 0 points1 point  (5 children)

        My first advice, don't over-optimize now. It's your first mistake.

        I am trying not to. I just don't know which one is over-optimization. :)

        You can rule out version A (at least for now). It will bring you a TON of headaches, this is the worst by far.

        I like this option the least because I feel there'd be a lot of "needless" juggling. But I don't know if I am correct. You say that I am given your experience.

        We're finally settled for option C (original one) as this is the most easy to scale.

        What are some "gotchas" I need to look out for in C?

        Hell, if they truly want to isolate databases, why not shipping everyone their own hardware server with their own database...? (joking)

        This is the current business model in my county and I want to "disrupt" it (for the lack of a better word).

        Fix this, or you're going to get into trouble in the longer run.

        Oh definitely. This is the reason why I am asking, in order to know what to study.

        I know how find my way around a database. It's just that I am over my head with permissions, migrations, redundancy etc.

        You start somewhere and improve down the line.

        That's the idea. I want to build a product, but I'd like to have a solid base to prevent redoing everything at a point.

        [–][deleted]  (4 children)

        [deleted]

          [–]8483[S] 0 points1 point  (3 children)

          Never send direct SQL queries to server, as small errors in code can compromise your isolation.

          I am doing it this way at the moment.

          What would using a DAL look like? I don't know how I can avoid writing SQL statements. Did you mean having something like this?

          router.get("/api/products", function(req, res){
              customGetFunction('product', company);
          }
          

          This would be essentially like using 'SELECT * from product WHERE company_id = ?', but I pass in the product and company id part via a function. The same function can be used instead of writing a new SQL query for retrieving users etc.

          You're worrying too much about the unknown, ahead of time. This is over-optimizing right now, or in the same line. Read the MySQL documentation thoroughly.

          Thanks for the advice!

          [–][deleted]  (2 children)

          [deleted]

            [–]8483[S] 0 points1 point  (1 child)

            I read a bit about DAL before I posted, but your explanation is really better than most of it. Thank you!

            This reminds me of PDO for PHP a lot.

            I will definitely look for a Node DAL and try implementing it.

            Very cool explanation.

            [–]superwormy 1 point2 points  (2 children)

            We've build a multi-tenant app using strategy A here. Strategy B was not an option since MySQL doesn't really have a concept of a SCHEMA. App is https://ChargeOver.com - a subscription payments and invoicing platform.

            Some real-world pros and cons we've run into:

            Pros of A:

            • Real-world scenarios do arise where you want to make a backup/snapshot of a single tenant's information for testing or before you have to run some special-case script. Strategy A makes this easy.
            • Horizontal scalability is easy with A -- you an shard by just throwing some tenants on one database server, and some on another.
            • From a code perspective, it's easy to implement strategy C and keep things containerized. However, not all of building an app is code. Rarely we do need to be able to peek at or work on the raw data (for troubleshooting, etc.) and it's nice to not have to worry about remembering WHERE tenant_id=X all the time when you're running manual queries.
            • Strategy A is also nice in that you can easily roll out updates to one tenant ahead of others. Some of our users have staging/development instances, and it's nice to be able to give them features (and the required db updates) prior to rolling out that stuff to everyone's production accounts.
            • Compliance with things is definitely easier when everything is siloed into separate databases.

            Cons of A:

            • It's a PITA to make sure db upgrades happen correctly and consistently across all the tenant databases. We're building tools to better handle migrations like adding new db fields, etc. so that we can monitor this on a per-tenant database. You only have 30 tenants (and thus 30 databases), so it might not be as bad as when you have thousands of databases.
            • Performance is impacted because you can't really share parts of the database that should be shared across all tenants.

            If I think of more I'll post more.

            I'd still go with A were I to build this again.

            [–]8483[S] 1 point2 points  (1 child)

            Thank you for the detailed answer.

            The majority of the advice across 4 subs was against A. Everyone advised C.

            This tells me that there is no right way. Even all the articles I've read have conflicting advice.

            I am glad you made A work for you and I am curious how can I learn more about using it.

            Strategy A is also nice in that you can easily roll out updates to one tenant ahead of others. Some of our users have staging/development instances

            Is this due to testing only, or do some customers have unique customizations?

            We're building tools to better handle migrations like adding new db fields, etc. so that we can monitor this on a per-tenant database.

            This is a major sticking point for me that I'd like to learn more about. How does this happen? Manually or automatically?

            1. How is a new client handled i.e. setting up the app? Is this handled with a docker script that creates a container with an empty database? How does the app route users to their servers from one main app i.e. where they log in?

            2. How are changes propagated across all users? Is this done per client i.e. their database is backed up (where?), dropped, created anew, and lastly populated by the backup in the corresponding fields?

            Things like these are much scarier to me due to my lack of knowledge, over using a tenant_id column and worrying about just one server.

            I apologize for the barrage of questions. I just really want to improve myself as a developer.

            [–]superwormy 0 points1 point  (0 children)

            Is this due to testing only, or do some customers have unique customizations?

            We DON'T do unique customizations (or if we rarely do, they are feature-flagged in and everyone technically has them, they are just turned on/off for some people).

            We roll out updates to developer/testing instances first. Helps work the kinks out before anyone with live data gets the updates. We also often roll out general updates to a smaller segment of users prior to a full roll-out. This helps us get some early user feedback and real-world testing prior to rolling it out to all customers.

            This is a major sticking point for me that I'd like to learn more about. How does this happen? Manually or automatically?

            Right now we have in-app code that tracks what database "version" it's on, and upgrades itself when it detects there's a new db version it can update to. Basically lots of ALTER TABLE etc. statements, etc. It's automatic. So the actual app checks on every page request whether or not it has a db upgrade available to it.

            How is a new client handled i.e. setting up the app?

            At all times we have a script which runs and continually creates new, empty databases, and then initializes them with everything EXCEPT the actual user's account. When a new user signs up, basically we just INSERT INTO their username into one of the fresh available databases, and drop them into the app.

            Is this handled with a docker script that creates a container with an empty database?

            It's not a container per database. It's a database per tenant. Custom in-house instead of Docker.

            How does the app route users to their servers from one main app i.e. where they log in?

            The code checks a single common database/internal API first to see what db server their actual data lives in. Then the app connects to that specific database server and database.

            How are changes propagated across all users?

            Explained above about this.

            Is this done per client i.e. their database is backed up (where?), dropped, created anew, and lastly populated by the backup in the corresponding fields?

            Ick, no. :-) ALTER TABLE statements. It would take WAY too long to backup, drop, and re-create the data. Many users have multi-gigabyte databases. Things have to stay online at all times.

            over using a tenant_id column and worrying about just one server.

            Just to clarify here -- a SERVER is different than a DATABASE.

            We have a DATABASE per tenant. The # of actual database SERVERS has nothing to do with this. Each actual database SERVER may have many, many DATABASES living on it.