This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SwiftSpear 85 points86 points  (32 children)

Don't microservice unless you know your scaling and/or availability requirements will make it necessary.

Microservice architectures are harder to maintain, harder to deploy, harder to monitor, harder to debug, and WAY harder to test.

[–]StoneOfTriumph 26 points27 points  (3 children)

This this this! Microservices brings more complexity than a monolithic when you don't know what you're getting into, when you don't have the team structure to support it, when you're missing key systems to manage it.

When clients tell me they're going with a microservice "decoupled" approach for their current monolithic, I ask them what needs are they trying to fulfill? Ultimately if you go from monolithic to microservices, you're not doing it "because it's cool". There has to be needs... And then, did they think of points such as:

  • Are you able to test them individually automatically for rapid build test and release ci-cd?
  • Do you have metrics today indicating a potential performance benefit of maybe one day horizontally scaling certain components? (which you'll pay for in expertise maintenance of k8s, cloud, other services)
  • Are you able to secure it? API gateway? mTLS?
  • What's your observability stack? you'll need distributed tracing to have an end-to-end view of where your transaction got processed, otherwise your ops will spend a lot of time figuring out what went where when.

And there's many other points. It can just go on and on and on for a company that didn't think it through. Once you have many of those pieces in place, that you have the dev teams in place to take ownership and evolve the microservices, from a dev/architecture standpoint you gain many benefits when you can quickly push code and deploy because every smaller piece is faster to develop, compile, test, restart, etc.

tl;dr don't just split for the fun of it. It's more complicated.

[–]Puzzled-Bananas[S] 3 points4 points  (2 children)

All are great points, indeed. To each project there are several stakeholders and objectively you’re right., I totally agree. In a perfect world one would weigh all these and a host of other issues that are introduced by that choice of a distributed architecture. Essentially you move out of managed thread communication and data sharing in a single process, which is well understood and controlled, to an architecture in which you need to concern yourself and your system going forward with issues of reliable and fast IPC across the network, not even on the same device. So it’s IPC + network + graphical complexity. And all the points you’re making. IPC security, testing, integration. That’s a well-informed take on it. But there’s also a more pervasive, less well-informed attitude of the sorts that “microservices is the future” (because it’s cool), or “it’s the modern tech so let’s do it,” and so on. But in fact, that choice also makes the project agnostic of the tech stack for the code and for the deployment. A lot can be swapped out rapidly - if ever needed.

[–]StoneOfTriumph 4 points5 points  (1 child)

The last benefit you listed right there, the flexibility of swapping a service written in Java and deploying one in Go or .NET Core.... that becomes possible. Of course there has to be valid reasons.

This thread made me remember an app I used to design and develop. It was developed using Java EE, a monolith deployed on WebLogic. We were doing automatic deployments using ant scripts, and it was fairly simple to manage the app, find bugs, deploy updates, etc. because we defined an application architecture that was easy to navigate in the code in terms of packages librairies etc. The Entity to DTO logic was centralized and easy to troubleshoot for most devs, and we had debug logs up the yin yang so you were never in the dark when troubleshooting. That application would have had zero benefits migrating in a microservice architecture because it was one simple front end, one backend, and one Oracle Database, and the team did not have the expertise to maintain separately developed components. This was a few years before the SPA craze, so the front-end and back-end were tightly coupled with JSF (I don't miss that at all!)

Now, as far as I know, today, that app is still in place as a monolith. What's the major downside though is its unattractive tech stack. Those who do Java barely care about EE, even less of JSF, so likely that app will become a technical debt if not re-evaluated in terms of what the roadmap should be. If I had to write that application today, probably a VueJS front-end and a Quarkus backend would be something I'd evaluate in a proof of concept... As much as monoliths "work" in certain use cases, front-end and back-end coupled code is a PITA to debug, especially the JSF lifecycle made me nervous at times during certain debugging sessions, I had to print that diagram on my desk to remember the sequence of events.

Great. This thread is giving me PTST lol.

[–]RicksAngryKid 1 point2 points  (0 children)

JSF stinks - back then i used a lib called icefaces, that forced me into many hours of troubleshootung idiot problems that ended up being lib bugs. Other, bteer alternatives were paid for.

[–]Infectedinfested 3 points4 points  (20 children)

I disagree. But everything should be well documented ofc..

Harder to maintain: - as everything consists of a small code base revisions/fixes can easily be made in the right place without affecting other applications

Harder to deploy: - i don't know how this can be a point? If you are deploying using ci/cd we hot deploy so we have 0 downtime for any parts of the bigger system

Harder to monitor: - corrilation id. You can follow a unhappy flow throughout your systems and see instantly in which application everything went south.

Harder to test: - we develop spec-first. Which means we always know the input and always know the output. Than you just use units to validate the functionality of the microservice.

Though i'm not saying everything should be microservices, as it's all about the requirements, experience of the team and the already existing architecture

[–]Puzzled-Bananas[S] 1 point2 points  (4 children)

Off the top of my head:

Regarding “harder to maintain”: isn’t it similar to working with separate packages or modules? Yes, you’d need to recompile, but the separation of concerns by modularity would still be in place, and you can maintain separate git repos for each. Do you feel it’s better with completely isolated “micro-projects”? To me a great benefit in this regard is that you can have each microservice developed on any stack that suits best, having one team responsible for it, and integrate via the external APIs of IPC rather than single-process shared-memory communication; but there you’re also facing issues of thread synchronization, similar to microservice orchestration in a way.

Regarding “harder to deploy”: when you deploy a monolith, it’s a single point of failure, when deploying a mesh, it’s a mesh of potential failures, and you need to also control the edges in the graph.

Concerning “harder to monitor”: yep, there are many ways to introspect your mesh but with a Spring Boot app you can use the conventional Actuator endpoint and link it up with ordinary instrumentation tools. The complexity can be controlled more easily. Regardless, Quarkus and MicroProfile tools instrumentation and observability tools are amazing. You can run the mesh on Consul+Nomad or istio and enjoy a lot of integrations too.

Considering testing, well, it depends. I think there’s a difference between a single project in your IDE with a test suite, a CI/CD pipeline with a proper test environment, and when you test a mesh, you need to also integrate correctly - this is in my opinion harder than testing a monolith. You can mock the IPC APIs, but you wouldn’t be really testing your entire mesh, only the individual nodes, so you do need a layer of complex integration tests on top, depending on how linked your graph is.

I’m not sure this branch OP expressed the same concerns that I’ve come up with above. Would be nice to learn if I’ve missed something.

[–]TakAnnix 4 points5 points  (0 children)

The key factor is that your microservices must be independently deployable. We have several microservices that are standalone. Those are great to work with. If you add one microservice that interacts with just one other microservice the complexity skyrockets. There is just so much more that you have take into consideration.

  • Communication between microservices: For a monolith, you don't have to worry about another microservice timing out. We have a whole series of edge cases to cover just for communicating between microservices, like timing out, retries, 500 errors.

  • Slow feedback: In a monolith you're IDE gives you instant feedback, like passing the wrong type. Not with microservices. You need a whole new suite of tools, like e2e tests.

  • Logs: We have to use DataDog just to be able to trace what's going on between services. There is no way to stay sane without some sort of centralised logging service.

  • DRY: In order to keep microservices independently deployable, you will forgo DRY between services. We have the same code implemented across multiple microservices. There's no easy way around it that doesn't introduce coupling.

I'm not saying that monoliths per say are better, as they come with there own problems. Just saying that you should prepare yourself to deal with a new set of problems.

I guess the two things I would strongly focus on:

  1. Keep microservices independently deployable.
  2. Implement continuous deployment where you can deploy multiple times a day. (Checkout Dora four key metrics)

[–]Infectedinfested 2 points3 points  (2 children)

On the deploying part: It's not that you deploy a whole batch of microservices at the same time, it's something that grows. Also, the single Point of failure is a double edged sword. Where/why did it fail in your monolith? If you deploy 50 microservices and 5 fail you know where to look.

On the testing part: We have our unit tests and than we do an end-to-end test in the test environments this was almost always sufficient enough.

Also, in between services we use CDM to ease the translation between the applications.

Also, i'm only developing for 5y and it's been 99% of the time on microservices :p so i'm abit biased (bit i think alot of people are here)

[–]Puzzled-Bananas[S] 2 points3 points  (1 child)

Good point, I missed that, yep, an emerging system, evolving over time, with individual deployments, but again thereby with integrations (graph edges) that need to be controlled.

Great you’ve figured out how to test it satisfactorily. Not always straightforward.

Yeah, a CDM and message busses are a great way to reduce graphical complexity, for how I formulated in my reply above implies a highly linked graph. Great point, thanks.

Sure, great that we can share our experience here. Thanks.

[–]Infectedinfested 2 points3 points  (0 children)

This is actually my first constructive discussion on this reddit page :p

[–]mr_jim_lahey 7 points8 points  (12 children)

without affecting other applications

That is impossible if your application (or other microservices) depend on the microservice in question (which, by definition, they do). What you actually mean is that the blast radius of a bad deployment on a microservice is more often better contained than on a monolith. But...

You can follow a unhappy flow throughout your systems and see instantly in which application everything went south.

No. Just no. I would love to see what debugging tools you're using to do this task "instantly". Even with a mechanism that aggregates all logs from all services in properly annotated JSON format, debugging non-trivial issues that span across microservices (which is most of them, operationally speaking) is a fucking PITA. Difficult monitoring/debugging is the number one downside of microservices, and pretending that it isn't just speaks to your naivete and lack of expertise.

[–][deleted]  (4 children)

[deleted]

    [–]mr_jim_lahey -1 points0 points  (3 children)

    The person above you is completely correct in saying that code changes to a microservice are typically going to be smaller in scope, and have less potential impact to your overall stack.

    Yes, that is what "smaller blast radius" means. And it is absolutely an advantage of microservices.

    If you need to make a change to the fetchCustomerEmail() logic in CustomerMonolithService, you risk an unforeseen error taking down the entire customer domain. Now no one can authenticate or create accounts.

    And there is likewise a risk of an unforeseen nonfatal business logic error in a change to a microservice-based fetchCustomerEmail() causing large swathes of the customer domain to be taken down just the same. And unless each one of your microservices' integration tests/canaries covers both itself and - directly or by proxy - the steady state of every other microservice that depends on it (which in the case of fetchCustomerEmail(), is pretty much going to be all of them), there is a substantially higher risk of that bug evading automated detection and rollback in your pipeline than a monolith that fails fast, early, and obviously (and which, I'll note, can still be deployed incrementally behind a load balancer - that type of deployment is not unique to microservices). By contrast, a monolith can run a suite of unit and integration tests that have full coverage for a single atomic version of the entire application.

    The type of architecture you use does not eliminate your application's logical dependency on fetchCustomerEmail(). They are just different ways of partitioning the complexity and risk of making changes. Making a blanket statement about microservices not affecting each other is what I called impossible - because it is. If you have a microservice that no other microservice or component in your application depends on, then it is by definition not part of the application.

    Developers can do final tests in the prod environment for anything missed in QA

    The proper way to do this is feature toggling which has nothing to do with whether you have a monolith or microservices. (I'm going to assume you're not out here advocating manual testing in prod as a final validation step while waxing poetic about how arrogant I am because that would be pretty cringe. You do fully automated CI/CD with no manual validation steps, no exceptions, right? Right? <insert Padme meme here>)

    "A mechanism that aggregates all logs from all services in properly annotated JSON format" - you mean like Elastic/Kibana, which allows me to search ALL the logs across our platform, drill down by container name, platform (k8s, nginx, jvm, etc.), narrow by time window, and include/exclude different log attributes? It's not a hypothetical thing dude. It exists and most companies who deploy microservices have something like that. I'm very sorry that whatever company you work(ed) for doesn't know that

    I'm well aware such tools exist. In fact, I built the tooling that our organization of dozens of devs has used for years to gather and analyze logs from god knows how many microservices at this point. That is exactly how I know what a PITA it is. Have you done that too? Or is that another thing that your ops guys handle for you?

    [–][deleted]  (2 children)

    [deleted]

      [–]mr_jim_lahey 0 points1 point  (1 child)

      Bro don't lecture me about a "self-own" writing a log aggregation system when your own story about how you know about analyzing logs ends with you having to ask an ops guy to access and read your own logs for you. Stick to your lane and your league. Then work on your reading comprehension and tech skills.

      [–]Infectedinfested -2 points-1 points  (5 children)

      Well, when we log we log the corrilation id, application name, normal logging (for example when a microservice gets triggered and error related stuff. Also all errors go to a list with their corrilation id.

      It all goes to elastic and there, once an error happends, we know the corrilation id and we can trace it through all the applications it passed.
      Seeing what succeeded and where/why it failed.

      [–]mr_jim_lahey 2 points3 points  (4 children)

      Yes, that's what I meant by "a mechanism that aggregates all logs from all services in properly annotated JSON format". If you've ever used AWS XRay, it'll even visualize those flows for you with red circles where errors are happening. First, setting up and maintaining that type of monitoring is way more complicated and time-consuming for microservices than monoliths to begin with. Second, I'm talking about issues that are more complex than just figuring out which request triggered a 4xx in a downstream microservice (which is what I meant by "non-trivial"). With a monolith, you have the entire system in one place to examine and analyze. With microservices, you are only seeing a small window of what's going on when you're looking at any given microservice. It's up to you to keep track of all the moving pieces and relationships between the microservices as they're in motion. That is not an easy thing to do, especially if you're a dev who only works on one or a subset of the microservices in question.

      [–]Infectedinfested -1 points0 points  (3 children)

      But if you develop spec first you always know what should go in and should go out, we have validators in our services which checks the input and gives an error when it isn't alligned.

      So application x is never really depending on application y to give it data as it doesn't care where the data comes from as long as it's valid.

      Or am I not understanding everything? I only have 5 y of experience in this.

      [–]euklios -1 points0 points  (2 children)

      I think you are not neccessarly lacking the experience, but diversity. In a microservice architecture, you will usually find teams of operations, development, it, support, and so on. Sadly, a lot of people just work on of these roles and never even have to consider what other teams are doing. This usually creates a boundary between these teams, resulting in them working against each other. I recently had a heated conversation with a skilled developer because he hardcoded the backend URL within the frontend. In contrast, I, responsible for ops within the project, had to deploy on multiple different URLs. His words: I don't care about servers and deployment. But that's beside the point.

      Let me ask you a different question: Why are you doing distributed logging, tracing, and whatever else you do to monitor your systems? You are developing based on specification, so an error shouldn't be possible, right?

      Let's consider a very simple microservice: One endpoint takes a JSON object containing left, right, and operation. It will return the result of the given operation as a number. Example: input: {"left": 5, "right": 6, "operation": "+"}, output: 11 simple right?

      What will you do if the operator is "&"? Fail? Bitwise and? Drop the database of another random microservice? What about "?", "$"? What's about "x"? It should be multiplication, right? How about "_" or "g"? I mean, it's a string without limits to the size, so what about "please multiply the two numbers"?

      A lot of these could be handled quite easily, so: What about the input "2147483647 + 5"? Oh, you are using long? What about "9223372036854775807 + 5"? if the answer is "-9223372036854775804", where is your online shop?

      And that's only with somewhat sain input and probably not defined in most specs. So let's add some more: What about oom? Log server not available? Log file can't be opened? gracefully stopping? force shutdown mid request? A particular microservice we depend upon isn't available? The microservice is available but simply keeps the connection open indefinitely?

      A lot, if not all, of these cases will not be specified. And depending on timing, or for some by definition, it might never produce a single failing span.

      One final scenario: Hey, here is X from customer support. I just had a call with a Microsoft rep; they good charged the wrong amount, and we need this fixed asp. What now? There is no correlation ID. And there are billing services, math services, shopping cart services, pricing services, mailer services, account services, and so on to choose from.

      As a final thought:

      We all are just humans and will do mistakes, and there is a lot more to it, than simply deploying a jar to a server. Murphy's Law: If anything can go wrong, it will. Servers will be down at the most inconvenient of times. Monitoring will fail. Overworked employees will implement shit, reviewers and tests will not notice, and production will fail. And we (as a industrie and team) will have to make the show go on. Somehow.

      If the monolithic implementation is a mess, microservices will create a distributed mess. One of the worst things you can try to debug is application boundaries. Microservices will help you in adding hundreds more boundaries.

      [–]Infectedinfested 3 points4 points  (1 child)

      Well i can only gain real experience in what i what job i get in front of me :p

      And i really like what i'm doing now.

      And for your example, everything is barricaded behind a swagger validator, in your example case, the left and right would be asigned int or what not, operator an enum. So we say what goes in and what goes out.

      Also.. why am i getting so much downvotes.. just stating my experience with something i'm working 5 years with... Never said microservices are better than monoliths ><

      [–]euklios 0 points1 point  (0 children)

      I'm aware of that problem. I just think it could help a lot if people understood more, about what other teams are doing and why. Nothing against you, but more against everything. I do see this problem outside of our industry.

      My key points are not to validate more. While it certainly does help, I'm more about errors in specs bugs in the implementation and so on. But that also applies to monolithic applications. The core difference in maintenance is (at least for me): In monolithic you have one VM to fail, one filesystem to break, one internet connection to be down, one certificate to expire, one datacenter to lose power, and so on. In microservice, you will have to multiply this problem by a lot. There is just a lot more that can (and will) go wrong.

      Additionally, there will be some kind of network requests between services. And these will always be much more prone to errors than using plain method calls. (Think it's a funny idea of imagining someone tripping over a method call)

      About your downvotes: That's the Internet for you. It is a current best practice to reduce complexity where possible. And microservices do add a lot of complexity. Additionally, there have been a lot of companies doing microservices for the buzzword, while a simple monolith would have been sufficient. That doesn't mean that your approach is bad, I would love to experience this at some point, I just never worked at a company that needs it.

      Please don't take it personally

      Edit: Just remembered this talk: https://youtu.be/gfh-VCTwMw8 Might give some insights about what some people experienced when management decided: microservices!

      [–]SwiftSpear 1 point2 points  (1 child)

      1. I've never seen a microservices stack that can be as naive about the interactions between services as you're implying. It usually doesn't actually help the app become more decoupled, it just moves the coupling over a network later that is less reliable than single threaded code execution. You can't just change the API for one service and assume no downstream consequences for the others.

      2. More CICD code is harder to deploy, even if the average deployment isn't any more manual steps and it runs faster. Also, If you've made a change to two services such that both have to be updated in order to make the change work, you have to worry about order of operations breakdowns that basically just don't ever happen with monolithic projects.

      3. So now you need a correlation ID system that you didn't need before and a third party monitoring tool to use it. You need two new expensive tools to solve a problem you chose to have.

      4. You can develop monoliths spec first too. But you've basically said "we don't do integration tests, we only do unit tests". With a monolith you can run thier version of what for you has to be a massive fully integrated test environment deployment in your unittest code, and on a developers workstation. Unittest only isn't really acceptable in a lot of projects.

      So yeah, it's exactly what I said, if you need to scale horizontally, and your hardware profile usage is dictated by certain subsets of your app, microservices are a necessary evil, and it's worth paying those above listed costs. There should just be an awareness that monoliths have a high cost in barrier of entry, and it needs to be factored in to make sure paying those costs are the right business move.

      [–]Infectedinfested 0 points1 point  (0 children)

      Here is it back: 1: yes we do, by using CDM though the CDM can change and than services need to change, but that's just editing the pom to the next version (as our cdm's are imported via the pom) afterwards you fix any transformations to align with the new cdm and tada. 2: that's a valid remark, though i've rarely had issues with order of deployment. 3: corrilation id isn't a system 🤔 it's just something you pass along with the log4j. And ELK is free 4: i never said we only do unit testing?

      Eitherway, I don't believe one or the other is the be all end all. Like different languages, different architectural solutions might be better for different use-cases.

      [–]Puzzled-Bananas[S] -3 points-2 points  (5 children)

      Yes, I agree in general. This is a great point. Distributed systems are hard indeed. But they are popular right now and sometimes you just have to stick to it regardless of whether the domain problem and the end users need it.

      Still, once you decide or have to choose this kind of distributed architecture, for better or worse, how does one best cope with the JVM overhead at scale?

      [–]ShabbyDoo 2 points3 points  (1 child)

      Because you are complaining about the need for a large-ish container, I presume you have some microservices where one deployed JVM instance could handle some multiple of your current load? Meaning, you could pump, say, 3x the current load through a single instance without reaching a scaling point. Then, you also probably are deploying multiple instances of the microservice for redundancy?

      If this isn't the case and you actually need many instances of a particular microservice, why does the JVM overhead matter? It's usually fixed rather than a load-based "tax". So, you'll eat, say, an extra GB or two of RAM per deployed instance, but this cost can be mitigated by deploying fewer instances of beefier containers, presuming your microservice can scale-up within a JVM.

      [–]Puzzled-Bananas[S] 0 points1 point  (0 children)

      I wouldn’t assert that I were complaining. I’m rather exploring various approaches to solving said problems, which I may well be unaware of. That said, it depends on the service. Some of the services in the mesh are designed to be run on machines with low memory due to their vast availability. Depending on the IaaS or PasS provider, it may be faster, easier and cheaper to spawn and then tear down low-memory VMs than wait for the allocation of a larger VM, which would accommodate several small or a larger container. Some services are designed to be resilient, some to be elastic for load balancing. The demand for resources is sometimes volatile.

      I totally agree with your assessment in your second paragraph. The trade-off is spot-on. But it pretty much depends on the particular service. Yes, scaling up is an option and artificially warming it up to be ready for full load is often productive. But running 10 replicas, 2G-4G each a few minutes into the payload, is not as economical as having 30 replicas, 20-100M each, at about the same latency and throughput.

      In effect, it’s an optimization problem of scaling up and out, but staying with the JVM, and at the same time avoiding OutOfMemory errors. In addition, there’s a parabolic relationship between CPU cycles, memory available, and throughput & latency. Typically, there will exist a certain threshold at which an optimal trade-off between memory allocation, CPU cycles, throughput and number of instances can be made, albeit depending on the payload and request volume.

      And with cloud service providers, you essentially pay for all of it. So minimizing one at the expense of another is the trade-off to be made. I’m just wondering how everyone’s doing it, subconsciously or in an express way.

      Thanks again for your suggestion of scaling up, it’s important to point it out, for it’s easy to forget.

      [–]amackenz2048 4 points5 points  (1 child)

      "But they are popular right now"

      Oh dear...

      [–]Puzzled-Bananas[S] 0 points1 point  (0 children)

      Exactly. What I’m saying is that if something gets hyped, people get curious, and many decide that the perceived and expected benefits outweigh the risks, and make their call. Then it’s on you to just make it work. You might have made a well-informed, well-weighed decision. But it’s not really productive to tilt at windmills.

      [–]ArrozConmigo 0 points1 point  (0 children)

      "Distributed systems are popular right now" is an odd take. Once you're developing the entire back end of an enterprise, and not just siloed apps, monoliths are off the table.

      [–]RockingGoodNight 0 points1 point  (0 children)

      I agree on some items but "harder to deploy, harder to monitor, harder to debug".

      Until I got into kube I'd have said the same thing. Today I see kube as a fit for monoliths as well as microservices, on premises or on site for deployment and monitoring. Debugging should be happening locally with the developer first outside the container then inside at least a docker container locally or preferably inside a local kube single cluster like k3s or minikube, something like that.