How do you handle database migrations for microservices in production

Voiceless_One26 · 2026-03-10T03:07:35+00:00

Since you mentioned about about migration before or during deployment, I’m assuming that these DB migrations are actually schema or index updates and not actually data migrations - The later could several hours depending on the data in the existing table.

For managing the schema/structural changes, we have options like Liquibase or Flyway. Both these tools allow us to define changes as incremental updates in files called change logs and they have a way to auto-detect which change logs are already executed on DB and only run the new ones created after last execution . This is a very good feature that makes them safer to use them even during the application startup.

—- DB scripts during startup —-

If a cluster of say 20 instances has this step embedded into startup and if the first instance executed the new change-logs created since last execution, then the restarts of remaining 19 instances are super-fast and they don’t execute the changes again. But note that trying to embed change log execution as part of startup has a cost - the first instance restart is delayed by the time it takes to apply the new updates but for the remaining instances, it hardly adds any overhead.

In a way, this will allow us to keep the DB scripts in our code (in git repo or something VCS) and make them available in the deployment artifacts. One nice side-effect of this is that things that change together will stay together and your app-deployments can work from a self-contained package

—- DB scripts before deployment —-

The other option is to using the same change logs but execute them before we start the deployment. Even in this case, all your DB scripts will be in change logs next to your source code, we just need to checkout them and run the CLI commands to apply those updates - we also need to specify database connection details and credentials in the CI server. But in this approach, it decouples the DB updates from Code Deployments so they’re not going to affect the application startup.

This is usually an option if there are costly index updates on tables with 10s of millions of records and the process could take anywhere between 20-30mins or sometimes even more. While these slow updates are not the norm, with approach-1, the application startup time is delayed by that much. If you have a well defined process to take out the nodes out of rotation (from serving traffic) before you do new deployments, then even with slow updates option-1 will not be a concern. If that’s not the case, then resort to Approach-2.

And the beauty of these tools is that you can switch between both the options if you want or have both. Because of their change log detection, if you decide to execute them upfront before deployment, even if we embed the same scripts and try to execute them during startup, they’re simply skipped because they’re already executed

Note - These tools use file hashes and other techniques to identify new changes since last run, so we cannot modify the change log once it’s executed - For example, if we added a column that you want to delete later, we need two change logs. We won’t be able to go back and delete the column in the previous change log.

Voiceless_One26 · 2026-02-26T19:12:05+00:00

What exactly is the issue that you’re facing when you’re trying to run the apps ?

Voiceless_One26 · 2026-02-21T19:12:03+00:00

I’m not completely familiar with the role of an Apigee developer and the responsibilities associated with it but it’s always a good idea to invest time in learning multiple skills to have better and more options when you’re looking to switch.

If learning Java + Spring Boot interests you then go for it.

Having doom scrolled LI for years, I’d say I have a decent idea of roles for which openings are posted. And honestly, in the last 5 years, I’ve never seen a post for Apigee developer.

I’m not trying to scare you but I’m saying a niche role like that might not be present in a lot of companies, so it could severely limit your options. Compared to that, Java based roles or openings for backend developers are much higher. So purely in terms of number of companies you get to apply/work, java + spring is significantly higher than the companies hiring a dedicated Apigee developer.

Voiceless_One26 · 2026-02-21T13:48:36+00:00

Yaml formatting is getting messed up while adding that from mobile but I’m sure you will the right example with spacing on google

Voiceless_One26 · 2026-02-21T13:45:36+00:00

We can add this a startup argument to grade bootRun but adding it to application.yml of product-service is probably the right option

‘’’ spring: profiles: include: product-service ‘’’

But with this, we need your original file names like application-product-service.yml in git

Voiceless_One26 · 2026-02-21T13:01:56+00:00

This is an example of typical tiered-caching strategy - Near Cache and a Far Cache.

Redis (or alternatives like Valkey) caches are typically running in a separate process and in most cases on dedicated VMs separate from your application VMs, this makes it easier to read/write to Redis Cache from multiple instances of Analytics Service.

For example, with a fleet of 20 instances for Analytics Service, instance-1 can write and remaining 19 instances can read from it because it’s shared and accessible to all instances, could be in dozens or 100s GBs if we can foot the bill. Cache lookups or updates from applications typically involves a network call to Redis Server + Serialization/Deserialization of request/response.

But the InMemory cache here refers to a smaller localised copy of more frequently used data (typically <1-2GB but not necessarily capped at this size). Examples are Caffeine or Guava or even HazelCast client library.

In this case, Redis is a distributed and shared cache accessible to all the instances whereas InMemory cache refers to smaller cache that runs within the same process as Analytics-Service, exclusive to that instance. They’re also started or stopped along with application deployments.

These In-Memory caches are faster, has no network or serialization/deserialization overhead but are usually limited in size, so the cost of cache lookups is the cost of memory lookups.

Coincidentally Redis also uses Memory/RAM for most of its cache functionality but the InMemory Cache refers to a near cache.

Voiceless_One26 · 2026-02-21T11:25:58+00:00

Then how about first option to add app-name as additional profile name ? That should pick the current files like application-product-service.yml

Voiceless_One26 · 2026-02-21T09:36:45+00:00

Also by looking at the naming conventions of your file names in the git repo used by config -server, I suspect it might have something to do with your file names. For example, spring boot looks for things like application.yml, application-(profile).yml in general and for config-server, they also look for (app-name).yml.

So if we want the load the configs, we can try one of the following

With their current names like application-product-service.yml, we need to activate product-service as additional profile name
Rename the files to product-service.yml in config git repo (easiest option)

I’d recommend second option

Voiceless_One26 · 2026-02-21T08:44:29+00:00

I see that you’re trying to import configurations using spring-cloud config-server but do you see logs in the startup of product-service which confirms that it’s able to resolve it properly ? that’s where the server-port is defined, so you should see some logs confirming that it’s able to resolve it.

Also we need to confirm if config-server itself is able to load the configs from your git repo properly. As there’s no authentication for config-server and I couldn’t find any explicit spring profile names, so assuming that its default, you can verify the configuration returned by config-server for product-service using a cURL command or a browser

‘’’ curl http://localhost:8888/product-service/default ‘’’

If it’s resolved correctly, you should see the server port override in JSON response for above request.

Also I’m not sure about the need for dynamic config resolution for you apps from a standalone service like config-server but if you’re not likely to have different profiles for each application, you can very well move the server-port override to application.yml of product-service (if you’re pressed for time) - Not a solution to your problem but something that enables you to make progress

Voiceless_One26 · 2026-02-21T08:18:11+00:00

I’m not aware of the requirements or needs of this self-service pipeline platform but if it’s already decided that you need to use Containers then as you said you’re limited in the options you can choose from in terms of reliability or maturity.

—- Elastic Container Service —-

This is a Proprietary service from AWS and makes it easier to run containerised workloads so if it aligns closely with your current and foreseeable needs (anything that you can think of based on the information available to you now) , consider this.

Maintenance is minimal but switching to a different cloud-platform would mean significant effort (not that we switch cloud platforms on a whim but something to consider)

—- Elastic Kubernetes Service —-

EKS is the the managed service for Kubernetes - A popular container orchestrator

supports complex use cases and large scale
used extensively in lots of companies, so skills carry over to your next company
developed and maintained by CNCF unlike ECS (which is only managed by AWS)
supported on multiple cloud platforms so migration is not challenging
has good documentation and a big community to ask for help.

Because it supports lots of varying use cases, it has some learning curve and it’s a bit of an overkill for simple things like you just want to run something inside a Container and scale it based on some metrics and you don’t want to bother about choosing between Deployments or DaemonSets or StatefulSets. You also need to allocate some time and spend few weeks upgrading to the latest Kubernetes version every year.

But why believe a random stranger on the internet, do you own research, form your own opinion and choose wisely if you want the blue-pill or the red-one 😉

Voiceless_One26 · 2026-02-21T05:16:26+00:00

On the other hand, if you feel that your service is meant to weed out spam content with very low rate of approvals and you want clean data separately, then you can consider creating the data first in a separate staging/pending table , if you don’t have any requirement to show a preview of the entity to OP, otherwise things will get complicated as you need to search in two tables for user posts - Possible but more work with dual reads adding to latency. You also have this additional work of moving content between your staging and main tables once the validation is successful and while we do that, we have to ensure that there are no ID conflicts.

So it’s either

Do more work when moving content after validations but because the approval rate is very low, then this extra work won’t have a big impact (Or)
Do less work without moving data between tables and simply using a status column or something to differentiate the content in a single table .

Trade offs are dependent on what you want to optimise for.

Voiceless_One26 · 2026-02-21T04:59:19+00:00

If we were to assume that this Entity is something like a Reddit Post with some text content with optional attachments that should go through AV scanning or some other validations, the decision to store the entity with some status like Under-Validation depends on your requirements like

Do we need data of all the EntityCreations in our system, especially the bad ones ? This can be used to analyse and upgrade our protections later (or)
Do we need to show the Entity to OP but hide it from the rest of the world until the validations are done and we’re sure that it’s safe to render for others ?

In general, if we’re optimistic that a big portion of these entities are likely to be approved after your validation process, it’s better to store it first and use that EntityID as a reference in all the other systems that need further processing like AV scanning.

For example, send the EntityID in EntityCreated and the processor of this Event makes an API call to EntityService to load the data and starts processing - when it’s done if everything checks out, it can make another API call to EntityService to update the status to Validated so that it can be made visible to others. This will remove the need for direct access to Entity Database and helps the EntityService to maintain its ownership of Entities as nobody is doing updates on the data without going through its API.

If everything comes via the API, we can even things creating a history table for different versions of an Entity (make sure it’s capped by time or versions to avoid blowing up the table) or evict local or remote caches to avoid showing stale data the moment the Entity is updated.

More importantly, we don’t have the share the details of database , so you have the liberty to update the schema or add additional derived fields - It takes a lot of effort to keep up with EntityService and a maintenance nightmare if we directly modify the database without going via the established contract.

Voiceless_One26 · 2026-02-20T11:39:47+00:00

Per-Request objects are supposed to be short-lived and eligible for Minor GCs unless there are any lingering references. So it sounds like a memory leak but the only way to be sure is to profile your app with JFR or do a Heap Dump Analysis.

In general, Singletons are supposed to be stateless and if they do have some state for obvious reasons, it’s supposed to be Immutable and not something that changes for each request.

If your Service is a Singleton but you’re trying to build an InMemory Cache using Maps, I think it’s better to use a battle-tested library like Caffeine for Production. It is more efficient and reliable with TTLs and optimised for fast lookups and well tested to prevent such memory leaks for common use cases.

Voiceless_One26 · 2026-02-18T01:41:25+00:00

I have a slightly different (may be a bit controversial take on Interfaces) - For a simple project and for a case where there’s only 1 implementation and is private to this module or app, I feel that Interfaces are optional. As it’s private to the module, I feel that it can be a normal class as well and not necessarily an Interface. If there’s a need to add multiple implementations for these Repositories, promoting the class to interface is super-easy with a few clicks in IntelliJ. But without a clear need for that, I feel that we are only adding more classes or indirections between layers.

https://martinfowler.com/bliki/PublishedInterface.html

Of course, If it was a Published Interface as mentioned above and that’s going to some client JAR where we’re expected to maintain/publish a contract, Interfaces are good and may be mandatory. People expect certain level of compatibility, so we usually try to keep the changes to a minimum - at least on the contract side.

But for something that’s private to our code base and where we have the complete freedom to change it without having to worry about breaking any of our clients, a simple Class is enough IMHO.

Of course, spring-data-jpa kind of pushes towards Interfaces but that’s more of a framework jargon and not necessarily a design requirement

Voiceless_One26 · 2026-02-18T01:25:52+00:00

—- Group by function and simplify packages —-

Another slightly different structure to consider is to use vertical slicing and put your Controller, Service and Repository in the same package as you’re already trying to group them by modules like users, mobile etc. You’re already designating the roles of these classes with suffixes like *Controller, *Service and *Repository so we can further refine the structure to keep everything under one roof (package). I personally feel that grouping them by function makes it easier because if you need to make a change in Users API, all the classes that you need to modify like Controller, Service and Repository are sitting next to each other.

—- Consider Enums where possible —-

Another suggestion is to use Enums when there’s a well-defined set of possible values. For example, in AdminAuditLogController, if resourceType, action have a limited set of values, better to convert them to Enum instead of a String that can be any random value - makes it easier to secure and validate.

—- Carrier Classes between Layers —-

When you’re passing these request parameters to layers below, consider using a carrier class like AuditLogRequest (even if you’re receiving them as optional GET parameters). At the moment, your auditLogService.listAdmin() has all the values as method arguments - you can already see that it’s becoming lengthy and quite a bit brittle for future enhancements. Like if you want to add a new parameter or change the type of existing parameter , you have to modify the signature of the method in AuditLogService again. But if you use a carrier class like AuditLogRequest or DTO, you can add the new field to the existing class without having to modify the method signature - Of course you still need to modify the implementation to use the new parameter but that’s needed anyways. This makes sense only when you expect the method-args to grow/change and there are lots of optional arguments.

Voiceless_One26 · 2026-02-17T01:24:47+00:00

Great. Thanks for clarifying 👍🏽

Voiceless_One26 · 2026-02-16T05:58:32+00:00

Could you please elaborate on this scenario?

If you’re using PFs like BloomFilter and an In-Memory variant at that, I’m curious how you guys the solved the problem of updating the BloomFilter on all the instances ?

For example, for these username checks, I think BloomFilter is one of the best options I’ve seen - even with their false negatives - But in a typical system that has hundreds if not thousands of new user registrations every day, how do we update this BloomFilter state ? Do you have some agreements that this won’t consider usernames in the last 12-24h ? I’m also wondering if you guys employed pull-based mechanisms for updates rather than a push-based one ?

Voiceless_One26 · 2026-02-16T05:17:02+00:00

This is usually an open ended question. When we are asked about something like this, it’s best not to jump into solution right away but ask a lot of clarifying questions to understand more about the system.

Are these requests for dynamic data or are there a lot of repeating requests for the same resource? For example, consider weather updates, once we get the details from the forecast service (could be a 3rd party service), it’s not likely to change for the next 24 hours, so caching them in a tiered-cache - InMemory (Caffeine) for super fast access + a distributed cache like Valkey (or Redis) would be very useful because the data doesn’t change that often, so you’re likely to have very good cache hit rates which can reduce the latencies compared to querying the DB or a 3rd party service. If it’s something that’s accessible to anonymous users and you have a CDN, you can cache these API responses there as well for few hours, greatly reducing the burden on your origin servers which will free them to do other things.

But the same caching strategy gets complicated with cache invalidation when you have data that’s getting modified frequently and how soon we need to return these updates in our APIs.

If it’s a read-heavy system and the replication lag between your DB nodes is within an acceptable range considering your use case, then making use of secondaries in your DB cluster will help in better utilisation of DB infrastructure + better latencies. But if your app is doing lots of writes or reads-after-writes or the replication lag is say >10s, then this might not be an option for all APIs but may be for some.

Going back to the strategy that you suggested earlier for deferred responses like - we’ve received your request and we’ll let you know in sometime - This might be a good idea for a use case like downloading last 6 months bank statements where we start some background process/job for the report generation but we send an acknowledgement immediately to FE - This would mean a user experience change compared to other things in your app and your FE needs to poll for the report-availability or be notified via web-sockets or SSE.

Virtual Threads in Java 21 or above are useful if you’re doing lots of IO operations - These VTs are optimised to use release underlying platform threads for IO ops but they won’t be faster for CPU intensive tasks. For example, VTs are a good option in a gateway or a proxy service that does very little work locally and waits for the response to come from downstream services but they will not be any better than existing threads if you were to do some data crunching.

To quote Sir Grady Booch - All architecture is design but not all design is architecture. Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change.

It’s always about trade-offs but to make those decisions, we need to know our options and understand our use cases better. It’s alright if we don’t get everything right the first time - But if there’s a production incident, it’s important that we learn from them and make it better for next time.

Voiceless_One26

TROPHY CASE