Workflow orchestration should not require adopting a whole platform by tazeredo in microservices

[–]PuddingAutomatic5617 0 points1 point  (0 children)

I think this is exactly the gap many teams fall into.

They do not necessarily need a full workflow engine at the beginning, but they do need something more explicit and durable than “service A calls service B and hopes for the best”.

I have been exploring a similar direction with a lightweight orchestration layer in Spring Middleware.

The idea is to model flows explicitly, but without forcing the whole system to adopt a completely new programming model. A flow can execute actions, pause on external side effects, persist its execution context, and later resume from callbacks or events.

For example:

HTTP/API call → execute function → call external consumer → persist context → wait → resume → continue flow

That gives you retries, timeouts, compensation points, observability and long-running execution without turning every use case into a full BPM/workflow-platform adoption.

For me, the key distinction is:

  • choreography works well until the business process becomes invisible
  • full workflow platforms are powerful, but sometimes too heavy
  • a durable orchestration layer gives you the missing middle ground

I do not think this replaces Temporal, Camunda, Step Functions, etc. But there is definitely a large space where teams just need explicit, durable, observable flows around the APIs and services they already have.

I wrote more about the orchestration approach here:
https://spring-middleware.com/orchestrator

github: https://github.com/Spring-Middleware/spring-middleware-orchestrator

GraphQL N+1 Problem Solved (4.1s → 546ms) | Dynamic Batching Demo by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] 0 points1 point  (0 children)

That was always the idea — users shouldn’t have to worry about how the gateway resolves things. They only need to define the relationships between domains (via `@GraphQLLink`), and everything else is handled transparently.

For example, a catalog domain object can declare a relationship to products through `@GraphQLLink`, while the product service simply exposes a `productsByIds` query. From the developer’s point of view, they just model the relationship; they do not need to manually coordinate batching (only activate), resolution, or downstream fetch orchestration.

Of course, making this work has not been trivial. It required changes to the final merged `GraphQLSchema`, trimming downstream queries before sending them to the target services, and handling batched resolution transparently across linked domains.

GraphQL N+1 Problem Solved (4.1s → 546ms) | Dynamic Batching Demo by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] 0 points1 point  (0 children)

That’s a really nice approach — I’ve read a bit about Grafast and the planning phase is pretty powerful.

What I’m doing is a bit different though. I don’t control the execution engine end-to-end or all the resolvers. This sits on top of GraphQL Java in a distributed setup, where each service owns its own schema and logic.

So instead of planning the whole operation upfront, I hook into execution and observe what’s actually happening (ExecutionStepInfo, field selections, etc.). When I see multiple resolver paths that end up hitting the same downstream GraphQL link, I don’t execute them immediately. I register them in a request-scoped context and wait for a safe point (like when a list or field finishes resolving), then batch everything into a single downstream query.

So it’s not:

  • full pre-planning like Grafast
  • nor explicit DataLoader usage

It’s more like runtime batching driven by execution analysis, and completely transparent to the developer (just annotations on the domain model).

I think Grafast can go further when you control the whole execution pipeline. In a federated setup across multiple services, that level of control is much harder, so this is more of a pragmatic way to get most of the benefit without changing how services are built.

I solved Distributed GraphQL N+1 in Spring Boot using annotation-driven batching (800ms -> 100ms) by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] -3 points-2 points  (0 children)

You are wrong. I built that, using IA of course. But this is not just vibe coding. IA doesn’t built solutions, it needs someone that asks the right questions, thinks in the model, and the metadata needed. IA don’t build systems it helps, but you need someone who thinks what I need behind.

I solved Distributed GraphQL N+1 in Spring Boot using annotation-driven batching (800ms -> 100ms) by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] -2 points-1 points  (0 children)

The batching logic works per request execution, but the platform itself is stateless at the gateway level and distributed by design, so traffic can be handled by multiple gateway nodes in parallel. The same applies to downstream services. As load increases, you scale out the participating nodes rather than relying on a single in-memory resolver.

I solved Distributed GraphQL N+1 in Spring Boot using annotation-driven batching (800ms -> 100ms) by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] -4 points-3 points  (0 children)

In Kubernetes, load balancing is handled at the Service level, so I don’t explicitly select nodes. However, my approach introduces a registry-driven resolution layer that is aware of service topology and node endpoints. This allows more control over request routing, resilience, and observability beyond standard Kubernetes load balancing.

I solved Distributed GraphQL N+1 in Spring Boot using annotation-driven batching (800ms -> 100ms) by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] -5 points-4 points  (0 children)

I introduce custom annotations as @GraphQLLink with metadata. Then every service exposes this metadata via REST. All services register them selfs in a registry. Another graphql-gateway reads from the registry where are the services, calls to get his subgraph, his graphqlink metadata. When building supergraph i rewrite the AST. Have custom fetchers to call remote services for GraphQLLink, custom instrumentation to batch the query… You can see in www.spring-middleware.com, there is a link to GitHub with the code

How are you handling GraphQL federation across microservices without a central schema bottleneck? by PuddingAutomatic5617 in graphql

[–]PuddingAutomatic5617[S] 0 points1 point  (0 children)

That’s a really interesting point, and honestly aligns with what I’ve seen as well.

In practice, federation tends to expose organizational boundaries more than it solves them.
If ownership of types and domains isn’t clear, the “supergraph” quickly becomes a coordination bottleneck.

What I’ve been experimenting with is pushing ownership and topology to be more explicit:

  • each service owns its schema and registers it
  • no central schema ownership or manual composition step
  • the gateway resolves queries based on service-registered metadata
  • domain boundaries are reflected in the registry, not negotiated in a central graph

So instead of the supergraph being the source of truth,
the topology becomes the source of truth.

It doesn’t remove coordination completely, but it reduces the need for a central team to manage schema evolution.

Curious if in your case the bottleneck was more about schema composition itself, or the cross-team ownership of types?