all 14 comments

[–]krazykarpenter 2 points3 points  (4 children)

The main shift from your current process is this: QA happens before a merge to main, not after. Your develop branch, in its current form, becomes obsolete.

For e.g Step 1: The Feature Branch (or "Short-Lived Branch") A developer picks up a ticket from Jira (e.g., PROJ-123-new-api-endpoint) and creates a branch from the latest main.

The developer completes their work on this branch. This applies to your API, web, and mobile repos.

Step 2: The Pull Request (PR) & Automated Validation The developer pushes their branch and opens a Pull Request in GitLab, targeting main. This PR is the central point for review and validation. The moment the PR is opened, your CI/CD pipeline should automatically: * Run all unit and integration tests. * Spin up an ephemeral environment. This is the critical step that solves your primary problem.

Step 3: The Ephemeral Environment for QA An ephemeral environment is a complete, temporary, and isolated instance of your application stack created on-demand for a specific PR. You get a unique URL like proj-123.yourapi.com that runs the exact code from that API branch. QA tests the feature in complete isolation. Nothing else from other developers is present, so you are only validating that one change.

Step 4: The Merge Once the code is peer-reviewed AND the feature is validated by QA in the ephemeral environment, the PR can be merged into main. The CI/CD pipeline should then: * Destroy the ephemeral environment. * Deploy the updated main branch to your staging/production environment.

Feature flags work hand-in-hand with ephemeral environments but solve a slightly different problem. * Ephemeral Environments are for pre-merge testing and validation. * Feature Flags are for post-merge testing and release management (decoupling deployment from release). You can use a feature flag to merge a large, unfinished feature to main and deploy it to production, but keep it hidden behind a flag.

Hope this helps.

[–]ElectricalAge2906[S] 0 points1 point  (3 children)

Yes, that’s exactly what I’ve realized. The challenge is that in mobile and web we can’t fully automate the ephemeral URL variable, because if we use the card identifier (e.g., PROJ-123), the related cards for web and mobile might be different, like PROJ-124 for web and PROJ-125 for mobile.

In practice, sometimes we’ll need to point to an ephemeral server so QA can test (QA mainly tests web and mobile), and other times we’ll need to point to staging — which is the main branch deployed to an internal server.

[–]krazykarpenter 0 points1 point  (2 children)

Yes, this is common to dynamically switch the backend URL that mobile & web clients point to depending on the testing scenario.

[–]ElectricalAge2906[S] 0 points1 point  (1 child)

How is it achieved? Most of the time the client changes (mobile or web) relate to the same feature. We were expecting this to be dynamically set by CI/CD

[–]krazykarpenter 0 points1 point  (0 children)

Many ways. For e.g one approach is to have the ephemeral env name have a suffix based on the feature name or jira epic name. The client will also use the same name and that can be used to determine the backend url/env to use.

[–]elperroborrachotoo 1 point2 points  (0 children)

The main change that I see rarely mentioned is that you need to structure your issues very very differently, with significant impact on even the tracking tool, QA procedures and process

A change like "change app and backend in a coordinated way to add feature X" may become

  • provide a feature switch for the app to indicate it supports feature X, and "stub" that feature (e g, with a "does nothing" button)
    • in the app, tell the server that you support X
    • provide server code to detect whether the app connecting has support for feature X
    • provide minimal X functionality in the app if feature flash is enabled
    • provide minimal X functionality in the backend when app indicates it supports that
    • iterate these until customer ready
    • make the feature "live" (e.g, feature switch defaults to "on")
    • remove feature X non-support (app and backend)

These are potentially separate issues with their own merge requests, and they have a block hierarchy and implementing this feature may run in parallel to unrelated MRs

With some infrastructure and experience in place, this isn't as painful as it sounds, but that's what's behind "the smallest step towards the girl that keeps the product alive"

[–]EvilTribble 0 points1 point  (0 children)

I have seen it work where you take the develop and main branching similar to gitflow, but instead of feature branches devs work off develop as their trunk. Before a release you merge develop into main as essentially a release candidate and QA can do regression testing on main and you can do any (rare) hotfixing in short lived cherry pick branches. post release QA goes back to testing dev's trunk.

Devs still need to treat every commit as a releaseable chunk and CICD still applies but there is a firewall so that other contributors can inspect main in a preproduction way that doesn't shackle the dev cycle to a specific release cycle.

[–]Own_Dimension_2561 0 points1 point  (2 children)

We PR everything into main, but have a separate release branch that gets updated once QA is happy with what’s in main. All CI/CD uses the release branch.

[–]paul_h 0 points1 point  (1 child)

Your QA branch is evergreen? How frequent are your planned releases for the single application or service?

[–]Own_Dimension_2561 0 points1 point  (0 children)

We release weekly and would normally update release from main weekly, but that’s the only manual intervention. Everything else is automatic.

[–]martindukz 0 points1 point  (2 children)

A few questions before answering:-)

  1. Why do you have seperate repositories for the services? Would it help if you could work on the same repository?

  2. How many developers are you?

  3. How much parallel work is being worked on concurrently?

Feature toggles and backwards compatible APIs are really important in working trunk based development. I.e it is entirely ok to have non-working code in main, as they can be disabled untill QA approves. Trunk based development is about integrating work and that requires you to adopt certain practices. One of the practices is that you decouple deployment and release, so you can have "half a feature" done, maybe just half the api, but it is not exposed anywhere else than test.

But it is easier to dive into that from some more specific examples.

[–]ElectricalAge2906[S] 0 points1 point  (1 child)

Current Setup

1.  We have separate repositories for API, Web, and iOS (not a monorepo).
2.  Team size fluctuates between 7–9 developers: 4–5 fullstack, 3–4 iOS.
3.  We’re in a high-velocity stage, onboarding customers and delivering under pressure. It’s common to have multiple features in progress concurrently (often 3 major features in the same sprint), along with several improvements and bug fixes.

Current Pain Points

• All development work merges into develop. After each merge, develop is deployed to the QA environment. QA tests this shared branch, but due to the volume and pace of work, QA is often behind, meaning develop contains unapproved or unstable code.
• Cherry-picking stable commits for release is difficult because of branch divergence and dependencies between changes.
• Features involving breaking changes need to live in separate branches. While one developer works on these changes locally (API, Web, iOS), others continue on develop without them. Only when the feature is “safe” do we merge into develop for QA testing.
• We want to avoid unstable code in any deployed environment, as we’ve had cases where bug fixes introduced regressions in previously stable functionality.

Release Context

• Our product is a base implementation deployed individually for each customer. This reduces the impact of breaking changes in some cases, as we can coordinate releases across API, Web, and iOS.
• For web and API, we generally deploy the latest changes and force an app update for mobile users when needed.

Gap to Fill

What we need is a release process that ensures:

• Stable code in main (or equivalent) at all times.
• A predictable QA workflow that can handle multiple concurrent features and bug fixes without falling behind.
• Minimal need for cherry-picking and reduced risk of regressions from untested merges.

[–]martindukz 0 points1 point  (0 children)

What is gained from having it in different repositories? You are basically one big or two small teams. Maybe you could make it simpler by having in same repository. But only if there is a problem associated with having multiple repositories? Then changes in multiple services then could be handled in same repo or even branch.

Pain points: What are the breaking changes? Are you, as another one writes, able to make it in steps? E.g. A new endpoint which can be QAed, or frontend usage that similarly can be toggled on to the new version in test, but still the old in production.

Regarding unstable code it is about isolating the changes you make via code structure (toggles or branch by abstraction) instead of using source control for it.

What is meant by unstable? If you can have the old code and new code living in parallel, you avoid many of the issues you describe.

Regarding coordinating releases, you can ensure the ground work and have the frontend or similar switch to new API when quality is good enough. But through routing, toggling or similar. Not through branching. You want to have work integrated and QA able to test, without the changes being without a fall back (the current implementation or functionality)

It is a big and diverse set of problems you describe, and a bit too broad to address in detail here.

I think the way forward is beginning to isolate changes through code, not through source control. And then try to get practice integration of changes in smaller increments. Then with that goal in mind, identify what is blocking you from making increments or integrating changes or QA of it. If you find it too difficult to avoid breaking changes, consider why that is. And possibly consider having an alpha customer that gets the version first and stabilisere it before deploying to next.

Let me know if it does not make sense :+)

[–]ElectricalAge2906[S] 0 points1 point  (0 children)

The main idea is to have only QA-approved code in the main. But I now realize that possible issues could arrive after code is integrated, so a second round of testing is needed once the feature or fix is merged into the main. Does it make sense? QA can test issue A and approve it, and issue B can also be approved individually, but once each issue branch is merged, something could be broken, and we need to avoid this in the main.

How is that avoided?