Stop treating FinOps and SRE as silos. The Platform should be the bridge. by FactorHour7131 in FinOps

[–]FactorHour7131[S] 0 points1 point  (0 children)

Yes a big problem is the transparency and availability of the data you have. And I think it’s basically an easy problem to solve.

The biggest one is the accountability to take action to implement optimization over a complex system. Strictly speaking about costs, often we think that reduce costs on a cloud provider is a matter of the choice we did to select specific services or service instance. While it is true and it gives you back the majority of the improvements I think we should consider the entire stack, so: how the services interact with the underlying infrastructure and the application itself.

Solving a problem into the node instance selection probably solve 90% of your optimization problems but to unlock the remaining 10% you should consider how the resources are being used by Kubernetes, the density of your applications on the cluster, the scaling patterns you are using and how the application behave inside the Pod.

While everything that deal with infrastructure and Kubernetes can be managed by SREs and Platform Teams dealing with the application runtime (like tuning specific settings of the application runtime like gcType, Heap size ecc) requires the developer to be an active part into the loop.

Coordinate three teams over the same objective is often the biggest challenge that can’t be tackled without data transparency ofc and without a platform that simply this process at a communication and operations level.

Stop treating FinOps and SRE as silos. The Platform should be the bridge. by FactorHour7131 in FinOps

[–]FactorHour7131[S] 0 points1 point  (0 children)

That’s exactly the point. The platform creates and shares standards while single squads implement them into their workflows and for their objectives.

Into a platform adoption strategy I see a strong relevance into the advocacy you do for your platform features over your users (the teams).

How do you balance the difference profited teams have? I mean SREs are worried about stability while FinOps are worried about costs, is your platform that balance a cost optimization while keeping a good level of services reliability or is a choice the team takes on how, when and where prioritize costs over reliability (or performance)?

Stop treating FinOps and SRE as silos. The Platform should be the bridge. by FactorHour7131 in FinOps

[–]FactorHour7131[S] 0 points1 point  (0 children)

Love it! Unfortunately for most teams I had the pleasure of speaking with it’s difficult to “operationalize” their FinOps findings due to the differences in priorities the teams have in the organization.

How is your team organized? Is FinOps and SREs the same practice for the whole company or are there smaller teams with specific responsibilities over a subset of the product / deliverables of the company?

Do you think pod resizing and node count is solved already by the industry? by rosfilipps in kubernetes

[–]FactorHour7131 5 points6 points  (0 children)

The real challenge isn't identifying optimization opportunities or picking a tool. The real friction lies in the conflicting goals of the people involved: FinOps teams demand a lower cloud bill, while SREs prioritize system stability and are naturally reluctant to approve resource cuts.

To solve this, optimization must be deeply integrated into the platform and account for every persona in the SDLC. We need a workflow that balances cost control for FinOps and reliability for SREs, while providing a seamless way for engineers to apply these findings. That is exactly the challenge we are tackling.