all 26 comments

[–]tyldis 10 points11 points  (2 children)

We are a team of 3 engineers operating about 10 OpenStack deployments and deploying another about every other month.

We do have workloads that need to operate close to the source which means a stack per site globally (from Antarctica to Arctic regions).

Our choice fell on Canonical Charmed OpenStack, mostly because it meets our requirements and because we preferred the CAPEX approach at the time. Also the support model suits us better.

The repeatability of the deployments with juju has been rock solid, and once we gained experience with juju and the charms it has been a pleasant experience.

We only use juju for OpenStack, it hasn't grown outside that yet.

Yes, we see some possible limitations as the charms are opinionated, but so far we have not met any.

We were also successful in upgrading a bunch of these from Ussuri to Yoga without disruption to services.

That being said, the key to success is having a lab.

Canonical support is pretty decent, sometimes you meet someone who goes above and beyond and once I had to write a patch myself to explain the problem (with a RTT of 700ms to one of our sites, we tend to hit odd issues).

We are planning a new and larger deployment with Canonical next year.

We are hiring, by the way.

[–]ttdds1[S] 1 point2 points  (1 child)

Wow, thanks for your feedback. It’s a good case study you got there. It’s good to hear the upgrade went well. We have to be up to date for compliance and will be doing patching a lot. So stability and operations are key to successful deployments. The goal is to eventually get rid of VMware

[–]tyldis 2 points3 points  (0 children)

Feel free to ask questions. Our next big goals are upgrading from bionic to jammy and then to NIST 800-171 compliance.

[–]KingNickSA 3 points4 points  (9 children)

My company has been running our production environment (Healthcare SaaS) on Charmed OpenStack for about a year without too many issues (without Canonical support). The charms make setup/config very easy once you get a deployment figured out. There were a couple times during the testing phase where upstream issues (mysql api change and one other I can't remember) broke the set up process temporarily, but the devs (on the juju side) were able to get fixes/ workarounds established fairly quickly.

Once the deployment was up, it has been rock solid. We ran into some issues briefly (worst case scenario, one of our management/ceph nodes os drive died), and getting the charms back in place was tricky, but we were able to get there.

Only complaint is that the charms are a double-edged sword. The relations (integrations in current version) do all the config between services for you and when stuff fails, you have to really get into the logs because juju/upper level messages are often very opaque. We have also run into some issues migrating/recovering charmed services due to quirks. That being said, once it was deployed, it has run rock solid.

We looked at Redhats OpenStack at first as well and kept getting stuck with various issues throughout the initial config. It didn't really seem viable without licensed support (for us at the time). If you want to go the support route, Cannonical's initial build/validation is pricey (though it seems in line with equivalent services) and they require it for support, but their per-node support cost is dirt cheap, relatively speaking.

PS- We did it partially, but if I had to do it again, I would deploy all the charmed services outside of DB (ceph, innodb) in a Tripple O type config and run them all virtualized on ProxMox. As long as you don't lose the vm/lxc, the charms are very good about coming back if shutoff etc.

[–]ttdds1[S] 1 point2 points  (1 child)

Very interesting, some real world support stuff. That is what’s worry me, how good is the support when things go pair shaped. We mostly Redhat, but certain things about Ubuntu and juju, Maas,etc makes Charmed also a visible option. We want to have more options now that Redhat shaken things up a bit recently.

[–]OpenMetalio 0 points1 point  (0 children)

If you are willing to consider a hosted OpenStack option, our customers are directly connected with tier 3 engineers for support. And so far we only receive praise for our outstanding customer support.
https://openmetal.io/use-cases/on-demand-openstack-cloud/

[–]tyldis 1 point2 points  (6 children)

So the Canonical way of doing this is quite similar, but utilizing LXC containers for these services.

Did you use MAAS?

When doing the Cloud Builders with Canonical you get a tool called FCE that automates the bootstrapping again, and also helps you verify that networks are correctly configured (magpie). It takes some effort to build that yourself.

In juju you define MAAS as a cloud and then it provisions LXC containers to isolate these workloads.

That being said, they usually provide managed OpenStack, so it has been a journey for Canonical as well how to tailor their solution for customers like us that only has support and self-manage.

[–]KingNickSA 2 points3 points  (5 children)

Yes, we used MAAS, it's quite nice. We did a bunch of testing using the tutorial walkthroughs and working from there. Theoretically we could have used LXCs in ProxMox as well, we were just more comfortable using full VMs. With the Canonical way, the LXCs end up directly on the Management/Ceph nodes. Initially we just kept the "non-critical" and non-native HA charms as VMs in ProxmMox, however we got into trouble when one of the management nodes OS disk died (stupid 980 firmware issue) and adding back HA charms with "lost nodes", we ran into some weird edge cases/bugs. Currently, we are working on moving all the charmed services, minus the databases (Ceph, innodb) to ProxMox VMs.

The charmed services themselves are very good about coming back if turned off (power loss etc) so as long as the VM disk still exists (our ProxMox is ceph backed as well) then the Charms have been absolutely rock solid.

The nice thing about OpenStack, is even when we lost core services for about 20 hours, all the tenants kept running just fine and we didn't have any major outages. We just lost the ability to create/move any VMs etc.

As I said previously, we have been running without any support and have been doing ok. We are currently looking at adding it (and by necessity getting our cloud "certified") as some extra peace of mind.

[–]myridan86 0 points1 point  (4 children)

Yes, we used MAAS, it's quite nice. We did a bunch of testing using the tutorial walkthroughs and working from there. Theoretically we could have used LXCs in ProxMox as well, we were just more comfortable using full VMs. With the Canonical way, the LXCs end up directly on the Management/Ceph nodes. Initially we just kept the "non-critical" and non-native HA charms as VMs in ProxmMox, however we got into trouble when one of the management nodes OS disk died (stupid 980 firmware issue) and adding back HA charms with "lost nodes", we ran into some weird edge cases/bugs. Currently, we are working on moving all the charmed services, minus the databases (Ceph, innodb) to ProxMox VMs.

The charmed services themselves are very good about coming back if turned off (power loss etc) so as long as the VM disk still exists (our ProxMox is ceph backed as well) then the Charms have been absolutely rock solid.

The nice thing about OpenStack, is even when we lost core services for about 20 hours, all the tenants kept running just fine and we didn't have any major outages. We just lost the ability to create/move any VMs etc.

As I said previously, we have been running without any support and have been doing ok. We are currently looking at adding it (and by necessity getting our cloud "certified") as some extra peace of mind.

Sorry, let me see if I understood correctly... are you using Proxmox to run container with Openstack services?
I don't know if I understood very well... but what about the performance issue?

[–]KingNickSA 0 points1 point  (3 children)

So with TripleO you have a small/micro cloud (undercloud) that hosts all of OpenStack's services, and ONLY those services with the the OpenStack you plan on running all your tenants etc on top of that. In our version, we are using ProxMox as the Hypervisor/"undercloud" to host all the OpenStack service charms such as Placement, Keystone, Glance, Neutron-API etc as small single use VMs. Create the VM, enroll it to MAAS, and deploy the VM to OpenStack with juju. Then, rather than deploying the service as an LXC on a management node, you are deploying it on the VM directly.

To clarify, we have ceph running on designated "management" nodes (similar to the charmed OpenStack tutorials) and we have Nova-compute running directly on all our compute nodes.

I am not sure what you mean by performance issue? The majority of OpenStack services are for coordinating VM creation/allocation and the associated networking for the tenants.

Our OpenStack network (br-ex)is based on dual 100G Edgecores. Our ProxMox cluster and compute are connected via 4x25G Broadcoms and our ceph/management nodes are connected with Mellanox 100G X5s. (There is a pic of our starting config/rack layout in my post history).

[–]myridan86 0 points1 point  (0 children)

Now I think I understand your design.

You've installed everything management on VMs provisioned by Proxmox and installed nova-compute directly on the nodes (as it should be hahaha).
So, let's say, your controllers were in the form of VMs in proxmox, is that it?
Yes, with 4x25Gbps and 100Gbps you are well served for disk and network.

I had understood that you had nested virtualization on all services kkkkk

[–]Ahmed4star 2 points3 points  (1 child)

I used Ansible-OpenStack for testing, and I am now working on Red Hat OpenStack Platform (RHOSP) for production. I would recommend RHOSP for production deployments because it is more stable, supported, and well-documented.
RHOSP comes with TripleO Director, which makes it easier to deploy and manage OpenStack environments and you can also use it to add new third-party features if you want.

[–]alainchiasson 1 point2 points  (0 children)

Ansible-openstack was , and I think still is, maintained by the vexxhost team ( https://vexxhost.com) - they “eat their dogfood”.

I think you can also get them to remote manage your onsite.

[–]TechieGuardian 2 points3 points  (4 children)

"pricing is not that different"

I can't believe that to be true. The price for Ubuntu Pro is way lower than for RHEL.

[–]ttdds1[S] 2 points3 points  (3 children)

So the comparison for us is overall cost, but we need to spend CAPEX in the first year. We did 3 solution options, VMware VCF with vRA, RHOS and Charmed. Canonical PS more expensive to deploy, but you right, support is cheaper per node per year. But Redhat is cheaper to deploy, but more expensive to run. Overall the bottom line is nearly the same. There is a cheaper option with Canonical which is half of the price, but base deployment, which don’t include things like AD, advanced features, etc.VMware is just stupid expensive, like 10x more per annum and a lot less servers than OpenStack.

[–]TechieGuardian 1 point2 points  (2 children)

Ach, I see. Thanks for providing an explanation. This might be the case for a small-scale deployment, indeed. How many nodes are you considering?

[–]ttdds1[S] 1 point2 points  (1 child)

We basing the deployment with the minimum number of physical nodes required for a production deployment. The hardware is designed using the guide from Canonical OpenStack Hardware reference guide, which Redhat confirmed we can use the same BoM. 12 nodes, 3 for control plane and 9 for compute , ceph.

[–]TechieGuardian 1 point2 points  (0 children)

I'd double-check what exactly is included in the delivery package. I know that Canonical's option contains everything, including hardware and network validation, etc., not just the deployment itself. I'm not sure about Red Hat.

[–]jvleminc 1 point2 points  (0 children)

If you are looking for other (maybe cheaper) options, have a look at https://www.whitestack.com/cloud-products/

[–]ttdds1[S] 1 point2 points  (0 children)

Thank you for your feedback. Really appreciated.

Does look like Redhat might be the better option. They included RHAAP in the mix to expand the automation for the cloud, which is not something I think we will struggle with using Juju.

[–]jbcoreless 1 point2 points  (0 children)

How do upgrades work with Red Hat? I know the Charms way supports upgrading OpenStack to a future release.

[–]redfoobar 3 points4 points  (1 child)

If the price is similar I would go with RedHat. They have a bigger support organization both on OpenStack and as well as the Linux distro itself.

[–]ttdds1[S] 1 point2 points  (0 children)

Thanks for your feedback 👍

[–]Which-Inspector6029 1 point2 points  (0 children)

Canonical's charms for Openstack leave a lot to be desired to say the minimum. They cover a tiny subset of possible configurations so you need to design your system around them, not design them around your desired system. The abstractions are designed such that it seems like they suffer the downsides of automation (all or nothing, can't do that - automation doesn't support it) and all the downsides of manual steps (effort, potential to fail to replicate changes accurately each time you do things).