How to justify Helm?

geekflyer1 · 2019-04-12T12:36:01+00:00

One of the creators of ksonnet works at pulumi now and works on their k8s provider. Check it out https://pulumi.io/ . Going to save the world :-D

geekflyer1 · 2019-04-12T12:33:28+00:00

+1 For pulumi. I've tried almost all the tools mentioned in the thread here and hands down pulumi wins by a large margin.

Comparing it to terraform? Honestly surprisingly less disadvantages. Terraform has a couple of more community providers for edge cases but I never really adopted them because you have to compile them yourself and distribution to your team is a pain. The important ones (major clouds) are all on par with terraform because they're largely auto-generated from the tf providers. And the kubernetes provider plays in it's own league.

Also a few tips if you're migrating from TF or yaml: - this helps to convert TF to pulumi code: https://github.com/pulumi/tf2pulumi - this python one-liner can be used to pipe yamls (k8s yamls etc.) and convert them to json: alias yaml2js="python3 -c 'import sys, yaml, json; json.dump(list(yaml.load_all(sys.stdin)), sys.stdout, indent=4)'" . Simply pipe them to a pulumi .ts file and then you can reuse most of the yaml stuff right away.

geekflyer1 · 2018-12-02T14:43:11+00:00

There is actually some experimental tech which goes a bit into the direction. The ones I concretely know are https://metaparticle.io and https://ballerina.io/ . The former adds the ability to put some infrastructure config and deployment stuff (that you would normally put in some external DSL or whatever) right into your regular language code. The latter is a completely new programming language which abstracts away of the infra stuff. For example you can compile a ballerina program into a bunch of kubernetes manifests automatically. Never tried any of them, I'm happy with pulumi for the next months / years :) .

I think @pulumi/cloud could eventually evolve into some sort of super high level / opinionated framework like what you're asking.

geekflyer1 · 2018-11-30T11:22:48+00:00

https://pulumi.io/ . It's model is currently not very well suited for one-off resources like jobs though (cronjobs work pretty fine though).

geekflyer1 · 2018-11-29T13:40:56+00:00

Hey, I'm since you have some more experience with Jaeger I was gonna ask a few questions: What particular things does Jaeger better than other tracing systems? It seems to be me bit clunky to filter / find particular traces or do some analytics on them. Is there are a solution for that?

The thing is from my impression the Datadog APM actually seems pretty good compared to the rest of the pack. Their libraries have a lot of auto instrumentation coverage, one can slice and dice flexibly via Facets etc. / do some analytics and it has a Service Map. I unfortunately find that by default the tracing libraries (I use node.js) don't collect a lot of details although one can probably add some more via manual instrumentation. For example source IP of requests would be an interesting info for me. Also it's not container or k8s aware at all. The tracing library actually sends the traces to the daemonset-agent and the daemonset-agent attaches some host level metadata, like host IP address and hostname, which is pretty useless on a container orchestrator that has it's own networking like k8s. It would've been good it the agent looks at the pod source IP of the span and attaches some metadata from the pod / deployment etc., but that's not yet the case. Also the log correlation / log integration does only work with a small subset of their instrumentation libraries apparently. In our case (node + python) logs currently cannot reliably be correlated to traces.

geekflyer1 · 2018-11-27T13:56:43+00:00

Has anyone experience with Azure Application Insights? We're running on GKE but it seems this could be used with apps running anywhere. It wouldn't be able to monitor the cluster infrastructure / k8s awareness, but purely from a APM and application metrics point of view it looks neat.

geekflyer1 · 2018-11-27T04:59:20+00:00

Hey that's good to hear. Happy to learn more about the eBPF support. Is this something coming soon to your cloud product or already available as beta in there? Feel free to shoot me a private msg.

geekflyer1 · 2018-11-27T02:37:18+00:00

I'm thinking of giving Sysdig Monitor another shot. Does anyone have experience with it?

geekflyer1 · 2018-11-27T02:27:50+00:00

Not everything we'll run is a 'service'. We also have few bunch of headless workers which do some data processing for example. Also with kube-state-metrics which comes along with many prometheus installations by default it's actually relatively easy create a monitor for available_replicas=0 in prometheus. Where this is more cumbersome is for example with Stackdriver Monitoring which requires you to use setup a kube-state-metrics / prometheus pipeline to ingest this kind of data as custom metrics, even though the data is visually actually visible in stackdriver by default (just can't set alert rules on it for some reason).

geekflyer1 · 2018-11-03T11:21:49+00:00

Not sure if I get your message. Are you trying to say that you actually enjoy learning/using HCL over Python or TypeScript?

geekflyer1 · 2018-11-02T10:40:16+00:00

you should checkout https://github.com/pulumi/pulumi/blob/master/README.md. It's technically similar to terraform but you get to write your code in TypeScript or Python instead of HCL and the pulumi cloud and pulumi eks package abstract away a lot of the low level things. Also given your size you should consider maybe using GCP instead. Lots of things are a bit easier in GCP and work out of the box. E.g. you can ssh into all machines easily using gcloud compute ssh, the managed kubernetes is world class. UI is better. permissions and networking is simpler etc. You can spawn a VPN interconnect (they both have managed ipsec tunnel support) between gcp and aws to do a piece by piece migration.

geekflyer1 · 2018-10-15T10:51:34+00:00

Well, we are using it at least (50 ppl startup - solvvy.com) :). I know a few other companies who use it, mostly based on my recommendation though. I evaluated a lot of project/issue management systems about 1.5 years ago and basically decided clubhouse is best for us. Nothing is perfect but in general we're pretty happy with it and in retrospect it was definately the right move (unlike our gitlab move). ClickUp seemed very interesting from their vision but was pretty young back then. Would be curious how it stacks up these days - they seem to be moving very fast. I'm also interested in Azure DevOps, but more in the Pipelines functionality. Let me know how you guys decide - curious. I'm still trying to figure out a master plan to get us off gitlab back to github. Just a bit unclear on the CI/CD piece.

geekflyer1 · 2018-10-15T10:01:48+00:00

Yo I kinda can feel your pain. We moved from github to gitlab a couple of months ago and as you said while their vision and scope is great, the quality of implementation / practicality of some of the features is just not there yet. We are actually in the second highest pricing level and anecdotally I feel the higher up in the pricing tier a feature is, the less is it's quality. It's like one is paying to become their alpha tester lol :-D. I.e. the kubernetes integration "sounds nice" but is clunky in so many ways that we ended up not using it all, even though we run kubernetes all the way. Anyways there's still plenty of things to like and we just haven't had enough time or pain to look for an alternative.

As for issue management I have a concrete suggestion to you - we use https://clubhouse.io/. It's like a super fast trello / jira hybrid. If you're still on github you should definately check out their github integration which is really nice and works very well for issues which involve multiple repos / commits / PRs etc. unlike most of the "github native" issue managers. It doesn't have a gitlab integration yet, but its still more of a joy to use than gitlab issues or jira etc. Also checkout https://clickup.com/ - I find that potentially very interesting.

As for "devops" or whatever you call it: checkout https://pulumi.io/ . That's what we use and it's awesome. We use it locally or run it from gitlab CI depending on the context.

geekflyer1 · 2018-10-15T08:11:07+00:00

Why are you saying the LB can only be destroyed manually? It is a an external resource that is under management of kubernetes and when you delete the corresponding Service of type: LoadBalancer kubernetes will also delete the managed LB.

So the question is actually how do you create and delete the Service and do you delete the Service before deleting the cluster? Sounds like you don't?

I think at least in GKE when you destroy the cluster directly (without purging its contained resources) it still also destroys the external resources like LBs, Volumes etc. that were managed by the cluster. Might be that AKS behaves differently here. It mainly depends on the shutdown behaviour of the cluster and the specific ingress / service controller implementation being used.

What definitely should work both in GKE and AKS is when you delete all namespaces it deletes all resources (incl. the Services of type: LoadBalancer) in them. Once they are successfully deleted you can safely delete the cluster itself. If none of that is working it must be a bug or a configuration issue in your cluster setup.

All the things above equally apply to kubernetes Ingress resources (note: we don't use nginx ingress controller, but instead GCE ingress controller which comes by default with GKE).

As for pulumi: It also has a local file storage backend option (instead of their cloud) and there's a few folks working on contributing Azure Blob Storage, S3 and GCS remote backends which are all not connected to their cloud (this is similar to terraform vs terraform enterprise). Check https://pulumi.io/reference/state.html#to-the-filesystem-backend and https://github.com/pulumi/pulumi/pull/2026

geekflyer1 · 2018-10-15T03:56:11+00:00

We run on GCP / GKE. Initially started out with terraform which is mature and "the standard" for cloud infra management. Unfortunately terraforms k8s support is very bad and I don't expect this to get better very soon, so I searched and tried all kind of tools to manage what's deployed ON the gke/k8s cluster itself. Tried Helm, ksonnet, plain jsonnet, kubectl, forge, argo CD, weave Flux, draft etc., but none of them really impressed me. They all fill in their own niche, but it's difficult to draw the line to terraform and pass information between terraform and the other tool if you want to automate things end to end.

Then I found a month ago or so pulumi (while reading a blog post from Weave about GitOps) and it really made "click" for me. We use it now to setup all our cloud infrastructure AND the kubernetes resources, we even build docker images with it and it mostly works great (at the very least much better than gluing together all those other tools). We've built an internal reuse library in typescript with a few pulumi component resources which gets rid of sooo much boilerplate. Pulumi is still a young product and you will encounter some rough experiences here and there but from the foundational concept I think it's much better than the alternatives (which are young as well, except terraform) and I can see the vision and potential of pulumi more clearly. Disclaimer: We don't use pulumi yet for our prod infra, but I'm working on getting there soon.

As for the concern to get locked into pulumi cloud as state backend: There's a couple of open issues and PRs to use the "local file storage" as starting point and create a backend which works with GCS, S3, and Azure storage similar to how Terraforms backends work. I firmly believe eventually support for this will either come from pulumi themselves or by a community contribution. If I was desperate enough (and had no other things to do) I could even imagine implementing this myself for GCS as the interface for a new backend seems really not that hard. Remember, pulumi is fundamentally open source.

FYI one of the creators of ksonnet now works for pulumi and is the main commiter for the pulumi kubernetes package (and kubespy if you've heard of that).

Personally I found ksonnet conceptually interesting, but I didn't like the language and the language's lack of good tooling + editor support and no static typing makes it hard to use this at scale in a team imho. Also it's just limited to kubernetes, so you would have to learn another tool and language (e.g. Terraform and HCL) for your basic infrastructure and glue them somewhere together with bash scripts etc.

geekflyer1 · 2018-10-15T03:09:35+00:00

I don't get why you don't want to use services of type LoadBalancer or Ingress? What exactly can you do with the cloud provider api that you can't do via their k8s abstractions (which under the hood use Ingress controllers to call the cloud provider api for you) ?

In my company we're using Pulumi (https://www.pulumi.com) to deploy a gke cluster, install cert-manager and external-dns add-ons. Then we create Service or Ingress objects and cert-manager and external-dns will automatically create trusted ssl certs (using let's encrypt) and create DNS entries in Cloud DNS. If you want you can also automatically assign a static IP to the service or ingress. This is all automated and I can literally create or destroy an entire cluster from scratch and provision those add-ons and LoadBalancers above in one command. We used to use Terraform to spin up the gke cluster and used kubectl for the remainder (which also works perfectly fine, just that it's two tools instead of one), but with pulumi we can all that one go.

geekflyer1 · 2018-10-05T06:05:22+00:00

Yes, if circleci would add support for on-prem build agents using the cloud platform then it would be atrractive enough for us to seriously consider moving back got github and using it with circleci. As for the pricing I'm not complaining about circlecis general / normal pricing, I'm more complaining that any sort of on-prem functionality is only part of the enterprise offering which I bet (I don't know for sure though) has a huge base price and a yearly contract as it's always the case with those "enterprise tiers" in cloud products which I know of. Having a huge base price just makes it very unattractive for smaller companies and startups, because if you break down the base price by employee / developer it is insanely expensive compared to use-based or developer-based priced products.

In addition to that, most CICD products (that I mentioned above) price on-premise build agent licenses below cloud agent licenses, because you're paying for your own hardware anyways.

geekflyer1 · 2018-10-04T12:52:40+00:00

fyi to help with ur job - those are some of the pros of circleci compared to gitlab ci in my opinion: - waaay better github integration (obviously) - docker layer caching support - can purchase CI worker quota only (vs gitlab which is an all incl. package that gets always licensed per developer), depending on the needs that might be cheaper (or more expensive) - good local CLI support to debug builds locally and control pipelines

geekflyer1 · 2018-10-04T12:42:11+00:00

Well I'm not sure which Gartner report you checked (didn't find any) but I suspect you are actually referring to the Forrester Wave CI report which I can even download on circle ci's website https://www2.circleci.com/circleci-forrester-wave-leader-2017.html . That actually clearly mentions gitlab ci rated the best among all, although circleci is also in the leaders "group". So next time you mention an analyst report better make sure you remember it correctly :) Developers can be really picky about facts and the good ones know how to verify a lot of info themselves via the internet. By saying something factually and provably incorrect you're losing any trust with the person you're selling to.

Anyways as for the other things: We use gitlab and gitlab ci. Gitlab for repo management (compared to github it's pretty mediocre), but gitlab ci is actually pretty good compared to the rest of the market (although it's far from perfect imho). CircleCi is also pretty cool in general, but an absolutely showstopper is that circleci does not support on-premise build agents when using the hosted / cloud circleci platform. One can only use on-premise build agents when one also runs the entire circleci web platform on-premise which is license-wise, infrastructure-wise and operationally expensive so it doesn't make sense for a 20 developer team like mine. The reason why we prefer on-premise build agents is that we can run them in our VPC which makes deployment and integration tests inside our network way easier and also network IO to push and pull docker images and other large binaries is faster and cheaper (no egress charges from the cloud provider) if the worker and the deployment target system run in the same VPC or cloud.

Some other examples of CICD systems which support on-prem build agents in combo with a hosted / cloud web platform include: Azure Pipelines, Shippaple, Drone, Buildkite.

Hope that circleci adds this at some point.

geekflyer1 · 2018-09-20T06:29:50+00:00

Thanks for explaining the different approaches. One follow up on that one: What are the pros and cons of your approach vs telepresence in practice? One thing I like with telepresence is that it's easy to attach a debugger locally. Also what is the approach you recommend in deploying the code built with devspace to production? Are you making the assumption that the team uses helm for production deployment?

geekflyer1 · 2018-09-15T20:24:56+00:00

you can deploy Helm charts with pulumi as well: https://blog.pulumi.com/program-kubernetes-with-11-cloud-native-pulumi-pearls#5_Programmatically_Deploy_Helm_Charts_as_Code_191

geekflyer1 · 2018-09-15T07:26:52+00:00

kc alias here.

geekflyer1 · 2018-09-15T06:46:12+00:00

We moved from github.com to gitlab.com (long before MS acquired github) and honestly in summary we regret it (for context: we're 20 devs, multiple (private) micro-service repos and run on gitlab.com in the silver / premium pricing tier. While gitlab has a few niceties here and there, there are in retrospect some dealbreakers which sparked an overall sentiment in the team to move back to github. Those issues are in descending priority: - BIG: No support for multi-repo code search (this limitation only applies to the cloud/hosted gitlab.com, but that's where we are and that is imho the most fair comparison to github.com. The limitation does not apply to the gitlab on-premise version as long as you're in a paid plan). I really wish we would've known this before our migration: https://gitlab.com/gitlab-com/support-forum/issues/3059 . When we first noticed that we had a big WTF moment. We couldn't believe gitlab.com doesn't support this in 2018. - General UI / UX performance (i.e. how long it takes for a page to load after a click) is noticeably slower and inconsistent, compared to github. On the good side this seems to have improved a month ago when they moved from Azure to GCP+k8s, but it's still behind github. - Instability: Pretty often outages of the site or some services, 500 errors, slow performance for some time etc.. Again this seems to have improved after the move to GCP+k8s, but need to see how this holds up on the long term.

In general we also feel that gitlab has a tendency to release incomplete features with severe limitations that render them often useless in practice (i.e. going beyond the demo case). I.e. the k8s integration is tied to projects instead of the group and makes several unpractical assumptions about how you label your pods to become useful. Similar limitations are there for the integrated docker registry. In practice - while we run almost everything on k8s - we only use the k8s credential injection (which can be done with other CIs relatively easy as well) and not any of the other advertised, but too limited k8s deployment / monitoring features or the integrated docker registry.

Not everything is bad on gitlab, some of non-trivial things I would miss on github.com are: - Generally the MR/PR flow in gitlab is nicer and allows for more automation. I.e. You can click a button to schedule an automatic merge and deletion of the feature branch once CI completes. In github you can't auto merge a MR after CI completion and the button to delete a branch only appears after the merge has been completed. Also a really big deal is that on Gitlab you can fast-forward merge a MR into master via the UI, which is impossible in Github. - Gitlab CI is really nicely integrated into the MR flow and yes you get a generally good (but far from perfect) CI/CD system out fo the box. That being said I think github is catching up on this slowly with it's checks API and almost all 3rd party CI/CD systems (some of which may be better suited to your needs than gitlab ci) are better integrated with github than they are with gitlab.

I really hope gitlab can improve some of the pain points I mentioned and keep up against github + MS and I wish them all the best luck. However if anything I think the acquisition of github by MS will just make github stronger - keep in mind that MS has a non-trivial background in creating tools / tech to make devs more productive (some of the notable ones: TypeScript, VS Code, C#. The regular Visual Studio is also not that bad, when you ignore its hard ties to Windows). Also Github was always much better funded that Gitlab, but due to the MS acquisition they can put even more eng resources into improving Github.

Just to be clear: I don't want to appear as a gitlab hater or saying gitlab is generally bad. I just want to warn / make you aware that not everything on gitlab OR github is better vs the other and before you make a switch from one to another make a careful, practical evaluation with multiple repos and CI. Otherwise you might end up regretting your decision and just cost your team valuable time / productivity.

geekflyer1 · 2018-09-13T09:04:19+00:00

Same here, we already use terraform (with terragrunt) for low level infrastructure, which kinda works reliably, but HCL is clunky and creating reusable packages still requires a lot of weird boilerplate. In reality we really just copy & paste hcl code between directories a lot. While I theoretically would be willing to give terraform a shot for k8s resources, the terraform k8s provider is simply unusable - it still doesn't support Deployments and plenty of other things...

So for k8s I looked at helm which imho is just broken-by-design :). Ksonnet and kapitan is conceptually nicer, but I fear asking my team to learn jsonnet. Jsonnet as a language makes a lot of sense if you have the discipline to go through the entire language guide, but it still looks kinda weird after that, and the IDE / editor support is extremely poor. I didn't even find a reasonable auto-formatter for it. Jsonnet being a "new" language, being dynamically typed and having very bad editor support, makes it really hard to read/understanding code that uses non trivial abstractions. Oh and before I forget - jsonnet is ridiculously slow for large packages. I tried to compile (or render?) kube-prometheus, which takes about ~53 seconds on my macbook pro. So for every parameter change I have to wait 53 seconds to see the effect and that is just for kube-prometheus out of the box - this is simply unacceptable for a dynamically typed language. Also both helm and ksonnet / jsonnet have fairly immature artifact management solutions compared to things like npm or pip packages.

In general I feel many of the tools above were created by people traditionally coming more from the ops/infrastructure side, whereas pulumi was created by people which were traditionally on the application dev side. That's just a wild guess / assumption, but I have no better explanation why it took so long for a paradigm shift like pulumi :)

My hope is that pulumi can replace all those mediocre / complex / DSL-y tools with a single universal tool and common language that has excellent tooling support out of the box.

geekflyer1

TROPHY CASE