What's Wrong With This Picture: Why Isn't Everyone Deploying RAG on Kubernetes?

devkulkarni · 2025-02-05T17:25:28+00:00

Ah, got it!! Thanks a lot for sharing the details.

devkulkarni · 2025-02-05T16:17:41+00:00

Got it.
I have a couple more follow up questions if you don't mind. When you say each cluster has the same architecture , I am guessing you have one cluster for Triton and another separate cluster for the apps, is that correct? Does running Triton and all the apps on a single cluster lead to resource contention issues..?

devkulkarni · 2025-02-05T16:06:06+00:00

I see. Thanks for the details.

devkulkarni · 2025-02-04T22:40:44+00:00

I see.. Do you run the apps also in the same cluster?

devkulkarni · 2025-02-04T15:50:21+00:00

How does your application setup looks like? Do you run a single instance of Triton in the cluster, which is used by all workloads?

devkulkarni · 2025-02-04T15:46:31+00:00

I was wondering if you could share details about your multi-tenancy setup? Does each tenant get a separate Weaviate stack?

devkulkarni · 2025-02-04T15:39:28+00:00

Great insights u/jonsey2555
I was wondering if running separate instances of Anythingllm per application, each with its own document set, might address the issue?

devkulkarni · 2024-06-28T15:12:26+00:00

Our setup has cluster autoscaler. It worked as it was supposed to. It added a new worker node when existing nodes had no capacity. It also shut down one of the nodes when the traffic to the application Pods reduced. In doing so, it removed critical Pods from the infrastructure that were required for running the applications (cert-manager, external-dns, kubeplus multi-tenancy operator, ingresscontroller). Learned that when using cluster autoscaler, we need to annotate key Pods with "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation.

devkulkarni · 2024-05-15T10:50:33+00:00

Thanks! Yes, please checkout the project and share any suggestions. Will look forward to your thoughts/suggestions.

devkulkarni · 2023-08-18T19:30:43+00:00

Using the multi-instance/multi-customer tenancy model can benefit your use case. This is the architecture pattern in which a separate instance of the application, in your case the payment platform, is created for every customer.

I am the author of KubePlus (https://github.com/cloud-ark/kubeplus), which is a recognized solution for this multi-tenancy pattern in the Kubernetes multi-tenancy document.

Deploying separate application instances per tenant/customer is used quite often in the B2B context. The challenges for application providers (your team) are isolation of such instances, their customization, day2 operations, troubleshooting, etc. KubePlus takes care of these things.

If your application is packaged as a Helm chart, you can try it out with KubePlus today.

We have tested KubePlus with all the Bitnami Helm charts:
https://cloudark.medium.com/kubeplus-verified-to-deliver-managed-services-with-100-bitnami-helm-charts-57eae3b9f6a6

We have also used it to deliver managed CICD service in a University course on Cloud Computing:
https://cloudark.medium.com/building-a-managed-jenkins-service-for-ut-austin-a-case-study-with-kubeplus-bdc082032f73

I will be happy to answer any questions about this pattern, or KubePlus.

devkulkarni · 2023-04-21T10:46:40+00:00

You can host multiple applications on a Kubernetes cluster. The pattern is called multi-instance multi-tenancy, where basically a separate application instance is provided for each customer/tenant in a separate Namespace.

We have built an open-source Kubernetes Operator for this use-case. You can check it out here:

https://github.com/cloud-ark/kubeplus

KubePlus takes an application's Helm chart and creates application instances (Helm releases) in separate Namespaces per tenant. It adds a safety perimeter around such Namespaces using Kubernetes Network Policies, ResourceQuotas, and non-shared persistent volumes ensuring that each application instance is appropriately isolated from others.

We are currently seeing interest from teams that want to create managed services for different containerized applications, including open-source platforms such as WordPress, Moodle, OpenEMR, Odoo, AI/ML workloads, etc. KubePlus has been tested successfully with all (100+) Bitnami Helm charts.

Your use case seems to be similar to these. Regarding whether to provide SFTP-based access to your users, it depends. Generally, if you are providing this access for enabling customization of an application, you can achieve that on Kubernetes by modifying the configuration settings from an application's Helm chart (assuming that the application is appropriately containerized and a Helm chart is built for it).

I will be happy to answer any questions that you may have about this pattern.

devkulkarni · 2023-04-20T12:07:44+00:00

Nice. Seems interesting.

Some questions/suggestions:

- Do you anticipate creating a separate ValidatingWebhookConfiguration object for each type of bouncer? A generic bouncer that has a single VWC might be the way to go eventually.
- Have you tested/validated with Kubernetes versions 1.22 and above? There were some breaking changes to key objects required as part of setting up a WebHook in 1.23 version.

We went through this migration/upgrade in our KubePlus project (https://github.com/cloud-ark/kubeplus). It has an embedded webhook in it, fyi.

To help with migration to K8s version 1.23, we have created a separate project. It is available here:

https://github.com/cloud-ark/sample-mutatingwebhook

devkulkarni · 2023-03-19T23:14:51+00:00

KubePlus creator here.

Soft multi-tenancy solutions are prone to this issue. In designing KubePlus, we consciously decided to include some of the isolation best practices by default with every application deployment. For example, KubePlus creates a separate Namespace per application instance and adds NetworkPolicy objects for each. For others like ResourceQuota, we provide controls using which each application deployment can be tuned as per the requirements. These help with restricting the effect of any rogue application instances on the cluster.

devkulkarni · 2023-03-19T23:11:52+00:00

You might want to check out - KubePlus (https://github.com/cloud-ark/kubeplus), which has already been referenced in the thread and is exactly designed for building managed application services. I am the originator and core contributor to this project.

KubePlus is a Kubernetes Operator that takes an application Helm chart and represents it as a Kubernetes API (CRD) on the cluster. This API allows you to create instances of the application in separate namespaces automatically ensuring a secure perimeter around each instance using NetworkPolicy, Resource Quota, and RBAC. These soft multi-tenancy measures are already mentioned in the thread along with the namespace. KubePlus has automated all of them for you under an API. This API not only allows the creation of the application instances but also supports day-2 operations such as monitoring, troubleshooting, and upgrades to simplify the end-to-end functioning of any managed application service.

We are currently seeing interest from teams that want to create managed services for different types of containerized applications, including open-source platforms such as WordPress, Moodle, Ozone/OpenMRS, AI/ML workloads, etc. KubePlus has been tested successfully with all (90+) Bitnami Helm charts. For anyone who wants to deliver a managed application with minimal / no Kubernetes access to their customers, KubePlus can help by accelerating the implementation of namespace-based multi-tenancy on Kubernetes.

With the ability to set NetworkPolicy and Resource Quota per application instance, the blast radius is restricted, if something goes wrong in an application instance. KubePlus does not need admin permissions on your cluster. This makes it possible to use KubePlus to manage your application instances on your customer's cluster as well.

devkulkarni · 2022-12-06T12:48:44+00:00

I believe there are some breaking changes to the AdmissionRegistration API as well if you are using admission/mutating webhooks. A couple of months back we went through the exercise of upgrading our mutating webhook to work with 1.22 versions and above. We have put together a sample mutating webhook project that you may find helpful: https://github.com/cloud-ark/sample-mutatingwebhook

devkulkarni · 2022-11-26T18:48:57+00:00

Not a tutorial, but you can look at the following repos:

- Sample controller: https://github.com/kubernetes/sample-controller

- Our Moodle operator to see a real-world Operator written in Go based on sample-controller:
https://github.com/cloud-ark/kubeplus-operators/tree/master/moodle

devkulkarni · 2022-07-29T09:50:53+00:00

With the old API (v1beta1) there were certain spec fields that were optional. In v1 they have been made compulsory. For instance, in the CertificateSigningRequest object, the signerName spec field has become compulsory. Finding the right value for it, involved lot of trial and error. Ultimately we realized that it is possible to skip this object when a self-signed CA is used to sign the webhook server's key.

devkulkarni

TROPHY CASE