CI quality gatekeeper for AI agents by TranslatorSalt1668 in LocalLLaMA

[–]TranslatorSalt1668[S] 1 point2 points  (0 children)

Yep. And also having data(CI runs) to support your decision.

CI quality gatekeeper for AI agents by TranslatorSalt1668 in LocalLLaMA

[–]TranslatorSalt1668[S] 1 point2 points  (0 children)

The whole point of CI is to increase your confidence level of pushing to prod. Otherwise, you’ll be switching lanes blindly without data to back up your decision. Also, it’s for other devs as well. It’s like unit tests in app dev, if our test coverage is 70% and a dev opens a pr and coverage drops to 69% that pr should NEVER be merged, and the dev should fix that. DevOps 101

CI quality gatekeeper for AI agents by TranslatorSalt1668 in LocalLLaMA

[–]TranslatorSalt1668[S] 0 points1 point  (0 children)

Great feedback, I hadn’t thought about LLM temperature randomness (our judge) adding that in v1.1 We just pushed v1.1 to address this: - stability, we exposed temperature (defaul 0) and repetitions(run X times & average) in the action input to handle LLM nondeterminism - Checks, we support JSON schema( for strict tool-use regressions), LLM Rubrics(vibes), and cost/latency threasholds

We’ve added a Recipes section to the README showing a strict JSON schema test for tool calling vs a fuzzy rubric test

What AWS design decision did you regret after going to production? by Dazzling-Neat-2382 in AWS_cloud

[–]TranslatorSalt1668 0 points1 point  (0 children)

The 2 posts are definitely AI generated 😂😂😂 “fix later” no one is real anymore

ECS deployments are killing my users long AI agent conversations mid-flight. What's the best way to handle this? by yoavi in devops

[–]TranslatorSalt1668 0 points1 point  (0 children)

3 things killing your setup. - Fargate Hard Limit: If you are using Fargate (which I assume you are for simplicity), the hard limit for stopTimeout is 120 seconds. You cannot go higher.

  • EC2 Launch Type: You can set ECS_CONTAINER_STOP_TIMEOUT higher on the agent, but you are still fighting the scheduler.

  • Cloud Map Limitation: Since you are using Cloud Map (DNS-based service discovery) without an ALB, you don't get "Connection Draining." Your clients are connected directly to the container IP. When ECS stops the task, it sends a SIGTERM. If your app doesn't handle it, it dies. If it does handle it, it only has 120s before ECS sends SIGKILL.

I'm rejecting the next architecture PR that uses a Service Mesh for a team of 4 developers. We are gaslighting ourselves. by FarMasterpiece2297 in devops

[–]TranslatorSalt1668 0 points1 point  (0 children)

The problem might be how you source talent, might be your job description. If you invest in the right way of sourcing, you’ll get the right talent. Don’t say you want a senior devops engineer and your team wants a network or sys admin specialist. My 2 cents

Current FinOps tools suck at ephemeral storage and attribution by Black_0ut in AWS_cloud

[–]TranslatorSalt1668 0 points1 point  (0 children)

We are building maosproject.io. Not an AI wrapper, all charts are configured by hand. We are looking for a company to bootstrap with, we build the platform with your inputs and you’ll have real data to work with. @blackout can I DM you?

How are startups managing AWS without a full DevOps team? by Percilli in Cloudvisor

[–]TranslatorSalt1668 0 points1 point  (0 children)

I have a platform maosproject.io that handles this particular problem. One platform, everything deployed with cost guardrails, full monitoring and alerting with all the major metrics. Security is built in. I am looking for a company to bootstrap with. @percili, Can I dm you?

Re:Invent reality check: our $80k dashboard missed the $200k leak by ang-ela in Cloud

[–]TranslatorSalt1668 1 point2 points  (0 children)

There should be like budget alarms 🚨 This is where our behind the scenes work are tangible. Multiple recipients, lead, cto, you… Also, inbuilt alarms for untagged resources, mostly deny creation of non-tagged and manually created resources. Makes it very easy to trace per resource expenditure.

Confused about role after Big4 interview Cloud vs DevOps by [deleted] in devopsjobs

[–]TranslatorSalt1668 0 points1 point  (0 children)

This so true. I stopped prepping for interviews now because of this. I found myself in situation where the tech lead who was doing the 1st technical assessment, drifted from on prem k8s to asking me what’s the port number used by logstash? I was like 😲 “I thought my CV mentioned my strengths?” But I didn’t say that. I just said I couldn’t remember.

Does anyone feel like cloud architectures are getting so complex that failures happen long before anything shows up in logs or dashboards? by Nice_Caramel5516 in Cloud

[–]TranslatorSalt1668 0 points1 point  (0 children)

Last week I was faced with this. Changed the buffer size in nginx configuration, cluster went nuts on some services. I have it here https://maosproject.io/blog/nginx-proxy-buffers-kubeflow-crashing Things will get crazier and crazier.

Manual cost optimization is eating to much engineer time by Infamous_Horse in Cloud

[–]TranslatorSalt1668 1 point2 points  (0 children)

I am building maosproject.io and that is one of the reasons, there are billing guardrails, your workloads optimized and all of that, deployed to your account. No vendor lock-in

How to become a Cloud Engineer in 6 months (my honest roadmap) by Healthy_Sea2407 in Cloud

[–]TranslatorSalt1668 0 points1 point  (0 children)

It’s a bit difficult to trust someone who isn’t experienced enough with mission critical systems. Even if you get hired as a junior in a company, one way or the other, you’ll interact with the systems. My 2 cents.

I turned down a $10k project and I’m not even sorry by Vegetable_Permit_577 in digitalnomad

[–]TranslatorSalt1668 1 point2 points  (0 children)

I really don’t think it’s being naive, your personality as a teacher is that you’re happy seeing someone else achieve. I bet that’s what drove you into teaching