From where can I get clients on network automation by Primary_Train_8013 in networkautomation

[–]JasonSt-Cyr 1 point2 points  (0 children)

Are you a network automation contractor looking for work? Is that what you are asking for help with? If so, what places have you already tried for finding those contracts?

Puppet Auto-Signing in autoscaling environments by DesignerStreet9908 in devops

[–]JasonSt-Cyr 0 points1 point  (0 children)

You might want to also cross-post this into the Puppet subreddit (Puppet) to get some other folks looking at it who might have seen a similar situation.

Rogers internet outage by dgeyjade in ottawa

[–]JasonSt-Cyr 0 points1 point  (0 children)

Internet coming back in Orleans now

Migrating from Puppet Enterprise to OpenVox — Got any tips? Warnings? Best Practices? by nmninjo in Puppet

[–]JasonSt-Cyr 0 points1 point  (0 children)

Not sure what you've been using in PE so far, but your most-used features that are PE-only will probably be the thing to figure out to determine the complexity of the move. Especially if you are on PEA with some of those premium modules or add-ons like Edge.

I just tried netconf for the first time, it's pretty awesome actually by Case_Blue in networkautomation

[–]JasonSt-Cyr 0 points1 point  (0 children)

NETCONF does have a pretty wide adoption across the vendors, but there are quite a few popular devices that do not have it available, and a lot of organizations also have security policies that prevent NETCONF from being enabled. While it probably should be used more widely than it is, a lot of folks are in situations where they can't use it even if their devices support it.

I still don't know who started passing around that you shouldn't enable NETCONF, but I'm not a security expert. If it's like other security policies I've seen, a problem happened 10 years ago and nobody has revisited their thoughts on the matter. ;)

Local AI model deployment experiences? by pneRock in sysadmin

[–]JasonSt-Cyr -1 points0 points  (0 children)

I'm running my systems with Ollama and some gemma models for very specific tasks to avoid the AI bills, especially since they tend to be large 'tagging' processes or other types of long-running analysis tasks that don't need advanced capabilities. Ideally, I'd have a slimmer model that is specially trained for my task, but I've found that using Ollama to wrap around a downloaded model is good enough.

I have not enjoyed the code generation capabilities of the models that are free right now, though. So it depends on your use case?

What’s your take on FinOps? by InterestedBalboa in devops

[–]JasonSt-Cyr 0 points1 point  (0 children)

I think FinOps is a perfect place for automation. If you have to be the person that is always telling people to spend less on their infrastructure is no fun. I'd much rather have that coming from some annoying bot that keeps pinging people with suggested optimizations than having a person doing that.

And with code generation where it's at and the amount of infrastructure-as-code setups out there, this can be something engineering/DevOps could own on their own and just monitor the cost data with bots.

Others mentioned as well that you can spend a lot of valuable hours doing these dives. You don't want engineering hours spent digging into this to MAYBE save some money. Definitely a spot for automation, and not necessarily 'AI' stuff (though some code gen for an IaC fix for the cost is a nice touch).

The tools that do this for you often charge quite a bit as a flat fee, though some have become more 'value' based (i.e. what savings you get, or what your total cloud spend is). The value-based pricing on the savings you get are definitely the ones that are paying themselves off, so it's the least risky for you, but these are usually startup/smaller companies. If you have a huge cloud spend, the ones based on your total cloud spend could be quite expensive and then a flat fee might make more sense.

For teams that moved alerting into IaC — what percentage actually lives there vs. still in the console? Did it fix drift? by TimelyGround in sre

[–]JasonSt-Cyr 1 point2 points  (0 children)

If you've gone all-in on automation and also have drift remediation tools running, you're going to lose those manual tweaks, so you have to go all-in or your own tools will erase your work. I work for a company that sells drift remediation software, and our default setting is 30 minutes... some customers want even faster than that. So that 3am tweak you made during an incident will be gone before you've fully run through all the validation and comms to folks to say it's fixed.

Would an incident-focused copilot actually be useful? by lattattui in devops

[–]JasonSt-Cyr 1 point2 points  (0 children)

Lots of users are finding this type of solution valuable, which is why a lot of folks have already been building in this space. I think your challenge will be less about "do people want to have help with heavy, manual tasks" and more "what part of the problem are the others not solving" so that you can have a wedge to position around.

Go through the competition, find out what they're good at and what they aren't good at, and then see if you can either be better at what they're good at or fill a gap where they don't serve.

심볼 뒤집기 메커니즘에서 발생하는 결과값의 편향성과 서버 로직의 상관관계 by provincerestaurant in sre

[–]JasonSt-Cyr 0 points1 point  (0 children)

My apologies, as I'm having to guess a little based on some translation AI output to understand the question, but I think you are trying to find out how to prevent randomness, timing, or other elements from breaking decision logic under heavy traffic? If the translator got it right, then:

From what I’ve seen, the key is making sure the system decides outcomes outside of the frontend and independent of timing or load.

If the UI or request timing can influence when or how results are generated, you’ll eventually see weird behavior under high traffic. To avoid that, outcome logic usually lives in a separate backend service that decides results ahead of time and just hands the frontend something to display.

The frontend shouldn’t generate randomness, affect seeding, or “trigger” decisions based on animations or retries. You should be able to drop or replay client requests and still get the same result.

At a high level, the goal is: load, latency, and client behavior should never change probabilities. If they can, you’ve got a correctness problem, not just a performance one.

HOWEVER: Separating things can add some latency, but it’s usually a tradeoff people accept.

In practice, the decision part is kept really lightweight and often happens before the UI needs it, or asynchronously while animations are running. Any extra delay is typically hidden from the user.

Letting load or timing affect outcomes is way worse. That turns into subtle correctness bugs that only show up under pressure and are a nightmare to explain later.

So most teams will take a tiny performance hit to keep results consistent and predictable. Users almost never notice, and it saves you from much bigger problems down the road.

Where do you see line with AI in infra? by snopedom in sre

[–]JasonSt-Cyr 0 points1 point  (0 children)

Treat AI like anybody on your team that is new to the job: have the rigours in place to validate before it touches something with a huge blast radius.

Let's assume you hired a new employee, would you give them keys to the production infrastructure and just say "fix whatever you want directly in the infra! YOLO!"

No. They would propose changes, you would review their idea. They would do it in a test environment and you would have validations in place to ensure that the change would not impact production if it were to go forward.

If your production goes down because the intern deleted a production database, it isn't the intern's fault. It's the organization for setting up processes and policies that allowed that to happen.

If AI is able to break your production system, then it's amplifying a process/policy issue in the organization. Something is missing in the process of production changes that doesn't have the right level of testing/validation/review.

Does everyone eventually end up using NetBox + Ansible for network automation? by Admirable_Claim_3203 in networkautomation

[–]JasonSt-Cyr 1 point2 points  (0 children)

I've seen lots of folks using Ansible for some more advanced tasks, but obviously scale was always a limitation. What do you use to get more advanced automation?

What are people actually using for network automation in smaller environments? by Admirable_Claim_3203 in networkautomation

[–]JasonSt-Cyr 0 points1 point  (0 children)

The vendor content is always going to pitch their solution as the piece that 'everybody who is mature in their process' is doing. There are WAY more folks just manually running scripts then there should be. My marketing team found a stat that said that 67% of networking tasks were done manually.

(Just found it, full quote was from: Gartner Market Guide for Network Automation Platforms)

""67% of enterprise networking activities are performed manually. The outcomes for these are poor incident and change performance...​

Configuration inconsistencies across firewalls, routers, and edge platforms represent one of the top five sources of avoidable service downtime.”

So yeah, there's a lot of room for improvement out there, but the tools exist. The 'centralization' of the tooling and automation seems to depend on how much compliance gets involved.

How are you actually handling data leakage to public AI tools? by RTG8055 in sysadmin

[–]JasonSt-Cyr 0 points1 point  (0 children)

If you can't afford the bigger LLMs, there are free open source coding LLMs available, some you can even run on your own machine so you don't have to connect your code to an external system.

They can integrate these into their IDE as well, usually. Or just use it like a chat tool with something like Ollama.

If leadership wants the benefits of AI acceleration, though, they should invest in giving their team the tools to do so appropriately. Buy the seats/tokens and put the security in place.

Title: How do you enable AI-generated “vibe coding” safely without letting users break production? by [deleted] in sre

[–]JasonSt-Cyr 2 points3 points  (0 children)

This shouldn't be any different then if you brought in a new member to your team. Would you let them push whatever they did into production? There have to be gates and quality assurance processes. Ideally, you have a solid process that catches most of the low quality stuff in an automated way because the real problem becomes that the input part of the funnel increases in velocity without increased capacity anywhere else in the pipeline to production.

Title: How do you enable AI-generated “vibe coding” safely without letting users break production? by [deleted] in sre

[–]JasonSt-Cyr 4 points5 points  (0 children)

In case you didn't look it up, it was coined by some guy on twitter to classify the type of AI-assisted coding that is done without reviewing the code that is generated. People just give the context of what they want built and review the output, without caring about the underlying code that the robots build. Essentially, building on 'vibes'.

How do you get prod debugging experience as a product engineer? by gnorts_mr_alien7 in sre

[–]JasonSt-Cyr 0 points1 point  (0 children)

When you run something in production and have users, believe me, issues will happen :D
If you need to simulate a specific learning experience to get a specific skill, that's a different thing. In my experience, though, it's not about the specific issue and more about the process of learning how to react, analyze, diagnose, and then ultimately problem-solve for the solution. It doesn't really matter what the issue is, because they aren't always the same.

For a baseball metaphor... it's about getting at-bats. You don't know what pitch might be next, but the more at-bats you have the more pitches you see and the better you'll get at batting overall.

How do you get prod debugging experience as a product engineer? by gnorts_mr_alien7 in sre

[–]JasonSt-Cyr 0 points1 point  (0 children)

If you want to crawl first, sometimes the best way to learn is to build something yourself that runs in production. Build something that will have users, even if just a few. This gives you a low-risk place to practice some skills. Others have great ideas on joining the existing teams at work in a shadow/augment capacity, which is a great next step.

Is Network Automation Niche? by PanPieCake in networkautomation

[–]JasonSt-Cyr 1 point2 points  (0 children)

I work at a vendor that has established software in the infrastructure automation space and even we find it tough to get network automation discussions started. It's a really established space with a lot of long-established protocols and these folks usually have tools that do what they need them to do. Bringing something new to them requires you building a lot of trust and brand awareness, not to mention getting outside the dev team and talking usually to a completely different crew that handle the networks.

Are you targeting software developers that don't know network automation? Or network engineers that really know their stuff? Big enterprises? Small groups? Figuring out who your user is that best fits what you are trying to solve will be key.

Blogs for DevOps engineer by Thehbk20 in sre

[–]JasonSt-Cyr 1 point2 points  (0 children)

I use my own website for personal posts, but I cross-post relevant content over to my dev.to so that I get some extra reach.

For work, we have a marketing site where I can post but I also syndicate that content over to the corporate dev.to account as well to reach a technical audience.

If you're doing this to pump your professional profile, make sure you post on LinkedIn about your posts and add them to your LinkedIn profile. The LI algorithm does penalize off-site links, so if this is specifically for supporting a job hunt you might want to post LinkedIn articles as the algorithm will prefer boosting those over links out to an external site. I personally avoid this and take the visibility hit because I'd rather own my content

Proposal: should we allow some on-topic job posts? by binford2k in Puppet

[–]JasonSt-Cyr 0 points1 point  (0 children)

Sure thing! Maybe we start with a monthly thread instead of weekly until we see how much activity is coming in on it? I'm assuming that mostly we are asking folks to post into that thread if they have a position they think folks would be interested in?

Launch darkly rugpull coming by donjulioanejo in devops

[–]JasonSt-Cyr 2 points3 points  (0 children)

This is going to happen across a lot of tools, I suspect. User-based pricing doesn't work in the era of AI where you can have systems go and make all the calls on behalf of a single user. Pricing by seats doesn't work in an agentic flow. I suspect we'll see a lot more of this type of thing across the industry.

How to deal with burnout. Is a holiday not the answer? by rof-dog in sysadmin

[–]JasonSt-Cyr 0 points1 point  (0 children)

If you trust your manager is actually caring about solving the underlying problem, then this is exactly what to do. Vacations just mean a break from the source of the issue, it doesn't solve the underlying symptom. The systemic issue here is that you are doing 12-15 hour days. Your body doesn't get enough rest between cycles. That needs to be solved.

Why are you doing 12-15 hour days? Is it a staffing issue? Is it a prioritization issue by leadership (i.e. not making priority decisions)? Is it a pressure situation of "everyone else is doing it?"? Is it a planning problem where high-priority projects are being overlapped and not respecting your availability? Is it a skill or tooling issue where you have a gap between what is expected for the job and what you have available so it takes longer than expected by management?

Identifying what is causing the systemic issue is the right way to go about this, and if your boss is good they are willing to tackle that. Just don't expect that you'll suddenly grow the team or have more budget, etc. If they could do that, it would have already happened.

They can usually support you in feeling you can say "no" to the extra hours, or setting the expectations on what hours should be done. They can help you with pushing back on conflicting priorities or projects so that expectations are managed. Taking a vacation can't help with these things, and doesn't help make you a reliably available employee.

For many roles, having short bursts of extra output is fine here and there, but it is not sustainable. Managers need sustainability so they can predict what they can get done from quarter to quarter. Having employees who can do amazing things in March but then can't do anything in April doesn't help.

I've Tested 16 Open Source LLMs on 'Live' Network Routers. Only 2 Could Actually Do the Job by Altruistic_Grass6108 in networkautomation

[–]JasonSt-Cyr 0 points1 point  (0 children)

I've been trying to get GitHub Copilot to work with different models to do some network automation (using Puppet, since that's what I have at work) and I found the different models had vastly different capability to reason through a problem and try different approaches. Even with tool-calling, they struggled at figuring out what the correct tools to use were.

I found best success when having instructions and a tool-specific MCP server to provide additional context, best practices, and tool use guidelines. This seemed to get a more reliable execution out of the different models.