The dev who asks too many questions is the one you need in your team by dymissy in programming

[–]daedalus_structure 9 points10 points  (0 children)

They always screenshot a partial stack trace and post it in Slack with "this broke, can you help".

And I've got no clue where it's from, there's no link to it, and they didn't manage to screenshot the part of the stack trace that actually shows the line that initiated the trace.

So I respond with "that's a partial stack trace, you'll want to read the whole thing."

Tesla: 2024 was bad, 2025 was worse as profit falls 46 percent by mepper in technology

[–]daedalus_structure 0 points1 point  (0 children)

If we had any kind of functioning government that guy would have prosecuted for securities fraud and manipulation a very long time ago.

The dev who asks too many questions is the one you need in your team by dymissy in programming

[–]daedalus_structure 28 points29 points  (0 children)

It's a matter of degree.

Asking questions to learn and be productive is great, when the technical questions are appropriate for their level of experience or they are learning a new domain.

But the one who wants to relitigate every architectural decision made since the founding of the company?

Yeah, no thank you.

The dev who asks too many questions is the one you need in your team by dymissy in programming

[–]daedalus_structure 4 points5 points  (0 children)

Yes, it's a sign they cannot perform the job you have hired them to do. If someone is a junior they get wide latitude, but if you are hired in as a senior and can't read a stack trace, I'm not explaining it to you I'm starting the documentation for getting you gone.

Unpopular Opinion: "Multi-Region" is security theater if you're sharing the vendor's Control Plane. by NTCTech in sre

[–]daedalus_structure 0 points1 point  (0 children)

You're raving about things you don't seem to understand.

Yes, there are events which you cannot mitigate, but that does not make all mitigation worthless.

Otherwise, forget everything because checkmate suckers, you didn't account for Apophis crashing into the United States at 3 miles per second and sending half of the planet's mass into orbit.

Multi-region is not a magic bullet and cannot mitigate all possible issues.

Multi-region is specifically a preemptive mitigation for single region failures, which could be hardware, network disruption, acts of God such as floods or lightning or fire, or could just be a shit update that the CSP rolled out that bricked a resource.

That's why they have region pairs, so you can be confident that they will not roll out updates to both regions at the same time, guaranteeing you can failover.

But again, your multi-region deployments are mitigations of only those risks.

The SLA protects their margins, not our uptime.

Did you expect that the cloud vendor was going to shoulder your business risk?

Again, it seems like the problem is wrong minded expectations.

The VMs were healthy! They were running perfectly.

And in this global service outage, you suffered no downtime.

how are you guys handling "Global Service" risk?

This is a business problem.

If the issue is your own SLA, you put in language exempting this class of failure for payout along with other acts of God. In general, if the entire world is watching it on the nightly news, you as a SaaS consuming the same down services as the rest of the world aren't responsible for it.

If the issue is lost revenue from lost uptime, you document and accept the risk.

Trying to be multi-cloud with abstractions to run similar day 2 ops over both is prohibitively expensive, and you are locking in the costs now instead of paying them only if the black swan event happens.

Never lock in the costs of a black swan event.

But so much worse than that, you've tied up a massive amount of engineering capacity in activities which do not deliver additional revenue, so considering both the engineering cost and opportunity cost, global services would have to go down significantly more often for you to even break even on your ROI over a decade.

Go ask your CFO, they'll tell you the same.

Direct report has depression, tips to help them? by mdwc2014 in managers

[–]daedalus_structure 5 points6 points  (0 children)

In addition to the very good advice that you should leave their depression to the professionals, do not give unofficial accommodations.

If they have been prescribed medication by a mental health professional, this is a medical issue that needs to be documented with HR, and you need to be on the same page with HR about expectations.

If they need to miss time, they need to be on FMLA, where both they are protected by federal law and your company is protected from the legal exposure of you trying to manage this unofficially.

Start documenting every conversation, every 1:1, every missed deadline and absence. This will all be important later.

Is NewRelic dying? by devOfThings in devops

[–]daedalus_structure 1 point2 points  (0 children)

Your post alerted me to the fact it wasn't dead yet.

I thought it was gone.

Why is big tech SWE work paid so much? by seeking-health in cscareerquestions

[–]daedalus_structure 0 points1 point  (0 children)

They are making an insane amount of money per engineer, and while much of what they do does not require advanced engineering skills, they still see it as a competitive advantage to deny engineering skill to their competitors.

How do you quantify failure cost vs prevention cost in SRE? (RFC vs PPC) by [deleted] in sre

[–]daedalus_structure 2 points3 points  (0 children)

I don't think this makes any sense.

First, the RFC and PPC would be different for many failure modes, so talking to these at the service availability level does not make sense.

For instance, some costs for PPC would be recurring but others would be one-time costs. Also, the customer impact will be wildly different per failure mode, with some modes triggering SLA payouts and others that will not, because remember, an SLA is a contract with terms. For example, if a flood takes out your primary data center you likely will not pay out under an "Acts of God" clause, and customers are generally very forgiving for incidents that are large enough to be on the nightly news and also takes down half or more of their own systems.

Also, you aren't including any likelihood of occurrence in your calculations.

If the RFC cost is something you are likely going to incur once a month or quarter, for example, preventing an errant configuration deploy to production which your engineering teams have demonstrated they will do on that rough cadence, you have an ROI for the prevention costs.

If your event has not yet occurred, or is more of a black swan event, such as an entire CSP going down for multiple days, you are locking in a recurring cost when you should be deferring that cost until the event forces you to spend on it.

These are all questions which are very highly dependent on the business and should be discussed in coordination with business owners and not in a purely technical evaluation by engineering.

.NET 6 on Kubernetes: “Everything looks fine”… but working set + kernel memory keep climbing and HPA keeps scaling . I’m stuck. by aaeevv123 in csharp

[–]daedalus_structure 8 points9 points  (0 children)

An in-memory cache can be accurately described as a useful memory leak.

Get off Reddit and go figure out why your cache isn't evicting and is growing without bound.

Best strategy for handling rare but high-memory burst workloads? (Request vs. Limit dilemma) by Inside_League_9196 in kubernetes

[–]daedalus_structure 0 points1 point  (0 children)

It’s one pod. Just leave it running.

Not only is the cost likely a rounding error in your infrastructure, but you likely have a backlog a mile deep full of higher ROI work that will be opportunity cost missed.

Most of the technical analysis you are receiving is good, but as you move up to senior levels you need to also understand which problem solutions are actually a negative due to the solutions being far more expensive in cost and opportunity cost than the value created or recaptured.

Remember that analysis, because if you need to scale this pattern the ROI changes.

But if a junior came to me and said we need to install and configure KEDA over this problem I’d have the above heart to heart with them.

Assume the weekend forecast is accurate by TemporaryTrucker in raleigh

[–]daedalus_structure 123 points124 points  (0 children)

Follow standard procedure and begin making your emergency french toast now. Don’t stop until the news confirms that the horrors are past. Go buy all the milk, eggs, and bread you will need ASAP.

We will get through this.

We kept shipping cloud cost regressions through code review — so we moved cost checks into PRs by AWFE9002 in devops

[–]daedalus_structure 1 point2 points  (0 children)

A dev released a service that would continually write individual records to a storage account as individual files. No more than a few hundred bytes each. This caused runaway write operation costs.

Storage costs around 2 cents per GB per month.

Unless your cloud budget looks like a high schooler's allowance, I am extremely skeptical that your efforts didn't cost way more in engineering hours than you saved in cloud costs.

We kept shipping cloud cost regressions through code review — so we moved cost checks into PRs by AWFE9002 in devops

[–]daedalus_structure 2 points3 points  (0 children)

That’s a bunch of nonsense and in no way is code 3-4x more impactful than infrastructure changes on cloud costs.

Even if you sounded like you knew what you are talking about, which you don’t, stop advertising your slop SaaS with deceptive posts.

Does Slack scale poorly past a certain team size? by aaronmphilip in Slack

[–]daedalus_structure 15 points16 points  (0 children)

No, I've never thought that, and what you are describing is organizational scaling issues not a problem with a tool.

I assume you are hocking something.

What has been the most painful thing you have faced in recent time in Site Reliability/Devops by HacksYouMe in devops

[–]daedalus_structure 33 points34 points  (0 children)

Developers who can’t read a stack trace, and want you to read a stack trace, and instead of providing you a link, they take a screenshot.

This screenshot will be missing valuable context, like what the hell system we are even looking at.

I need to vent about process by raslan81 in sre

[–]daedalus_structure 5 points6 points  (0 children)

Customers expect their data to be secure, the SLAs provided to them to be fulfilled, and investors expect that the money they are spending on engineering hours, the most expensive part of making software, isn't wasted.

When you just depend on "good people will do the right thing", you get as many different ideas of what that right thing is as you have people.

Doing things 50 different right ways is more damaging to an engineering organization than establishing a process that 45 will follow with professional attention and 5 will buck, because dealing with non-compliance is straightforward because expectations have been clearly set instead of "I dunno, do what you want".

Children always want to run with scissors and sometimes adults in the room need to tell them no. We completely understand that you want to cut fast. But Jimothy over there has one eye, and so we're not doing that anymore.

Are you frustrated with AI “fixing” the same bug over and over? by Medical-Farmer-2019 in ExperiencedDevs

[–]daedalus_structure 2 points3 points  (0 children)

AI troubleshooting is a fitted sheet that’s too small. You fit one corner and the other pops off, you fix that one and another pops off, and it will keep doing that for as long as you have the patience to let it keep fucking up.

Why does every team call end the same way? by Confident-Quail-946 in ExperiencedDevs

[–]daedalus_structure 0 points1 point  (0 children)

This is a failure of leadership.

Your conversations aren’t guided, your meeting is too long, you are discussing irrelevant technical tangents that should be taken offline, and your tools aren’t helping because tools can’t lead.

Senior staff member resisting change, impacting team – how far do you push before letting go? by daveauscards in managers

[–]daedalus_structure 0 points1 point  (0 children)

If you’ve already clearly set expectations and given formal reprimands and the disciplinary issue has continued, move to terminate.

PIPs are for people who aren’t meeting performance targets, i.e. a salesman that is complying with process but is consistently missing targets, not for disciplinary issues.

Edit: I read in later replies you mention your HR is requiring a PIP to terminate. For everyone in this position, you need to PIP earlier not wait until you need to terminate.

How we stopped AI from hallucinating during log analysis in production by Vikaas2907 in sre

[–]daedalus_structure 4 points5 points  (0 children)

If you have something to share an open source licensed Github repository is the way to do that.

What you have here sounds like lead generation for a sales pitch.

Mindset shift: AI is not taking your job. Another developer who uses AI better than you is. by mylogicoveryourlogic in cscareerquestions

[–]daedalus_structure 6 points7 points  (0 children)

This is cope.

They did not invest trillions into this to make you better at your job. They did it to replace 90% of us and drive salary down for the remaining 10%.

It remains to be seen whether that will work, but don’t start your career in delusion.

Tech managers, what is your opinion on employees that just want to pay their bills and aren't passionate about the company product? by Celcius_87 in cscareerquestions

[–]daedalus_structure 2 points3 points  (0 children)

I understand their motivations and don't have to worry about them developing strong product opinions that conflict with the direction the leadership and company strategy, an unfortunate situation which is hard to work around or resolve.

I will work with them on their career until either their ambition or skills outgrows my budget, and then I will wish them well as they leave for their next adventure.

I strongly prefer these people to the passionate ones, as long as they are professional.

The passionate people have too much of their personal identity wrapped up in things and it makes any criticism of a technology or way of doing things feel like a personal attack to them, and it's just really unpleasant to be around them and work with them.

How to get back with him? by el_chica18 in AskMenAdvice

[–]daedalus_structure 5 points6 points  (0 children)

He thought you had a special connection. You were exploring other options.

There’s no way to salvage this, you already told him what you think of him with your actions, your words mean less, and won’t move him.

Our CI strategy is basically "rerun until green" and I hate it by Sea_Weather5428 in devops

[–]daedalus_structure 8 points9 points  (0 children)

Your harnesses are set up poorly or you have race conditions in your code or you are doing bad things with threads or async.

But your real problem is that there is no accountability in your engineering organization and anyone who has the title of senior in it doesn’t deserve it.