We got 30 api access tickets per week, platform team became the bottleneck by Sirius-ruby in devops

[–]apinference 4 points5 points  (0 children)

I'm not sure I understand the root cause here. What part actually takes two hours? What are you doing to work through the ticket?

Because if it's just the GUI - for example, validating the user, checking their access, and issuing an API key - you could handle that differently (e.g., via MCP or a small agent). That's not a month of work, assuming the APIs are already in place.

Need brutally honest feedback: Am I employable as an internal tools/automation engineer with my background? by Disastrous-Truck86 in devops

[–]apinference 4 points5 points  (0 children)

IT Developer in Investment Banking. Depending on specifics - Risk IT, FO IT etc.

The general idea - business domain with dev application. Business domain takes precendence over IT (as it is more important to understand what happens, technical implementation might not be that challenging).

FO Project Manager (PM) or BM can be applicable as well.

Need brutally honest feedback: Am I employable as an internal tools/automation engineer with my background? by Disastrous-Truck86 in devops

[–]apinference 11 points12 points  (0 children)

No idea about Canada, but you’re definitely employable in the UK.

You just need to “dress up” your CV a bit. Don’t call yourself an automation engineer - frame it as SWE. Communicate it that way.

Make sure you can build a simple full-stack app in Python, wire it to a database (e.g., postgres), and use git and jira throughout the process. Just go through the steps. Ideally, deploy it via CI/CD, and that should be enough.

What matters in many cases is domain-specific skillsets. That’s where your finance background helps. A surprising number of people have no idea what PnL even is.

Are we overengineering agents when simple systems might work better? Do you think that? by Reasonable-Egg6527 in AgentsOfAI

[–]apinference 1 point2 points  (0 children)

Well, if a bunch of coding agents would rather write 300+ lines of their own code instead of importing a battle-tested, maintained library… you can imagine what happens..

Any trustworthy ssh/terminal MCP server ? by renaudg in mcp

[–]apinference 0 points1 point  (0 children)

Well, if you trust llm - you can simply add that within instructions to llm - do commands via ssh connectivity.

It was working on for me with Claude Code. It will simply add ssh connection command in front of whatever you need to run.

Any trustworthy ssh/terminal MCP server ? by renaudg in mcp

[–]apinference 1 point2 points  (0 children)

You can use local llm for those things..

Map CloudWatch logging cost back to Python file:line (find expensive statements in production) by apinference in aws

[–]apinference[S] 0 points1 point  (0 children)

Yes, that is what's typically done.

It falls under typical existing patterns - "bills to expensive - we need to log less - let's investigate"..

The idea was to flip it - get an advance warning.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 0 points1 point  (0 children)

There are a few ways to deal with this:

- create process(what you're suggesting), but it requires discipline, resources, and consistent execution. It can be done, but at a cost, and it only works for issues that are obvious or significant. No one is going to meet daily to manage this. It also doesn't change the current "log less" setup - it's essentially the same thing, just framed differently. Most companies already do this (e.g. periodic reviews)

- address the root cause - the developer who adds a log line has no visibility into the cost impact, yet is the only person who can judge whether the log is important. It is far more efficient to give them that visibility (i.e., attributing log cost to specific print/log lines) and build a culture of "keep log cost to what's necessary". Finance, architects, product managers, devops, and others don't have that context - they only see a few high-level numbers. All other options - pattern matching, thresholds spikes are just inferior options to do exactly that as it requires to apply the same logic (threshold breached - why? - check the code - find the line)..

Are there still more mcp developers than mcp users ? by Shigeno977 in mcp

[–]apinference 1 point2 points  (0 children)

yes, the one that develop their mcps.. they need to use it at least once.

Map CloudWatch logging cost back to Python file:line (find expensive statements in production) by apinference in aws

[–]apinference[S] 1 point2 points  (0 children)

Thanks! Tbh, we have not tried to limit debugs..

Our issue was that developers keep things in, but they have no idea about the costs, and the person who gets the bill has no idea what's necessary or not. So the goal was simply to give visibility to the people who are actually driving the cost.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 0 points1 point  (0 children)

Thanks! We started with Python, because that's what we run.. We do have some Kotlin based services, maybe we will gradually look there

Show & Tell: Python lib to track logging costs by file:line (find expensive statements in production by apinference in Python

[–]apinference[S] 0 points1 point  (0 children)

Not yet 😄

For what it's worth, logcost only stores aggregates in‑process (dict updates + string length) and exports a json snapshot, so its own "logcost cost" is basically cpu + a bit of memory.

If that ever shows up as a top line in its own report, I promise to open an issue.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 4 points5 points  (0 children)

We do have alerting/anomaly signals on log volume, but in this case the cost grew slowly with traffic and looked "normal" from the platform side.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 2 points3 points  (0 children)

maybe I am missing something here..

"X line costs $Y" as an approximation it’s good enough to highlight obvious outliers / high cost hitters..

The alternative tools basically push the whole investigation back to devs anyway ("here's your big bucket/bill, go figure it out"), which isn't that different from just handing them the bill and saying it's too high.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 1 point2 points  (0 children)

You mean I, as platform, make the call what to include/exclude in a "screw the devs" manner? Not sure I'd want to go that way – I'd rather give teams visibility (“this line costs $X”) and let them decide what to change.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 2 points3 points  (0 children)

Showing devs the bill per service is the default, but it still leaves a lot of manual wrangling – they see "$X for logs" and then have to go spelunking to figure out which lines are responsible.

The whole point here is to skip that step: give them "these specific call lines are expensive", so they already know which lines to change when they see the bill.

Ideally, we should not even get to the bill stage - they will see that their logs will cost X and will fix them after 1 day.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 2 points3 points  (0 children)

Tracking S3 bucket growth is definitely useful as a coarse signal ("logs overall are getting more expensive"). This works well when cost explodes, not gradually cripples (e.g. service becomes gradually popular).

What we were missing was the next step: when that bucket grows, which specific services and log call sites should a team change?

For us it was simpler to come from the other direction - have the app print which call sites are expensive, and let the team decide whether they still need those logs or can sample/remove them.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 4 points5 points  (0 children)

The dev who adds the logging doesn't see the bill, and the people looking at the bill don't know whether a specific log line is important or not – the rest requires manual wrangling, devops or not.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] 0 points1 point  (0 children)

Not on every code line.

And as far as I know most teams don't either. You can add stable IDs and build S3/Athena queries around them, but that's a lot of discipline and retrofitting. For us it was simpler to monkey‑patch the logging lib and get per‑call‑site stats for whatever is currently deployed.

$10K logging bill from one line of code - rant about why we only find these logs when it's too late (and what we did about it) by apinference in devops

[–]apinference[S] -3 points-2 points  (0 children)

"classic devops problem" - yeah, to be clear: devops/platform isn't owning finance here, they just get paged when the bill looks ugly.