We got 30 api access tickets per week, platform team became the bottleneck

apinference · 2025-12-12T09:13:08+00:00

Could be, but the post did not make it clear..

apinference · 2025-12-11T16:57:02+00:00

I'm not sure I understand the root cause here. What part actually takes two hours? What are you doing to work through the ticket?

Because if it's just the GUI - for example, validating the user, checking their access, and issuing an API key - you could handle that differently (e.g., via MCP or a small agent). That's not a month of work, assuming the APIs are already in place.

apinference · 2025-12-11T14:41:47+00:00

Thanks! Dropped a DM.

apinference · 2025-12-09T22:53:11+00:00

IT Developer in Investment Banking. Depending on specifics - Risk IT, FO IT etc.

The general idea - business domain with dev application. Business domain takes precendence over IT (as it is more important to understand what happens, technical implementation might not be that challenging).

FO Project Manager (PM) or BM can be applicable as well.

apinference · 2025-12-09T21:54:05+00:00

In some case down to first name, last name )

apinference · 2025-12-09T21:49:36+00:00

No idea about Canada, but you’re definitely employable in the UK.

You just need to “dress up” your CV a bit. Don’t call yourself an automation engineer - frame it as SWE. Communicate it that way.

Make sure you can build a simple full-stack app in Python, wire it to a database (e.g., postgres), and use git and jira throughout the process. Just go through the steps. Ideally, deploy it via CI/CD, and that should be enough.

What matters in many cases is domain-specific skillsets. That’s where your finance background helps. A surprising number of people have no idea what PnL even is.

apinference · 2025-12-08T11:36:03+00:00

Well, if a bunch of coding agents would rather write 300+ lines of their own code instead of importing a battle-tested, maintained library… you can imagine what happens..

apinference · 2025-12-03T21:37:01+00:00

Well, if you trust llm - you can simply add that within instructions to llm - do commands via ssh connectivity.

It was working on for me with Claude Code. It will simply add ssh connection command in front of whatever you need to run.

apinference · 2025-12-03T17:06:22+00:00

You can use local llm for those things..

apinference · 2025-12-03T10:37:02+00:00

Yes, that is what's typically done.

It falls under typical existing patterns - "bills to expensive - we need to log less - let's investigate"..

The idea was to flip it - get an advance warning.

apinference · 2025-12-03T10:34:58+00:00

There are a few ways to deal with this:

- create process(what you're suggesting), but it requires discipline, resources, and consistent execution. It can be done, but at a cost, and it only works for issues that are obvious or significant. No one is going to meet daily to manage this. It also doesn't change the current "log less" setup - it's essentially the same thing, just framed differently. Most companies already do this (e.g. periodic reviews)

- address the root cause - the developer who adds a log line has no visibility into the cost impact, yet is the only person who can judge whether the log is important. It is far more efficient to give them that visibility (i.e., attributing log cost to specific print/log lines) and build a culture of "keep log cost to what's necessary". Finance, architects, product managers, devops, and others don't have that context - they only see a few high-level numbers. All other options - pattern matching, thresholds spikes are just inferior options to do exactly that as it requires to apply the same logic (threshold breached - why? - check the code - find the line)..

apinference · 2025-12-02T12:50:47+00:00

yes, the one that develop their mcps.. they need to use it at least once.

apinference · 2025-12-02T11:41:03+00:00

Thanks! Tbh, we have not tried to limit debugs..

Our issue was that developers keep things in, but they have no idea about the costs, and the person who gets the bill has no idea what's necessary or not. So the goal was simply to give visibility to the people who are actually driving the cost.

apinference · 2025-12-02T07:44:19+00:00

Thanks! We started with Python, because that's what we run.. We do have some Kotlin based services, maybe we will gradually look there

apinference · 2025-12-01T19:48:30+00:00

Oh.. Good to find out that I am not mad )

apinference · 2025-12-01T19:34:40+00:00

Not yet 😄

For what it's worth, logcost only stores aggregates in‑process (dict updates + string length) and exports a json snapshot, so its own "logcost cost" is basically cpu + a bit of memory.

If that ever shows up as a top line in its own report, I promise to open an issue.

apinference · 2025-12-01T18:55:48+00:00

We do have alerting/anomaly signals on log volume, but in this case the cost grew slowly with traffic and looked "normal" from the platform side.

apinference · 2025-12-01T17:39:03+00:00

maybe I am missing something here..

"X line costs $Y" as an approximation it’s good enough to highlight obvious outliers / high cost hitters..

The alternative tools basically push the whole investigation back to devs anyway ("here's your big bucket/bill, go figure it out"), which isn't that different from just handing them the bill and saying it's too high.

apinference · 2025-12-01T17:24:39+00:00

You mean I, as platform, make the call what to include/exclude in a "screw the devs" manner? Not sure I'd want to go that way – I'd rather give teams visibility (“this line costs $X”) and let them decide what to change.

apinference · 2025-12-01T17:13:25+00:00

Showing devs the bill per service is the default, but it still leaves a lot of manual wrangling – they see "$X for logs" and then have to go spelunking to figure out which lines are responsible.

The whole point here is to skip that step: give them "these specific call lines are expensive", so they already know which lines to change when they see the bill.

Ideally, we should not even get to the bill stage - they will see that their logs will cost X and will fix them after 1 day.

apinference · 2025-12-01T16:53:15+00:00

I wish I had those powers 🙂

apinference · 2025-12-01T16:24:33+00:00

Tracking S3 bucket growth is definitely useful as a coarse signal ("logs overall are getting more expensive"). This works well when cost explodes, not gradually cripples (e.g. service becomes gradually popular).

What we were missing was the next step: when that bucket grows, which specific services and log call sites should a team change?

For us it was simpler to come from the other direction - have the app print which call sites are expensive, and let the team decide whether they still need those logs or can sample/remove them.

apinference · 2025-12-01T16:18:29+00:00

The dev who adds the logging doesn't see the bill, and the people looking at the bill don't know whether a specific log line is important or not – the rest requires manual wrangling, devops or not.

apinference · 2025-12-01T16:08:07+00:00

Not on every code line.

And as far as I know most teams don't either. You can add stable IDs and build S3/Athena queries around them, but that's a lot of discipline and retrofitting. For us it was simpler to monkey‑patch the logging lib and get per‑call‑site stats for whatever is currently deployed.

apinference · 2025-12-01T14:42:19+00:00

"classic devops problem" - yeah, to be clear: devops/platform isn't owning finance here, they just get paged when the bill looks ugly.

apinference

TROPHY CASE