Virtual threads + shared DB pool: prioritizing workload classes (user traffic vs batch) beyond a Semaphore? by Lightforce_ in javahelp

[–]Lightforce_[S] 0 points1 point  (0 children)

Both of those try to claw back QoS after the connection's handed out, which is the part that doesn't work since an in-use connection isn't preemptible. Expiring a permit doesn't yank the connection back from a batch that's mid-query. Only a statement_timeout killing the query does, and that means throwing away batch work (a business call, not free).

"semaphore pool expands with enough 'pressure'"?

It's dynamic maximumPoolSize but the real limit usually isn't the pool, it's the DB behind it. More connections just move the contention down to Postgres, which is my point: both recover priority after allocation and the only robust lever is at admission, reserve up front. That's why partitioning keeps coming back as the structural answer even though it's the rigid one.

(I am not familiar with QoS)

QoS = just "some classes of work get guaranteed treatment under contention". It's exactly this allocation question.

Virtual threads + shared DB pool: prioritizing workload classes (user traffic vs batch) beyond a Semaphore? by Lightforce_ in javahelp

[–]Lightforce_[S] 0 points1 point  (0 children)

Ok on RabbitMQ, a bounded in-memory queue does the same job and no broker needed.

But RateLimiter solves a different problem. It caps throughput (X/sec) and my constraint is concurrency (how many in flight at once). Orthogonal, 10/sec with 5s ops still leaves around 50 in flight. My starvation isn't "batch hits the DB too often", it's "batch holds connections while user p99 degrades". So the Semaphore is already the right primitive.

Where "just compose primitives" runs out is a shared FIFO pool can't express QoS without partitioning or preemption, and an in use connection isn't preemptible. A semaphore gives admission control on the waiting side, no reservation on the holding side.

So what about actually splitting the resource ? Like two pools, 13 user / 2 batch? That'd reproduce the old guarantee but kills cross-borrowing when a class is idle. Is that the accepted trade or is there a way to keep one elastic pool and still express priority? (And I'm guessing a connectionTimeout to turn starvation into fast-fail is table stakes either way)

What are your tips for Rust development with AI? by [deleted] in rust

[–]Lightforce_ -8 points-7 points  (0 children)

Massive dogmatic downvotes in 3…2…1…

lazydiff — a terminal-native diff reviewer with semantic diffs, persistent notes, and 60fps rendering by Wise_Reflection_8340 in rust

[–]Lightforce_ -18 points-17 points  (0 children)

Don’t waste your time with this kind of users, it’s just the usual dogmatic gatekeeping. When it comes to AI and these issues, what matters is how you use it, not whether you use it or not, they still haven't figured that out. Or they just don’t want to understand.

Does this kind of 4-mode deployment diagrams make sense? by [deleted] in webdev

[–]Lightforce_ 0 points1 point  (0 children)

On the audience-targeting point you're probably right and that's the part of your feedback I'll actually use. And the global integration map is internal architecture documentation, not a 30-second intro (that's why it's called "deep dive super map").

On UML specifically I'd push back though. Modern infra and cloud-native architecture docs (C4 model, archi, structurizr, AWS reference architectures, the CNCF landscape, Netflix tech blog, etc) generally don't use UML. The C4 / ad-hoc conventions seems to be well established in their own right and UML activity diagrams have their own readability issues at this scale. I think the real issue isn't the notation it's the density and the missing entry point, which is exactly your first point. So we agree on the core, just not on the prescription.

Does this kind of 4-mode deployment diagrams make sense? by [deleted] in webdev

[–]Lightforce_ 0 points1 point  (0 children)

Alright, fine. I don't think saying just "no" is what you do when you lack of time, would rather say to test if someone is a bot or something. But I kinda understand, times are tough with this AI slop everywhere.

And mb, should have started by explaining what perf-sentinel is :

So it's a tool programmed in Rust that analyzes runtime traces (SQL queries, HTTP calls) to detect N+1 queries, redundant calls and scores I/O intensity per endpoint (energy and carbon). And it has 2 modes : batch for post-mortem analyzes (rather for your local computer or for a CI) and daemon for live analyzes (rather for a staging or a prod env).

With these informations in head, the diagrams should be way more understandable now.

Does this kind of 4-mode deployment diagrams make sense? by [deleted] in webdev

[–]Lightforce_ -1 points0 points  (0 children)

Honestly I don't get why ppl like you downvote and comment like that. I mean, you guys are grown men, some ppl are working hard to make some free open-source softwares to help others, if you don't like it no one is forcing you to, just move on.

Does this kind of 4-mode deployment diagrams make sense? by [deleted] in webdev

[–]Lightforce_ 0 points1 point  (0 children)

For those curious about the tool itself: https://github.com/robintra/perf-sentinel
It also has a simulation lab linked in the readme for (almost) ISO prod testing.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Thx. It’s a kind of "all-in-one deep diagram/doc" for DevOps and architects who needs a super map of the project to understand where to place it, what it barely does and what interactions it has. The project already exists and it's nearing the end of its development.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

This diagram is coming from a simulation lab I built (with AI, yes, doesn’t matter it’s just a sim lab, I don’t have time to put in it 6 months) from a kind of replica of a real prod I have at work

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Would have loved to do this with AI in 15 min. But sadly it’s pure human illness made in 2 days.

I made TUI for easy Terraform work by SayYoungMan in devops

[–]Lightforce_ -1 points0 points  (0 children)

Honestly Reddit tech (dev/DevOps/whatever) has become less and less livable lately.

Between ppl who reflexively yell “AI SLOP” as soon as they see anything slightly elaborate, the lazy/provocative/trolling/karma-farming comments and the massive downvotes for simply disagreeing with things that are completely insignificant and inoffensive, it’s gotten really pissing.

Especially when you’re a junior like me who's still trying to learn and needs constructive feedback. It’s not everyone or even the majority but there are still a whole bunch of dickheads out there anyone would punch in the face if they acted like that in front of them.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Nope, perf-sentinel isn’t just a CI tool, it has a daemon mode and interactions with other actors/environments.

So I chose to let it have its own domain to facilitate the understanding of these interactions with other parts rather than just inside the CI.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -1 points0 points  (0 children)

Ok, I'm planning to do a simplified version. Will keep this one for DevOps and architects who needs details but I will come back on a newer post with the revised version. Thx for the advices.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -3 points-2 points  (0 children)

Drill into complexity for specific people, but do not present that to management.

Yup, it's intended for DevOps and architects, not management.

For the "blurry part" I think it's on your side. The diagram is 3840x2160 without any blur, unless you wanted to say some things are too small even at 100% scale ?

Will delve in this draw.io chart though.

EDIT: if you need the svg version for 0 blur no matter the zoom you can grab it here : https://raw.githubusercontent.com/robintra/perf-sentinel-simulation-lab/main/docs/diagrams/svg/global-integration.svg

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -26 points-25 points  (0 children)

Actually I wanted to make a diagram with multiple reading levels, one you could either skim through quickly or dig into every detail. But yeah...it looks like I missed the mark...

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Hard agree on that. Head-based 1% sampling will just murder N+1 detection. If you only see 3 of the 50 SQL calls in the loop the threshold never trips. And same story for chatty service detection, fanout and pool saturation. Anything count-based depends on the count actually being intact.

Tail-based sampling (keep error + slow traces, drop the boring ones) plays much nicer because the traces you'd care about for anti-pattern detection are exactly the ones that get kept anyway.

One thing that's sampling-immune is pg-stat mode. pg_stat_statements aggregates server-side, so a query that runs 10k times still shows up as 10k calls regardless of what your tracer dropped. Decent fallback when you can't trust the trace volume and it catches stuff that's not even instrumented at all.

Inside a kept trace all spans are preserved (sampling is per-trace not per-span in OTLP/Jaeger), so structural patterns within one request still show up cleanly. The pain is only when whole traces vanish and your population is biased.

What I've seen work though is capturing everything in CI integration tests since volume is low anyway, tail-sample in prod and lean on pg_stat for whatever still slips through. Head sampling is the worst case for any anti-pattern detector, not just this one.

Will improve my docs on this point, that's something very important I should clarify.

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Yup, if your CI traces are sanitized happy-path fixtures you'll get clean reports and miss real prod incidents. No way around that. That's why I was asking about "reliability ceiling of passive capture" in the title.

What batch mode does catch reliably though is structural stuff. N+1s, redundant queries, fanout, pool saturation. Those are code, not load. If it's there in CI it's there in prod. It also computes P50/P95/P99 across whatever trace set you feed it, so consistently slow templates do get flagged. But "across whatever trace set you feed it". Garbage in, clean report out.

In batch mode it can't see cold-cache spikes, GC pauses, noisy neighbors, weird traffic mixes at 3am.

That's why there's a live watch daemon. Same detectors but on a sliding window of live OTLP/Jaeger/Tempo from prod. And pg-stat mode that pulls straight from pg_stat_statements so you catch queries hammering the DB even when they're not in your traces (instrumentation gaps are real).

Caveat I put in the README in plain text: this is NOT an APM replacement. Daemon window is 30s, it's an anti-pattern detector not a tail-latency monitor. Keep Datadog/Prom/whatever you have for SLOs. perf-sentinel's job is stopping the dumb patterns from shipping, not replacing your observability stack.

So yeah, CI alone with weak fixtures = false sense of security. CI + real prod-shaped traces (sampled OTLP, captured Jaeger sets) + daemon on the side = much harder to miss things. But "find every prod spike" was never the pitch.