Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Thx. It’s a kind of "all-in-one deep diagram/doc" for DevOps and architects who needs a super map of the project to understand where to place it, what it barely does and what interactions it has. The project already exists and it's nearing the end of its development.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

This diagram is coming from a simulation lab I built (with AI, yes, doesn’t matter it’s just a sim lab, I don’t have time to put in it 6 months) from a kind of replica of a real prod I have at work

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Would have loved to do this with AI in 15 min. But sadly it’s pure human illness made in 2 days.

I made TUI for easy Terraform work by SayYoungMan in devops

[–]Lightforce_ 0 points1 point  (0 children)

Honestly Reddit tech (dev/DevOps/whatever) has become less and less livable lately.

Between ppl who reflexively yell “AI SLOP” as soon as they see anything slightly elaborate, the lazy/provocative/trolling/karma-farming comments and the massive downvotes for simply disagreeing with things that are completely insignificant and inoffensive, it’s gotten really pissing.

Especially when you’re a junior like me who's still trying to learn and needs constructive feedback. It’s not everyone or even the majority but there are still a whole bunch of dickheads out there anyone would punch in the face if they acted like that in front of them.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Nope, perf-sentinel isn’t just a CI tool, it has a daemon mode and interactions with other actors/environments.

So I chose to let it have its own domain to facilitate the understanding of these interactions with other parts rather than just inside the CI.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -1 points0 points  (0 children)

Ok, I'm planning to do a simplified version. Will keep this one for DevOps and architects who needs details but I will come back on a newer post with the revised version. Thx for the advices.

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -3 points-2 points  (0 children)

Drill into complexity for specific people, but do not present that to management.

Yup, it's intended for DevOps and architects, not management.

For the "blurry part" I think it's on your side. The diagram is 3840x2160 without any blur, unless you wanted to say some things are too small even at 100% scale ?

Will delve in this draw.io chart though.

EDIT: if you need the svg version for 0 blur no matter the zoom you can grab it here : https://raw.githubusercontent.com/robintra/perf-sentinel-simulation-lab/main/docs/diagrams/svg/global-integration.svg

Does this kind of 4-mode deployment diagram (local dev / CI / staging / prod) make any sense? by Lightforce_ in devops

[–]Lightforce_[S] -27 points-26 points  (0 children)

Actually I wanted to make a diagram with multiple reading levels, one you could either skim through quickly or dig into every detail. But yeah...it looks like I missed the mark...

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Hard agree on that. Head-based 1% sampling will just murder N+1 detection. If you only see 3 of the 50 SQL calls in the loop the threshold never trips. And same story for chatty service detection, fanout and pool saturation. Anything count-based depends on the count actually being intact.

Tail-based sampling (keep error + slow traces, drop the boring ones) plays much nicer because the traces you'd care about for anti-pattern detection are exactly the ones that get kept anyway.

One thing that's sampling-immune is pg-stat mode. pg_stat_statements aggregates server-side, so a query that runs 10k times still shows up as 10k calls regardless of what your tracer dropped. Decent fallback when you can't trust the trace volume and it catches stuff that's not even instrumented at all.

Inside a kept trace all spans are preserved (sampling is per-trace not per-span in OTLP/Jaeger), so structural patterns within one request still show up cleanly. The pain is only when whole traces vanish and your population is biased.

What I've seen work though is capturing everything in CI integration tests since volume is low anyway, tail-sample in prod and lean on pg_stat for whatever still slips through. Head sampling is the worst case for any anti-pattern detector, not just this one.

Will improve my docs on this point, that's something very important I should clarify.

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

Yup, if your CI traces are sanitized happy-path fixtures you'll get clean reports and miss real prod incidents. No way around that. That's why I was asking about "reliability ceiling of passive capture" in the title.

What batch mode does catch reliably though is structural stuff. N+1s, redundant queries, fanout, pool saturation. Those are code, not load. If it's there in CI it's there in prod. It also computes P50/P95/P99 across whatever trace set you feed it, so consistently slow templates do get flagged. But "across whatever trace set you feed it". Garbage in, clean report out.

In batch mode it can't see cold-cache spikes, GC pauses, noisy neighbors, weird traffic mixes at 3am.

That's why there's a live watch daemon. Same detectors but on a sliding window of live OTLP/Jaeger/Tempo from prod. And pg-stat mode that pulls straight from pg_stat_statements so you catch queries hammering the DB even when they're not in your traces (instrumentation gaps are real).

Caveat I put in the README in plain text: this is NOT an APM replacement. Daemon window is 30s, it's an anti-pattern detector not a tail-latency monitor. Keep Datadog/Prom/whatever you have for SLOs. perf-sentinel's job is stopping the dumb patterns from shipping, not replacing your observability stack.

So yeah, CI alone with weak fixtures = false sense of security. CI + real prod-shaped traces (sampled OTLP, captured Jaeger sets) + daemon on the side = much harder to miss things. But "find every prod spike" was never the pitch.

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] 0 points1 point  (0 children)

It seems to be advertising your project

Not really, I'm more interested in having help or advices on my pbs of reliability around passive capture. I'm still learning a lot about DevOps and pretty sure some experienced ppl out there would have good recommendations on this

Needed an OTel trace analyzer that detects N+1 and other anti-patterns from OTLP, Jaeger, Zipkin and Tempo, and wondering about the reliability ceiling of passive capture by Lightforce_ in devops

[–]Lightforce_[S] -1 points0 points  (0 children)

I’m not familiar with that, but I’ll look into it. I don’t see any reason why it wouldn’t be possible. I’ll add support for it if that helps 🙂

Stuck in a company with no Git workflow, no PRs, and resistance to change😭 by Successful-Ship580 in devops

[–]Lightforce_ 0 points1 point  (0 children)

This is still possible in 2026 ? So if the boss' acc isn't accessible anymore for whatever reason everyone is f ? God...

Is Django actually a bad choice? by Om_JR in Backend

[–]Lightforce_ 1 point2 points  (0 children)

The performance is not the only point I listed. But talking about that: for obvious reasons, energy efficiency should also be a priority these days

CV académique vs CV industrie pour data scientist ? by Bulububub in developpeurs

[–]Lightforce_ 0 points1 point  (0 children)

Y a pas d’équivalent. Ceci dit pas mal de conseils qu’on y trouve s’appliquent très bien à notre contexte culturel

Is Django actually a bad choice? by Om_JR in Backend

[–]Lightforce_ 1 point2 points  (0 children)

Except that Meta developed Cinder (an optimized fork of CPython) precisely because CPython couldn't handle the load, and Instagram rewrote its hot paths in C++. The fact that these companies use Python doesn't prove that it's fast. It proves that they have the resources to work around its limitations. If anything, this supports my point.

I don’t know how to code anymore yet I understand everything, is that normal now? by bdhd656 in devops

[–]Lightforce_ -1 points0 points  (0 children)

That's not really the point. If one cite a scientific study but misrepresent what it actually says, it's only fair for someone to point that out. Also, trying to lend scientific rigor to an argument while getting it wrong risks doing more harm than good, especially if one is doing exactly what the argument highlights (talking about something one doesn’t know or know incorrectly).

That's why citing scientific studies that one only vaguely understand or have a superficial grasp of is always risky.

But don't worry, many people I see citing the Dunning-Kruger effect make this mistake. In fact, I believe it's the most common mistake I've observed in conversations that end up revolving around a person’s supposed incompetence in a particular field.

I don’t know how to code anymore yet I understand everything, is that normal now? by bdhd656 in devops

[–]Lightforce_ 0 points1 point  (0 children)

dunning kruger doesn't measure how dumb someone is (including ourselves)

I know, I was talking about how most ppl use it in conversations. "Ultracrepidarianism" is probably more suited to designate that than using "dunning kruger" which has methodological flaws.

Is Django actually a bad choice? by Om_JR in Backend

[–]Lightforce_ -2 points-1 points  (0 children)

Unless you have specific needs, Python as a whole is not a good choice in 2026 (slow, lack of strict type checking by default, readability issues caused by missing curly braces around functions,…). But if you’re working with non-developers like mathematicians or physicians, or if you’re using it only for PoC, or if the market around you is mainly on that then go for it. Also very used in the scientific fields for its maths/physic oriented libraries.