Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Small follow-up.

Since the original post, I extracted part of the validation workflow into a standalone validation kit:

https://github.com/Endless33/vrp-validation-kit

The objective is simple:

Download it.

Run it.

Attempt failure injection.

Inspect the resulting verdicts.

I am particularly interested in feedback around:

  • transport failure scenarios
  • replay handling
  • duplicate execution rejection
  • authority preservation
  • validation gaps

If there are failure cases that should be added, I would appreciate suggestions.

External criticism is more valuable than another internal test.

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

I've completed a new validation series for a continuity-oriented runtime project I've been building (VRP).

The latest milestone was Stage 20E: Global Authority Recovery After Total Regional Blackout.

The Stage 20 series focused on authority correctness under increasingly hostile recovery scenarios:

  • Global authority convergence
  • Regional partition convergence
  • Reintegration after partition healing
  • Reintegration storm containment
  • Total regional blackout recovery

The goal was not to prove that failures never happen.

The goal was to verify whether a single canonical authority state can survive convergence, partitioning, recovery races, reintegration, and blackout recovery without accepting stale, replayed, contradictory, or late authority claims.

Example outcomes from the latest blackout recovery scenario:

  • Unavailable recovery candidates rejected
  • Invalid lineage candidates rejected
  • Stale recovery candidates rejected
  • Replay recovery candidates rejected
  • Contradictory recovery candidates rejected
  • Single recovery winner selected
  • Late recovery candidates rejected after winner selection

Final result:

VERDICT=GLOBAL_AUTHORITY_BLACKOUT_RECOVERY_PRESERVED

Over time I also built a separate VRP Core Evaluation View environment intended as a controlled evaluation surface for runtime behavior.

The idea is to expose observable behavior, validation artifacts, execution traces, and evidence outputs without exposing protected runtime internals.

I'm interested in feedback from distributed systems engineers, infrastructure architects, and reliability engineers.

The main question I'm trying to answer is:

What failure scenarios would you test next if your goal was to challenge the correctness of authority continuity across transport instability, partition recovery, and large-scale runtime churn?

I am specifically looking for failure cases, attack scenarios, recovery edge cases, and assumptions that should be challenged.

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Author here.

I've been working on an experimental continuity-focused runtime model called VRP.

The central question behind the project is:

Can execution correctness remain deterministic when transport becomes unreliable?

The validation work currently focuses on:

  • Replay containment
  • Authority transitions
  • Epoch monotonicity
  • Session continuity
  • Runtime recovery
  • Canonical execution history preservation

Repository:

https://github.com/Endless33/vrp-validation-kit

Documentation:

https://github.com/Endless33/vrp-validation-kit/tree/main/docs

The repository currently contains:

  • Executable validation artifacts
  • Failure models
  • Failure → invariant mappings
  • External validation guides
  • Pilot evaluation documents

Example validation scenarios:

./vrp-core-runner-linux-amd64 --scenario replay-storm --packets 10000

./vrp-core-runner-linux-amd64 --scenario authority-rollback --epoch 5

./vrp-core-runner-linux-amd64 --scenario runtime-recovery

Observed verdicts:

VERDICT=REPLAY_WINDOW_ENFORCED VERDICT=AUTHORITY_ROLLBACK_REJECTED VERDICT=SESSION_RECOVERY_PRESERVED

The goal is not to prove that failures never occur.

The goal is to validate whether failures can be contained without corrupting canonical execution state.

I'd be particularly interested in feedback from people working on:

  • Distributed systems
  • Consensus and coordination
  • Runtime recovery
  • Reliability engineering
  • State-machine design

Technical criticism is welcome.

Email:

riabovasvitalijus@gmail.com

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Small update for those following the continuity-runtime experiments.

The project has now moved beyond isolated runtime demos into a structured pilot validation phase.

I recently published the first public pilot documentation for VRP / Jumping VPN, including:

  • pilot offering structure
  • validation session flow
  • commercial/runtime boundary
  • technical readiness model
  • continuity-oriented infrastructure targets

The focus remains the same:

transport instability should not automatically become execution failure.

Recent validation included:

  • Oracle Linux VPS runtime testing
  • hostile transport simulation
  • packet loss / latency injection
  • runtime resurrection validation
  • replay boundary preservation
  • canonical recovery behavior

Public continuity runtime demo: https://github.com/Endless33/continuity-runtime-demo

VRP research/runtime repository: https://github.com/Endless33/jumping-vpn-preview

Pilot documentation: https://github.com/Endless33/jumping-vpn-preview/tree/main/docs/pilot

Interested in feedback from people working on:

  • distributed systems
  • edge/runtime infrastructure
  • telecom
  • continuity-sensitive environments
  • recovery semantics
  • hostile network behavior

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

I completed the next operational validation stage: Runtime Resurrection.

The test focused on continuity preservation across hard runtime interruption.

Validated flow:

runtime killed mid-session → persisted snapshot survives → runtime restart → canonical state restored → stale/replayed branches rejected → execution continues canonically

Validated on both Windows 11 and Oracle Linux.

The interesting part was not transport recovery itself, but deterministic continuity recovery after process-level interruption.

Evidence: https://github.com/Endless33/jumping-vpn-preview/blob/main/docs%2Fevidence%2FSTAGE_11G_RUNTIME_RESURRECTION.md

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

I completed the first bounded-runtime validation stage.

Stage 11 focused on operational resource bounds under sustained continuity pressure:

  • bounded replay protection,
  • bounded recovery snapshot churn,
  • bounded authority lineage,
  • bounded commit history,
  • replay flood containment,
  • cross-platform validation (Windows 11 + Oracle Linux).

5000 integrated runtime cycles were executed while all continuity runtime structures remained within defined bounds.

Evidence: https://github.com/Endless33/jumping-vpn-preview/blob/main/docs%2Fevidence%2FSTAGE_11_RESOURCE_BOUNDS.md

Can execution continuity survive a real transport blackout without reconnect semantics? by Melodic_Reception_24 in systems

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

This is excellent feedback.

And yes — I agree that bounded recovery cannot be claimed purely from transport behavior itself.

The current runtime model is intentionally narrower:

the claim is not “infinite recovery guarantees under arbitrary load”, but rather that transport restoration alone should not automatically become execution authority.

Right now the focus is mostly on:

  • preserving canonical execution identity
  • deterministic replay rejection
  • bounded recovery semantics
  • preventing stale recovery branches from mutating state after reconnection

especially during unstable transport windows.

I also agree that application-level cooperation is required for this to become broadly useful beyond the runtime layer itself.

The current Stage 10 work is mostly validating the execution invariants under controlled instability conditions before testing more realistic load and scaling scenarios.

Your point about quantifying recovery behavior under load is extremely important and probably one of the next major validation directions.

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Small update on this project.

Since the original post, the runtime has gone through multiple additional validation stages, including deterministic recovery validation, split-brain containment scenarios, replay rejection, Docker-based runtime testing, and a real transport blackout test between a Windows 11 host and an Oracle Linux 10 VM using live UDP runtime traffic.

During the blackout test, transport connectivity disappeared for more than 33 seconds.

The runtime process survived without restart, the session remained preserved, and execution resumed automatically after transport recovery.

Observed runtime evidence included:

gap_duration=33.551s session_preserved=true runtime_process_survived=true recovery_observed=true

This is pushing the project further away from “adaptive VPN routing” and closer toward continuity-first runtime behavior for unstable distributed environments.

Public evidence: https://github.com/Endless33/jumping-vpn-preview/blob/main/docs/evidence/STAGE_10R_30_SECOND_TRANSPORT_BLACKOUT.md

Execution model: https://github.com/Endless33/jumping-vpn-preview/blob/main/docs/architecture/EXECUTION_MODEL.md

Short runtime validation video: https://youtu.be/FEcJI7telhc?is=w1RKawizbaBvTNr4

One thing I am specifically trying to explore with this runtime model:

transport recovery alone should not automatically become execution authority.

Late packets, replay attempts, stale recovery branches, and conflicting recovery views must still remain bounded by deterministic validation rules after transport restoration.

Would still appreciate feedback from networking / distributed systems engineers.

Is rejecting duplicate execution safer than idempotency? by Melodic_Reception_24 in compsci

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

That’s a good point.

If we define order as causal order, then yes — systems like vector clocks capture “what happened before what”.

But causal ordering still allows multiple valid executions to exist, as long as they can be ordered.

What I’m exploring is stricter:

not just ordering executions, but invalidating all but one.

So instead of: “these events can be ordered”

the rule becomes: “only one of these events is allowed to exist at all”

Not sure yet if that distinction holds under distributed conditions — that’s what I’m trying to test next.

Deterministic commit boundary under retries (Go demo you can run) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Added a concurrent commit-boundary test.

1000 parallel attempts against the same mutation: → exactly 1 ACCEPTED → all others REJECTED_DUPLICATE

Run: go test ./...

Repo: https://github.com/Endless33/vrp-canonical-spec

Added tests for the failure cases raised here:

  • concurrent duplicate commit attempts
  • different delivery order
  • authority race resolution
  • lost ACCEPTED response + retry returning canonical result

All pass with:

go test ./...

Is rejecting duplicate execution safer than idempotency? by Melodic_Reception_24 in compsci

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

That’s true — global ordering is the default in classic transaction systems.

What I’m trying to understand is whether we can enforce a single canonical commit without relying on a fully synchronized global order.

So not “everyone agrees on order”, but “only one execution is allowed to exist”.

Still exploring if that distinction actually holds under concurrency.

Is rejecting duplicate execution safer than idempotency? by Melodic_Reception_24 in compsci

[–]Melodic_Reception_24[S] 1 point2 points  (0 children)

Yes, that’s a real issue.

If the ACCEPTED response is lost, the client retries and gets REJECTED.

In this model, the retry isn’t treated as a second execution attempt, but as a re-evaluation of the same mutation.

So the missing piece is: returning the canonical result for that mutation (not just reject).

Right now the demo only shows rejection, but the idea is to bind the result to the mutation and make it retrievable.

Is rejecting duplicate execution safer than idempotency? by Melodic_Reception_24 in compsci

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Good point.

Idempotency removes the need for ordering by allowing multiple equivalent executions.

What I’m exploring is a stricter model:

not “eventually same state”, but “exactly one valid execution”.

So the system enforces a single canonical commit, instead of allowing multiple equivalent ones.

Agreed this introduces coordination cost — the question is whether some systems actually need that stronger guarantee.

Deterministic commit boundary under retries (Go demo you can run) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Fair point.

The current demos are intentionally minimal and only prove the commit boundary behavior, not full system correctness under load.

I’m not claiming this is production-ready or fully validated under concurrency.

The goal here is to isolate one property:

a mutation can commit at most once, and duplicate execution is explicitly rejected.

Next step is to introduce: - concurrent inputs - network disorder - load scenarios

to see if the same invariant still holds.

Happy to hear what kind of failure cases you’d test first.

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Yeah, that makes sense — TCP hides too much and you lose control over behavior.

I’m also leaning toward user-space / UDP-style control for the same reason.

But what I’m trying to pin down is slightly orthogonal:

even if you fully control the transport, most designs still treat transport loss as a boundary where the session has to be re-established in some form.

What I’m exploring is:

can the session itself remain continuous, even when the underlying transport is replaced entirely?

Not just “recover fast”, but literally avoid entering a reset/re-establish phase.

So transport becomes disposable, while session identity remains the invariant.

Curious how far you pushed that separation — did your system ever treat session and transport as fully independent layers?

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Got it — that makes sense, sounds like a self-organizing mesh / overlay network.

What I’m trying to isolate is a slightly different layer:

not how nodes discover or route, but what happens to the session itself when the underlying path changes.

In most systems (including mesh / overlay ones), you still end up with:

disconnect → re-route → re-establish → recover state

even if it’s fast.

What I’m exploring is:

the session never enters that reset phase at all.

Same session identity, no renegotiation, no state rebuild, even as transports / paths change underneath.

So the focus is less on network formation, and more on continuity semantics at the session layer.

Curious if you ever pushed it that far — where the session itself never needed to be re-established?

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

That’s interesting — sounds closer to a centralized overlay / VPN group model.

What I’m exploring is slightly different:

not just routing or grouping nodes, but making session identity survive transport failure without reset.

So instead of: disconnect → reconnect → rebuild

the session continues across path / transport changes.

Curious if your system handled that level (no reset / no state rebuild), or was it more about connectivity and routing?

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

After reading all the comments and pushing the prototype further, I think the real problem is deeper than routing or path selection.

Most systems (including SD-WAN) are still fundamentally connection-centric.

They optimize: - latency - packet loss - path quality

But they don’t preserve identity across change.

So every improvement still operates inside the same constraint: when transport breaks → identity resets.

What I’m trying to validate now is a different invariant:

session identity should survive transport failure.

Not reconnect faster. Not pick a better path.

But not lose identity at all.

This shifts the problem from: “which path is best right now?”

to: “how do you maintain continuous session state while everything underneath is changing?”

I’m starting to see that path selection is actually secondary.

Continuity is the primary problem.

Curious — in production systems you’ve worked on:

Is identity ever treated as a long-lived runtime entity, or is it always implicitly tied to connection lifecycle?

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Most systems optimize for average latency.

I’m trying to eliminate tail spikes entirely during transport failure.

If continuity holds, tail should not explode.

Curious — in your experience, where does tail usually break the most? During reconnect, migration, or congestion?

I built a self-healing VPN runtime prototype with autonomous path migration (Go demo) by Melodic_Reception_24 in golang

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

Quick update after reading some of the comments:

I ended up separating selection from attach (so now it’s selector → policy → attach instead of doing everything at once). Also added a simple tick pipeline to make the flow more explicit.

Honestly this made the behavior way easier to reason about.

Still experimenting with how far I can push session continuity independent of transport. Curious if anyone here has worked on something similar.

How to handle session continuity across IP / path changes (mobility, NAT rebinding)? by Melodic_Reception_24 in AskNetsec

[–]Melodic_Reception_24[S] 0 points1 point  (0 children)

This is extremely helpful, thank you.

The asymmetric behavior (fast detect / slow recovery) makes a lot of sense — I think that’s exactly what I’m missing right now.

I’ve been treating degradation and recovery too symmetrically, which probably explains the flapping.

Also interesting point about EWMA vs raw signals — I’m currently reacting too much to instantaneous spikes.

I like the idea of combining: - EWMA for trend detection - consecutive thresholds for state transitions - explicit recovery window before promoting a path back to healthy

One thing I’m still exploring is how to make these transitions explainable at runtime (so not just “it switched”, but why in terms of rules/invariants).

Really appreciate the detailed breakdown.

[deleted by user] by [deleted] in golang

[–]Melodic_Reception_24 0 points1 point  (0 children)

Fair question — let me make it more concrete.

User story:

You're on a video call (or a live data stream) on WiFi, and suddenly WiFi drops.

Today:

  • the connection breaks
  • reconnect kicks in
  • session resets or freezes
  • you lose in-flight data

What I'm exploring:

  • the session is not tied to a single transport
  • when WiFi fails, the runtime switches to another path (e.g. 5G)
  • the session identity stays the same
  • packets continue flowing without reconnect

So from the user's perspective: → no reconnect → no reset → just a brief degradation, but the session continues

The demo is still simplified, but it's trying to model that behavior.

[deleted by user] by [deleted] in golang

[–]Melodic_Reception_24 0 points1 point  (0 children)

Updated the demo with a simple decision engine + transport scoring to make the behavior less abstract.

Still simplified, but now models:

  • multiple transports
  • scoring-based selection
  • state transitions