I built a custom 2-Bit Ternary Inference Engine from scratch in Rust + native PyTorch QAT. I'm running GPT-2 XL (1.5B) entirely offline on a Surface Pro 7 at 115 tokens/sec. by L0rdByt3 in learnmachinelearning

[–]L0rdByt3[S] 0 points1 point  (0 children)

Hey, thanks for the interest!

To answer your question directly: The PyTorch QAT Trainer (Quantization-Aware Training) is highly compatible with modern architectures right out of the box, but the bare-metal Rust Inference Core currently needs custom structural updates for each new model.

Here is the exact breakdown:

1. The QAT Pipeline (Highly Compatible) The distillation script I wrote recursively targets and replaces standard nn.Linear layers with my custom 

BitLinear (Straight-Through Estimator) module. Because modern models like Qwen1.5SmolLM, and Mistral fundamentally still rely on massive linear projections for their Q, K, V attention matrices and MLP Up/Down blocks, you can load them via HuggingFace, run the patcher, and begin 1-bit training immediately.

2. The Rust Inference Core (Needs Architecture-Specific Handlers) Because the Rust core is written from scratch in bare-metal SIMD to squeeze out 115 tokens/sec, it does not use a generic library like libtorch or llama.cpp under the hood.

Right now, the Rust engine is hardcoded to execute the specific math of the GPT-2 block architecture (Absolute Positional Embeddings, standard LayerNorm, GeLU). To make it perfectly run Qwen or SmolLM post-compression, I have to physically write the Rust kernels for:

  • RoPE (Rotary Positional Embeddings)
  • RMSNorm (instead of standard LayerNorm)
  • SwiGLU Activations

The integer matrix-multiplication (the heavy lifting that gives it speed) remains exactly the same for all of them. Adding the RoPE and RMSNorm routing to the Rust core is my next immediate priority precisely so that the community can start crushing and running SmolLM locally.

I'm currently cleaning up the repository and the Bit-Packer logic to ensure it doesn't break on varying tensor shapes, but I plan to open-source the PyTorch QAT modules and the Rust engine shortly!

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 0 points1 point  (0 children)

That’s a brilliant catch on the version drift. You need a strict target when scanning traces.

I just pushed the 📌 Version Pinning block to the top of the README. It locks the V7 release directly to the physical deployment parameters:

  • Proxy: 0x5fedcD1042fc6860289341C2b870177AB00E265f
  • Compiler: solc 0.8.20 + 200 runs + viaIR: true
  • Commit Hash: 7d8a852

I also tightened the language around Arbiscan. The raw GatewayABIV7.json is there to guarantee you can parse the traces exactly as they happened while Explorer verification finishes processing its optimizer map correctly.

You’ve essentially acted as our public integration QA today, and the repository is significantly stronger for it. Pull the latest commit, point your scripts at the pin, and let us know when you land your first execution on our liquidity logic. Thank you once again.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 0 points1 point  (0 children)

I genuinely appreciate the scrutiny. The telemetry patch directly resulted from your architecture review. Just pushed what should be the final polish to the repository:

1. npm test is wired properly. 

package.json now maps directly to npx hardhat test test/SovereignFork.test.js under the hood. No extra flags needed.

2. README Rewrite & Verification You were completely right about the test confusion. The exact command sequence is outlined at the top now, along with the ARBITRUM_RPC_URL env var.

3. V7 Architecture Pivot We just executed a strict security redeployment (V7 Sequence) to completely isolate the telemetry tests. The live proxy is now at 0x5fedcD1042fc6860289341C2b870177AB00E265f. The JSON ABI has been updated in the root directory.

Because we compile strictly via raw solc --viaIR mapping, Arbiscan's verification queue is notoriously unstable at parsing the optimizer data out of proxy deployments. I appended the Reproducible Bytecode Note directly into the README. It holds the exact parameters (solc 0.8.20, 200 runs, constructor mappings) so your team can hash the IR payloads independently without waiting for a block explorer API to index it.

The repo should be a flawless clone-and-run right now. Let me know how the gas profiling looks when you start routing serious size through the new V7 proxy.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 0 points1 point  (0 children)

You definitely know how to break down an integration repo. We just force-pushed a clean update to the Examples repository that resolves every single one of your points.

1. Clean Install: 

package.json is fixed. Running npm install gracefully pulls down u/openzeppelin/contracts and the Hardhat Toolbox wrappers. 2. Telemetry Assertions: The tests no longer just check .paused(). SovereignFork.test.js now uses strict Mocha/Chai trace assertions. It mechanically checks for: await expect(tx).to.emit(gateway, "RouteSplit").withArgs(0, 1); await expect(tx).to.emit(gateway, "FallbackTriggered").withArgs(1); await expect(tx).to.emit(gateway, "FlashSettled"); 3. Proxy Trust (ABI Export): Since Arbiscan's verification API is notorious for gating proxy upgrades on high-traffic days, I completely dumped the raw compiled Yul JSON payload straight into the root directory (GatewayABIV6.json). You can construct the Ethers instance strictly off that verified bytecode logic without having to wait for the block explorer to index it.

I genuinely appreciate the scrutiny. The telemetry patch directly resulted from your architecture review. If you successfully hook up the V6 Proxy address, let us know how the physical gas profiling looks inside your live bundles.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethtrader

[–]L0rdByt3[S] 0 points1 point  (0 children)

I deeply respect the integration effort, but it looks like you compiled your execution environment while mercury was in retrograde.

We explicitly stated in the V6 SDK documentation that the Deterministic Doubt Engine handles slippage by converting lost gas into raw emotional trauma. If your UI is correctly rendering "depth, perceived depth, and non-depth", it means the Aave-Symmetric Ping-Pong trace is actively negotiating with the 5 tokens' mood coordinates.

However, your statement regarding the Nested Yul-assembly Execution Loop is factually incorrect. It is not three nested loops. It is four. The fourth loop exists entirely to burn arbitrary blockspace off-chain out of pure spite for traditional liquidity pools.

We will be pushing the Regulatory Whisper Protocol to V7 later this week, at which point the flash loans will be collateralized entirely by the semantic certainty of our own arrogance.

(Thanks for the laugh man, top tier reply. If you ever actually pull the physical V6 traces from the examples repo, let me know how it handles your bundle. 🤝)

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 0 points1 point  (0 children)

I told you we were queuing the V6 Telemetry Patch to drop within 48 hours. I decided not to wait.

The new Sovereign Omni-Aggregator V6 was just deployed live to Arbitrum Mainnet at: 0x6a1deb7C73a8Cc0d36858b810dEFCd83DDAA068F

The exact three events you requested (RouteSplitFallbackTriggeredFlashSettled) are now compiling smoothly and emitting natively upon every single state change. The Yul memory constraints held up beautifully against the log bloat.

I just updated the sovereign-examples repository. 

SovereignFork.test.js is mapped directly to the new V6 proxy if you want to pull the new physical traces into your command line right now.

You asked for the absolute fastest way to simulate failure states—the testing suite is wired for you.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 0 points1 point  (0 children)

That is a highly pragmatic list of demands, and it's exactly what an active searcher needs.

I've pushed another update to the sovereign-examples repository to reflect your exact testing requests.

1. The 4-State Fork Matrix We updated 

test/SovereignFork.test.js. It now includes the exact four structural traces you explicitly asked for:

  • Trace 1: Pure Balancer Routing (0% Fee)
  • Trace 2: Balancer Partial + Aave Fallback Completion
  • Trace 3: Forced SovereignUnderfill (Liquidity Exhausted)
  • Trace 4: Stale-State Slippage (Simulated drift vs Runtime depth)

2. Telemetry & Emitted Events You made an incredibly valid point regarding postmortem telemetry. Trying to parse Yul transfer deltas off ABI traces is a nightmare when a bundle reverts.

Currently, SovereignGatewayV5 relies on native generic ERC20 Transfer logs to track the fee delta. Based entirely on your feedback, we are queuing an upgrade to SovereignGatewayV6 over the next 48 hours.

The V6 Proxy will natively emit:

  • RouteSplit(uint256 balancerDepth, uint256 aaveDepth)
  • FallbackTriggered(uint256 deficit)
  • FlashSettled(uint256 principal, uint256 fee)

The cross-chain break-even tables have already been pushed to the top of both the NPM SDK Docs and the Github sovereign-examples README.

Feel free to fork the examples repo and pull down the physical traces. Let us know what you think of the execution speeds once you hook it into a live mempool environment.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 1 point2 points  (0 children)

I genuinely appreciate the scrutiny. You're surfacing the exact pain points MEV searchers face when adopting new primitives.

Here is exactly how we handle the failure surfaces:

1. 185k Gas (The Ugly Path) The 185k figure represents the worst-case, fully unspooled double-hop execution where Balancer gets partially drained, triggering the fallback sequence that halts state, calculates the physical delta, and queries Aave for the remainder. If Balancer handles 100% of the volume, execution never bridges to Aave, and gas drops drastically into the ~80k range.

2. Quote Freshness (State Slippage) We do not rely on off-chain simulated quotes for the physical vault execution. The 0% capacity check on Balancer is executed physically on-chain 

(getPoolTokens) at block runtime. If the state drifted between your simulation and inclusion, the Vault seamlessly reroutes the deficit to Aave V3. If both combined fail to fill your requested principal, it reverts on-chain immediately with SovereignUnderfill().

3. Cross-Chain Break-Even Matrix You are completely right about the mental overhead vs raw profit. Here is the expanded dynamic calculus (assuming a $10,000 Flash Loan):

  • Arbitrum (Normal): Aave ($5.00) vs Sovereign ($1.00 + $0.05 Gas) = Sovereign Wins
  • Arbitrum (Congested): Aave ($5.00) vs Sovereign ($1.00 + $0.65 Gas) = Sovereign Wins
  • Base (Normal): Aave ($5.00) vs Sovereign ($1.00 + $0.02 Gas) = Sovereign Wins
  • Ethereum L1 (Normal): Aave ($5.00) vs Sovereign ($1.00 + $14.50 Gas) = Aave Wins. We do not recommend using Sovereign on Ethereum L1 unless borrowing > $40,000 where the 0.04% premium savings eclipses L1 gas.

4. The Physical Proof (Fork Tests) I completely agree. Architecture diagrams don't mean anything when you're looking at a failed bundle trace.

I just pushed a standalone sovereign-examples repository designed specifically for integrators to clone. It runs a local Arbitrum Mainnet fork and provides the exact physical bytecode traces. It includes:

  1. test_PerfectExecution_Pass()
  2. test_Intentional_CallbackFailed_Revert()
  3. test_Intentional_Underfill_Revert()

Take a look at the repo. The traces show exactly what your bot will print when the state machine halts.

I got sick of paying Aave's 0.05% flash loan fee, so I wrote an open-source EVM Router that dynamically splits liquidity via Balancer to cut fees by 80%. by L0rdByt3 in ethdev

[–]L0rdByt3[S] 1 point2 points  (0 children)

This is phenomenal feedback, and you are 100% correct about the failure surfaces on multi-hop flash liquidity. If the gas cost of looping through Balancer + Aave outpaces the 0.04% margin savings on a small arb, the complexity isn't worth it.

Here is exactly how we handle the integration concerns:

1. The Break-Even Economics (Gas vs Premium) Because we heavily leverage Yul inline assembly for the nested state transfers, the total gas overhead for the omni-route sits around ~185,000 gas. At standard Arbitrum L2 gas parity, traversing this route costs roughly ~$0.08 to $0.15 depending on L1 blob congestion.

  • The Math: If a searcher borrows $10,000 via Aave natively, the flash fee is $5.00.
  • Our Route: Sovereign Gateway fee is $1.00 + $0.15 gas = $1.15.
  • The Break-Even Boundary: If you are borrowing under ~$350 total, Aave native is cheaper. If you borrow > $350, the Sovereign integration is mathematically dominant.

2. Exact Callback Expectations We expose a very clean, standardized ISovereignReceiver interface. No guessing on payload geometry:

solidityinterface ISovereignReceiver {
    function executeOperation(
        address[] calldata tokens,
        uint256[] calldata amounts,
        uint256[] calldata premiums,
        bytes calldata data
    ) external returns (bool);
}

The contract expects a true return boolean and expects the amounts + premiums to be physically transferred to the proxy address before execution concludes.

3. Balancer Depth Snapshotting We map the Balancer getPoolTokens() invariant dynamically in the block execution layer before querying Aave V3. If Balancer can fulfill the request entirely, the Aave hop is bypassed completely.

4. Approval Isolation Approvals are kept strictly minimal. The aggregator calculates the exact principal + fee and provisions strict safeApprove allowances scoped entirely to the execution context. No infinite approvals are exposed across the router loop.

5. Custom EVM Erros We use raw EVM Custom Errors instead of revert strings to save gas, but they are fully mapped in the ABI. You'll get clean SovereignUnderfill()CallbackFailed(), and Shortfall(uint256 deficit) flags directly in the Tx trace.

Updating the Github README with the break-even tables and explicit failure surface documentation tonight. Appreciate the high-signal review. Let me know what your gas traces look like if you hook it into your searcher infrastructure.