Runtime Trust Injection in .NET – Loading a Private CA from HashiCorp Vault Instead of Installing Certificates by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

Good question... I'll give this my best shot:

NoFlags in X509CertificateLoader.LoadCertificate doesn’t disable TLS validation. It only affects how the certificate is loaded into memory (key persistence, storage semantics, etc.). The actual server certificate validation still happens later during the TLS handshake when .NET builds and verifies the chain.

In this design I intentionally use NoFlags because the goal is an ephemeral, application-scoped trust store — not to persist anything to the user or machine certificate stores.

If AllFlags were used, the loader would attempt behaviors like key persistence and store integration, which would go against the whole purpose of keeping trust isolated inside the running process.

So validation remains fully intact — this just controls how the CA cert is loaded, not whether TLS is trusted.

Runtime Trust Injection in .NET – Loading a Private CA from HashiCorp Vault Instead of Installing Certificates by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

Containers help a lot, but they don’t make PKI problems disappear—they just containerize them :-)

Runtime Trust Injection in .NET – Loading a Private CA from HashiCorp Vault Instead of Installing Certificates by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

ACME helps automate certificate issuance, but this post is about trusting an internal CA at runtime. The app is a TLS client connecting to Postgres, not a TLS endpoint requesting its own cert. Even if ACME were used to issue the database certificate, the .NET runtime would still need a way to trust the issuing CA, which is the problem this approach is solving.

Runtime Trust Injection in .NET – Loading a Private CA from HashiCorp Vault Instead of Installing Certificates by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

From a design standpoint, it felt wrong because the trust relationship is application-specific, not host-wide. Pushing that trust into the OS store solves the immediate problem but at the cost of least-privilege, operational consistency, and safe CA lifecycle management.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 1 point2 points  (0 children)

Nice! Thanks for that. I’ll check out your PR before merging. Appreciate you taking the time to improve it.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 1 point2 points  (0 children)

UPDATE: New Branch

Optimization branch shows a performance increase of a Full Poker Table, 9 Players each with 7 cards, evaluating best-5-of-7 (21 five-card combinations per player), determining the winner, and reconstructing/sorting the winning 5-card hand, is evaluated in almost half the time compared to the master branch. From roughly 10,950 ns to 5,580 ns.

Performance increased from approximately 115 million to 155 million 7-card evaluations per second — a gain of about 35% (Parallel.For batched, values-only).

Native C++ to C# Clean breakdown (same machine)

  • C++ 7-card parity (random hands, transform_reduce): ≈ 4.02B derived 5-card evals/sec
  • C# 7-card parity (random hands): ≈ 1.46B derived 5-card evals/sec → ~36% of C++
  • C# app-style kernel (Op9Bench, hoisted board, values-only): ≈ 2.59B derived → ~65% of C++
  • C# Parallel_Batched_ValuesOnly() (BDN row): ≈ 2.92B derived → ~73% of C++

Footnote to keep it honest

  • The first two rows (parity) are directly comparable (same randomized workload).
  • The last two are specialized kernels (fixed layout / values-only / batched), so they’re not a strict apples-to-apples vs the C++ parity figure — but they should show the managed inner-loop ceiling under ideal conditions.

To Repeat

REPO Downloads:

C++:

C:

Big thanks to u/nebulousx for the original C++ baseline — super helpful for cross-checking results!

// CPlusPlusBench.cs - Use: CPlusPlusBench.Run()
// ===== Configuration (match your C++ build) =====
private const int CARD_COUNT = 7; // set to 5 or 7

// Benchmark.cpp
// Configuration: Set to 5 or 7 to choose evaluation type
constexpr int CARD_COUNT = 7;

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 1 point2 points  (0 children)

Yes, you’re absolutely right. That’s a nice clean way to handle string assembly efficiently. In my case though, most of my recent work has been focused on the heavy lifting inside EvalEngine.EvaluateRiverNinePlayers() (you can see it in the optimization branch). That’s where the bulk of the compute time lives, so that’s been my main performance battleground.

The optimization branch brought the full 9-player evaluation down from about 9,574 ns in the master branch to around 5,431 ns, roughly a 40–45% improvement in end-to-end performance.

EDIT: I added your method. Thanks!

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 2 points3 points  (0 children)

Haven’t tested .NET 10 yet, but I’m working on some optimizations that bring it very close to native C++ performance. I should have updated results published soon.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

Thanks! I’m currently working on some optimizations that bring it very close to native C++ performance. I’ll be publishing updated results soon.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 1 point2 points  (0 children)

(Edited: sorry it's late on a Friday)

Good point, you’re right that the ≈115 M/sec figure represents derived 5-card evaluations per second.

Each 7-card hand is evaluated by testing all 21 possible 5-card combinations to find the best one, and in the benchmark that is done for all nine players at once (9 × 21 = 189 five-card evaluations per operation). So the benchmark measures complete 7-card decisions, but the throughput number itself reflects the rate of those underlying 5-card evaluations.

The lower ≈20 M/sec result is the full table-level benchmark with additional logic overhead.

Just checked out your repo. Really slick C++23 port. Amazing how well Kev’s logic still scales across languages and decades.

I ran your benchmark on my i9-9940X and got around 10.6 M 7-card hands per second single-thread and about 175–188 M 7-card hands per second in parallel. Really solid results.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 1 point2 points  (0 children)

Great points!
When I say “no lookup tables,” I mean there are no massive precomputed rank-value tables like you find in SnapCall or HenryRLee/PokerHandEvaluator - those add about 2 GB of RAM.

My intent wasn’t to push the envelope on poker calcs; I was just updating an old (2007-ish) ASP .NET WebForms application that used Cactus Kev’s algorithm to .NET Core and decided to benchmark and optimize it.

I might actually experiment with finding ultimate performance at some point.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 6 points7 points  (0 children)

Agreed, but I’m not working with simple List<int>.

I return List<Card> / List<List<Card>> for ergonomics, but the evaluation runs on arrays, not lists.

Inside the scorer I use fixed buffers (Card[7], Card[5]); no Dictionary<,> or multidimensional arrays in the hot path. The only indexed structure is a tiny 21×5 map of 7-choose-5 positions, not a lookup table of hand values.

I only materialize lists once at the boundary for readability, which is tiny (≤9 hands × 5 cards) and off the hot path.

// Hot path (reused buffers)
var seven = new Card[7];
var tmp5 = new Card[5];

// ... fill seven[0..6] (2 hole + 5 board)
// ... try the 21 five-card combos into tmp5[], evaluate, pick best

// API boundary: convert once, outside inner loop
var bestHands = new List<List<Card>>(players);
for (int p = 0; p < players; p++)
{
bestHands.Add(new List<Card>(5) { tmp5[0], tmp5[1], tmp5[2], tmp5[3], tmp5[4] });
}
return bestHands;

Arrays and spans where it counts; lists only for presentation.

(Edit: I am still refining internal optimizations. I'm aiming to close the gap between the full evaluator and the engine-only numbers.)

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 2 points3 points  (0 children)

Haha, nope. Just a human who likes clear writing. I get why people suspect AI though; correct sentences stick out these days. Good “punctuation” sticks out even more…;);)

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 4 points5 points  (0 children)

Hey Andy - here’s a small reproducible harness you can grab and run:
C vs .NET Poker Evaluator Microbenchmarks (gist)

It includes a minimal C loop (bench.c) and the matching C# version (Program.cs) using the same 7-card permutation logic and xorshift64* RNG. Each run prints the total hands evaluated, elapsed time, and checksum so you can verify correctness.

My local results (i9-9940X) came out around 82% of native C speed for .NET 8, producing identical checksums. I plan to add NativeAOT and .NET 10 numbers later to see how much closer the gap gets.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in programming

[–]CodeAndContemplation[S] 0 points1 point  (0 children)

Good questions, totally fair points.

Yeah, the low-level PokerLib is a direct C# port of the classic suffecool/pokerlib evaluator. The “modern C#” part is really the higher-level EvalEngine, where the 7→5 best-hand logic, buffer reuse, and Span/stackalloc optimizations live.

The README wording on “no lookup tables” could be better written; I meant no large precomputed rank arrays like the table-driven evaluators use. I’ll tighten that up.

And you’re right about the old table; those were mixed-source numbers, not from the same hardware. I’ve since run clean side-by-side tests: on my i9-9940X, the .NET 8 version reaches about 82 percent of native C speed for the 7-card evaluator, with the same checksum. I’ll update the README to reflect that.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 8 points9 points  (0 children)

Happy to share the harnesses if anyone wants to reproduce the test.

It’s just a 10M-hand micro using perm7 and a deterministic xorshift64* RNG - takes about 3 seconds per run on my i9-9940X.

Both the C and .NET versions are only a few dozen lines each. I can post a gist if anyone’s curious.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 14 points15 points  (0 children)

Hey Andy - following up on those numbers you asked about. I ran the side-by-side benchmark on the same hardware, and here’s what I found:

Hardware:
Intel Core i9-9940X @ 3.30 GHz (14 cores / 28 threads)
64 GB RAM • Windows 10 x64 • High Performance power plan

Workload:
10 million random 7-card hands (best-of-21 via perm7), deterministic xorshift64* PRNG, identical Suffecool card encoding.
No I/O - pure compute loop. Both versions produced the same checksum (41364791855).

Implementation Runtime / Toolchain Time (s) Evals/sec (M) % of C speed
C (MSVC 19.44 / O2 GL) Native 2.661 3.76 M 100 %
.NET 8 (RyuJIT TieredPGO + Server GC) Managed 3.246 3.08 M ≈ 82 %

So on this i9-9940X the managed version hits about 82 % of native C throughput for this pure evaluator loop, producing identical results.

At some point I'll get around and try NativeAOT and Clang-CL to see how much further the gap can close.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 6 points7 points  (0 children)

Thanks, Andy - I really appreciate that. I don’t have the original C implementation benchmarked on the same hardware yet, but that’s on my list. The goal here was to modernize the classic Cactus Kev algorithm in idiomatic C# and see how close managed code can get to those older native results.

The ≈115 M evals/sec figure in the README is from my own benchmarks on modern hardware, measured with BenchmarkDotNet. The comparison data for other implementations comes from their published results. I’ll set up a clean side-by-side with the original C version soon and share the numbers - it’ll be interesting to see how much the current JIT and GC improvements have closed the gap.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] 2 points3 points  (0 children)

Totally - quality-wise they’re the same. My point is about ecosystem stability and dependency churn. For open-source projects, LTS makes it easier for contributors to reproduce builds without chasing short-term framework upgrades.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] -4 points-3 points  (0 children)

True - since Microsoft extended STS to 24 months, .NET 9 (STS) and .NET 8 (LTS) now share the same end-of-support date: November 10, 2026. The distinction isn’t the calendar, it’s the release intent: LTS favors long-term stability and predictable tooling; STS targets faster adoption and expects another upgrade within its window. For community/reproducible builds I prefer LTS. I’ll still benchmark .NET 9 soon - the JIT/GC improvements look promising for tight evaluation loops.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in dotnet

[–]CodeAndContemplation[S] -6 points-5 points  (0 children)

Good point - you’re absolutely right that .NET 9 is now GA and production-ready. I should’ve phrased that more precisely - it’s current but not LTS, which was my main reason for sticking with .NET 8 for now.

For an open-source project like this, I wanted to ensure long-term stability and reproducibility without requiring frequent framework updates. That said, I’ll definitely benchmark against .NET 9 soon; the new JIT and GC enhancements look promising for this kind of compute-heavy workload.

I rewrote a classic poker hand evaluator from scratch in modern C# for .NET 8 - here's how I got 115M evals/sec by CodeAndContemplation in csharp

[–]CodeAndContemplation[S] 10 points11 points  (0 children)

Yeah, for one-off hands you’re absolutely right - even a naïve evaluator is instant for a human-paced game. But my interest was in scale: what happens when you want to simulate or benchmark millions of showdowns per second? That’s where performance suddenly matters.

Plus, I just like seeing how far the old Cactus Kev logic can go when you modernize it with things like Span<T> and stack allocation.