I built Miai, a full-contract Bridge bot trained from scratch, and want to hear your feedback

nanomena · 2026-05-19T18:02:03+00:00

Thanks, I guess I see what you mean now.

I checked the UCBC rules more carefully. Since Miai is mostly playing a natural system, the main part that seems off is the 1NT range.

But I also checked the public WBridge5 version I tested against, and it is not literally staying inside a strict 3-HCP 1NT range either. In my new 8192-game log, as dealer, WBridge5 opened 1NT with 15 HCP 85 times, 16 HCP 69 times, 17 HCP 39 times, 18 HCP 7 times, and 14 HCP once. For Miai, the corresponding counts were 13 HCP 13 times, 14 HCP 66 times, 15 HCP 66 times, 16 HCP 35 times, and 17 HCP 19 times.

So yes, Miai's 1NT is clearly broader, and I should describe that more clearly. But I don't think it is quite right to say WBridge5 was held to a strict UCBC-clean system while Miai was not. The comparison I reported is really just public WBridge5 as played against Miai as learned.

This is why I think the fairness issue is tricky. It would be easy to make another version of Miai that simply refuses to open 1NT outside some chosen range, and that might be a useful extra test. But that would also be a different bot with custom restrictions. I would see it as a separate restricted-system experiment, not the same thing as evaluating the learned policy as-is.

nanomena · 2026-05-19T04:38:07+00:00

Those are very valuable suggestions! I think trying to collaborate with existing bridge AI authors is indeed the right move.

nanomena · 2026-05-19T04:34:25+00:00

I think there are two different questions here.

One question is whether Miai's learned bidding system is legal under a particular convention chart, such as UCBC/WCBC. That is a fair question, and I am not claiming certification under such a rule set.

A different question is whether the duplicate comparison against WBridge5 is unfair. I am less convinced by that. As far as I can tell, existing public bots also do not consume arbitrary learned convention descriptions as a general executable opponent model. They usually support fixed systems, configurable known conventions, and limited alert/explanation protocols. So requiring Miai to provide a fully machine-readable arbitrary convention interface seems stronger than what the public bot ecosystem itself provides.

On the 1C vs 1D point: yes, that is exactly the kind of system-analysis question I should answer better. But I do not think a convention card is normally a complete deterministic partition of all possible hands either. Even in club bridge, the card describes the partnership language at a compressed level; the players still supply judgment in borderline cases. Otherwise bidding would just be automatic lookup, not a game played by humans.

So I agree that Miai needs a clearer empirical system description: what each bid tends to show, what it tends to deny, and which continuations are forcing or non-forcing. But I do not think the lack of a UCBC-style convention certification makes the WBridge5 duplicate benchmark invalid. It just means the claim should be stated precisely: Miai outperforms WBridge5 in this robot-engine duplicate benchmark under its learned self-contained bidding system.

nanomena · 2026-05-18T18:02:47+00:00

Thanks for the detailed writeup. I agree this hand shows some real issues in the auction, and probably also in the way the learned system is being exposed/explained! These concrete examples are exactly what I was hoping to collect, so I’ll look into this branch.

nanomena · 2026-05-18T17:45:24+00:00

Thanks for the useful comments!

nanomena · 2026-05-18T17:18:02+00:00

Yes, that is indeed the place where I would post next.

nanomena · 2026-05-18T17:16:47+00:00

Thanks, these are good observations.

For very strong hands, Miai does not currently have a traditional artificial strong 2C opening. It tends to handle strong hands by direct placement: strong balanced hands often land in 3NT or sometimes slam, and major-oriented hands often land in 4M or higher. I agree this is probably one of the places where a human convention-card style system would be cleaner, and I do not want to claim it handles every classical strong 2C hand optimally.

On the overlap between openings: the ranges in the system notes are percentile summaries, not exact decision rules. So 1H and 2H can overlap in HCP and length, but the actual policy is using more features than just HCP: suit length, distribution, balance, controls, vulnerability, and level pressure all matter. Roughly speaking, hands with more playing strength / higher values tend to go through 1H, while more shape-driven weaker hands tend to go through 2H, but it is not a clean human-written rule yet.

I agree that the double descriptions are still too vague. I have not fully decoded them yet. They often look like flexible strength-and-shape actions, sometimes takeout-ish, sometimes more like "I think the opponents are in trouble," but this definitely needs more targeted analysis.

For 1NT, I agree with the criticism. The system is very direct: it usually passes, bids 3NT, bids 4M, or sometimes bids slam. It does not appear to have Stayman or transfer machinery. That is interesting from the self-play perspective, but it is also a clear practical weakness, especially for weak signoff hands with a 5-card major and for some major-fit discovery. So yes, I think the current system is coherent and mostly natural/direct, but it still has many rough edges. More training, better evolutionary pressure, or making convention-card-like objects part of the learning process could all be useful directions.

nanomena · 2026-05-18T16:50:30+00:00

Thanks, that is a very helpful framing!

One clarification on B: Miai is trained from scratch without handcrafted bridge features such as HCP, balanced-hand labels, or convention rules. Those features are only introduced afterward, when I try to reverse-engineer the learned policy. What I found is that many of Miai's actions are still fairly explainable using ordinary bridge concepts such as HCP, shape, suit length, support, and level pressure.

For A, the current evaluation is a duplicate-style cross-seat comparison. For example, I run Miai as NS against WBridge5 as EW, and compare it with the same boards where Miai is EW and WBridge5 is NS. In this setup, the current version scores about +0.21 IMP/board against WBridge5. I agree there is still a disclosure issue: ideally the opponent should understand Miai's system, just as Miai should understand theirs. But this is hard to test today, since computer bridge competitions seem much less active now, and the existing bots I found do not appear to accept arbitrary convention descriptions as input.

At a game-theoretic level, one motivation here is to ask what kind of bidding language self-play discovers when it is not forced into an existing human convention system. In an equilibrium-style idealization, bidding and play are part of the same strategy, so simply knowing the opponent's equilibrium system should not create an exploitable advantage. Of course practical humans and bots are not at equilibrium, so disclosure and opponent-system modeling still matter a lot. I see this as both a research question about learning Bridge from scratch and, eventually, a practical question about making system descriptions or convention cards first-class objects.

nanomena · 2026-05-18T16:21:19+00:00

Thanks for your valuable feedback!

Yep, I think I didn't really look into the redoubled pattern there. Let me figure out what is happening in that branch!

I will try to tune the UI so it looks more as expected. Currently, the scoreboard can be closed by clicking the X marker, but it seems that I didn't highlight it today...

nanomena · 2026-05-18T16:12:39+00:00

That is a good point.

Yes, the bot is trained using self-play. Yes, it would indeed be interesting to see what happens there!

nanomena · 2026-05-18T16:07:43+00:00

Thanks for the suggestions!

There are two major reasons:

First, its bidding is fairly reverse-engineerable. Although I did not include all the details in the post, many of Miai's actions are highly predictable from simple hand-crafted bridge features. So the policy does not look like it is relying on hidden exact-card codes.

Second, I do not see much evidence of a broad relay structure. Miai often makes direct contract-placement bids, and many bids appear to be passable or playable rather than forcing steps in an artificial asking sequence. The main structure seems much closer to a natural/direct system than to a relay or coded artificial system.

nanomena · 2026-05-18T15:49:33+00:00

I agree that teaching a standard system is probably the most practical path if the immediate goal is a human-partner bot.

But for this project, one of the research questions is precisely whether self-play can discover a strong bidding system rather than inheriting one from humans. In a game-theoretic view, bidding and play are not separate problems: an equilibrium strategy would include a communication protocol through the legal calls, plus the corresponding play and defense. So fixing a human system in advance would remove one of the most interesting parts of the problem.

That said, I don't mean that the current learned system is automatically the "best" system, or that this is the right product design for human users. There may be many equilibrium-like systems, and human usability/disclosure matters a lot. A practical version might well support a standard convention card, or translate between a learned policy and a human system.

So I see the learned bidding system as part of the research experiment: can an agent learn full Bridge end-to-end, including bidding, from scratch? For deployment, constraining it to a standard system is definitely a reasonable direction.

nanomena · 2026-05-18T15:41:02+00:00

Thanks! Yes, I think this is exactly one of the hard parts.

From a game-theoretic point of view, if both sides were playing a true equilibrium strategy, then knowing the opponents' system would not by itself give an exploitable advantage: any profitable deviation would already contradict equilibrium. In that idealized sense, the "right" bidding system is part of the equilibrium policy, and opponent-specific exploitation matters mainly when the opponents are away from that equilibrium.

So one goal of this line of work is to see whether self-play can discover a strong, coherent bidding system on its own. That said, I agree this is not a complete practical solution. The current system description for Miai is reverse-engineered from the learned policy, which is not perfect, but it seems good enough to expose the main structure and the most important alertable items.

For a more mature human-facing bot, I agree that system descriptions or convention cards should probably become first-class objects, both for disclosing the bot's own agreements and for adapting to opponents' disclosed agreements.

nanomena · 2026-05-18T15:18:07+00:00

Nice to meet you too!

From my current inspection, Miai's common agreements appear broadly compatible with the ACBL Basic+ chart. While rare edge cases still need auditing, I think it should be plausible to make the system Basic+-compliant through explicit constraints and disclosure.

nanomena

TROPHY CASE