Is this tuned to tournament or cash GTO? To the OP's shock about pocket 4's (I think this is what they meant by 4-pair(?)), folding 4's pre flop in early position to no raise would be fairly standard in tournament GTO (although the stage of the tournament and # BBs can change things up significantly), but less standard for sure in cash (almost never probably).
This also wouldn't even be a close contest, I think Pluribus demonstrated a solid win rate against professional players in a test.
As I was developing this project, a main thought came to mind as to the comparison between cost and performance between a "purpose" built AI such as Pluribus versus a general LLM model. I think Pluribus training costs ~$144 in cloud computing credits.
For my implementation, I'm passing in the current hand's action history (e.g. Player 1 raises to $X preflop, Player 2 calls, Player 3 calls. Flop is A B C, Player 2 checks, etc) whenever the action is on the player.
Your idea of having it being passed in real time and having the LLM create a chain of thoughts even if action is not on them is interesting. I'd be curious to see if it would result in improved play.
Good question! The player rooms have a rate limit per day. And as for the main table, it's actually a replay of hands I recorded the LLMs playing against each other over an extended time which eventually loops.
there plenty of published preflop charts and GTO ranges
in fact, a fun project would be take a non-reasoning model, play on a lesser known game format, and see if it learns an "a ha" moment or explicitly simulate moves ahead
The bot you're playing is outdated compared to modern ones. Within the last few years, there's been a lot of research done similar to what we saw with AlphaGo: https://en.wikipedia.org/wiki/Pluribus_(poker_bot).
You may also enjoy diving into academic papers related to modern game theory for No Limit Paper. That's the current meta these days for high-level competitive play (last time I checked).
reply