How to Build a Chess Engine and Fail

131 points by xlinux 3 days ago

janalsncm 2 days ago

I am curious why the author chose a genetic algorithm rather than standard backprop to distill the evil. Logistic regression seems like a pretty reasonable choice and it’ll be a lot faster than a genetic algorithm. Add an L1 penalty for sparsity.

In the past I’ve tried distilling not just the eval function but the outcome of the search (something like depth 20) in a neural net. It kind of works, but not very well until the net is pretty big. Deepmind did something similar.

obrhubr 2 days ago

I didn’t think of that, but I guess the conclusion as to performance of the model would have remained the same.
- janalsncm 2 days ago
  
  Reading over your article again, it seems like you made the same “mistake” as I did. In other words, the evals you’re seeing in the Lichess PGN aren’t the raw outputs of the Stockfish evaluation function, they’re the output of the search, which is a highly nonlinear function of millions of evals. If I recall from their docs, it’s something like depth 18, so your chances of distilling it with 20k parameters is essentially zero.
  (I put mistake in quotes here because Deepmind showed that with a server farm of TPUs it is possible to distill the search as well. So it’s not impossible per se.)
  But that’s ok! You’ve created a super fast evaluation function. Instead of trying to distill the depth 18 output, it will be much more realistic to distill the depth zero output, the NNUE. If you rerun those FENs in Stockfish you can pretty quickly create an easier dataset.

vunderba 2 days ago

As part of my undergrad work, I used similar principles to the article (steady state genetic algorithms) to create a bot capable of playing Reversi where the fitness function was loosely defined as a set of "weights" on each square across the 8x8 board. These were used as part of the evaluation function in the classic minimax algorithm.

We trained over the course of 5-6 days, and the end generation was capable of beating an intermediate player but got completely destroyed by experts. It was a fun project!

renox 2 days ago

I remember coding a Reversi as a school project: players liked my project because it was so weak that they could beat it!

senthilnayagam 2 days ago

I built a list of simple games in pygame with code and solver generated by AI.

Have implemented 2048, Minesweeper, Sudoku and chess. First three are single player games have made good progress.

I still don't understand UCI interface and have not thought through chess solving. Hope will take another try this weekend

https://github.com/muonium-ai/simplegames

skeltoac 2 days ago

I especially enjoyed the link to The Bitter Lesson by Rich Sutton, which I hadn't read before. Now I wonder what "discoveries" have been built into today's AI models and how they might come to be detrimental.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

janalsncm 2 days ago

Probably LLM maximalist ideas that suggest infinite “scaling laws” for LLMs (they are not laws), leading to ridiculous conclusions like building a $1 trillion cluster is the fastest way to AGI. People like Leopold Aschenbrenner are in this camp.
Imagine if LLMs were the only way we had to play chess. You’d need a centralized server and peak performance wouldn't even best a grandmaster. You spend $1 trillion building a super cluster because that’s all you know.
< - - This is where AI is today.
And then some startup creates Stockfish, a chess engine better than any LLM or grandmaster and can run on a smartphone.
PaulHoule 2 days ago

Has there ever been a serious effort to play chess by "rule-based" methods as opposed to search?
(... other than the evaluation function being based on handwritten or learned rules)
- janalsncm 2 days ago
  
  In the past, yes. That was essentially the approach up until the 1980s. Computers were too slow to run millions of evaluations per move, so you got a lot of efforts in a more heuristic-driven direction.
  Examples of heuristic engines are Hans Berliner’s CAPS-II and PARADISE by David Wilkins. PARADISE used a database of 200 rules.
  There are also engines that try to retroactively apply rules to Stockfish evals but they’re fairly confusing in my experience. It’s like being told the answer and having to come up with a reason after the fact.

zelphirkalt 2 days ago

All this I guess comes after laying the ground work, like implementing a bitboard representation or something else, and implementing the logic for being able to tell, whether a move is valid or invalid. That in itself is already a lot of work. iiuc the idea here is "merely" writing the engine, and take everything else as a given.

obrhubr 2 days ago

The game’s implementation itself was furnished with the competition by Sebastian Lague. I completely agree that writing the move logic, validation, etc… is a difficult undertaking especially when it comes to optimisation which is what allows the bots built on top to perform well.
- zelphirkalt 2 days ago
  
  That makes sense for such a competition. Thanks for making this even clearer!
  Of course another interesting competition could be to develop _all_ of a chess game and engine and see how low in code people can go. But that would be an entirely different competition.
- james_marks 2 days ago
  
  Is there not an open standard for this? I glanced and didn’t see anything, but seems crazy every chess project would need to define the game.
  - qsort 2 days ago
    
    There are open components for generic purposes, e.g. draw the board given a PGN, show the moves, check if moves are legal and so on, but they aren't suitable for engine internals. There are several tricks with bits to store positions, hashing schemes for transposition tables and the likes that are key to unlock the maximum performance possible, and they are usually tied to the engine implementation.
    
    james_marks 13 hours ago
    
    Interesting, that makes sense.
    Given that, I think what I’m imagining is a PGN linter & renderer.

snickerbockers 2 days ago

This is something that's been on my bucket list for a while now, although I'd do it using "normal" programming instead of DL.

i feel like if you can bruteforce every possible board configuration for the next 3 turns and then make the move that leads to more desirable outcomes, that ought to be enough to thrash most amateurs. Probably not good enough to beat somebody who actually understands chess tactics on a deeper level, but I expect most non-masters are just "winging it" and making things up as they go along, so a machine that can think farther ahead than you would win more often than not.

But like I said, this is all just me fantasizing about a project I haven't even started yet so my hypothesis could easily be wrong.

vunderba 2 days ago

> bruteforce every possible board configuration for the next 3 turns and then make the move that leads to more desirable outcomes
I'd recommend familiarizing yourself with the minimax algorithm - since that's exactly what it does where "desirable outcome" is essentially the tunable evaluation aspect of the equation.
https://en.wikipedia.org/wiki/Minimax
immibis 2 days ago

I did exactly this when the Laser chess variant was posted on Hacker News about last year. It does indeed work pretty well. Players eventually learn patterns that look good in 3 moves but not with a greater depth, though.
It's how real chess engines like Stockfish work, but they are highly optimized so they can search extreme depths.
janalsncm 2 days ago

> make the move that leads to more desirable outcomes
This is kind of begging the question, right? How do you know what a “desirable” outcome is, other than mate-in-N positions? (In your case, N<=3.) The point of an evaluation function is to take a position and return a value which describes how desirable it is.
- snickerbockers 2 days ago
  
  It's basically just a depth-first tree traversal with some heuristics to weigh different turns against each other (where a turn is defined as a possible board configuration that can be reached from the current configuration).
  First you'd have a heuristic that ranks outcomes from turn 3, which would just mean weighing the pieces the computer lost against the pieces the opponent lost. Then you rank each turn-2 node based on how desirable the turn-3 leaf-nodes it leads to are and how likely the opponent is to actually make the moves that lead to it. Then you'd do the same for turn-1 and then pick the move that scored highest.
  - Quekid5 a day ago
    
    Ah, yes... "just". I think you're seriously underestimating the difficulty here.

binarymax 2 days ago

Doesn’t look like failure to me - looks like a fun project that disproved your distillation hypothesis!

Great write up and thanks for sharing.

obrhubr 2 days ago

Thanks for the kind words!

bob1029 2 days ago

> To be exact every participant has a maximum of 1024 tokens at his disposal to craft the best chess bot they can.

I'd be quite tempted to try linear genetic programming with a variant of a Brainfuck-style UTM for this kind of constraint. Assuming 1024 tokens = 1024 bytes = 1024 instructions, I think there is some degree of performance that is feasible.

This is really interesting to me because hand-coding BF to do something like chess is absolutely beyond me. I wouldn't even know where to begin. A LGP-coded BF program would almost certainly outperform anything I could design with my own hands.

obrhubr 2 days ago

This is highly specific to C# which was the language imposed on all participants. But I agree that some languages might be especially adept to these kinds of tasks and it would be interesting to see which.

deadbabe 2 days ago

Why build a chess engine these days, just use an LLM?

achierius 2 days ago

LLMs are quite bad at chess actually, even compared to human players -- and certainly compared to proper modern chess engines
- PaulHoule 2 days ago
  
  They usually struggle to always generate valid moves.
  And that's pivotal.
  If you have a program which always makes valid moves and gives up when it has lost you wrote a proper chess playing program. It may play badly, but it plays.
  - ElFitz 2 days ago
    
    That could now be achieved by precomputing all valid moves and using outlines[0] or Structured Outputs[1] to constrain the output.
    [0]: https://github.com/dottxt-ai/outlines
    [1]: https://openai.com/index/introducing-structured-outputs-in-t...
    
    PaulHoule 2 days ago
    
    … or just a valid move checker that prompts it to try again if it fails to make a valid move.
emptybits 2 days ago

Why consume when you can create?
Building is often more fun and almost always more educational than just using. Even in “failure”. :-)
ziddoap 2 days ago

Fun, a learning experience, practicing a skill set, curiosity, etc.
Take your pick.
janalsncm 2 days ago

Your question sounds flippant but it’s actually quite deep. Why aren’t LLMs good at chess? The answer is likely that to be good at chess at some level requires search (and probably some ordering constraint on your evaluation function too that I’m not smart enough to figure out).
LLMs aren’t searching. They are memorizing. This explains their extremely poor performance on out of domain positions, whereas Stockfish can easily understand them.
- deadbabe 2 days ago
  
  I think chess is a great example to use to demonstrate that LLMs don’t actually know anything, and will be entirely unsuitable to replace humans on anything except thoroughly solved and documented problems.

vikhack 2 days ago

I don't really know how to do that but I'm really putting in my effort