

Discover more from Mir's Data .Report
For over a century, the game of chess was considered the ultimate benchmark for measuring intelligence. The belief was that if a machine could excel at chess, it would mimic human thought processes. Chess was widely regarded as a complex game with countless combinations, and the creation of a machine capable of mastering it seemed like an insurmountable task. However, the way chess was eventually conquered by machines offers valuable insights into the development and understanding of current large language models (LLMs).
The Chess Conundrum
As a kid, I have spent countless hours playing and mastering chess. And at one point, I ranked in the top 1% on MSN Chess. My approach was rather simple: play lots of games with higher ranked players than me. During one year I spent about 8 hours per day. In terms of MSN rankings, the brute force approach to learning worked out. But, to use the math analogy, I achieved only a local maximum. I became a top player within a “local” community of players. By no means, did I become a top player for the “global” community of players. I, maybe, would have made it to the top 10%. Maybe. It was not an optimal way to be a good chess player, and frankly the career in chess was not for me.
There are, of course, other approaches to getting good at chess. One includes reading various combinations from top players. Basically memorizing as many games and combinations as possible. But lots of people memorize those combinations and still do not come close to the level of world champions such as Ding Liren, Ian Nepomniachtchi, or Garry Kasparov. There is an element of raw intelligence (some would even say, “specialized intelligence”) for the game of chess that you need to have to get to that level.
On the subject of raw intelligence I have one striking memory - that of re-watching Kasparov’s game against the super computer. At the time, the chess community believed that only artificial intelligence could truly master the game, given the vast number of possible combinations (a number with 48 “0s”). When IBM's Deep Blue defeated the world champion Kasparov in 1997, it seemed like artificial intelligence had arrived. And AI finally possessed levels of raw intelligence greater than that of the world champion.
The Misconception About Chess and AI
As it turned out, the approach used to defeat Kasparov in chess was not based on replicating human thought processes, or acquiring that “raw intelligence”. Instead of creating a machine that was good at winning, the task was to create a machine that was good at not losing.
Deep Blue leveraged a form min-max method, which relies on brute force computing power. To win, this method does NOT require calculating all possible combinations but only a few more than what humans can calculate independently.
To use my MSN analogy, Deep Blue found a local maximum in which it was able to beat Kasparov. It never mastered a way to become “good” at chess, but instead relied on having more horsepower to a brute force method - not that different from my own. The victory of Deep Blue over Kasparov was not a testament to true raw (artificial) intelligence but rather an example of exceptional computing power.
Lessons from Chess for Large Language Models
The success and capabilities of large language models (LLMs) have led many to believe that language is a better proxy for human thought process than chess.
The success and capabilities of large language models (LLMs) have led many to believe that language is a better proxy for human thought process than chess.
Consequently, some may argue that mastering language through LLMs signifies the achievement of true artificial intelligence. However, the lesson from chess serves as a cautionary tale. Just like the brute force approach in chess, LLMs may be avoiding losing rather than truly winning. The question then arises: Are LLMs genuinely replicating human thought processes, or are they just efficiently leveraging computational power (and storage)?
Why would this matter? Don’t know about you, but when I today go to a doctor or lookup driving directions, I do naturally expect the best outcome possible, personalized to me. So if we assume AI is going to be integrated in our everyday’s life, then we should actually care:
whether we are given the shortest estimated driving directions or the slightly above average ones,
whether our health diagnosis is given with the best optimal assessment or whether it is normalized based on averages of similar groups,
One of the ways a bunch of folks recently started testing LLMs is by using SAT and GMAT tests. The results are phenomenal, but perhaps a bit overblown. It has been awhile, but as I recall, a major winning strategy for both tests—especially in the essay writing section—is a min-max strategy of avoiding a mistake.