Using LC0's WDL to Analyze Games

@RwSF75 said in #20:
> - Stockfish has used its own games since the WDL output was introduced in the engine.

Where is it explained how they relate the output to their own games outcomes?

I thought that, in the first place, the WDL output was based on a conversion curve using other population of games, such as lichess top games.

Is there now a parameter search to make their internal scoring** behave as WDL according to their games? something would be floating otherwise, in my blurred mind (blurred or from afar), but sometimes appearance of pulling by own bootstrap is misleading.

I actually think that in all WDL approaches Lichess and previous SF, using populations of games is indeed improving on the centipawn limited point of view, but that some scrunity on the assumptions that allow the conversions or relations should be made explicit and explained.

How can a position score wihtin a game be completely determining the odds from then on (I guess having the constraint on input data set to be human ratings above threshold, but then the question of position space covering might come in, which LC0 would have somehow adressed through at least some wider exploration early in its training self play scheduling, in its compromise between exploratoin and "exploitation".

I also had that question with the Maia paper (Figure 11, then) that showed how lichess data was powerful when taken over all qualities of play. It was lacking as often in chess data display, notions of error bars, or I could not read it around. But I was also not clear on their definition of positoin complexity. I have a tendency to watch of outer information flow in data analysis, and other contrust, what seems floating, and hence likley having hidden or obvious to author assumptions, or even dissonance.

But I can be wrong in that. I welcome alternate points of view (even questions) so I am not myself floating for ever. As above. I may be asking too much and not well written.. But not asking won't give myself a chance.. thanks for above.

** scoring which is not based on game outcomes directly, but some loop of NNue (master?) training with SF itself as moderate depth search oracle --- possible old news as well, all of this post.

Periastron

#22

@Periastron said in #16:
> ...
> we'd still have some code to write to scan through the game, record the scores and plot them using a nice graphing lib to get these cool graphs , correct?
>
> If so - is there any source available to get started? I can program in Python, but am not familiar with the libs so this would be very helpful.
>

No need for any effort anymore - I drafted a functional version myself. If I can polish the python code enough to become world-visible AND if I find a way to show python code here, I'll share it in this thread.

GnocchiPup

#23

@dboing

Lichess uses the 2300 statistics.

Sf uses their own games, at +1 the graph has already reached 75% score rate. 50% win 50% draw. At +1.5, the graph shows almost a 100% win rate.

github.com/official-stockfish/WDL_model

Unlike the 2300 score rate where 90++% score rate is around +4?

Periastron edited

#24

@Periastron said in #22:
> No need for any effort anymore - I drafted a functional version myself. If I can polish the python code enough to become world-visible AND if I find a way to show python code here, I'll share it in this thread.

OK, here we go. Some remarks for the simple code below, which I consider kind of a template for you to build on. It works, but definitely, it is not a production-ready application (your requirements will differ from mine, anyway).
- Will handle only the first game in a PGN.
- Requires engine configuration to enable sending WDL on UCI. Example config for LC0 is in #16
- Built and exclusively tested with Python 3.12.2, python-chess 1.10.0, matplotlib 3.8.3
- Python has significant whitespace, of course. Forum soft eats leading space chars. Please replace "----" by four spaces.
- PGN is required by the standard to be ISO-8859-1 (compare http://www.saremba.de/chessgml/standards/pgn/pgn-complete.htm ). Chessbase however apparently exports UTF-8, requiring that encoding to be specified with "open". Works with lichess' export format, too. YMMV.
- When adapting the strings for the PGN file and engine path, use slashes instead of backslashes even on windows (like I did).
- The code may not be all that pythonic. Haven't coded for a while. Feel free to publish a cooler version if you like (not necessarily here, but please leave a link here).
- Last but not least: can't provide coding/fixit support. If required, please refer to the usual forums.

import chess.pgn
import chess.engine

import matplotlib.pyplot
import numpy

pgn = open("Q:/Schach/PartieAnalyseWork/Testpartie.pgn", encoding='UTF-8')
game = chess.pgn.read_game(pgn)

print (game)

engine = chess.engine.SimpleEngine.popen_uci("C:/Program Files/ChessBase/Engines.x64/Leela-0.30.0-CUDA/lc0.exe")
board = game.board()

wdlWhiteWins, wdlDraw, wdlBlackWins, moveAxis= [], [], [], []
ply = 0

for move in game.mainline_moves():
----info = engine.analyse(board, chess.engine.Limit(time=0.3))
----wdlWhiteWins.append(info["wdl"].white().wins/1000)
----wdlDraw.append(info["wdl"].white().draws/1000)
----wdlBlackWins.append(info["wdl"].white().losses/1000)
----moveAxis.append(1+ply/2)

----board.push(move)
----ply += 1

engine.quit()

matplotlib.pyplot.stackplot(moveAxis, wdlWhiteWins, wdlDraw, wdlBlackWins, colors = ["#b0b0b0", "#606060", "#010101"])
matplotlib.pyplot.show()

ItsGam3Tyme

#25

> @RwSF75 said in #3:
> This means that it is damped by this model, which says that a player needs an eval of +3 to have a 50% chance of winning.
@RwSF75 said in #5:
> rawWinningChances(300) returns 50%

*rawWinningChances(300) returns 0.5. You assume that this is a percentage but it is not. The scale is -1 to +1. If you multiply by 50 to get [-50 to + 50] and add 50 to get [0,100], you will get the percentage you desire, which is about 75%. As can be verified with the graph on lichess.org/page/accuracy.

And to confirm, the eval chart and the various eval bars all use winning chances, not raw centipawns. It's more intuitive. Evals of [+4, +6, +8] have winning chances of [81%, 90%, 95%]. You _feel_ like +6 is closer to +8 than +4 even though they all have a difference of +2.

RwSF75

#26

@ItsGam3Tyme said in #25:
> *rawWinningChances(300) returns 0.5. You assume that this is a percentage but it is not. The scale is -1 to +1. If you multiply by 50 to get [-50 to + 50] and add 50 to get [0,100], you will get the percentage you desire, which is about 75%. As can be verified with the graph on lichess.org/page/accuracy.

I don't know what to tell you, 0.5 looks a lot like a percentage. If it reaches 1 it means white has 100% winning chances and if its -1 it means black has 100%.
In the chart at 0cp the value is 50% so 75% is not the winning chances, it's the expected game score.
Either way the point is the same, using Lichess's model you need +3 to have a 75% expected score and using Stockfish you need +1.

dboing

edited

#27

@GnocchiPup
thanks for the link: github.com/official-stockfish/WDL_model
This helped with the floating impression I had, although it means they could drop the words centipawn completely now. But that is well written, it wraps up the implementation information flow well enough for me, I don't even need to read the details to understand what matters in terms of chess land interpretation.

> Stockfish's "centipawn" evaluation is decoupled from the classical value of a pawn, and is calibrated such that an advantage of "100 centipawns" means the engine has a 50% probability to win from this position in selfplay at fishtest LTC time control.

It seems to me that once one is having such an adjustable internal scoring rescaled that way, so it can follow the evolution of the engine and its also evolving niche of other engines (by evolving I do not necessarily mean progress... even if ELO increases... but let's hope that they is enough new "blood" in the pool over time... as when A0 appeared in the kingdom of SF8, and LC0 kept following such injection of new positions), ... long sentence, skip parens and put a comma: , .. so it goes: It seems to me, yadii yadaa that the internal scale is allowed to range the gamut needed to start discerning as many new positions as needed as the pool of engine would keep serving each other new positions, so that one day or in the limit, the whole world of positions would be covered (or could be covered). It used to ramp up, now I would guess it might have to consider resolution improvements.

I just wonder if there is a relation between the same internal score for a position at certain depth, and a moderate depth further to the odds? Perhaps the position information itself has the depth information in a probabilistic way. Well in reasonable chess that is. With legal chess I would not be so sure. But wait, SF is still an exhaustive search engine.

Am I running in circles.. It is not something that would surprise me.

dboing

edited

#28

W+D+L = 1 = 0.999999999... = 100% = 99.999999....

affstein

#29

As these graphs tend to look a bit twitchy, I was wondering, if it makes more sense to just scale draw to a constant (and discarding it) and have white's winning "chances" above the x-axis and black's below.
While this loses the insight of actual probabilities, this might be easier to read visually, right?
Also, I think there is still way more information computers could provide, that WDL lacks, like how difficult it is to play for the best result in a given position. The computer might find a sequence of 7 only-moves to keep the game even and just say the game is even. This sounds especially helpful in opening preparation, it actually seems to me that pros have some resources like this, because it seems to me, that there is a tendency to play openings where the opponent might get into a better position, however the path to get there is rather unlikely

RwSF75

edited

#30

@affstein said in #29:
> As these graphs tend to look a bit twitchy, I was wondering, if it makes more sense to just scale draw to a constant (and discarding it) and have white's winning "chances" above the x-axis and black's below.
> While this loses the insight of actual probabilities, this might be easier to read visually, right?

I'm not sure what you mean... something like the 3rd chart here? imgur.com/a/gUmscVD
I think if you need a chart to make it easier to read visually then the expected score chart (2nd) is fine and conveys basically the same information.