lichess.org
Donate

Using LC0's WDL to Analyze Games

Are the WDL graph colors the wrong way around? For example in the Carlsen Vidit game, the blue area decreases over time, which would correspond to blue meaning white win chance
@jaxu said in #11:
> Are the WDL graph colors the wrong way around? For example in the Carlsen Vidit game, the blue area decreases over time, which would correspond to blue meaning white win chance

You are correct, I wrote it the wrong way around in my explanation. I've updated the post, thank you.
BTW, Stockfish 16 also has a parameter to output WDL values. I used it to create a visual bar in the Move Assistant feature of LiChess Tools. I actually find it more interesting than the eval based border color. Yet I can't completely trust the statistical model that generates the values. I mean, they are completely made up, estimations based on some past games database.
@jk_182 said in #13:
> I used python-chess to analyse the games. At the end of this blog post lczero.org/blog/2020/04/wdl-head/ there is an explanation how to get the WDL when analysing with Leela.

Got it, and succeeded in enabling WDL output plus WDL Elo calibration. lc0.config snippet follows:
--show-wdl=true
--wdl-calibration-elo=2300

If the python-chess you are referring to is this one:
pypi.org/project/python-chess/
we'd still have some code to write to scan through the game, record the scores and plot them using a nice graphing lib to get these cool graphs , correct?

If so - is there any source available to get started? I can program in Python, but am not familiar with the libs so this would be very helpful.

THANKS a lot for the insightful article in the first place, and also for any code you could point us to.
@TotalNoob69 said in #14:
> BTW, Stockfish 16 also has a parameter to output WDL values. I used it to create a visual bar in the Move Assistant feature of LiChess Tools. I actually find it more interesting than the eval based border color. Yet I can't completely trust the statistical model that generates the values. I mean, they are completely made up, estimations based on some past games database.

This seems contradictory. How can it be "completely made up" when its based on the results of millions of recent games?
If in those games you see that in 95% of them white wins with a +1.5 eval then outputing a 95% chance of winning for white with that eval seems logical.
OK, maybe I was too brutal in expressing the idea. They are not made up, they are derived statistically. What I meant is that there are WDL records, like how many actual games have been played and won, drawn or lost, and there are estimations based on other games and other positions.

You have to take the statistical estimation with a grain of salt. Even if SF evaluates two positions the exact same value it doesn't mean the positions are the same. It's the difference between a "losing position based on SF" and a gambit that wins a lot in human games.

Even mathematically you are translating a numeric value into three other values, there is bound to be some information loss. Also note that in the SF engine the WDL setting is boolean, I didn't see any ELO level setting like for LC0. Obviously differently rated people would have different WDL ratios from the same position.
Both lichess and SF are using lichess or equivalent human databases to compute their WDL conversion curves with some statistical assumptions within. LC0 uses its own games, and directly uses WDL outcomes in its position evaluation and policy probabilities.

This might be old news, but it still my current understanding. Anyone is welcome to correct this point of view. I just gave up trying to figure out more from source code or near source-code documentation. This does not mean it has not improved over the past 3 years. But here is a good place to update each other, thanks to the blog author judicious efforts.
@dboing said in #19:
> Both lichess and SF are using lichess or equivalent human databases to compute their WDL conversion curves with some statistical assumptions within. LC0 uses its own games, and directly uses WDL outcomes in its position evaluation and policy probabilities.
>
> This might be old news, but it still my current understanding. Anyone is welcome to correct this point of view. I just gave up trying to figure out more from source code or near source-code documentation. This does not mean it has not improved over the past 3 years. But here is a good place to update each other, thanks to the blog author judicious efforts.

- Lichess uses 2300+ Elo rated rapid games from June 2022.
github.com/lichess-org/lila/pull/11148

- Stockfish has used its own games since the WDL output was introduced in the engine. For the calibration of SF 16.1, 2.1M games (130M positions) were used.
github.com/official-stockfish/Stockfish/commit/1100688
github.com/official-stockfish/Stockfish/commit/5c2b385
github.com/official-stockfish/Stockfish/wiki/UCI-&-Commands#setoption (UCI_ShowWDL)