lichess.org
Donate

Using LC0's WDL to Analyze Games

@RwSF75 said in #30:
> I'm not sure what you mean... something like the 3rd chart here? imgur.com/a/gUmscVD
> I think if you need a chart to make it easier to read visually then the expected score chart (2nd) is fine and conveys basically the same information.
Pretty much (what I tried to describe, was your 3rd chart but at each move divide by the drawing probability. Minor difference, I think I like your 3rd chart better though)
The advantage over WDL is that you can very easily compare white and black winning probability (as distance above/below the x-axis).
Based on the 3rd chart, I also came to the conclusion, that they should be bars instead of graphs as just before move 40, where the evaluation flips, it looks like, there is a point where both have winning chances, however this does not seem to be the case.
Actually I think this is the point where I disagree with "score chart [...] conveys basically the same information". Your example is lacking a situation, where both white and black have decent winning chances. This is the whole point of alternative evaluations like WDL as the probability of the game ending in a draw and who is in favor are two theoretically independent pieces of information, which cannot be conveyed by just one graph-line.
@affstein said in #31:
> This is the whole point of alternative evaluations like WDL as the probability of the game ending in a draw and who is in favor are two theoretically independent pieces of information, which cannot be conveyed by just one graph-line.

Well said, and I think that is what the blog meant in the first paragraphs**. WDL being more informative than centipawn. It is not only about centipawn as metric unit, but it being only one dimension of evaluation, one number or quantity.

But when focussed on winning in performance mode, some people neglect that chess can be learned or explored about its "physics" in any outcome class boundary crossing. And the room in between might also be something to experience to a finer sense of how big chess can be. The complement to sharpness is also something to learn, it teaches what sharpness might be.

3 outcome. 2 boundaries. 3 "regions". and the middle region. Yuck, drawish... hmm I think study mode might benefit from changing that aesthetic.

** not my first idea that it would include population information (games, player pairs).
@Periastron please use GitHub or at least pastebin for code! don't subject us to pasted in code as a comment
@affstein
Lichess already does this. Draws are evenly divided to W% and B%. It looks directly correlated to centipawns though because their basis for score rate is the historical 2300+ rapid results given a centipawn score.

However, WDL as done by Lc0 is different as these are playouts from any given position.

So a sharp mutually difficult line would appear to have a very narrow draw area and large White and Black areas.

If I understand your proposal correctly 10 80 10 would be graphed the same as 40 20 40? But it seems to me 40 20 40 implies the position is sharp and more difficult, more so if it extends for several moves.
@jaxu said in #34:
> @Periastron please use GitHub or at least pastebin for code! don't subject us to pasted in code as a comment
If your comment targets the usability of the code provided this way: you can safely ignore it if you're brilliant enough so that writing it yourself is more efficient than getting the format right.

If it targets cluttering a discussion forum with code: the author provided cool graphs with no inital indication how to get them. That's totaly fine and a valuable contribution, but the question raised shortly after:

@GnocchiPup said in #8:
> How were the WDL graphs made? Nibbled?
is obviously legitimate. At least three participants were interested even after the author stated that coding is required. I provided the answer in place, which is clearly on-topic. It has the downside of cluttering the forum with around 30 lines of code. It has the upside that anyone who reads this blogpost can find it, even years after - even if pastebin/whatever, link stability, my account there, or myself are no longer alive by then.

True, the post is useless for strict non-coders, so they need to scroll some lines where it yould not have been required. But even for those with little coding skill, it gives an indication how easy chess programming has become these days. And for those interested, it saves significant work. For me, this was a one-time-thing, and I clearly stated I cannot support in the forum. I also encouraged followups (which my lead to this code snippet to turn into a multi-person project) to choose a different location.

My judgement is: the value added outweighs the downsides. And because I add net value, the choice of "how" should be mine in the first place.
<Comment deleted by user>
@GnocchiPup said in #35:
> However, WDL as done by Lc0 is different as these are playouts from any given position.

Whithin the self-play RL batches during training, no? Which might are more exploration in the first batches, and in my possible wrong uderstanding, more of the "any" notion in early positoin than late position (as there might be more early game terminiatoin in more exploratory batches.

But I do agree that high level human might not be very exploratory (and SF might not be either).

However, not knowing what we know by other means, the mechanics post WDL conversion, might be the same.

But you make a point, I just only nuanced, while asking if that nuance makes sense to you.
@dboing

Take this with a grain of salt.

But here's how I understand on how a0 and Lc0 implements search.

Self play to generate the NN. This provides the engine with the bias on which positions look good, and which to search first.

During the actual game, self play Montecarlo search. This is where its WDL comes from and not an a priori. In theory a SF 0.0 would produce a WDL of 0 100 0, but the same position could look very different for Lc0. Could be 10 80 10, could he 30 60 10. Both cases, best play is draw, but second case is harder for Black. I have a position in mind to test this, might do a post in the future.

This is how I understand how they implement it.
<Comment deleted by user>