Diminishing Returns and Values of Positions

9.3 Diminishing Returns and Values of Positions

9.3.1 C

RAFTY

Goes Deep

Several researchers used CRAFTYfor their go-deep experiments. However, none had such a large set of test positions at his disposal as we had (over 40,000 positions).

Steenhuisen (2005) observed deep-search behavior of CRAFTYon different test sets and reported different Best Change rates and Best Change rate decreases for different test sets. This and the following section will show that best-change rates strongly depend on the values of the positions included in a test set.

As described in Section 8.2, to determine the best available approximation of the utility value of each analyzed position, the backed-up evaluation at the deepest search depth served as an “oracle”. We devised six different groups of positions based on their estimated utility values, as given in Table 9.1. In usual terms of chess players, the positions of Groups 1 and 6 could be labeled as positions with “decisive advantage”, positions of Groups 2 and 5 with “large advantage”, while Groups 3 and 4 consist of positions regarded as approximately equal or with a “small advantage”

at most.

Table 9.1: Number of positions in each of the six groups of data. The groups were devised based on backed-up heuristic evaluation values obtained at search depth of 12 plies (CRAFTY).

Group 1 2 3 4 5 6

Evaluation (x) x<-2 -2≤x<-1 -1≤x<0 0≤x<1 1≤x<2 x≥2

CRAFTY 4,011 3,571 10,169 18,038 6,008 6,203

The results for each of the six groups are presented in Figure 9.1. The curves clearly show a different deep-search behavior of the program for the different groups, depending on the estimated value of positions they consist of. The chance of new best moves being discovered at higher depths is significantly higher for balanced positions than for positions with a decisive advantage. It is interesting to observe that this phenomenon does not yet occur at the shallowest search depths, while in the results of RYBKAit manifests itself at each level of search (see Section 9.3.2).

Tables 9.2 and 9.3 show the results for Groups 4 and 6. While the results resemble the ones obtained by Steenhuisen [Ste05] on the 4,500 positions in a sense that both

9. FACTORSAFFECTING DIMINISHING RETURNS FORSEARCHING DEEPER

Figure 9.1: Go-deep results of CRAFTYon the six different groups of positions.

Table 9.2: Results of CRAFTYfor the 18,038 positions of Group 4.

Search Best Change Fresh Best (d-2) Best (d-3) Best Mean

depth in % (SE) in % in % in % evaluation

3 35.96 (0.36) 100.00 - - 0.36

4 34.47 (0.35) 74.88 25.12 - 0.37

5 33.18 (0.35) 64.16 27.34 8.50 0.37

6 32.34 (0.35) 54.38 28.44 11.38 0.37

7 30.48 (0.34) 49.53 31.14 9.51 0.37

8 29.86 (0.34) 42.81 31.45 11.27 0.38

9 27.75 (0.33) 40.02 33.87 10.81 0.38

10 26.48 (0.33) 37.77 33.31 10.57 0.38

11 24.53 (0.32) 34.79 33.48 11.14 0.38

12 23.17 (0.31) 32.26 33.07 12.04 0.39

Best Change and Fresh Best rates decrease consistently with increasing search depth, the rates nevertheless significantly differ for each of the two groups of positions.

The 95%-confidence bounds for Best Change (calculated using the Equation 9.2) 156

9.3. Diminishing Returns and Values of Positions

Table 9.3: Results of CRAFTYfor the 6,203 positions of Group 6.

Search Best Change Fresh Best (d-2) Best (d-3) Best Mean

depth in % (SE) in % in % in % evaluation

3 37.42 (0.61) 100.00 - - 2.64

4 32.27 (0.59) 73.93 26.07 - 2.76

5 30.13 (0.58) 64.85 24.83 10.33 2.84

6 26.60 (0.56) 55.70 28.06 9.70 2.95

7 26.21 (0.56) 49.88 27.37 10.52 3.04

8 23.99 (0.54) 39.92 31.18 11.02 3.17

9 22.44 (0.53) 37.21 32.18 12.72 3.29

10 20.47 (0.51) 36.30 30.79 11.50 3.42

11 18.30 (0.49) 31.37 32.42 12.07 3.54

12 17.85 (0.49) 29.27 29.99 13.91 3.68

at the highest level of search performed for the samples of 18,038 and 6,203 positions of Groups 4 and 6 are [22.56;23.97] and [16.91;18.82], respectively.

9.3.2 R

YBKA

Goes Deep

The results of go-deep experiments with RYBKA will not only confirm that Best Change rates depend on the values of the positions, but also demonstrate that the chance of new best moves being discovered at higher depths is lower at all depths compared to CRAFTY, which is rated more than 300 rating points lower on the Swedish (SSDF) Rating List [Kar08]. Table 4 presents the subsets evaluated by RY

-BKA, analogous to those presented in Table 9.1 and evaluated by CRAFTY.

Table 9.4: Number of positions in each of the six groups of data. The groups were devised based on backed-up heuristic evaluation values obtained at search depth of 12 plies (RYBKA).

Group 1 2 3 4 5 6

Evaluation (x) x<-2 -2≤x<-1 -1≤x<0 0≤x<1 1≤x<2 x≥2

RYBKA₁₂ 1,263 1,469 9,808 22,644 3,152 2,133

The results of RYBKA presented in Figure 9.2 resemble the results of CRAFTY

in Figure 9.1, except that all the curves appear significantly lower on the vertical

9. FACTORSAFFECTING DIMINISHING RETURNS FORSEARCHING DEEPER

scale. This result seems to be in line with the observation, based on the results by Sadikov and Bratko [SB06], that the amount of knowledge a program has (or the quality of the evaluation function) influences the deep-search behavior of a program.

The big difference in strength of the two programs is likely to be the consequence of RYBKA having a stronger evaluation function; it is as well commonly known that chess players prefer evaluations of this program to RYBKA’s evaluations. In their study, Sadikov and Bratko [SB06] claim that diminishing returns will start to manifest themselves earlier using a program with a stronger evaluation function, based on experiments performed on chess endgames, at the same time suspecting that similar results would be obtained with more pieces on the board. The results presented here seem to be in accordance with that claim.

Figure 9.2: Go-deep results of RYBKA on the six different groups of positions.

Tables 9.5 and 9.6 (results of RYBKA) are the analogous of Tables 9.2 and 9.3 (results of CRAFTY). We will just briefly mention here that the mean evaluations of both programs in won positions monotonically increase with increasing search depth, which is in accordance with our findings presented in Chapter 8.

158

9.3. Diminishing Returns and Values of Positions

Table 9.5: Results of RYBKA for the 22,644 positions of Group 4.

Search Best Change Fresh Best (d-2) Best (d-3) Best Mean

depth in % (SE) in % in % in % evaluation

3 28.59 (0.30) 100.00 - - 0.31

4 27.36 (0.30) 71.42 28.58 - 0.31

5 27.00 (0.30) 62.95 27.12 9.93 0.31

6 25.44 (0.29) 53.32 28.13 10.45 0.31

7 24.00 (0.28) 49.91 26.63 11.21 0.30

8 22.88 (0.28) 45.78 26.85 11.37 0.30

9 22.50 (0.28) 42.97 25.63 11.46 0.30

10 20.73 (0.27) 37.17 28.46 11.31 0.30

11 20.03 (0.27) 36.16 27.76 11.78 0.30

12 19.01 (0.26) 34.08 27.87 11.85 0.30

Table 9.6: Results of RYBKAfor the 2,133 positions of Group 6.

Search Best Change Fresh Best (d-2) Best (d-3) Best Mean

depth in % (SE) in % in % in % evaluation

3 22.36 (0.90) 100.00 - - 2.49

4 20.39 (0.87) 77.24 22.76 - 2.60

5 17.63 (0.83) 66.76 24.20 9.04 2.77

6 16.41 (0.80) 54.86 25.43 10.57 2.89

7 16.32 (0.80) 49.71 26.44 10.06 3.01

8 15.24 (0.78) 44.00 23.69 13.23 3.14

9 14.49 (0.76) 45.63 24.60 10.36 3.27

10 13.31 (0.74) 42.61 23.94 12.68 3.42

11 12.61 (0.72) 37.92 24.16 8.55 3.59

12 12.19 (0.71) 36.54 30.00 7.31 3.75

The 95%-confidence bounds for Best Change at the highest level of search per-formed for the samples of 22,644 and 2,133 positions of Groups 4 and 6 are [18.51;19.53]

and [10.87;13.65], respectively.

9. FACTORSAFFECTING DIMINISHING RETURNS FORSEARCHING DEEPER

9.4 Diminishing Returns and Quality of Evaluation

In document pri ˇcloveˇskem in raˇcunalniˇskem reˇsevanju problemov (Strani 167-172)