Friday, February 22, 2008

Engine match mania

Over the last week, I tested several versions of my latest checkers engine (which carries the rather bland name Cake 1.8) - and I did this on all of my three computers. I own a AMD64 desktop, a CoreDuo laptop and a CoreQuad Desktop. These machines are all about equally fast; the CoreQuad is a bit faster than the other two, it also runs a 64-bit windows. Here are the results of my engine matches against KingsRow 1.16c:

Engine CoreQuad CoreDuo AMD64 total
Cake 1.8 v1 +28-12 +27-18 +27-20 82-50
Cake 1.8 v2 +19-17 +25-17 +32-15 76-49
Cake 1.8 v3 +23-15 +24-19 +27-17 74-51
Cake 1.8 v4 +21-15 +23-17 +26-17 70-49
Cake Manchester +23-11 +24-13 +24-21 71-45

The results are a mess! I ordered the different versions of Cake 1.8 according to their overall result, and while you can see that v1 is best overall, it is worse than v2 on the AMD64 and worse than Cake Manchester on the CoreDuo. Which means that if I had only been testing on the CoreDuo, I would have been disgusted with the performance of my new program, because it's worse than the old one, and if I had been testing on the AMD64, I would have chosen v2 rather than v1. Which of course means that running engine matches on a single computer (as I did for the past 7 years or so) is clearly insufficient, and it also casts some doubt on the current methodology of using 3 computers - how do I know that using 3 computers is enough? Wouldn't 10 be better? All in all, this is quite a disgusting discovery, and I'm at a bit of a loss of how to proceed when optimizing my engine :-(


Brandon Corfman said...

When you do your benchmarking, do you pick the same games each time, or is this a random draw?

Without knowing better, I'd assume you picked a test suite of 3-move games that proceeds according to the opening books of each program, and then begins to diverge at some point once the engines begin to calculate.

Otherwise, if you're just testing random games, then you're bound to have some differences that you can't account for. I can't imagine you're testing that way, right?

Regardless, tracking these issues down has got to be a tough job.

Ed Gilbert said...

Brandon, these games are with opening books off. With books on all games would be draws.

Martin, I think your results indicate that engine matches should be run with many more games. The 2500 variations of 11-man ballots might be worth a try.

-- Ed

aM0k said...

Hello, Before nothing a greeting as I can obtain cake 1,18 v2?

a greeting and Thanks!

SAID said...

hi Martin !
i would like to inform you that
your latest checkerboard(1.65 32bit)with cake 1.8 has a serious bugs especially evaluation