Friday, February 22, 2008

Engine match mania

Over the last week, I tested several versions of my latest checkers engine (which carries the rather bland name Cake 1.8) - and I did this on all of my three computers. I own a AMD64 desktop, a CoreDuo laptop and a CoreQuad Desktop. These machines are all about equally fast; the CoreQuad is a bit faster than the other two, it also runs a 64-bit windows. Here are the results of my engine matches against KingsRow 1.16c:

Engine CoreQuad CoreDuo AMD64 total
Cake 1.8 v1 +28-12 +27-18 +27-20 82-50
Cake 1.8 v2 +19-17 +25-17 +32-15 76-49
Cake 1.8 v3 +23-15 +24-19 +27-17 74-51
Cake 1.8 v4 +21-15 +23-17 +26-17 70-49
Cake Manchester +23-11 +24-13 +24-21 71-45

The results are a mess! I ordered the different versions of Cake 1.8 according to their overall result, and while you can see that v1 is best overall, it is worse than v2 on the AMD64 and worse than Cake Manchester on the CoreDuo. Which means that if I had only been testing on the CoreDuo, I would have been disgusted with the performance of my new program, because it's worse than the old one, and if I had been testing on the AMD64, I would have chosen v2 rather than v1. Which of course means that running engine matches on a single computer (as I did for the past 7 years or so) is clearly insufficient, and it also casts some doubt on the current methodology of using 3 computers - how do I know that using 3 computers is enough? Wouldn't 10 be better? All in all, this is quite a disgusting discovery, and I'm at a bit of a loss of how to proceed when optimizing my engine :-(

Thursday, February 07, 2008

Bug squashing

Yesterday and today I found bugs in Cake's evaluation. I was looking at some games that Cake had lost, and made a change in the evaluation which should have improved the situation. However, although the evaluation was different, the output of the engine remained the same - the exact same number of nodes searched with the same value for the position. That made me *very* suspicious, and I looked at the code a bit more closely, and discovered that during my code cleanup of last summer I had broken a really important piece of knowledge. Being suspicious, I continued looking for bugs and found one minor bug and one cosmetic bug. The match result improved drastically after stomping the first bug, from +19-16 @5s/move to +26-11. I don't have much hope that the minor bug will make a big difference but soon I will see!

Monday, February 04, 2008

Multi-Core speed

Nowadays, CPUs aren't getting much faster in terms of clock speed, but the manufacturers are putting multiple cores in one CPU. While this gives a lot more bang for the buck, it is a challenge for programmers to make use of the newly available resources. Some problems are easy to parallelize, while others are much more difficult. Unfortunately, tree searching such as in two-player-strategy-games belongs to the second class: it's not straightforward at all to implement a parallel game tree search. Of course there are solutions for this, the best-known is called YBWC for young brother wait concept. This is quite easy to explain with words, but really programming it is something different :-(
I haven't managed to find any sensible description of the algorithm with some real code, except for the source code of Crafty, a strong free chess engine. While this is documented quite well on the source code level, there are very many source code files and many of them have the required changes for the parallel search, and it's all very confusing to me. I wonder if there is any sensible YBWC tutorial anywhere on the net?

Anyway, my own plans were different to start with: All I wanted to do was to make my book generator work in parallel - it is quite easy to have multiple threads running on a multi-core-CPU when each thread is just doing a standard search on its own position. The big difficulty only arises when all threads are supposed to help search the same position, because they have to start communicating with each other.

The only problem in having multiple threads running simultaneously is the following: the endgame database access code has to be protected from being accessed by multiple threads, since confusion could occur when two threads try to load data from disk at the same time. In such situations programmers use a "lock" to prevent other threads from accessing critical code when one thread starts using it. This thread has to free up the critical code again once it is done, so that the other threads can also use it. Obviously, this whole process might slow down the program a lot, because one thread can be blocking the others - and the only way of knowing whether this is a problem is to try! I made Cake thread-safe last summer, and finally tested the speed on my new quadcore yesterday. Here is the result:

1 thread: 2050 kN/s
2 threads: 2000 kN/s
4 threads: 1815 kN/s

Running 2 threads simultaneously thus results in a speed loss of about 2.5%, while running 4 threads gives a loss of 11%. Of course, that is not a big price to pay, since in total you are getting 4x89% = 356% out of your CPU compared to having a single thread running. I also have to admit that I just put the whole database lookup call in a critical section; perhaps it is possible to make that more efficient by only putting the parts of the database lookup that affect the database cache in a critical section. Nevertheless, I'm afraid I have no excuses any more and should be working on a multi-threaded book generator now...