Monday, February 04, 2008

Multi-Core speed

Nowadays, CPUs aren't getting much faster in terms of clock speed, but the manufacturers are putting multiple cores in one CPU. While this gives a lot more bang for the buck, it is a challenge for programmers to make use of the newly available resources. Some problems are easy to parallelize, while others are much more difficult. Unfortunately, tree searching such as in two-player-strategy-games belongs to the second class: it's not straightforward at all to implement a parallel game tree search. Of course there are solutions for this, the best-known is called YBWC for young brother wait concept. This is quite easy to explain with words, but really programming it is something different :-(
I haven't managed to find any sensible description of the algorithm with some real code, except for the source code of Crafty, a strong free chess engine. While this is documented quite well on the source code level, there are very many source code files and many of them have the required changes for the parallel search, and it's all very confusing to me. I wonder if there is any sensible YBWC tutorial anywhere on the net?

Anyway, my own plans were different to start with: All I wanted to do was to make my book generator work in parallel - it is quite easy to have multiple threads running on a multi-core-CPU when each thread is just doing a standard search on its own position. The big difficulty only arises when all threads are supposed to help search the same position, because they have to start communicating with each other.

The only problem in having multiple threads running simultaneously is the following: the endgame database access code has to be protected from being accessed by multiple threads, since confusion could occur when two threads try to load data from disk at the same time. In such situations programmers use a "lock" to prevent other threads from accessing critical code when one thread starts using it. This thread has to free up the critical code again once it is done, so that the other threads can also use it. Obviously, this whole process might slow down the program a lot, because one thread can be blocking the others - and the only way of knowing whether this is a problem is to try! I made Cake thread-safe last summer, and finally tested the speed on my new quadcore yesterday. Here is the result:

1 thread: 2050 kN/s
2 threads: 2000 kN/s
4 threads: 1815 kN/s

Running 2 threads simultaneously thus results in a speed loss of about 2.5%, while running 4 threads gives a loss of 11%. Of course, that is not a big price to pay, since in total you are getting 4x89% = 356% out of your CPU compared to having a single thread running. I also have to admit that I just put the whole database lookup call in a critical section; perhaps it is possible to make that more efficient by only putting the parts of the database lookup that affect the database cache in a critical section. Nevertheless, I'm afraid I have no excuses any more and should be working on a multi-threaded book generator now...

No comments: