Sunday, July 13, 2008
CB 1.65 bug
It has been nearly 3 months since I released CheckerBoard 1.65, and I hate to admit that I got bug reports nearly instantly after its release. I finally took the time to look into them (partly because my girlfriend is in Spitzbergen). I cleaned up a lot of my source code in CB 1.65 with the goal of making it public with this release, and unfortunately, I managed to add a rather serious synchronization bug to it. I believe to have found it, some final testing is still required, but probably there will soon be an update available.
Friday, February 22, 2008
Engine match mania
Over the last week, I tested several versions of my latest checkers engine (which carries the rather bland name Cake 1.8) - and I did this on all of my three computers. I own a AMD64 desktop, a CoreDuo laptop and a CoreQuad Desktop. These machines are all about equally fast; the CoreQuad is a bit faster than the other two, it also runs a 64-bit windows. Here are the results of my engine matches against KingsRow 1.16c:
The results are a mess! I ordered the different versions of Cake 1.8 according to their overall result, and while you can see that v1 is best overall, it is worse than v2 on the AMD64 and worse than Cake Manchester on the CoreDuo. Which means that if I had only been testing on the CoreDuo, I would have been disgusted with the performance of my new program, because it's worse than the old one, and if I had been testing on the AMD64, I would have chosen v2 rather than v1. Which of course means that running engine matches on a single computer (as I did for the past 7 years or so) is clearly insufficient, and it also casts some doubt on the current methodology of using 3 computers - how do I know that using 3 computers is enough? Wouldn't 10 be better? All in all, this is quite a disgusting discovery, and I'm at a bit of a loss of how to proceed when optimizing my engine :-(
Engine CoreQuad CoreDuo AMD64 total
-------------------------------------------------
Cake 1.8 v1 +28-12 +27-18 +27-20 82-50
Cake 1.8 v2 +19-17 +25-17 +32-15 76-49
Cake 1.8 v3 +23-15 +24-19 +27-17 74-51
Cake 1.8 v4 +21-15 +23-17 +26-17 70-49
Cake Manchester +23-11 +24-13 +24-21 71-45
The results are a mess! I ordered the different versions of Cake 1.8 according to their overall result, and while you can see that v1 is best overall, it is worse than v2 on the AMD64 and worse than Cake Manchester on the CoreDuo. Which means that if I had only been testing on the CoreDuo, I would have been disgusted with the performance of my new program, because it's worse than the old one, and if I had been testing on the AMD64, I would have chosen v2 rather than v1. Which of course means that running engine matches on a single computer (as I did for the past 7 years or so) is clearly insufficient, and it also casts some doubt on the current methodology of using 3 computers - how do I know that using 3 computers is enough? Wouldn't 10 be better? All in all, this is quite a disgusting discovery, and I'm at a bit of a loss of how to proceed when optimizing my engine :-(
Thursday, February 07, 2008
Bug squashing
Yesterday and today I found bugs in Cake's evaluation. I was looking at some games that Cake had lost, and made a change in the evaluation which should have improved the situation. However, although the evaluation was different, the output of the engine remained the same - the exact same number of nodes searched with the same value for the position. That made me *very* suspicious, and I looked at the code a bit more closely, and discovered that during my code cleanup of last summer I had broken a really important piece of knowledge. Being suspicious, I continued looking for bugs and found one minor bug and one cosmetic bug. The match result improved drastically after stomping the first bug, from +19-16 @5s/move to +26-11. I don't have much hope that the minor bug will make a big difference but soon I will see!
Monday, February 04, 2008
Multi-Core speed
Nowadays, CPUs aren't getting much faster in terms of clock speed, but the manufacturers are putting multiple cores in one CPU. While this gives a lot more bang for the buck, it is a challenge for programmers to make use of the newly available resources. Some problems are easy to parallelize, while others are much more difficult. Unfortunately, tree searching such as in two-player-strategy-games belongs to the second class: it's not straightforward at all to implement a parallel game tree search. Of course there are solutions for this, the best-known is called YBWC for young brother wait concept. This is quite easy to explain with words, but really programming it is something different :-(
I haven't managed to find any sensible description of the algorithm with some real code, except for the source code of Crafty, a strong free chess engine. While this is documented quite well on the source code level, there are very many source code files and many of them have the required changes for the parallel search, and it's all very confusing to me. I wonder if there is any sensible YBWC tutorial anywhere on the net?
Anyway, my own plans were different to start with: All I wanted to do was to make my book generator work in parallel - it is quite easy to have multiple threads running on a multi-core-CPU when each thread is just doing a standard search on its own position. The big difficulty only arises when all threads are supposed to help search the same position, because they have to start communicating with each other.
The only problem in having multiple threads running simultaneously is the following: the endgame database access code has to be protected from being accessed by multiple threads, since confusion could occur when two threads try to load data from disk at the same time. In such situations programmers use a "lock" to prevent other threads from accessing critical code when one thread starts using it. This thread has to free up the critical code again once it is done, so that the other threads can also use it. Obviously, this whole process might slow down the program a lot, because one thread can be blocking the others - and the only way of knowing whether this is a problem is to try! I made Cake thread-safe last summer, and finally tested the speed on my new quadcore yesterday. Here is the result:
1 thread: 2050 kN/s
2 threads: 2000 kN/s
4 threads: 1815 kN/s
Running 2 threads simultaneously thus results in a speed loss of about 2.5%, while running 4 threads gives a loss of 11%. Of course, that is not a big price to pay, since in total you are getting 4x89% = 356% out of your CPU compared to having a single thread running. I also have to admit that I just put the whole database lookup call in a critical section; perhaps it is possible to make that more efficient by only putting the parts of the database lookup that affect the database cache in a critical section. Nevertheless, I'm afraid I have no excuses any more and should be working on a multi-threaded book generator now...
I haven't managed to find any sensible description of the algorithm with some real code, except for the source code of Crafty, a strong free chess engine. While this is documented quite well on the source code level, there are very many source code files and many of them have the required changes for the parallel search, and it's all very confusing to me. I wonder if there is any sensible YBWC tutorial anywhere on the net?
Anyway, my own plans were different to start with: All I wanted to do was to make my book generator work in parallel - it is quite easy to have multiple threads running on a multi-core-CPU when each thread is just doing a standard search on its own position. The big difficulty only arises when all threads are supposed to help search the same position, because they have to start communicating with each other.
The only problem in having multiple threads running simultaneously is the following: the endgame database access code has to be protected from being accessed by multiple threads, since confusion could occur when two threads try to load data from disk at the same time. In such situations programmers use a "lock" to prevent other threads from accessing critical code when one thread starts using it. This thread has to free up the critical code again once it is done, so that the other threads can also use it. Obviously, this whole process might slow down the program a lot, because one thread can be blocking the others - and the only way of knowing whether this is a problem is to try! I made Cake thread-safe last summer, and finally tested the speed on my new quadcore yesterday. Here is the result:
1 thread: 2050 kN/s
2 threads: 2000 kN/s
4 threads: 1815 kN/s
Running 2 threads simultaneously thus results in a speed loss of about 2.5%, while running 4 threads gives a loss of 11%. Of course, that is not a big price to pay, since in total you are getting 4x89% = 356% out of your CPU compared to having a single thread running. I also have to admit that I just put the whole database lookup call in a critical section; perhaps it is possible to make that more efficient by only putting the parts of the database lookup that affect the database cache in a critical section. Nevertheless, I'm afraid I have no excuses any more and should be working on a multi-threaded book generator now...
Thursday, January 17, 2008
Heavyweight match
I just finished running an engine match between the 64-bit versions of Cake and KingsRow. Cake managed a narrow win, but what makes even happier is that the entire engine match ran without problems, i.e. CB64 and Cake64 and KingsRow64 are all stable enough to run for 12 hours without any errors. I will probably publish the 64-bit versions in a couple of weeks.
As an aside, I got so annoyed with Windows Vista UAC (user account control) popping up TWO message boxes to confirm when I just want to rename a file that I ended up disabling UAC. I really wonder what the MS guys were thinking when they invented this feature. One message box for renaming a file would already be a huge PITA, but two... I guess the only people who don't disable UAC are those who don't know how to do it!
Some more Vista madness: I also noticed that by default my system was set to defragment its harddisk daily, run a full virus check daily, and run the pretty much useless Windows Defender. My system (a QuadCore with 4GB RAM) used to access the harddisk permanently for about 15-30 minutes after the system start. With all this stuff disabled, it is much better. In my quest for energy efficiency, I bought an 80+ power supply (80plus.org), which is not only energy efficient but also very silent (it only generates little heat and thus needs less ventilation than a standard power supply). However, with a loudish harddisk I didn't have any benefit of that. Now I do :-)
As an aside, I got so annoyed with Windows Vista UAC (user account control) popping up TWO message boxes to confirm when I just want to rename a file that I ended up disabling UAC. I really wonder what the MS guys were thinking when they invented this feature. One message box for renaming a file would already be a huge PITA, but two... I guess the only people who don't disable UAC are those who don't know how to do it!
Some more Vista madness: I also noticed that by default my system was set to defragment its harddisk daily, run a full virus check daily, and run the pretty much useless Windows Defender. My system (a QuadCore with 4GB RAM) used to access the harddisk permanently for about 15-30 minutes after the system start. With all this stuff disabled, it is much better. In my quest for energy efficiency, I bought an 80+ power supply (80plus.org), which is not only energy efficient but also very silent (it only generates little heat and thus needs less ventilation than a standard power supply). However, with a loudish harddisk I didn't have any benefit of that. Now I do :-)
Monday, January 14, 2008
64-bit speedups
I tried to find out how much performance increase a 64-bit compile will give compared to the 32-bit compile. I tested three different programs: my Connect 4, Cake and KingsRow. I also have a speedup reported by Ed Gilbert for his international checkers program, KingsRow-10. Here are the numbers:
This list shows that all programs are able to benefit from the 64-bit compile, although the gain is rather moderate for all programs except KingsRow-10. Why can some programs benefit more than others? Probably this has to do with how much they make use of 64-bit operations. For example, my Connect 4 program uses a 64-bit representation of the board. KingsRow and Cake use 32-bit representations and don't really have any use for the larger word size on the 64-bit machine. KingsRow-10 on the other hand also uses 64-bit numbers for its board representation and benefits much more than any other program on the list.
With all these numbers I have to add that there are some points to be considered: The KingsRow 32/64-bit versions don't seem to search the exact same tree - in kN/s searched, KR-64 was 5.8% faster, but its search time was only 2.6% lower. Additionally, KR wasn't using the endgame database during this test, since it doesn't recognize it for some reason on my system. Cake 32/64 searches exactly the same number of nodes on the one-minute search I used for this test. Cake32 makes use of some assembler instructions which Cake64 does not, so probably the speed difference would be a bit larger if I found out how to program these functions in 64-bit assembler. The same is true for Connect 4 (For the experts: I don't have a LSB function in assembler for the 64-bit versions).
- KingsRow: 5.8%
- Cake: 6.4%
- Connect 4: 11.6%
- KingsRow-10: 42.9%
This list shows that all programs are able to benefit from the 64-bit compile, although the gain is rather moderate for all programs except KingsRow-10. Why can some programs benefit more than others? Probably this has to do with how much they make use of 64-bit operations. For example, my Connect 4 program uses a 64-bit representation of the board. KingsRow and Cake use 32-bit representations and don't really have any use for the larger word size on the 64-bit machine. KingsRow-10 on the other hand also uses 64-bit numbers for its board representation and benefits much more than any other program on the list.
With all these numbers I have to add that there are some points to be considered: The KingsRow 32/64-bit versions don't seem to search the exact same tree - in kN/s searched, KR-64 was 5.8% faster, but its search time was only 2.6% lower. Additionally, KR wasn't using the endgame database during this test, since it doesn't recognize it for some reason on my system. Cake 32/64 searches exactly the same number of nodes on the one-minute search I used for this test. Cake32 makes use of some assembler instructions which Cake64 does not, so probably the speed difference would be a bit larger if I found out how to program these functions in 64-bit assembler. The same is true for Connect 4 (For the experts: I don't have a LSB function in assembler for the 64-bit versions).
Sunday, January 13, 2008
Cake 64
I recently bought a new motherboard/CPU for the computer that was generating the opening book database for Cake. My new machine is the most powerful I ever had, with a Core Quad and 4 GB of RAM. For curiosity's sake, I also installed Windows Vista on that machine. I'm not at all convinced by Microsoft's new OS, but at least it gives me the opportunity to finally run my 64-bit compiles myself. Thanks to this, I found the bug in the 64-bit version of Cake, and of course it was a 32/64-bit issue. I was computing the size of a memory block full of pointers for the endgame database with
char *pointer;
int memsize = blocknum * sizeof(int);
pointer = malloc(memsize);
since I had assumed that sizeof(int) would change to 8 byte on the 64-bit machine. But it stayed at 4 bytes, while pointers do need 8 bytes, and of course the whole thing crashed. I replaced the sizeof(int) with sizeof(pointer) and now it works. It appears that this was the only portability issue; I now have a working 64-bit version of Cake on my machine. By a strange coincidence I clocked it to be 6.4% faster than the 32-bit version.
char *pointer;
int memsize = blocknum * sizeof(int);
pointer = malloc(memsize);
since I had assumed that sizeof(int) would change to 8 byte on the 64-bit machine. But it stayed at 4 bytes, while pointers do need 8 bytes, and of course the whole thing crashed. I replaced the sizeof(int) with sizeof(pointer) and now it works. It appears that this was the only portability issue; I now have a working 64-bit version of Cake on my machine. By a strange coincidence I clocked it to be 6.4% faster than the 32-bit version.
Wednesday, August 22, 2007
Housekeeping
I recently inherited the house of my grandfather, and now have a lot more housekeeping to do than earlier. Somehow, this might have helped me in my decision to do some housekeeping with Cake. Those of you who have programmed probably know the feeling: even if you started out with a more or less clean concept (I certainly didn't with Cake), things eventually get out of hand. You add an idea here, put in a quick fix there, and before you know it, your code is unreadable and unmaintainable. That's about the state that Cake was in before my summer holidays in the Austrian mountains. There, besides reading the Harry Potter finale and climbing some mountains, I cleaned up Cake's source code during two whole weeks. That much time is necessary, since you really have to stand back, and get an overview over your whole program to decide exactly how you want to do your cleaning up. You can't just do it on a single day, or fix it by programming one evening per month - until you get back to programming, you will have forgotten everything again (at least that's what would happen to me, but then I turned 36 recently!).
The result of two weeks of hard programming is that there are no global variables left in Cake, that Cake can be switched to thread-safe mode (at a small speed penalty), and that much of Cake's code is now much clearer and cleaner than before. It doesn't play any better at all though - but now that I have (nearly) finished the housekeeping part, I might be more inclined to work on it again!
The result of two weeks of hard programming is that there are no global variables left in Cake, that Cake can be switched to thread-safe mode (at a small speed penalty), and that much of Cake's code is now much clearer and cleaner than before. It doesn't play any better at all though - but now that I have (nearly) finished the housekeeping part, I might be more inclined to work on it again!
Sunday, February 18, 2007
64-bit versions of CheckerBoard and Cake
Linux users could already benefit from the new powerful 64-bit processors for quite some time, while M$'s win64 never really got out of its beta phase. With the release of Windows Vista, this has changed, and 64-bit computing is now also available to the masses. I run neither Linux nor Vista, but I do have a compiler capable of producing 64-bit executables. I wonder whether they will run under 64-bit Vista? If you happen to own a 64-bit Vista, you can try it by downloading CB_64.zip (264KB), and installing it in your CheckerBoard directory. To do this, you need to extract checkerboard64.exe to the CheckerBoard directory, and CakeM64.dll to your engines directory. Run checkerboard64.exe and choose CakeM64.dll as your engine. Let me know what happens!
Friday, February 09, 2007
Wow! CheckerBoard runs under Vista
I haven't been doing any checkers-related stuff lately. I'm all the more pleased to hear that CheckerBoard and Cake run under M$ newest OS - that means I don't have to do any checkers-related stuff in the near future :-)
And no, of course I didn't try this myself: I'm going to avoid Vista for as long as possible!
And no, of course I didn't try this myself: I'm going to avoid Vista for as long as possible!
Subscribe to:
Posts (Atom)