Thursday, August 07, 2008

4 in a row source code

I just published the source code of my 4 in a row program on my four in a row page. While this doesn't appear to fit in a computer checkers blog, I think it does belong for two reasons: first, it is an example of how to write a high-performance game program. Second, this is the second part of my plan to publish some of my source code (CheckerBoard came in first). More to follow...

Tuesday, July 22, 2008

CheckerBoard source code release

My interest in computer checkers is waning - partly because the game is solved, partly because other interests are taking more of my time these days. I still sometimes get mail asking for new features in CheckerBoard, but I haven't done much lately. Since I don't think this will change anytime soon, I have made the source code of CheckerBoard (in version 1.651) public. This code can be used in any way you might see fit; I'm not going to bother with some kind of public license.
I am very much aware that the code is neither particularly good nor clean; CheckerBoard was - like perhaps most software - not really designed well, i.e. it started out as a simple program and then I kept on adding more and more things to it, and so the whole program structure isn't nice. Some of the source code is really old, written at a time when my programming skills were even less developed than they are now. And of course, I'm not a trained programmer who learned how to properly develop software. All of this will make the CheckerBoard source code rather hard to understand, but if you have a lot of patience and a decent working knowledge of standard C, then you can probably get to grips with it.

Friday, July 18, 2008

CheckerBoard 1.651

I just published CheckerBoard 1.651 on my webpage. This release fixes four things:

1) a serious synchronization bug between the multiple threads of CheckerBoard, which could show up in autoplay and engine match mode; on my two newer machines that I usually use for development, this bug didn't show, but on my oldest machine it was very obvious. Even if the bug didn't show clearly on the newer machines, it is possible that it affects engine match outcomes - an upgrade is therefore strongly recommended if you run engine matches.

2) a small bug with graphics updating when changing the piece set (the board would update, the pieces only on the next window resizing).

3) export to HTML now also works from setup positions for Italian checkers (it always worked for English checkers)

4) I forgot to include the file 11man_FEN.txt which is used for 11-man engine matches. It contains 2400 of the 2500 possible 11-man starting positions. 100 of the positions are omitted, because they are likely wins for one side. The newest release includes this file, so that 11-man engine matches with 4800 games can be run (if you have the patience...) for better statistics than with standard engine matches.

If I get no negative feedback on this release, I will also publish the source code of CheckerBoard, since I find it increasingly difficult to find the time to work on CB - making the source public is a better alternative to stopping the development altogether.

But now, enjoy your bug-free 11-man matches!

Sunday, July 13, 2008

CB 1.65 bug

It has been nearly 3 months since I released CheckerBoard 1.65, and I hate to admit that I got bug reports nearly instantly after its release. I finally took the time to look into them (partly because my girlfriend is in Spitzbergen). I cleaned up a lot of my source code in CB 1.65 with the goal of making it public with this release, and unfortunately, I managed to add a rather serious synchronization bug to it. I believe to have found it, some final testing is still required, but probably there will soon be an update available.

Friday, February 22, 2008

Engine match mania

Over the last week, I tested several versions of my latest checkers engine (which carries the rather bland name Cake 1.8) - and I did this on all of my three computers. I own a AMD64 desktop, a CoreDuo laptop and a CoreQuad Desktop. These machines are all about equally fast; the CoreQuad is a bit faster than the other two, it also runs a 64-bit windows. Here are the results of my engine matches against KingsRow 1.16c:



Engine CoreQuad CoreDuo AMD64 total
-------------------------------------------------
Cake 1.8 v1 +28-12 +27-18 +27-20 82-50
Cake 1.8 v2 +19-17 +25-17 +32-15 76-49
Cake 1.8 v3 +23-15 +24-19 +27-17 74-51
Cake 1.8 v4 +21-15 +23-17 +26-17 70-49
Cake Manchester +23-11 +24-13 +24-21 71-45



The results are a mess! I ordered the different versions of Cake 1.8 according to their overall result, and while you can see that v1 is best overall, it is worse than v2 on the AMD64 and worse than Cake Manchester on the CoreDuo. Which means that if I had only been testing on the CoreDuo, I would have been disgusted with the performance of my new program, because it's worse than the old one, and if I had been testing on the AMD64, I would have chosen v2 rather than v1. Which of course means that running engine matches on a single computer (as I did for the past 7 years or so) is clearly insufficient, and it also casts some doubt on the current methodology of using 3 computers - how do I know that using 3 computers is enough? Wouldn't 10 be better? All in all, this is quite a disgusting discovery, and I'm at a bit of a loss of how to proceed when optimizing my engine :-(

Thursday, February 07, 2008

Bug squashing

Yesterday and today I found bugs in Cake's evaluation. I was looking at some games that Cake had lost, and made a change in the evaluation which should have improved the situation. However, although the evaluation was different, the output of the engine remained the same - the exact same number of nodes searched with the same value for the position. That made me *very* suspicious, and I looked at the code a bit more closely, and discovered that during my code cleanup of last summer I had broken a really important piece of knowledge. Being suspicious, I continued looking for bugs and found one minor bug and one cosmetic bug. The match result improved drastically after stomping the first bug, from +19-16 @5s/move to +26-11. I don't have much hope that the minor bug will make a big difference but soon I will see!

Monday, February 04, 2008

Multi-Core speed

Nowadays, CPUs aren't getting much faster in terms of clock speed, but the manufacturers are putting multiple cores in one CPU. While this gives a lot more bang for the buck, it is a challenge for programmers to make use of the newly available resources. Some problems are easy to parallelize, while others are much more difficult. Unfortunately, tree searching such as in two-player-strategy-games belongs to the second class: it's not straightforward at all to implement a parallel game tree search. Of course there are solutions for this, the best-known is called YBWC for young brother wait concept. This is quite easy to explain with words, but really programming it is something different :-(
I haven't managed to find any sensible description of the algorithm with some real code, except for the source code of Crafty, a strong free chess engine. While this is documented quite well on the source code level, there are very many source code files and many of them have the required changes for the parallel search, and it's all very confusing to me. I wonder if there is any sensible YBWC tutorial anywhere on the net?

Anyway, my own plans were different to start with: All I wanted to do was to make my book generator work in parallel - it is quite easy to have multiple threads running on a multi-core-CPU when each thread is just doing a standard search on its own position. The big difficulty only arises when all threads are supposed to help search the same position, because they have to start communicating with each other.

The only problem in having multiple threads running simultaneously is the following: the endgame database access code has to be protected from being accessed by multiple threads, since confusion could occur when two threads try to load data from disk at the same time. In such situations programmers use a "lock" to prevent other threads from accessing critical code when one thread starts using it. This thread has to free up the critical code again once it is done, so that the other threads can also use it. Obviously, this whole process might slow down the program a lot, because one thread can be blocking the others - and the only way of knowing whether this is a problem is to try! I made Cake thread-safe last summer, and finally tested the speed on my new quadcore yesterday. Here is the result:

1 thread: 2050 kN/s
2 threads: 2000 kN/s
4 threads: 1815 kN/s

Running 2 threads simultaneously thus results in a speed loss of about 2.5%, while running 4 threads gives a loss of 11%. Of course, that is not a big price to pay, since in total you are getting 4x89% = 356% out of your CPU compared to having a single thread running. I also have to admit that I just put the whole database lookup call in a critical section; perhaps it is possible to make that more efficient by only putting the parts of the database lookup that affect the database cache in a critical section. Nevertheless, I'm afraid I have no excuses any more and should be working on a multi-threaded book generator now...

Thursday, January 17, 2008

Heavyweight match

I just finished running an engine match between the 64-bit versions of Cake and KingsRow. Cake managed a narrow win, but what makes even happier is that the entire engine match ran without problems, i.e. CB64 and Cake64 and KingsRow64 are all stable enough to run for 12 hours without any errors. I will probably publish the 64-bit versions in a couple of weeks.

As an aside, I got so annoyed with Windows Vista UAC (user account control) popping up TWO message boxes to confirm when I just want to rename a file that I ended up disabling UAC. I really wonder what the MS guys were thinking when they invented this feature. One message box for renaming a file would already be a huge PITA, but two... I guess the only people who don't disable UAC are those who don't know how to do it!

Some more Vista madness: I also noticed that by default my system was set to defragment its harddisk daily, run a full virus check daily, and run the pretty much useless Windows Defender. My system (a QuadCore with 4GB RAM) used to access the harddisk permanently for about 15-30 minutes after the system start. With all this stuff disabled, it is much better. In my quest for energy efficiency, I bought an 80+ power supply (80plus.org), which is not only energy efficient but also very silent (it only generates little heat and thus needs less ventilation than a standard power supply). However, with a loudish harddisk I didn't have any benefit of that. Now I do :-)

Monday, January 14, 2008

64-bit speedups

I tried to find out how much performance increase a 64-bit compile will give compared to the 32-bit compile. I tested three different programs: my Connect 4, Cake and KingsRow. I also have a speedup reported by Ed Gilbert for his international checkers program, KingsRow-10. Here are the numbers:


  • KingsRow: 5.8%
  • Cake: 6.4%
  • Connect 4: 11.6%
  • KingsRow-10: 42.9%


This list shows that all programs are able to benefit from the 64-bit compile, although the gain is rather moderate for all programs except KingsRow-10. Why can some programs benefit more than others? Probably this has to do with how much they make use of 64-bit operations. For example, my Connect 4 program uses a 64-bit representation of the board. KingsRow and Cake use 32-bit representations and don't really have any use for the larger word size on the 64-bit machine. KingsRow-10 on the other hand also uses 64-bit numbers for its board representation and benefits much more than any other program on the list.

With all these numbers I have to add that there are some points to be considered: The KingsRow 32/64-bit versions don't seem to search the exact same tree - in kN/s searched, KR-64 was 5.8% faster, but its search time was only 2.6% lower. Additionally, KR wasn't using the endgame database during this test, since it doesn't recognize it for some reason on my system. Cake 32/64 searches exactly the same number of nodes on the one-minute search I used for this test. Cake32 makes use of some assembler instructions which Cake64 does not, so probably the speed difference would be a bit larger if I found out how to program these functions in 64-bit assembler. The same is true for Connect 4 (For the experts: I don't have a LSB function in assembler for the 64-bit versions).

Sunday, January 13, 2008

Cake 64

I recently bought a new motherboard/CPU for the computer that was generating the opening book database for Cake. My new machine is the most powerful I ever had, with a Core Quad and 4 GB of RAM. For curiosity's sake, I also installed Windows Vista on that machine. I'm not at all convinced by Microsoft's new OS, but at least it gives me the opportunity to finally run my 64-bit compiles myself. Thanks to this, I found the bug in the 64-bit version of Cake, and of course it was a 32/64-bit issue. I was computing the size of a memory block full of pointers for the endgame database with


char *pointer;
int memsize = blocknum * sizeof(int);
pointer = malloc(memsize);


since I had assumed that sizeof(int) would change to 8 byte on the 64-bit machine. But it stayed at 4 bytes, while pointers do need 8 bytes, and of course the whole thing crashed. I replaced the sizeof(int) with sizeof(pointer) and now it works. It appears that this was the only portability issue; I now have a working 64-bit version of Cake on my machine. By a strange coincidence I clocked it to be 6.4% faster than the 32-bit version.