Monday, December 18, 2006

Core Duo!

It's been a long time since I last bought a new computer (21 months, to be exact, a record for me in the last 10 years!) - but I just did it again: I got myself a new laptop with an Intel Core Duo CPU - unfortunately not one of the even better Core 2 Duos. However, even this machine is absolutely great! Cake runs about as fast on my Core Duo at 2GHz as on my Athlon 64 at 2GHz. But the cool thing is that I can run two instances of Cake at the same speed, if I want to. Not that this really helps me in any way... it just shows why these dual core machines are so cool. The days of single core CPUs are numbered (as far as PCs go, anyway), and the first quad-cores for consumers are already on the market. Which gives me a bit of a headache, because it means that I should try to rewrite Cake so that it can use these new CPUs with multiple cores! All good chess programs have this capability, but the only checkers program that I know of that could use multiple processors was Chinook. However, the speedup from multiple processors there was quite lousy compared to what is reported for chess programs. For chess, it's something like 1.9x faster for dual cores and 3.5x faster for quad-cores, for Chinook it was approximately 3x faster on 16 CPUs [1]. It's not quite clear to me whether checkers is inherently tougher to parallelize than chess or whether the implementation in Chinook was poor. Any opinions?


arhuaco said...

I read the reference (lu93parallel.pdf) some time ago and I found the conclusions discouraging. I think that you should try the parallel version, Anyway, I think that you can avoid locking when querying hash tables (adding a checksum field).

Also, the caches (level 1 and level 2) of these new PCs are huge. So, I guess that it would be nice to try a parallel version of the game in these computers. The inter-process communication is a lot faster nowadays. In the old days, these guys used machines with slower IPC.

Having A LOT of ram also helps, since a lot of disk accesses will be cached.

I got a dual core this year, and I haven't done anything fun with it.

I switched to UNIX in 1999 because I found it easier to use POSIX threads. I did not like the threading model of WINNT.

arhuaco said...

I insist. These new PCs have a lot of power. Perhaps you could try (as an experiment) a very simple multi-threading strategy with shared hash table and a simple-minded parallel search of the root tree. Just to see what happens. I'd love to know the results.

I would like to do my master thesis in parallel checkers programming. (I'm kinda lazy, so I will have to do something fun in order to study formally, again).

Databases are so important in checkers that I've thought that perhaps the best way to make a checkers program faster is to use RAID and those things. Make the queries to the endgame databases faster (using different disks in different buses).

Martin Fierz said...

Hi Nelson!

I would definitely love to work on a parallel version of Cake, but the constraints on my life probably won't allow me to do it :-(. Besides, as you say, the Chinook results are very discouraging! I will try to continue improving CB, because there I can make one small change at the time, and it doesn't take much thinking. Quite unlike trying to parallelize Cake!

I don't really think the database issue is such a big problem nowadays. This topic is by far overrated in the Chinook publications; perhaps it was very important then, but it certainly isn't today when you can allocate 1GB database cache. I see my search speed drop by about a factor of 2 when going from an opening to an endgame position; that means that the endgame database lookup is expensive, but not all that much more than the regular evaluation. The trick is not to look up a position when it is "too close" to the leafs of the tree unless you already have it in RAM. "too close" is a function of the speed of your harddisk/CPU and the amount of RAM you have. If you look up all positions from disk, then your search slows down to a crawl, up to 100x slower than without the database lookup IIRC.

About RAID etc I don't really know: the big problem with checkers databases is latency, not throughput. I always only load 1KB from disk when I look up a position, and in terms of throughput that just doesn't cost anything these days, where you have close to 100MB/s transfer rate: 1KB takes 0.01 ms to fetch. The random access seek time is about 10ms and completely dominates the time it takes to load stuff from disk.

best regards

arhuaco said...

>About RAID etc I don't really
>know: the big problem with
>checkers databases is latency,
>not throughput.

Let's not think of raid. I was wrong here.

If you have N disk buses, and N disks, and you make queries to them in parallel (let's say that each thread queries a different database), then there will be a benefit in latency :) In fact, since some processes will sleep when you're querying a database, I guess you can launch more threads (let's say 3 or 4) than the number of processors you have.

Let's hope we can test this someday.

You you use RAID 1 it _should_ happen automatically, but RAID is not really needed. Each thread could access a different disk.

I think my PC has 4 sata buses! So having expensive disks could really make a difference here.

arhuaco said...

>You you use RAID 1 it _should_
>happen automatically, but RAID
>is not really needed. Each thread
>could access a different disk.

A friend told me that this does not happen automatically.

So, different threads could use different disks.