15 February 2006

The queue

You'll have to pardon me if I've been a bit incomprehensible recently, I've been busy writing yet another paper. Hopefully this'll be the last one before I start my write-up in earnest. The basic gist of it is that I'm writing a paper on squashing brains, which always makes for some rather confusing after-dinner conversation.

Anyway, I've been wanting to keep up with Hiren's stream of consciousness dissertation on programming, so here's my attempt at turning your own brain to a liquefied pulp.

I've had the recent displeasure of needing to build ATLAS, LAPACK and CVMLIB, which all together provide an optimised library for linear algebra, needed for our real-time simulation work. Building on x86 presents few problems, with ATLAS only needing three hours to build (!!), giving me a bunch of static (.a) libraries which I can happily link against. For me, a good set of optimisation flags for gcc, on pentium4 processors are:

-O3 -pipe -march=pentium4 -ffast-math -mfpmath=sse -msse2

On pentium-m (e.g. laptop/centrino)

-O3 -pipe -march=pentium-m -ffast-math -mfpmath=sse -msse2

On x86_64 (notably, on AMD64, such as Athlon and Opteron), it's a whole different kettle of fish. Here, I've discovered, if you build your libraries as shared (.so), then you need to build the libraries you link against also as shared libraries, and build the objects with position independent code. This is complicated by the fact that LAPACK, ATLAS and CVMLIB's build scripts are written to mostly generate static objects :|

The solution lay in manually adding -fPIC to pretty much every compile line in building the libraries (usually by editing the top-level make include file), and then repackaging the ordinary static libraries as shared .so files. This can be achieved using a small shell script:

mkdir tmp
cd tmp
ar x ../$1.a
gcc -shared *.o -o../$1.so
rm -rf *.o
cd ..

To create foo.so from foo.a, simply run:

./conv foo

It's a little ugly, but it does work. Additionally then, I've found the set of optimisation flags on x86_64 that work for me are:

-fPIC -funroll-loops -march=k8 -ffast-math -fpeel-loops -m64

The -m64 does make a significant difference; I've found that, in practice, the linear-algebra routines I run (matrix inversion etc.) run about 10-20% faster with -m64 on AMD platforms. If I had time, I'd like to properly check that out. Oh well.

I've also been keeping up on building a variety KDE packages for Mandriva 2006; the latest selection of (mostly eye-candy) RPM's are here, including a 1.3.7 build of the comix window decoration.

And now, since the medium of blogging seems to demand it: indulge yourself with some alternative entertainment.

No comments: