Camille Coti: QosCosGrid image on Grid5000

As a reminder, if you want to reserve machines and deploy an image on them, you must issue your reservation in the "deploy" queue. For example, if you want N machines during H hours in interactive mode, on the gdx cluster:

$ oarsub -I -l nodes=N,walltime=H -t deploy -p "cluster='gdx'"

If you want to issue a reservation, replace the -I (for "interactive") option by -r <date and time you want your reservation to start>. For example:

$ oarsub -r "2009-05-02 20:00:00" -l nodes=N,walltime=H -t deploy -p "cluster='gdx'"

To be allowed to use machines, you must be connected with the reservation. When you are reserving machines in interactive mode, the shell that issued the request is in the reservation. It is "holding" the reservation, which means that if you close this shell, you are losing your reservation and your machines are rebooting. To connect to a reservation, obtain the reservation number with oarstat, and connect to it:

$ oarsub -C <reservation number>

Then once your reservation starts you can deploy the image as follows:

$ kadeploy -f $OAR_FILE_NODES -l ccoti -e QosCosGrid_XP

Not all the nodes may be deployed correctly. You will see the status of your machines at the end of the deployment. It may not be serious if a machine is not deployed correctly: sometimes you have to redeploy some machines 2 or 3 times before they are all up and running. It may be caused by a machine that takes too long to reply a ping or to reboot, etc.

If you want to make changes to the image, you will have to save it and register it under your own account. To save the image, log in on the front-end machine of the cluster and use ssh and tar. If the image you want to save is on gdx-182, you will type:

$ ssh root@gdx-182 "tar --posix --numeric-owner --one-file-system -zcf - /" > images/qcgompi.tgz

The simplest way to register an environment is to write a file that describes its characteristics. You will find one in my home directory: /home/orsay/ccoti/save_env.

Pass this file to the environment registration tool like follows:

$ karecordenv -fe save_env

/!\ Be very careful when it comes to specify the path to the kernel and initrd, otherwise your machines will not be able to boot.

If they don't boot and you don't understand why, you can open a remote console on a machine with kaconsole:

$ kaconsole -m <machine name>

Don't do this in an important shell (like the one that is holding your reservation), because it does not return.

It has a user account and, of course, a super-user account. The passwords are the usual ones, ask me if you don't know them. The user account can use sudo with no password.

I installed QCG-OMPI in the ~/qcgompi folder and some tools in the ~/tools folder. You can find vi(m), emacs, gcc, g77, g++, gdb and usual things like this on the machine. It also has NFS server/client modules. To deploy your own NFS in synchronous mode:

on the server:
$ sudo exportfs -o rw,sync \*:/home
to mount your NFS partition on the clients, listed in a machinefile (let's call the server gdx-XXX):
$ for i in $(cat machinefile) ; do ssh $i sudo mount gdx-XXX:/home /home ; done

Please don't try to deploy an NFS on the whole grid, but stick to a '1 cluster = 1 NFS' rule.

/!\ remove the server from the machinefile, otherwise it will try to mount /home on /home, and while mounting on /home it will unlink the "real" /home. If you do this, there is nothing else to do than re-deploying.

You will have to deploy a grid infrastructure by yourself (usually on the QCG testbed I'm the one who is deploying it). A deployment script is in ~/INRIA-dynamic. You have to pass it a configuration script that describes your clusters. It follows the usual format. You describe each cluster one by one, and list the machines in each cluster.

The description syntax is the following:

Interconnection techniques:

1 -> direct
2 -> traversing TCP
3 -> proxy

On Grid5000 all the ports are open (as long as you stay inside g5k), so traversing TCP will be equivalent with direct. Just put anything for <port min> and <port max>, as long as you put something greater than 1024 (that can only be used by the super user) and that you change <port min> between two successive deployments.

A little example:

1 5 1 qcg.lri.fr qcg.lri.fr 129.175.22.112 4010 4100
qcg.lri.fr
qcg1.lri.fr
qcg2.lri.fr
qcg3.lri.fr
qcg4.lri.fr
2 1 1 matrix.scic.ulst.ac.uk 129.175.22.112 25000 25100
matrix.scic.ulst.ac.uk
3 4 3 node2.qoscosgrid.man.poznan.pl qcg.lri.fr 129.175.22.112 5000 5100
node2.qoscosgrid.man.poznan.pl
node3.qoscosgrid.man.poznan.pl
node4.qoscosgrid.man.poznan.pl
node1.qoscosgrid.man.poznan.pl

Deploy the infrastructure with the following command:

$ qcg-run.sh -c <config file> -p /home/mpi/qcgompi/bin -m <port min> -M <port max>

You will have to do this each time you deploy the machines, but not each time you run an application. The infrastructure is persistent but it must be deployed manually when the machines have just booted.

Once the infrastructure is deployed, you can have fun with your applications doing just what you do on the testbed. A list of the available machines are in /tmp/machinefile, and the MCA parameters you need to use are in /tmp/mcaparams.conf. Don't forget to set the JOBID on every machine.

You can play with colors (the definitive version). To ensure compatibility with the other layers of the QCG stack, colors must be strings (or arrays of characters). I have put an example of how to handle string colors in ~/test_colors.c. Since MPI_Comm_split() takes an integer as a color, a QCG_ColorToInt() function is provided. I extended the MPI API so that it can be called like any MPI function. So far I implemented only the C binding, but if you need it in Fortran or C++ I can implement other bindings.

The topology has to be passed as an environment variable called QCG_TOPOLOGY. In the full QosCosGrid stack this variable will be set by the deployment mechanism, but on the image we only have QCG-OMPI so you will have to write it by yourself. First give the depths, separated by one pipe (|), then two pipes, and then the list of colors, separated by one pipe. The example you will find in ~/.bashrc on the image is:

export QCG_TOPOLOGY='3|3|3|3|3||c1|c2|c4|c1|c2|c4|c1|c3|c4|c1|c3|c4|c1|c3|c4'

It defines 3 depths for 5 processes. You don't have to define this variable if you are not using topology discovery features.

/!\ It may not work for uneven topologies. I tested it quickly and it seems to work, but this code does not work on the QCG testbed (must be some uninitialized variable that does not have the convenient value on qcg). An example of an uneven topology:

2|2|3|3|3||c1|c2|c1|c2|c1|c3|c4|c1|c3|c4|c1|c3|c4

Which will give you (output of test_colors):

0    1    2    3    4        <---- ranks
2    2    3    3    3         <---- depths
c1   c1   c1   c1   c1
c2   c2   c3   c3   c3
          c4   c4   c4
-- integers --
0    1    2    3    4       <---- ranks
2    2    3    3    3       <---- depths
0    0    0    0    0
1    1    2    2    2
          3    3    3