I have created an image on Grid5000 for Opteron-based
clusters featuring QCG-OMPI and various tools. The name of the
image is QosCosGrid_XP and it is registered under my account
(ccoti).
Contents:
As a reminder, if you want to reserve machines and deploy an
image on them, you must issue your reservation in the "deploy"
queue. For example, if you want N machines during H hours in
interactive mode, on the gdx cluster:
$ oarsub -I -l nodes=N,walltime=H -t deploy -p "cluster='gdx'"
If you want to issue a reservation, replace the -I (for
"interactive") option by -r <date and time you want your
reservation to start>. For example:
$ oarsub -r "2009-05-02 20:00:00" -l nodes=N,walltime=H -t deploy -p "cluster='gdx'"
To be allowed to use machines, you must be connected with the
reservation. When you are reserving machines in interactive
mode, the shell that issued the request is in the
reservation. It is "holding" the reservation, which means that
if you close this shell, you are losing your reservation and
your machines are rebooting. To connect to a reservation,
obtain the reservation number with oarstat, and connect to
it:
$ oarsub -C <reservation number>
Then once your reservation starts you can deploy the image as
follows:
$ kadeploy -f $OAR_FILE_NODES -l ccoti -e QosCosGrid_XP
Not all the nodes may be deployed correctly. You will see the
status of your machines at the end of the deployment. It may not
be serious if a machine is not deployed correctly: sometimes you
have to redeploy some machines 2 or 3 times before they are all
up and running. It may be caused by a machine that takes too
long to reply a ping or to reboot, etc.
If you want to make changes to the image, you will have to
save it and register it under your own account. To save the
image, log in on the front-end machine of the cluster and use
ssh and tar. If the image you want to save is on gdx-182, you
will type:
$ ssh root@gdx-182 "tar --posix --numeric-owner --one-file-system -zcf - /" > images/qcgompi.tgz
The simplest way to register an environment is to write a
file that describes its characteristics. You will find one in
my home directory: /home/orsay/ccoti/save_env.
Pass this file to the environment registration tool like
follows:
$ karecordenv -fe save_env
/!\ Be very careful when it comes to specify the path to the
kernel and initrd, otherwise your machines will not be able to
boot.
If they don't boot and you don't understand why, you can open
a remote console on a machine with kaconsole:
$ kaconsole -m <machine name>
Don't do this in an important shell (like the one that is
holding your reservation), because it does not return.
It has a user account and, of course, a super-user
account. The passwords are the usual ones, ask me if you don't
know them. The user account can use sudo with no password.
I installed QCG-OMPI in the ~/qcgompi folder and some tools
in the ~/tools folder. You can find vi(m), emacs, gcc, g77, g++,
gdb and usual things like this on the machine. It also has NFS
server/client modules. To deploy your own NFS in synchronous
mode:
- on the server:
$ sudo exportfs -o rw,sync \*:/home
- to mount your NFS partition on the clients, listed in a
machinefile (let's call the server gdx-XXX):
$ for i in $(cat machinefile) ; do ssh $i sudo mount
gdx-XXX:/home /home ; done
Please don't try to deploy an NFS on the whole grid, but
stick to a '1 cluster = 1 NFS' rule.
/!\ remove the server from the machinefile, otherwise it will
try to mount /home on /home, and while mounting on /home it will
unlink the "real" /home. If you do this, there is nothing else
to do than re-deploying.
You will have to deploy a grid infrastructure by yourself
(usually on the QCG testbed I'm the one who is deploying it). A
deployment script is in ~/INRIA-dynamic. You have to pass it a
configuration script that describes your clusters. It follows
the usual format. You describe each cluster one by one, and list
the machines in each cluster.
The description syntax is the following:
<cluster #> <# of machines>
<interconnection technique> <frontale>
<broker> <proxy's IP> <port min>
<port max>
Interconnection techniques:
- 1 -> direct
- 2 -> traversing TCP
- 3 -> proxy
On Grid5000 all the ports are open (as long as you stay
inside g5k), so traversing TCP will be equivalent with
direct. Just put anything for <port min> and <port
max>, as long as you put something greater than 1024 (that
can only be used by the super user) and that you change
<port min> between two successive deployments.
A little example:
1 5 1 qcg.lri.fr qcg.lri.fr 129.175.22.112 4010 4100
qcg.lri.fr
qcg1.lri.fr
qcg2.lri.fr
qcg3.lri.fr
qcg4.lri.fr
2 1 1 matrix.scic.ulst.ac.uk 129.175.22.112 25000 25100
matrix.scic.ulst.ac.uk
3 4 3 node2.qoscosgrid.man.poznan.pl qcg.lri.fr 129.175.22.112 5000 5100
node2.qoscosgrid.man.poznan.pl
node3.qoscosgrid.man.poznan.pl
node4.qoscosgrid.man.poznan.pl
node1.qoscosgrid.man.poznan.pl
Deploy the infrastructure with the following command:
$ qcg-run.sh -c <config file> -p /home/mpi/qcgompi/bin -m <port min> -M <port max>
You will have to do this each time you deploy the machines,
but not each time you run an application. The infrastructure is
persistent but it must be deployed manually when the machines
have just booted.
Once the infrastructure is deployed, you can have fun with
your applications doing just what you do on the testbed. A list
of the available machines are in /tmp/machinefile, and the MCA
parameters you need to use are in /tmp/mcaparams.conf. Don't
forget to set the JOBID on every machine.
You can play with colors (the definitive version). To ensure
compatibility with the other layers of the QCG stack, colors
must be strings (or arrays of characters). I have put an example
of how to handle string colors in ~/test_colors.c. Since
MPI_Comm_split() takes an integer as a color, a QCG_ColorToInt()
function is provided. I extended the MPI API so that it can be
called like any MPI function. So far I implemented only the C
binding, but if you need it in Fortran or C++ I can implement
other bindings.
The topology has to be passed as an environment variable
called QCG_TOPOLOGY. In the full QosCosGrid stack this variable
will be set by the deployment mechanism, but on the image we
only have QCG-OMPI so you will have to write it by
yourself. First give the depths, separated by one pipe (|), then
two pipes, and then the list of colors, separated by one
pipe. The example you will find in ~/.bashrc on the image
is:
export QCG_TOPOLOGY='3|3|3|3|3||c1|c2|c4|c1|c2|c4|c1|c3|c4|c1|c3|c4|c1|c3|c4'
It defines 3 depths for 5 processes. You don't have to define
this variable if you are not using topology discovery
features.
/!\ It may not work for uneven topologies. I tested it
quickly and it seems to work, but this code does not work on
the QCG testbed (must be some uninitialized variable that does
not have the convenient value on qcg). An example of an uneven
topology:
2|2|3|3|3||c1|c2|c1|c2|c1|c3|c4|c1|c3|c4|c1|c3|c4
Which will give you (output of test_colors):
0 1 2 3 4 <---- ranks
2 2
3 3 3 <---- depths
c1 c1 c1 c1 c1
c2 c2 c3 c3 c3
c4
c4 c4
-- integers --
0 1 2 3 4 <---- ranks
2
2 3
3 3
<---- depths
0 0 0 0 0
1 1 2 2 2
3 3 3