Christophe Cérin Research Activities

January 6th, 2009

Meta Grid Middleware and fully Decentralized Grid Middleware

— Posted by Christophe Cérin @ 11:17 am —

BonjourGrid is a meta Desktop Grid middleware meaning that it is able to instanciate multiple Desktop Grid middlewares in the same infrastructure. The principle of the proposed approach is to create, dynamically and in a decentralized way, a specific execution environment for each user to execute any type of applications without any system administrator intervention. An environment do not affect another one if it fails.

Each user, behind a desktop machine in his office, can submit an application. It is important to note that BonjourGrid can handle bag of tasks and distributed applications with precedences between tasks. BonjourGrid deploys a master (coordinator), locally on the user machine, and requests for participants (workers). Negotiations to select them should now take place. Using a publish/subscribe infrastructure, each machine publishes its state (idle, worker or master) when changes occur as well as information about its local load or its use cost, in order to provide useful metrics for the selection of participants. Under these assumptions, the master can select a subset of workers nodes according to a strategy that could balance the ``power'' of the node and the ``price'' of its use. The master and the set of selected workers build the Computing Element (CE) which will execute, manage and control the user application. When a CE finishes the application, its master becomes free, returns in idle state and releases all workers to return also to the idle state. When no application is submitted, all machines are in the idle state.

The key idea of BonjourGrid is to rely on existing Institutional Desktop Grid middlewares, and to orchestrate and coordinate multiple instances, i.e multiple CEs, through a publish/subscribe system. Each CE will be owned by the user who has started the master on his machine. Then this CE is responsible for the execution of one or many applications for the same user. In the user level, a user A (resp. B) deploys his application on his machine and the execution seems to be local. Level 1 (middleware layer) shows that, actually, a CE with 4 (resp. 5) workers has been dynamically created, specifically for the user A (resp. B). Level 0 shows that all machines are interconnected and under the availability of any user.

To realize this approach, we decompose BonjourGrid in three fundamental parts: a) A fully decentralized resources discovery layer, based on Bonjour protocol; b) A CE, using DG middlewares such as XtremWeb (XW), Condor or Boinc, which executes and manages the various tasks of applications; c) A fully decentralized protocol of coordination between a) and b) to manage and control all resources, services and CEs.

BonjourGrid has been deployed and tested over 1000 nodes (thank to virtuaization) on Grid'5000 and we have demonstrated how to use it as a job scheduler. BonjourGrid is fully written in Python.

Current work is about taking into account the a fault-tolerant coordinator. Indeed, it is important to continue the execution of the application when the coordinator (user machine) fails (it is disconnected for instance). The second issue is the reservation of participations: in the current version, BonjourGrid allocates available resources for a user without any reservation rules. Thus, if a user demands all the available machines for a long time, BonjourGrid allocates them to him. The third issue is going up to a wide area network. The current version works only in a local network infrastructure, it is important to bypass this constraint. Grafting the new package of Bonjour, (Wide Area Bonjour from Apple), seems to be a good solution to solve this problem.

PastryGrid is yet another Grid Middleware but it proposes a decentralized system for managing Desktop Grid (DG). The idea is to bypass the main drawback of existing systems putting all the control on a single master that can fails. Here, each node can play alternatively the role of client or server. Our main contribution is to design the PastryGrid protocol (based on Pastry -- an overlay network) for DG in order to decentralise the execution of a distributed application with precedence between tasks. The protocol is close to Vigne but it extends it in many directions. For instance the selection of the "next computing node" is fully decentralized and built "on the fly". Comparing to a centralised system, we evaluate our approach over 205 machines executing 2500 tasks. The results show that our decentralised system runs better than the same system configured as a master/slave because it gives less overhead.

People involved: Heithem Abbes, Mohamed Jemni, Christophe Cérin

August 19th, 2011

Middleware for Clouds: the Resilience Project

— Posted by Christophe Cérin @ 14:13 am —
The SlapOS Manifesto by Jean-Paul Smets – Romain Courteaud – Christophe Cérin – Rafael Monerat – Luke Nowak – Cédric Saint Martin.
  • Cloud Monopolies raise growing concern for citizen privacy, business freedom and technical resilience of the information society. SlapOS is a community project driven by Free Software pioneers which aims at powering the information society with a Cloud Computing operating system respectful of the ideals of Freedom, Competition and Resilience of the Internet as they were imagined by Louis Pouzin and Vinton Cerf.
  • While most open source Cloud Computing initiatives focus on technology, SlapOS is a technology neutral lightweight operating system which primarily focuses on disrupting how we do business on the Cloud. SlapOS implements the concept of Free Market and Free Competition at the core of every end-point of the Cloud. SlapOS helps everyone do business: it is an accounting and billing system which can turn any existing application or technology into SaaS, PaaS or IaaS in a matter of hours.
  • Thanks to SlapOS, everyone can now start competing with Google, Salesforce, Amazon.or Microsoft and provide thousands alternatives to Cloud Monopolies. The more companies and citizen join SlapOS community, the more resilience, privacy and freedom irrigates Information Society.
Join the SlapOS community!

August 19th, 2011

High Performance Computing: efficient estimate of properties of large scale logical networks

— Posted by Christophe Cérin @ 12:00 am —

As the number of processors embedded in high performance computing platforms becomes higher and higher, it is vital to force the developers to enhance the scalability of their codes in order to exploit all the resources of the platforms. This often requires new algorithms, techniques and methods for code development that add to the application code new properties: the presence of faults is no more an occasional event but a challenge.

Scalability and Fault-Tolerance issues are also present in hidden part of any platform: the overlay network that is necessary to build for controlling the application or in the runtime system support for messaging which is also required to be scalable and fault tolerant. In this project, we focus on the computational challenges to experiment with large scale (many millions of nodes) logical topologies. We compute Fault-Tolerant properties of different variants of Binomial Graphs (BMG) that are generated at random. For instance, we exhibit interesting properties regarding the number of links regarding some desired Fault-Tolerant properties and we compare different metrics with the Binomial Graph structure as the reference structure.

A software tool has been developed for this study and we show experimental results with topologies containing 21000 nodes. We also explain the computational challenge when we deal with such large scale topologies and we introduce various probabilistic algorithms to solve the problems of computing the conventional metrics (average distance, message density, node connectivity, link connectivity, diameter and fault diameter).

Release number: 3.14116 || Inspired from a css template by WordPress