Response to the

GrADS Inter-Module Communication Problem

A Case for the use of Services rather than Libraries

 

Dave Angulo

University of Chicago

Argonne National Laboratory

 

Gregor von Laszewski

Mathematics and Computer Science Division

Argonne National Laboratory

 

 

 

 

 

Ian Foster

Mathematics and Computer Science Division

Argonne National Laboratory

 

 

 

 

 

 

 

 

 

 

 

 

 

Problem

The GrADS project is comprised of several large modules, which are interconnected through the exchange of information.  Traditional information interchange is done through function call APIs to libraries.  We advocate herein a more sophisticated, more widely accessible method that facilitates debugging and helps to ensure that modules work together as a whole while not increasing development efforts nor discernibly increasing overhead.  This technique is based on the use of standard internet based protocols with concomitant use of standardized data formatting tools.

Requirements

The GrADS project has several large functional components.  Each of these components will be developed in isolation at disperse sites.  A major concern of this project is that the interconnectivity of these components might collapse.  A major reason for this concern is that each component is being developed at sites distantly removed from each other.  The project will suffer a major setback if the output prepared by the developers of one module do not correspond with the input expected by the developers of the next one.

A second major concern of this project is imposed on it from the applications groups (which includes developers of Cactus applications).  These developers have stressed extremely strenuously that they require to be allowed to have human intervention at the interfaces between each pair of components.  They require that each of the components be treated as a service and not as a library (according to the head of the applications team, Dr. Gannon of Indiana University).

Example Scenario

It might be beneficial at this point to describe one of the scenarios given for communication between a pair of these modules.  This example involves the communication between the module that performs scheduling and resource selection and the module that performs performance prediction (called the Selector and Predictor herein for the sake of brevity).  The Selector will pass a selection of nodes to the Predictor and ask for a prediction of performance.  This will be done iteratively until the Predictor returns a prediction that fits the requirements of the Selector.
The communications between this pair of modules consists of the Selector sending a connected set of nodes to the Predictor for analysis.  The Predictor then replies with a performance prediction.  The connected set of nodes will be sent (abstractly) as a weighted directed acyclic graph.  The weightings on the connections represents the communication cost.  The performance prediction might be a single numeric entry.

The challenge is to find a means of representing this weighted directed acyclic graph in a manner that is efficient, verifiable by team members for both modules (in isolation from each other), human readable (to fulfill the requirements of Dr. Gannon’s applications team), and that doesn’t increase development efforts.

Choices

There are several potential technologies that can be utilized to transmit communication data between the modules.  These technologies include (1) traditional function call APIs to access libraries, (2) internet services encapsulated in CORBA objects, and (3) standard internet based protocols with concomitant use of standardized data formatting tools.  We will investigate the benefits and drawbacks of each approach.

Traditional APIs do not meet the requirements of this project in several ways.  First, they do not lend themselves to be run on a different physical machine.  This should be a high priority in a distributed environment.  Especially when the subsystems need to be presented as services available to all, hiding the entry and exit points of the subsystems in API calls that are inaccessible to users on different machines should be avoided.

Further, using traditional function calls limits the interoperability with modules written in other languages.  Libraries written in C++ are difficult to call from Fortran programs, and vice versa, thus further limiting the utility of the modules as services.

Finally, traditional APIs do not lend themselves to rigid adherence to data formatting agreed upon by the two teams of developers.  Directed acyclic graphs (from out example) could be put into data structures, but these are not human readable (violating the concerns of the Cactus programmers).

The benefits of traditional function calls are few.  They are well understood by programmers, however.  Programmers may thus feel more comfortable using this technique.

CORBA closes the gap on its use as a general-purpose service.  CORBA objects are accessible as services over the internet, thus presenting themselves as useful objects for human intervention at the interfaces.  CORBA objects, though, are still limiting.  Their limitations are orthogonal to the language limitations imposed by APIs.  CORBA limitations are platform based, as they are not readily accessible to the Microsoft platforms.  Additionally, since the specifications are ambiguous, vendors’ implementations are not always compatible and many vendors have added extensions to the standard.  CORBA objects and IDL specifications are also somewhat difficult and outside of the comfort zone of many programmers.  Moreover, CORBA interfaces cannot easily be inspected although custom objects could be created to allow inspection at each module interface.

Standard protocols combined with standard data formatting tools overcome many of the difficulties of the other two alternatives.  Since we wish to utilize standard internet protocols, the modules are then inherently packages as services available to all.  They confer language and platform independence.  Since we limit our choices to protocols that are standardized, there will be many available tools for humans to inspect or intervene at the module interface boundaries.  Additionally, the choice of standardized protocols implies that the available tools will aid the development teams in debugging and in ensuring that the interfaces on both sides will match each other, even though they are developed in isolation.

The use of standard data formatting tools ensures that tools will be available to force rigid adherence of compatible formatting.  This will give a further aid to guarantee that the interfaces match when the modules are finally put together.  Standard data formatting tools likewise give the human availability to inspect and interfere at the module interfaces because these tools will allow the data to be displayed and entered in a human readable format.  This aids in program development and debugging as well as giving the Cactus programmer their required access.

There are two potential objections to this facility.  The first the question of overhead and the second is the question of whether there is too big of a learning curve for programmers.  It is true that this option has a slight amount of extra overhead, but the majority of the overhead is in opening a socket.  The technology involved is not daunting and with the internet technology becoming so ubiquitous, it can be assumed that most programmers will have been exposed to similar techniques.

Specifics.

A specific standard for both the protocol and the data-formatting tool must be selected.  We propose HTTPS as the protocol of choice.  This protocol is well understood and there are many tools available to allow inspection or intervention of the communications.

Next, we turn to selecting a standard data-formatting tool.  This tool may not bee needed for all communications between modules.  Specifically, if an executable is being sent as the only form of communication, it can be sent as part of the HTTP request as a MIME type object.  But when the Resource selector sends a directed acyclic graph to the Performance Predictor, formatting that data in a standardized, human readable format and ensuring that the data sent adheres to strict formatting rules is essential.

For these purposes, XML seems an ideal tool.  It is a well-established standard and has tools available for creation, verification, and display on all platforms.  Since many people are familiar with HTML, XML is quite easy to learn.  The only problem with XML is in writing the Document Type Declaration (DTD), but since that needs to be done only once, aid from an expert can be enlisted.

Use of XML is preferred over the creation of a new language specific to the domain.  The reason for this is that tools exist to verify adherence to the language rules in XML, whereas they would have to be created for a new language and would not be available on all platforms.  The Globus team has recognized this fact and has an alternate to the Resource Specification Language in XML.

Perhaps an example would be warranted here as the best way to explain why XML could be so useful.  The following example gives a hypothetical markup of a very simplistic directed acyclic graph:

<dag>

  <nodes>

    <node id="113"

          domain="fermat.cs.uchicago.edu"

          mflops="166"

          mbmemory="128"/>

    <node id="242"

          domain="euler.cs.uchicago.edu"

          mflops="233"

          />

  </nodes>

  <connections>

    <connect id="747"

             idorigin="113"

             idterminate="242"

             mbpersec="50"/>

  </connections>

</dag>

 

Conclusions

We believe that standardized protocols and data formatting tools will ease the development task of the GrADS modules and will help to ensure that these modules, developed in isolation, will interact correctly with each other (avoiding the English Channel tunnel mishap).  We also believe that utilization of this methodology will allow each module to be used as a service, fulfilling the requirements of the Cactus developers.