Science Application

The purpose of the science application is to execute a project-specific algorithm on a work unit in order to produce result data that can later be analyzed by the project. The science application is unique because it can be used in two different ways. The primary use of the science application is to execute a project's science algorithm on each volunteer's computer. The project client acts as a proxy between the project server and the science application. The project client requests work units from the project server, starts the science application, and sends it the work unit. The client then waits to receive the result, which it returns to the project server. The client instructs the science application to compute the result for a work unit by executing an XML-RPC on the science application. For this reason, the science application must implement an XML-RPC server. The work unit generator and result validator do not need an XML-RPC server because they regularly poll the project server instead of being contacted by the server. This polling method is not possible with the science application for two reasons. In the event that the science application crashed while computing the result for a work unit, the project client would not be able to detect the crash if it were simply waiting for the science application to send a result. However, if the project client initiated a connection with the science application by executing a synchronous XML-RPC, the project client would be notified by a Java Exception that the connection was reset, indicating a problem with the science application. The client could then restart the science application. The second reason why polling was not the best choice is related to the second use for the science application, which is to compute the accepted spot-check results on the computer where the project server is running. If the project uses spot-checks, the project server will start the science application and instruct it to compute the results for one or more work units, which will then become the spot-check work units and results. The figure below shows an overview of the science application's control flow for both of its uses:

Science Application Control Flow

When the science application is started, it should immediately initialize its XML-RPC server and begin listening for XML-RPCs. Example code for creating an XML-RPC server can be found in the Original Components section and also in the example directory of the latest framework release. There are three RPC handlers that the science application must implement. Before explaining the implementation of these RPCs, it is necessary to discuss an important feature of SLINC: check-pointing. If the science application is shut down before completing a computation and returning a result, the computation would normally have to be started from the beginning the next time the science application was started. This process is inefficient if more than a few minutes are required to compute the result for a work unit. Check-pointing addresses this problem by allowing the science application to periodically save its state and to retrieve this state information the next time it is started. This state information is referred to as a check-point. By periodically saving check-points, the science application can resume a previously started computation by retrieving the last check-point and initializing the science algorithm to begin at the appropriate point in the work unit data. Check-points could be saved by writing a file to disk and retrieved by reading that file, but the project client provides an RPC that performs those functions so that the science application does not need to perform disk I/O itself.

To save a check-point, the science application should execute the client.saveCheckpoint RPC on the project client, which always listens for connections on the localhost, or 127.0.0.1, address. This RPC takes a single byte array as its parameter, which is the data to be saved in the check-point. To retrieve a previously saved check-point, the science application can execute the client.getCheckpoint RPC on the project client, which does not take any parameters. This RPC returns a vector containing a byte array. If the byte array has length 0, the project-client did not find a previously saved check-point. Otherwise, the array contains the data from the last check-point that was saved.

There are two RPCs for computing results that the science application must implement. One of them is called sciapp.computeResult, and is executed by the project client when it has a new work unit for which a result must be computed. The other is called sciapp.computeSpotCheckResult, which is executed by the project server to compute spot-check results, if spot-checking is used by the project. These two RPCs perform exactly the same function; the reason for having the project client and project server call separate RPCs is that the check-pointing functionality is only present in the project client, not in the project server. When the sciapp.computeSpotCheckResult RPC is executed, the science application should therefore disable spot-checking, if it is used.

When one of the computeResult RPCs is executed, the science application should convert the given byte array into useful values that can be passed to the science algorithm. It should then spawn a new thread in which to execute the science algorithm; we will refer to this thread as the compute thread. This thread should have the lowest possible priority so that the science algorithm will only use processor time when there are no higher priority threads that require it. Using a higher priority thread may disrupt the volunteers' use of their own computers, which may reduce volunteer participation. Before executing the science algorithm, the compute thread should first execute the client.getCheckpoint RPC on the project client to determine whether the science algorithm should resume a previously started computation. If the project client returns a check-point, the science algorithm should be initialized to resume the previous computation. Otherwise, the science application should be executed, starting at the beginning of the work unit. During its execution, the science algorithm may periodically call the client.saveCheckpoint RPC on the project client, passing it a byte array containing state information so that the computation can be resumed if the science application were shut down before the completion of the current computation. When the science algorithm has finished computing the result for its assigned work unit, the compute thread should terminate. The science application should then encode the result into a byte array, wrap it in a vector, and send it to the project client via the return statement of the computeResult RPC that was executed.

It is very important to write a script called run_sci_app that will start the science application. This script is executed by the project client to start the science application. If the science application is designed to run on Windows platforms, this script should be called run_sci_app.bat; if it is designed to run on UNIX-like platforms, the script should be called run_sci_app.sh. If it is designed to run on both platforms, both scripts will be needed. The science application detects what platform it is running on and executes the appropriate script for that platform. If the run_sci_app.sh script is used, it is very important that all users have read and execute permissions on the script, for example 755. Inside each subdirectory of the example directory in the framework there are example run_sci_app scripts for both Windows and UNIX.