Work Unit Generator

The purpose of the work unit generator is to partition a project's science data into smaller work units to be computed by the science application, which runs on each volunteer's computer. There is no limit to the amount of data that can be stored in a work unit. The figure below shows an overview of the work unit generator's control flow:

Work Unit Generator Control Flow

The first step taken by the work unit generator after it is started is to execute the server.getLastWorkUnit RPC on the project server. This RPC instructs the server to return all information about the last work unit that was generated, including its ID, data, creation date, point value, and priority. Complete specifications of this RPC and all other RPCs that project developers can use can be found in the XML-RPC Interface Specification. The reason that this RPC should be executed when the work unit generator is first started is that the generator will need to know at what position in the science data to resume partitioning. If the work unit generator is being started for the first time, there will not be any work units in the project database, so the project server should notify the generator, which would then start generating work units from the beginning of the science data. Thus, there are two possible return types for the server.getLastWorkUnit RPC: either it is a vector or some other object representing an exception. If the return value is not a vector, it does not matter what that other object is; it is only important to know whether the return value is a vector or not. If it is not a vector, then an exception has occurred. Depending on the XML-RPC library that is used, determining whether the return value is a vector can be done in two ways. Using the xmlrpc-c library, it is possible to query the status of the xmlrpc_c::rpcPtr object after the RPC has been executed. If the isSuccessful() method of the instance of that object returns true, it means that the RPC completed successfully, and a vector has been returned containing the relevant work unit information. If that method returns false, then the server threw an exception, meaning there were no work units in the project database. The process is simpler using the Apache library. The return value of the RPC will initially be a java.lang.Object. To determine whether that object is a vector, cast it to a java.util.Vector inside a try block. If a java.lang.ClassCastException was thrown as a result of the cast, then the object was an exception. If the cast was successful, then the server did not throw an exception, and it is now possible to access the individual elements in the vector.

If there were no work units in the database, the work unit generator should generate one or more work units to be sent to the project server. The first time the work unit generator is started, we recommend that several work units be generated so that there will be enough in the system to distribute to clients when they begin connecting. After generating these work units, the work unit generator should execute one of the server.addWorkUnit RPCs on the project server. There are several variants of this RPC, each accepting different combinations of parameters. These RPCs will be explained shortly.

If the server did return a work unit, the work unit generator should inspect that work unit to determine what part of the science data to partition next. In most cases, it will only be necessary to inspect the data contained in the work unit, which is at index 1 in the vector. When the work unit generator has determined what data will be assigned to the next work unit, there are two options for how to proceed. The first option is to query the server to determine the number of ingress work units in the database. An ingress work unit is a work unit that has never been sent to a volunteer to be processed. Although ingress work units are not the only type of work unit that can be sent to a client, there should always be ingress work units in the system to guarantee that whenever a project client requests a work unit there will be one available. The number of ingress work units can be queried by executing the server.getNumWorkUnits RPC. This RPC does not require any parameters, and it returns an integer greater than or equal to zero indicating the number of ingress work units in the project. The work unit generator should decide whether the number of ingress work units the project has is sufficient, and if not, it should generate some number of work units and send them to the project server via one of the server.addWorkUnit RPCs. The second option the work unit generator has is to skip executing the server.getNumWorkUnits RPC, and instead generate and send a single work unit to the server by executing one of the server.addWorkUnit RPCs. These RPCs all return an integer indicating the number of clients that are waiting for work units. A client is said to be waiting for a work unit if it requested a work unit when the project server did not have any to distribute. The existence of clients that are in a waiting state reduces the efficiency of a project because instead of performing useful computations, these clients are instead in a sleeping state, waiting to request a work unit from the server at a later time. After sending the newly generated work unit to the project server, the work unit generator can decide whether the number of clients waiting for work units is acceptable or not, and it may choose to generate and send additional work units.

There are several variants of the server.addWorkUnit RPC to allow some degree of control over the work units, if desired, while providing a simple interface for projects that do not need the advanced control features. The version of this RPC that allows for the most control over the work units takes five parameters:

Parameter Index Type Description
0 String The work unit ID to use.
1 Byte[] The work unit data.
2 Integer The priority of this work unit, where priority >= 0.
3 Integer The point value of this work unit, where points >= 0.
4 Double The number of seconds until the work unit should expire, where seconds >= 0.

Recall that clients executing this RPC must add each of these parameters to a vector in the order specified in the Parameter Index column, and then execute the RPC by passing that vector to the XML-RPC library. All XML-RPC parameters must be wrapped in a vector, even if only one parameter is required, so to be concise we will no longer mention adding parameters to a vector before executing an XML-RPC.

The first parameter allows specification of the work unit ID. This ID is displayed in the log files when the work unit is assigned and when results are returned for this work unit. When extracting the results from the database using the extract_results script, a directory is created for each work unit, and the name of each directory will be the work unit ID. If one of the server.addWorkUnit variants that does not require a work unit ID is executed, a work unit ID will be generated by the server. The second parameter is the byte array representing the data to be stored in that work unit. The third parameter is the priority of the work unit. A priority value of 0 indicates the lowest priority, and higher values indicate higher priorities. Work units with higher priorities are guaranteed to be distributed to clients before work units with lower priorities. Priorities are relative, so if two work units have the same priority, the work units will be distributed according to a first-in-first-out (FIFO) ordering. If one of the server.addWorkUnit variants that does not require a priority is executed, the default priority of 0 is used. The fourth parameter is the point value for the work unit. Some work units may require more computations than others, so volunteers can be rewarded differently depending on the difficulty of the work unit assigned to them. The point value of a work unit is added to a volunteer's score after the volunteer returns a result for that work unit. If the result is later found to be invalid, the volunteer's score is decreased by the point value of the work unit. If one of the server.addWorkUnit variants that does not require a point value is executed, the default point value of 1 is used. The last parameter is the work unit's expiration time. This parameter is used to impose a limit on the amount of time a work unit has between being added to the project and being retired. The transitioner optimistically assumes that all clients will return results promptly. It will therefore only assign a work unit a certain number of times. That number is the minimum number of results that is required for each work unit, defined in the project configuration. However, if a client is assigned a work unit, and then never returns a result, that work unit might never be retired. The purpose of the expiration time is to prevent this from happening by limiting the amount of time a work unit can spend in the system. If a work unit is not retired before its expiration time, that work unit can be assigned to other volunteers. The expiration time is specified in seconds. Note that if the expiration time is shorter than the average time required for a client to complete a work unit, many work units will expire unnecessarily, resulting in wasted work because those work units will be assigned to additional volunteers. If one of the server.addWorkUnit variants that does not require an expiration time is executed, the default expiration time of one day is used.

Different projects may have different needs for the level of control over work units, so there are several variants of the server.addWorkUnit RPC, each allowing different combinations of parameters to be used. All of these variants are presented in the XML-RPC Interface Specification.