[OPUS]

Frequently Asked Questions


OPUS Applications



OPUS Applications


What are the advantages of using OPUS?

OPUS is designed to help you distribute processing over a local network of machines and to allow you to start up a number of separate instances of any task. The central objective is throughput: performing the analysis of a number of independent datasets robustly and efficiently.

A useful feature of this system is the ability to monitor the status of both datasets and processes.


What kind of processes can be run in the pipeline?

Any process which can be run from a shell script: a simple shell script itself, or an executable invoked from a shell script. Of course, those executables can be written in any language including IRAF and IDL.

The default size of the "PROCESS" field in a process status file is 6 characters which effectively limits your process names to this size or less. Although the OAPI supports changes to the process status file structure, including the size of this field, the current version of the PMG does not.


Can a script read from STDIN (standard input)?

No.

The script should get its arguments from the command line, from environment variables, or from the process resource file. All keywords prepended with ENV. in the process resource file become environment variables so these variables are easily accessible.


Can I use command line arguments for my tasks?

Yes.

This is a standard way to get information into your task. OPUS pipeline tasks run in the background, but command line arguments can be specified in the process resource file. This is a convenient way to use the same task for slightly different functions.


How can I send a password to my task?

Since processes run within a pipeline cannot interact with the user, OPUS provides a method for prompting the user who starts a pipeline task for a password, and then sends the password to the task.

This is accomplished by adding a particular entry in the task's process resource file called PASSWORD. The PASSWORD entry in the resource file indicates to the Process Manager (PMG) that it should prompt the user starting the process for a password. Once the password is received by the PMG, it is encrypted and passed on to the task as a pair of extra parameters on the task's COMMAND line.

If you repeatedly start a group of processes on the same machine, all of which require the same password, you can avoid getting prompted separately for each task password by setting a common PASSWORD_PROMPT resource value in the process resource files. When the PMG starts a group of processes which contain PASSWORD resource entries, it will also check the PASSWORD_PROMPT resource for each one, and will only prompt the user for a password for each unique combination of PASSWORD_PROMPT value and remote node name (the node name is part of the grouping because of the password encryption scheme used). The entered password will be supplied to all of the processes with the same PASSWORD_PROMPT value and node name in the group.

Note that if you do not provide the PASSWORD_PROMPT resource value, the process is assigned to the default password group (a blank prompt). This may be OK if you only have a single process requiring a password, but if multiple processes requiring passwords are started together on the same node, and all are assigned to the default password group, only a single password is prompted for and is passed to each of the processes. To avoid this, it is highly recommended that the PASSWORD and PASSWORD_PROMPT resource items be used together. The PASSWORD_PROMPT can also provide the nice feature of reminding the user what kind of password they are being asked to supply (userid, machine name, etc.).


How are the values of environment variables set?

Keywords from the process resource file prepended with ENV. and their values are defined as environment variables. Note that ENV. does not become part of the keyword name. These environment variables only are available in the shell in which your task is run.

There are additional environment variables defined by the OPUS system as a task is started in response to an event. First, all tasks have access to the EVNT_TYPE variable that takes on one of the following three possible values:

   EVNT_FILE_EVENT, EVNT_OSF_EVENT, or EVNT_TIME_EVENT
Each of these values corresponds to the type of trigger that caused the event. The number of items in the event is placed in the EVNT_NUM variable. Unless you have configured your application to handle more than one item per event, EVNT_NUM will be 1.

Tasks that are triggered by a file event additionally have access to the EVNT_NAME variable:

   EVENT_NAME      The filename which triggers the event.

If EVNT_NUM is greater than 1, then there will be additional environment variables defined of the form EVENT_NAME1, EVENT_NAME2, etc..

Tasks that are triggered by an OSF have access to all the information in that OSF:

   OSF_DATASET     The name of the exposure that triggered the task.
   OSF_DATA_ID     The type of the exposure (by default, a 3 character descriptor).
   OSF_DCF_NUM     An arbitrary sequence number.
   OSF_START_TIME  The time the exposure started in the pipeline.
As in the case of file events, if EVNT_NUM is greater than 1, there will be additional environment variables defined but followed by a number (e.g., OSF_DATA_ID1) for each item in the event.

Time events have no event-related environment variables defined other than EVNT_TYPE.

You can use the values of these environment variables as command line arguments to the tasks you write, or in the bodies of the tasks themselves. See the path file section for more details on the relationship between path file variables and the environment variables from process resource files.


What is the difference between an external and an internal poller?

External polling processes are programs or scripts that have no knowledge of how the OPUS blackboard works. These processes are invoked through the OPUS task XPOLL (eXternal POLLer). Most of the sample pipeline applications are external pollers. The g2f task is the only exception; it was implemented using the OAPI.

The OPUS system uses information in the process resource file to decide when to activate a process. In the case of external pollers, xpoll responds to an event by spawning its associated process that in turn communicates back to xpoll how successful it was in processing the event through an exit status code. The code is mapped to specific keyword values in the process resource file by xpoll, and the OPUS system is informed of the disposition of the event. External pollers are started by the OPUS system each time work is required, then they exit to be started again later by the OPUS system when more work is needed.

Internal polling processes, like g2f, are programs written with knowledge of how the OPUS blackboard works. They are typically processes with some significant start-up overhead (e.g. database reading, etc.). The process is written to perform the start-up overhead and then enter a polling loop to wait for pipeline events. The process stays active as it polls for and processes events. Internal pollers are built using the OAPI to communicate with the OPUS system.


How do I add a processing step to a pipeline?

There are three things that are required: a new script for that step, a new corresponding process resource file, and an update to the pipeline.stage file.


Are there any limitations on naming a new task?

Yes.

The name of the task is used in the construction of the process status file, and that file has a limited number of characters to hold the task name. The default limit is six (6) characters, and eventhough this limit can be changed, the current PMG expects the default value.


What are some of the common gotchas I should beware of when I write my OPUS tasks?

Besides the traditional dangers of memory leaks and unclosed files you should be attuned to the possibility that many copies of your task might be running simultaneously. Thus it is important to open files for reading only (in C use 'r', not 'r+') when possible, to expect collisions when updating databases, and always to terminate with a known status.

Status messages to the standard output device will automatically be kept in a process log file. It is extraordinarily useful to write to the log file both wisely and often to document the actions taken by a pipeline task.

Also keep in mind that an external polling process has access to process resource file keywords and values through its environment only for those keywords prepended with ENV..


What kind of message reporting does OPUS provide?

The severity of the OPUS messages reported to any of the log files can be selected with the environment variable, MSG_REPORT_LEVEL. This allows the user to specify which type of messages should be reported. Ordinarily the number of 'Debug' message can be quite large, and during normal operations 'Informational' messages (and those more severe) will be sufficient. The user can set the current report level to following values: ALL, DIAG, INFO, WARN, ERROR, FATAL, NONE. The report levels are cumulative which implies, for example, that the WARN level will receive WARN, ERROR and FATAL messages. The default level for message reporting is to report INFO messages.

Note that when the report level is set to NONE, no log files are produced.


Top of Applications FAQ

Top of OPUS FAQ