-
What is OPUS?
-
OPUS is a distributed pipeline system which allows multiple instances of
multiple processes to run on multiple nodes over multiple
paths. OPUS is a
generic pipeline system, and is not tied to any particular processing
environment nor to any particular mission. OPUS is a flexible system which can support
the development of a pipeline for your own telemetry processing.
The OPUS system is designed to support a sequence of independent applications which
take datasets from a raw form and process them to some intermediate or
final state.
The OPUS system does not supply you with the mission-specific applications themselves.
Instead OPUS provides you with a fully distributed "pipeline" processing
environment which will help you organize your applications, monitor the
processing, and control what is going on in your pipeline.
The monitoring is done with two Motif applications: the
Process Manager (PMG),
and the Observation Manager (OMG). The first gives you the tools you need to
control the individual tasks in your pipeline. The second gives you a
quick view of the status of datasets in your pipeline and allows you to
control individual observations.
But more than monitoring, the OPUS system makes it easy for you to
distribute processing over multiple nodes, running several instances of
the same task either on one machine, or on several machines.
The OPUS system is designed to be used for a variety of purposes,
simultaneously. You can establish a production OPUS pipeline, while
running a reprocessing OPUS pipeline at the same time. You can use the
OPUS system to control and monitor a variety of calibration sequences
which simultaneously share reference data and input datasets.
-
What is included in the OPUS System?
-
The OPUS System includes, for
several operating systems,
the eXternal POLLer (XPOLL),
the OPUS
Process Manager (PMG),
and the OPUS
Observation Manager
(OMG). These three components allow you to add your own
applications and construct your own production pipeline.
In addition, the OPUS Application Programming
Interface (OAPI) is included. The OAPI is distributed as an object
library and C++ header files with which you can write applications that
interface to the OPUS system directly from within C++ code, or even extend
the functionality of OPUS to suite your needs.
Alone these components don't process any data. They simply provide
the capability for you to construct a distributed, automated
production pipeline designed to process telemetry for your
instruments. Your pipeline consists of tasks you have written,
which are then started, monitored, and managed by OPUS.
-
What are the OPUS "Managers"?
-
Two pipeline "managers" come with the OPUS system. They are Motif
GUI applications
which assist the user in monitoring the system. The Process
Manager (PMG) not only assists with the task of configuring the system, but
monitors what processes are running on which nodes, and what they are
currently doing.
The Observation Manager (OMG) takes a second view of the pipeline
activities, monitoring what datasets are where in the pipeline and alerting
the operator when observations are unable to complete pipeline processing.
-
What is not included in the distribution?
-
The HST-specific applications are not included. These have limited
applicability to other missions and are designed to process the
telemetry for the specific instruments aboard the Hubble Space
Telescope. However, there is sufficient software
on the demonstration CD to enable you to build a complete production
pipeline with your own applications.
-
What does the OPUS Sample Pipeline demonstrate?
-
In addition to the OPUS System, a simple set of applications is included
in the CD-ROM distribution. This "sample pipeline"
demonstrates some of the capabilities of the OPUS system. It allows you to
run the pipeline, understand what happens when you modify
process resource files, experiment
with the OPUS Managers (OMG and
PMG),
and test the OPUS capability to distribute processing.
The sample pipeline was developed to test the functionality of the OPUS
system. It is used at the
Space Telescope Science Institute (STScI)
to verify the correctness of new OPUS builds and installations.
-
Why can't I just use a shell script to tie my applications together?
-
Certainly you can. And for a low volume pipeline this might be the
low cost solution. However, as the volume of data increases, as the number
of applications increase, and as the complexity of the processing grows,
the ability to distribute processing over multiple nodes, and to monitor
the status of each process and each observation in the pipeline, becomes
more complex.
This complex distribution and monitoring task is what OPUS is designed to
handle in a robust way.
-
Why are multiple instances of applications important?
-
Not all applications are equal. Some run for a significant amount
of time, others are quite speedy. Some require a large amount of
resources, others are not so demanding. You can speed up total
throughput of your pipeline by having multiple copies of an application
running simultaneously, perhaps on different machines. This way the
pipeline can process several datasets simultaneously.
OPUS allows you to tailor the mix of processes and to add multiple copies
of critical applications to the pipeline.
-
How many steps can there be in the pipeline?
-
The more the merrier!
Having more steps in the pipeline (decoupling processes so they do
only a single task) is essential in constructing an efficient and
flexible pipeline. The OPUS motto has always been: decouple, decouple,
decouple. Only with a modular system can you use your own resources
efficiently to attain the throughput you need.
The default observation status file
(OSF) structure accommodates up to 24 stages. Additional stages are possible
by reconfiguring the OSF size.
-
What are paths?
-
A path is a set of directories used when processing data in the pipeline.
Multiple pipelines with identical steps, but with different paths, can be
run simultaneously, yet without interference. For example, at
STScI it is necessary to operate a real-time pipeline
at the same time that a production pipeline is processing, while a
reprocessing pipeline may also be simultaneously converting science images
in the background, and another pipeline may be processing engineering telemetry.
-
How does OPUS work?
-
The success of OPUS can be attributed in part to adopting a
blackboard
architecture of interprocess communication. Processes do not communicate
with each other directly; they post and update information about their status on a
common blackboard.
The blackboard is implemented simply as an ordinary directory on an
ordinary disk. The status messages are implemented simply as ordinary
files. The files are empty; pertinent information is contained in the
filenames.
This technique effectively decouples the communication processing and
automatically makes the entire system more robust. The standard file
system available under the operating system provides OPUS with a simple,
robust blackboard.
The next release of the OPUS system will offer a distributed object
alternative to the file system-based storage of blackboards that improves
scalability of the system.
-
Wasn't OPUS developed for the HST project?
-
Yes, OPUS was first developed for the Hubble Space Telescope and
constitutes the production pipeline system at the
Space Telescope Science Institute
(STScI). It takes the incoming telemetry stream from
Goddard Space Flight Center,
converts the data to standard
FITS files, and
stages the observations for calibration. When processing for an
observation is complete, all data is then staged to be inserted
into the DADS archive.
Special purpose OPUS applications were developed for the FUSE (Far Ultraviolet Spectrographic
Explorer) mission. While the HST applications were not directly
applicable to the FUSE telemetry (although code reuse was
significant in this project),
the OPUS system, the OPUS pipeline
infrastructure, and the OPUS managers are used by the FUSE group.
Since OPUS was originally distributed on CD-ROM in 1997 a number of
astronomical institutions have been reviewing its capabilities. The
International Gamma-Ray Astrophysics Laboratory
(INTEGRAL) in
Switzerland is one of the missions that has definitely decided to
use OPUS for its pipeline platform.
Other groups have picked up OPUS and are using or planning to use
the OPUS platform for their own projects. This includes the
Chandra X-Ray Observatory
and the Mount Stromlo and Siding Spring Observatories in
Australia (MSSSO) for their
large camera mosaic project.
Recently, the Space Infrared Telescope Facility
(SIRTF) has joined
the OPUS platform group, and will be using OPUS to control and monitor
their production pipeline after launch.
-
Isn't OPUS too elaborate for a small mission?
-
First, OPUS is not a large system. It is small, designed to solve a
specific problem in a robust way: distributed processing with controlled
monitoring. Even if your processing steps comprise simple shell scripts,
OPUS can provide the glue which ties everything together.
Second, OPUS relieves your talented engineering and science staff to do
the more "interesting" work. Your mission is to understand the instrument
and the science, not to build pipelines.