The sample pipeline consists of seven separate tasks, and one interactive utility:
%source ~/.cshrcPerforming that step will also source the opus_login.csh because the installation script now adds a line in your ~/.cshrc file to source the opus_login.csh. This change is due to new start up procedures for processes.
Next run the Process Manager (PMG) as a background task:
% pmg >& /dev/null &You will get a Motif application window that looks something like this:
Now start up the Observation Manager (OMG) to monitor datasets as they progress through the sample pipeline. Type,
% omg >& /dev/null &and another Motif application window should appear on your display:
The OMG can monitor a single path at a time, and by default should be monitoring the sample pipeline path, g2f. The first time you bring up the OMG there will be no datasets listed since you haven't started any processes yet.
Next go back to the PMG and start the sample pipeline processes. From the "File" menu, click on "Select Process".
You should bring up one copy of all the processes listed:
Note that "g2f" is the name of one of the pipeline processes, as well as the name we've chosen for our sample path. Don't let that confuse you. This simply means we have the following files in the OPUS_DEFINITIONS_DIR directory:
g2f.path g2f.resourceThe path file describes the directories for all the processes we plan to bring up in the sample pipeline; the resource file describes the attributes of the single process g2f.
See the question under the PMG section about bringing up a pipeline process for more details about selecting processes. Once you have finished your process selection, click on the "Start" button. You should see a number of processes listed in the PMG, all in the IDLE state. They are waiting for work to do.
To start data flowing through the pipeline, the input GIF data files need to be moved to the input directory for the sample pipeline. This input directory is defined in the resource file for the gifin process, under the INPATH entry, which looks something like:
INPATH = gif_data ! Directory where the input files are found ! This entry is in gifin.resourceFind the corresponding entry for "gif_data" in your g2f.path file
gif_data = ~/opus_test/g2f/input/So in this example the gifin process will look for input data files in the directory ~/opus_test/g2f/input (the translation of INPATH for the gifin process).
You might wonder why you have to move the data from the configured area to your local data tree -- why the software doesn't just look for the files in the configured tree. There are two reasons.
First, the data files get renamed during pipeline processing. If more than one user is trying to run the pipeline from the same configured tree at the same time, the renaming of these files would conflict between users. Therefore, it's safer and cleaner to move the files to be processed into a user's local area. (Or, if you're running the pipeline from the CD-ROM, you cannot rename the files in place!)
The other reason for copying the input data files into your own directory is that by doing so you can choose just how much data you want to process at a given time. There are 256 GIF data files. You may not want to drop them in the pipeline all at once! You certainly can do that, but you may want to start out with just a few until you are satisfied that your environment is set up correctly.
We have provided an interactive utility to help you move the data. The utility is called dat2gif (it will also rename the files from "*.dat" to "*.gif" during the move). It requires two input parameters; the script will prompt you for them. Wildcards are accepted:
% dat2gif File(s) to copy: /home/joe/gif/gif95* Path name (e.g., g2f): g2fAs files appear in the gifin input directory, the pipeline processes listed in the PMG should begin to list dataset names -- the status fields should change from the IDLE state to indicate work is being done.
At the same time, datasets should start to appear on the OMG display as well, indicating which processing stages have completed (c), are awaiting processing (w), are currently processing (p), or have encountered an error (e).
When you have reached this stage you are running an OPUS pipeline!
How much disk space do I need to run the sample pipeline?
The sample pipeline uses only flat files for input and output. However, there is nothing to prevent you from writing your own database-dependent applications. (The HST pipeline uses a database extensively, for example.) But in order to keep the sample pipeline simple, we have removed any database dependency.
First of all the public release images (the original GIF files in the sample pipeline) are often composites, not raw science data. They have been processed to make a scientific point, but in so doing, some of the original signal has been modified or lost.
Second, the header information in the FITS files is simulated. Since the original public release images were often composites of separate exposures it was not possible to correlate the image with a specific observation. The keyword values are taken from an exposure which might be similar to one of the composite observations.
First, on the same machine, bring up several copies of the g2f task. Use the Process Manager (PMG) to select the same task on the same machine several times. The pipect task will monitor the pipeline throughput and produce a summary report when it is terminated. That report, described in the first question, as well as the wall clock time required to complete the sample dataset, should demonstrate the benefit of multiple instances.
It is still possible to swamp the resources on a single machine with too many tasks. Try bringing up several copies of the g2f process on different machines to determine what the best mix of tasks and machines best suits your configuration.
This will delete all the data from the processing directories. You will not have to terminate/restart any of the pipeline processes to rerun the data, but do note that you will still be using the same pipect.log file as was used in the initial run of the data (The pipect.log is the output from the pipect process -- the pipeline process that tracks pipeline statistics). If you wish to start with a new pipect.log file, you will have to terminate your pipect process and then manually delete (or rename, if you wish to save) the file from
~/opus_test/[path]/fits/pipect.log(where [path] is the path the data is being processed through [e.g., "g2f"]). You can then start up a new pipect process. All that you need to do now is run the dat2gif command to copy the gif*.dat files into the input directory as you did for the initial run of the data.
You will want to make sure that the pipeline output directories are clean before rerunning the data or anomalous pipeline behavior will result.
The observation which was being processed when the task exited can easily be identified. One way is to examine messages in the process log file just before the crash. It is good practice to have each process print the name of the exposure it is beginning to process for this kind of troubleshooting. Another way to determine what the process was doing when it exited is to use the OMG. The OMG column for the "absent" process should be marked with an "x" on one of the lines on the display.
Often the problem can be traced to "bad data", an unexpected value in the data stream which was not handled correctly by the software. If you have confidence that no other observation is likely to have the same problem, then you can use the PMG to start up another copy of the failed process. Or, if you already have multiple instances of the failed process running in the pipeline, then you probably will not notice any consequence of the failed process besides the failure of a single observation; other instances will just have to do the work of the failed process.
gif9508x01 gif9508x02 gif9508x03 gif9508x04 gif9508x05We intentionally inserted a few error cases to demonstrate how the Observation Status Files (OSF) are set in the case of a processing error.
You can check this by listing the files in your OPUS_HOME_DIR directory:
%ls $OPUS_HOME_DIR/*_*The process status files probably all contain at least one underscore.
Alternatively, to determine where these files are kept, search the path file for the OPUS_OBSERVATIONS_DIR definition:
%set pathname = `osfile_stretch_file OPUS_DEFINITIONS_DIR:g2f.path` %grep OBS $pathname OPUS_OBSERVATIONS_DIR = /home/mydir/obs %ls /home/mydir/obs/The OPUS utility osfile_stretch_file is used to find the first disk file g2f.path under the "stretched" environment variable OPUS_DEFINITIONS_DIR (similar to a Unix path). Since OPUS_DEFINITIONS_DIR can be defined to stretch through one or more local directories and then through the OPUS system directories, the utility is used to search the directories in the stretch for the first occurrence of the file. This allows the user to create local copies of some OPUS system files that override the official copies in the OPUS system directory tree.
To force the use of the sort option of glob, set an environment variable in your login shell or OPUS_DEFINITIONS_DIR:opus_login.csh file for
setenv OPUS_SORT_FILES
This will cause OPUS to perform all file searches using the sorted glob (which collates in LC_COLLATE order, see your operating system documentation for more information). This likely will result in alphabetically ordered searches.
However, you cannot necessarily run multiple copies of every
process. For instance, you will find if you attempt to start
up more than one gifin task, that you will only
get one instance of it. This is because it is restricted in
the OPUS_DEFINITIONS_DIR:pmg_restrictions.dat
file.
Then, in the Process Manager (PMG), when selecting the processes to run in your pipeline, specify the new path you have defined.
The easiest way to do this is to save your pipeline as a file, edit the path names in that file, save the file with a new name, and load that pipeline definition in the PMG.
Note that a "block" is 512 bytes. An easy way to view the resource file is described in the Process Resource Files document.
OUTPATH = fits_data ! Directory where output files are written
And in the g2f.path file (if that's the path you are running in), "fits_data" is defined as:
fits_data = ~/opus_home/g2f/fits/So in this example the g2f process will place output data files in the directory ~/opus_home/g2f/fits (the translation of OUTPATH for the g2f process).
An easy way to view the resource file is described in the Process Resource Files document. An easy way to look at your path file is to bring up the "Select Process..." option in the "File" menu of the Process Manager.
Then double-click on the name of the path you are interested in.
xv is able to display the FITS files as well as the GIF files. The FITS files can also be viewed with a true FITS display package. SAOimage can handle FITS files, as will IDL. xv will also decompress compressed images, so it can be used to view the compressed FITS files as well.
%fitsverify picasso_raw.fits FITS++ Verification Program Version 1.10 Wed Aug 20 11:27:18 1997 =================================================================== FITS Verification for file: picasso_raw.fits =================================================================== Summary contents of FITS file: picasso_raw.fits 0: Primary Array ( BYTE ) 2 dims [370,495] 183150 bytes, 243 header lines, 71 FITS blocks No special records. =================================================================== No problems were encountered.Note that the version number may change over time.
We have included in the distribution a task called listhead which will list the headers of the FITS file. The task runs as part of the sample pipeline and produces ASCII files of header keywords and values in the output directory.
The task can also be run interactively. For example, just type:
%listhead 9707_raw.fitsThis command will produce a file called 9707_raw.lis which contains the keyword lines in ASCII format. The sample pipeline would name the ASCII header file for this example: gif9707.hdr.
The listhead
task can now handle wildcards in the source
argument, as well as directory/logical names in the destination
argument.
listhead \*.fits
*.lis
listhead \*.fits N_DADS_DIR
listhead OCAL:\*.fits O_DADS_DIR
listhead o3s41010q.fits -f MYDISK:o3s41010q.hdr
The "\" is required before the wildcard.
The "-f filename" syntax is REQUIRED to use an output filename other
than inputfilename.lis
.
The normal pipeline processing is linear and sequential
IN------>KW------>FT------->LH------->CZOPUS allows you to have any number of these processes up at a time
FT IN------>KW------>FT------->LH------->CZ FT CZ FTIn addition we have provided another sample task that runs in parallel with the getkw (KW) task. Since the KW and the HD tasks do not require exclusive access to the same resources, they can and do run simultaneously:
HD------>FT IN------>KW------>FT------->LH------->CZThe g2f (FT) process will wait until both the HD and the KW tasks are complete for a dataset before proceeding.
OSF_RANK = 1 ! First Trigger OSF_TRIGGER1.FT = w ! Need a 'Wait' in FT stage OSF_TRIGGER1.KW = c ! Need a 'Complete' flag in KW column OSF_TRIGGER1.HD = c ! Need a 'Complete' flag in HD column OSF_TRIGGER1.DATA_ID = gif ! Also need the Data_id set to GIF
By way of example, assume you wanted the following three pipelines:
production pipeline:
HD------>FT IN------->KW------>FT-------->LH------->CZ FT CZ FTquicklook pipeline:
IN------->KW------>FT-------->LHreprocessing pipeline:
HD------>FT IN------->KW------>FT-------->CZBy slightly modifying the process resource files, and changing their names to make them distinct, you can set up a variety of pipelines which can be run simultaneously.
The TASK line of your new file must point to the name of the new
resource file. So, for example, if you are copying and renaming
the g2f.resource
file to myg2f.resource
,
you need to change the task line from
TASK = < g2f -p $PATH_FILE >to
TASK = < g2f -p $PATH_FILE -r myg2f >Also, the process resource files must not refer to stages which are not present in the path. Thus if the g2f process ordinarily refers to the HD process:
OSF_TRIGGER1.FT = w ! Need a 'Wait' in FT stage OSF_TRIGGER1.KW = c ! Need a 'Complete' flag in KW column OSF_TRIGGER1.HD = c ! Need a 'Complete' flag in HD columnand if one pipeline does not contain a HD task, then OPUS will complain violently about the third line above. Thus to ensure that the task will run in such a pipeline you need to modify your new resource file and remove any reference to the HD task. If the task is mentioned in more than one resource file, be sure to copy, rename, and modify all of them.
The next step is to determine how the data in each of the pipelines is to flow; what directories will be used. In the simple sample pipeline you might just want three different input directories, and three different output directories. In that case you need to create those directories in your environment. For example, if you started with:
Input directory: /home/my_name/opus_test/g2f/input/ Output directory: /home/my_name/opus_test/g2f/fits/You might then just create three unique directories:
Input directories: /home/my_name/opus_test/prod/input/ /home/my_name/opus_test/quick/input/ /home/my_name/opus_test/repro/input/ Output directories: /home/my_name/opus_test/prod/fits/ /home/my_name/opus_test/quick/fits/ /home/my_name/opus_test/repro/fits/The next step is to define three different path files. You can use the g2f.path as a template for the three paths. You might start out with:
STAGE_FILE = OPUS_DEFINITIONS_DIR:g2f_pipeline.stage OPUS_OBSERVATIONS_DIR = /home/my_name/opus_test/g2f/obs/ gif_data = /home/my_name/opus_test/g2f/input/ fits_data = /home/my_name/opus_test/g2f/fits/ hdr_data = /home/my_name/opus_test/g2f/fits/ sample_db = /home/my_name/opus/db/You would want to create three paths with the names of the different directories substituted:
prod.path
STAGE_FILE = OPUS_DEFINITIONS_DIR:prod_pipeline.stage OPUS_OBSERVATIONS_DIR = /home/my_name/opus_test/prod/obs/ gif_data = /home/my_name/opus_test/prod/input/ fits_data = /home/my_name/opus_test/prod/fits/ hdr_data = /home/my_name/opus_test/prod/fits/ sample_db = /home/my_name/opus/db/quick.path
STAGE_FILE = OPUS_DEFINITIONS_DIR:quick_pipeline.stage OPUS_OBSERVATIONS_DIR = /home/my_name/opus_test/quick/obs/ gif_data = /home/my_name/opus_test/quick/input/ fits_data = /home/my_name/opus_test/quick/fits/ hdr_data = /home/my_name/opus_test/quick/fits/ sample_db = /home/my_name/opus/db/repro.path
STAGE_FILE = OPUS_DEFINITIONS_DIR:repro_pipeline.stage OPUS_OBSERVATIONS_DIR = /home/my_name/opus_test/repro/obs/ gif_data = /home/my_name/opus_test/repro/input/ fits_data = /home/my_name/opus_test/repro/fits/ hdr_data = /home/my_name/opus_test/repro/fits/ sample_db = /home/my_name/opus/db/Finally, since each path has a different number of steps, you should create its own path-specific pipeline.stage file. The prod_pipeline.stage file would be identical to the g2f_pipeline.stage file, and only has to be copied:
%cd /home/my_name/opus_test/definitions %set fname = `osfile_stretch_file OPUS_DEFINITIONS_DIR:g2f_pipeline.stage` %cp $fname prod_pipeline.stageHowever, the other two paths have fewer steps and require modification of the g2f_pipeline.stage file. For example, the quick_pipeline.stage file would appear as:
NSTAGE = 4 STAGE01.TITLE = IN STAGE01.DESCRIPTION = "GIF INIT" STAGE02.TITLE = KW STAGE02.DESCRIPTION = "Database select" STAGE03.TITLE = FT STAGE03.DESCRIPTION = "GIF to FITS" STAGE04.TITLE = LH STAGE04.DESCRIPTION = "List FITS Header"Then you need to change which path is being monitored in an OMG. To clean, copy, or delete the data you create in your new path will require corresponding configuration files.