source: proiecte/hpl/openmpi_compiled/share/man/man7/opal_crs.7 @ 97

Last change on this file since 97 was 97, checked in by (none), 14 years ago

Adding compiled files

File size: 6.0 KB
Line 
1.\"
2.\" Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
3.\"                         University Research and Technology
4.\"                         Corporation.  All rights reserved.
5.\" Copyright (c) 2009      Sun Microsystems, Inc.  All rights reserved.
6.\"
7.\" Man page for OPAL's CRS Functionality
8.\"
9.\" .TH name     section center-footer   left-footer  center-header
10.TH OPAL_CRS 7 "Dec 08, 2009" "1.4" "Open MPI"
11
12.\" **************************
13.\"    Name Section
14.\" **************************
15.SH NAME
16.
17Open PAL MCA Checkpoint/Restart Service (CRS) \- Overview of Open PAL's CRS
18framework, and selected modules.  Open MPI 1.4.
19.
20.\" **************************
21.\"    Description Section
22.\" **************************
23.SH DESCRIPTION
24.
25.PP
26Open PAL can involuntarily checkpoint and restart sequential programs.
27Doing so requires that Open PAL was compiled with thread support and
28that the back-end checkpointing systems are available at run-time.
29.
30.SS Phases of Checkpoint / Restart
31.PP
32Open PAL defines three phases for checkpoint / restart support in a
33procress:
34.
35.TP 4
36Checkpoint
37When the checkpoint request arrives, the procress is notified of the
38request before the checkpoint is taken.
39.
40.TP 4
41Continue
42After a checkpoint has successfully completed, the same process as the
43checkpoint is notified of its successful continuation of execution.
44.
45.TP 4
46Restart
47After a checkpoint has successfully completed, a new / restarted
48process is notified of its successful restart.
49.
50.PP
51The Continue and Restart phases are identical except for the process
52in which they are invoked. The Continue phase is invoked in the same process
53as the Checkpoint phase was invoked. The Restart phase is only invoked in newly
54restarted processes.
55.
56.\" **************************
57.\"    General Process Requirements Section
58.\" **************************
59.SH GENERAL PROCESS REQUIREMENTS
60.PP
61In order for a process to use the Open PAL CRS components it must adhear to a
62few programmatic requirements.
63.PP
64First, the program must call \fIOPAL_INIT\fR early in its execution. This
65should only be called once, and it is not possible to checkpoint the process
66without it first having called this function.
67.PP
68The program must call \fIOPAL_FINALIZE\fR before termination. This does a
69significant amount of cleanup. If it is not called, then it is very likely that
70remnants are left in the filesystem.
71.PP
72To checkpoint and restart a process you must use the Open PAL tools to do
73so. Using the backend checkpointer's checkpoint and restart tools will lead
74to undefined behavior.
75To checkpoint a process use \fIopal_checkpoint\fR (opal_checkpoint(1)).
76To restart a process use \fIopal_restart\fR (opal_restart(1)).
77.
78.\" **********************************
79.\"    Available Components Section
80.\" **********************************
81.SH AVAILABLE COMPONENTS
82.PP
83Open PAL ships with two CRS components: \fIself\fR and \fIblcr\fR.
84.
85.PP
86The following MCA parameters apply to all components:
87.
88.TP 4
89crs_base_verbose
90Set the verbosity level for all components. Default is 0, or silent except on error.
91.
92.TP
93crs_base_snapshot_dir
94The directory to store the checkpoint snapshots. Default is \fB/tmp\fP.
95.
96.\"   Self Component
97.\" ******************
98.SS self CRS Component
99.PP
100The \fIself\fR component invokes user-defined functions to save and restore
101checkpoints. It is simply a mechanism for user-defined functions to be invoked
102at Open PAL's Checkpoint, Continue, and Restart phases. Hence, the only data
103that is saved during the checkpoint is what is written in the user's checkpoint
104function. No libary state is saved at all.
105.
106.PP
107As such, the model for the \fIself\fR component is slightly differnt than for
108other components. Specifically, the Restart function is not invoked in the same
109process image of the process that was checkpointed. The Restart phase is
110invoked during \fBOPAL_INIT\fR of the new instance of the applicaiton (i.e., it
111starts over from main()).
112.
113.PP
114The \fIself\fR component has the following MCA parameters:
115.TP 4
116crs_self_prefix
117Speficy a string prefix for the name of the checkpoint, continue, and restart
118functions that Open PAL will invoke during the respective stages. That is,
119by specifying "-mca crs_self_prefix foo" means that Open PAL expects to find
120three functions at run-time:
121
122   int foo_checkpoint()
123
124   int foo_continue()
125
126   int foo_restart()
127
128By default, the prefix is set to "opal_crs_self_user".
129.
130.TP 4
131crs_self_priority
132Set the \fIself\fR components default priority
133.
134.TP 4
135crs_self_verbose
136Set the verbosity level. Default is 0, or silent except on error.
137.
138.TP 4
139crs_self_do_restart
140This is mostly internally used. A general user should never need to set this
141value. This is set to non-0 when a the new process should invoke the restart
142callback in \fIOPAL_INIT\fR. Default is 0, or normal execution.
143.
144.\"   BLCR Component
145.\" ******************
146.SS blcr CRS Component
147.PP
148The Berkeley Lab Checkpoint/Restart (BLCR) single-process checkpoint is a
149software system developed at Lawrence Berkeley National Laboratory. See the
150project website for more details:
151
152   \fI http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml \fR
153.
154.PP
155The \fIblcr\fR component has the following MCA parameters:
156.TP 4
157crs_blcr_priority
158Set the \fIblcr\fR components default priority.
159.
160.TP 4
161crs_blcr_verbose
162Set the verbosity level. Default is 0, or silent except on error.
163.
164.\"   Special 'none' option
165.\" ************************
166.SS none CRS Component
167.PP
168The \fInone\fP component simply selects no CRS component. All of the CRS
169function calls return immediately with OPAL_SUCCESS.
170.
171.PP
172This component is the last component to be selected by default. This means that if
173another component is available, and the \fInone\fP component was not explicity
174requested then OPAL will attempt to activate all of the available components
175before falling back to this component.
176.
177.\" **************************
178.\"    See Also Section
179.\" **************************
180.
181.SH SEE ALSO
182  opal_checkpoint(1), opal_restart(1)
183.\", orte_crs(7), ompi_crs(7)
Note: See TracBrowser for help on using the repository browser.