1 | .\" |
---|
2 | .\" Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana |
---|
3 | .\" University Research and Technology |
---|
4 | .\" Corporation. All rights reserved. |
---|
5 | .\" Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. |
---|
6 | .\" |
---|
7 | .\" Man page for ORTE's SnapC Functionality |
---|
8 | .\" |
---|
9 | .\" .TH name section center-footer left-footer center-header |
---|
10 | .TH ORTE_SNAPC 7 "Dec 08, 2009" "1.4" "Open MPI" |
---|
11 | .\" ************************** |
---|
12 | .\" Name Section |
---|
13 | .\" ************************** |
---|
14 | .SH NAME |
---|
15 | . |
---|
16 | Open RTE MCA Snapshot Coordination (SnapC) Framework \- Overview of Open RTE's SnapC |
---|
17 | framework, and selected modules. Open MPI 1.4 |
---|
18 | . |
---|
19 | .\" ************************** |
---|
20 | .\" Description Section |
---|
21 | .\" ************************** |
---|
22 | .SH DESCRIPTION |
---|
23 | . |
---|
24 | .PP |
---|
25 | Open RTE can coordinate the generation of a global snapshot of a parallel job |
---|
26 | from many distributed local snapshots. The components in this framework |
---|
27 | determine how to: Initiate the checkpoint of the parallel application, gather |
---|
28 | together the many distributed local snapshots, and provide the user with a |
---|
29 | global snapshot handle reference that can be used to restart the parallel |
---|
30 | application. |
---|
31 | . |
---|
32 | .\" ************************** |
---|
33 | .\" General Process Requirements Section |
---|
34 | .\" ************************** |
---|
35 | .SH GENERAL PROCESS REQUIREMENTS |
---|
36 | .PP |
---|
37 | In order for a process to use the Open RTE SnapC components it must adhear to a |
---|
38 | few programmatic requirements. |
---|
39 | .PP |
---|
40 | First, the program must call \fIORTE_INIT\fR early in its execution. This |
---|
41 | should only be called once, and it is not possible to checkpoint the process |
---|
42 | without it first having called this function. |
---|
43 | .PP |
---|
44 | The program must call \fIORTE_FINALIZE\fR before termination. |
---|
45 | .PP |
---|
46 | A user may initiate a checkpoint of a parallel application by using the |
---|
47 | orte-checkpoint(1) and orte-restart(1) commands. |
---|
48 | . |
---|
49 | .\" ********************************** |
---|
50 | .\" Available Components Section |
---|
51 | .\" ********************************** |
---|
52 | .SH AVAILABLE COMPONENTS |
---|
53 | .PP |
---|
54 | Open RTE ships with one SnapC component: \fIfull\fR. |
---|
55 | . |
---|
56 | .PP |
---|
57 | The following MCA parameters apply to all components: |
---|
58 | . |
---|
59 | .TP 4 |
---|
60 | snapc_base_verbose |
---|
61 | Set the verbosity level for all components. Default is 0, or silent except on error. |
---|
62 | . |
---|
63 | .TP |
---|
64 | snapc_base_global_snapshot_dir |
---|
65 | The directory to store the checkpoint snapshots. Default is \fB/tmp\fP. |
---|
66 | . |
---|
67 | .\" Self Component |
---|
68 | .\" ****************** |
---|
69 | .SS full SnapC Component |
---|
70 | .PP |
---|
71 | The \fIfull\fR component gathers together the local snapshots to the disk local |
---|
72 | to the Head Node Process (HNP) before completing the checkpoint of the process. This |
---|
73 | component does not currently support replicated HNPs, or timer based gathering |
---|
74 | of local snapshot data. This is a 3-tiered hierarchy of coordinators. |
---|
75 | . |
---|
76 | .PP |
---|
77 | The \fIfull\fR component has the following MCA parameters: |
---|
78 | . |
---|
79 | .TP 4 |
---|
80 | snapc_full_priority |
---|
81 | The component's priority to use when selecting the most appropriate component |
---|
82 | for a run. |
---|
83 | . |
---|
84 | .TP 4 |
---|
85 | snapc_full_verbose |
---|
86 | Set the verbosity level for this component. Default is 0, or silent except on |
---|
87 | error. |
---|
88 | . |
---|
89 | .\" Special 'none' option |
---|
90 | .\" ************************ |
---|
91 | .SS none SnapC Component |
---|
92 | .PP |
---|
93 | The \fInone\fP component simply selects no SnapC component. All of the SnapC |
---|
94 | function calls return immediately with ORTE_SUCCESS. |
---|
95 | . |
---|
96 | .PP |
---|
97 | This component is the last component to be selected by default. This means that if |
---|
98 | another component is available, and the \fInone\fP component was not explicity |
---|
99 | requested then ORTE will attempt to activate all of the available components |
---|
100 | before falling back to this component. |
---|
101 | . |
---|
102 | .\" ************************** |
---|
103 | .\" See Also Section |
---|
104 | .\" ************************** |
---|
105 | . |
---|
106 | .SH SEE ALSO |
---|
107 | orte-checkpoint(1), orte-restart(1), opal-checkpoint(1), opal-restart(1), |
---|
108 | orte_filem(7), opal_crs(7) |
---|
109 | . |
---|