[97] | 1 | .\" |
---|
| 2 | .\" Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana |
---|
| 3 | .\" University Research and Technology |
---|
| 4 | .\" Corporation. All rights reserved. |
---|
| 5 | .\" Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. |
---|
| 6 | .\" |
---|
| 7 | .\" Man page for ORTE's SnapC Functionality |
---|
| 8 | .\" |
---|
| 9 | .\" .TH name section center-footer left-footer center-header |
---|
| 10 | .TH ORTE_SNAPC 7 "Dec 08, 2009" "1.4" "Open MPI" |
---|
| 11 | .\" ************************** |
---|
| 12 | .\" Name Section |
---|
| 13 | .\" ************************** |
---|
| 14 | .SH NAME |
---|
| 15 | . |
---|
| 16 | Open RTE MCA Snapshot Coordination (SnapC) Framework \- Overview of Open RTE's SnapC |
---|
| 17 | framework, and selected modules. Open MPI 1.4 |
---|
| 18 | . |
---|
| 19 | .\" ************************** |
---|
| 20 | .\" Description Section |
---|
| 21 | .\" ************************** |
---|
| 22 | .SH DESCRIPTION |
---|
| 23 | . |
---|
| 24 | .PP |
---|
| 25 | Open RTE can coordinate the generation of a global snapshot of a parallel job |
---|
| 26 | from many distributed local snapshots. The components in this framework |
---|
| 27 | determine how to: Initiate the checkpoint of the parallel application, gather |
---|
| 28 | together the many distributed local snapshots, and provide the user with a |
---|
| 29 | global snapshot handle reference that can be used to restart the parallel |
---|
| 30 | application. |
---|
| 31 | . |
---|
| 32 | .\" ************************** |
---|
| 33 | .\" General Process Requirements Section |
---|
| 34 | .\" ************************** |
---|
| 35 | .SH GENERAL PROCESS REQUIREMENTS |
---|
| 36 | .PP |
---|
| 37 | In order for a process to use the Open RTE SnapC components it must adhear to a |
---|
| 38 | few programmatic requirements. |
---|
| 39 | .PP |
---|
| 40 | First, the program must call \fIORTE_INIT\fR early in its execution. This |
---|
| 41 | should only be called once, and it is not possible to checkpoint the process |
---|
| 42 | without it first having called this function. |
---|
| 43 | .PP |
---|
| 44 | The program must call \fIORTE_FINALIZE\fR before termination. |
---|
| 45 | .PP |
---|
| 46 | A user may initiate a checkpoint of a parallel application by using the |
---|
| 47 | orte-checkpoint(1) and orte-restart(1) commands. |
---|
| 48 | . |
---|
| 49 | .\" ********************************** |
---|
| 50 | .\" Available Components Section |
---|
| 51 | .\" ********************************** |
---|
| 52 | .SH AVAILABLE COMPONENTS |
---|
| 53 | .PP |
---|
| 54 | Open RTE ships with one SnapC component: \fIfull\fR. |
---|
| 55 | . |
---|
| 56 | .PP |
---|
| 57 | The following MCA parameters apply to all components: |
---|
| 58 | . |
---|
| 59 | .TP 4 |
---|
| 60 | snapc_base_verbose |
---|
| 61 | Set the verbosity level for all components. Default is 0, or silent except on error. |
---|
| 62 | . |
---|
| 63 | .TP |
---|
| 64 | snapc_base_global_snapshot_dir |
---|
| 65 | The directory to store the checkpoint snapshots. Default is \fB/tmp\fP. |
---|
| 66 | . |
---|
| 67 | .\" Self Component |
---|
| 68 | .\" ****************** |
---|
| 69 | .SS full SnapC Component |
---|
| 70 | .PP |
---|
| 71 | The \fIfull\fR component gathers together the local snapshots to the disk local |
---|
| 72 | to the Head Node Process (HNP) before completing the checkpoint of the process. This |
---|
| 73 | component does not currently support replicated HNPs, or timer based gathering |
---|
| 74 | of local snapshot data. This is a 3-tiered hierarchy of coordinators. |
---|
| 75 | . |
---|
| 76 | .PP |
---|
| 77 | The \fIfull\fR component has the following MCA parameters: |
---|
| 78 | . |
---|
| 79 | .TP 4 |
---|
| 80 | snapc_full_priority |
---|
| 81 | The component's priority to use when selecting the most appropriate component |
---|
| 82 | for a run. |
---|
| 83 | . |
---|
| 84 | .TP 4 |
---|
| 85 | snapc_full_verbose |
---|
| 86 | Set the verbosity level for this component. Default is 0, or silent except on |
---|
| 87 | error. |
---|
| 88 | . |
---|
| 89 | .\" Special 'none' option |
---|
| 90 | .\" ************************ |
---|
| 91 | .SS none SnapC Component |
---|
| 92 | .PP |
---|
| 93 | The \fInone\fP component simply selects no SnapC component. All of the SnapC |
---|
| 94 | function calls return immediately with ORTE_SUCCESS. |
---|
| 95 | . |
---|
| 96 | .PP |
---|
| 97 | This component is the last component to be selected by default. This means that if |
---|
| 98 | another component is available, and the \fInone\fP component was not explicity |
---|
| 99 | requested then ORTE will attempt to activate all of the available components |
---|
| 100 | before falling back to this component. |
---|
| 101 | . |
---|
| 102 | .\" ************************** |
---|
| 103 | .\" See Also Section |
---|
| 104 | .\" ************************** |
---|
| 105 | . |
---|
| 106 | .SH SEE ALSO |
---|
| 107 | orte-checkpoint(1), orte-restart(1), opal-checkpoint(1), opal-restart(1), |
---|
| 108 | orte_filem(7), opal_crs(7) |
---|
| 109 | . |
---|