[97] | 1 | .\" Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. |
---|
| 2 | .\" Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. |
---|
| 3 | .\" |
---|
| 4 | .\" Man page for ORTE's orterun command |
---|
| 5 | .\" |
---|
| 6 | .\" .TH name section center-footer left-footer center-header |
---|
| 7 | .TH MPIRUN 1 "Dec 08, 2009" "1.4" "Open MPI" |
---|
| 8 | .\" ************************** |
---|
| 9 | .\" Name Section |
---|
| 10 | .\" ************************** |
---|
| 11 | .SH NAME |
---|
| 12 | . |
---|
| 13 | orterun, mpirun, mpiexec \- Execute serial and parallel jobs in Open MPI. |
---|
| 14 | |
---|
| 15 | .B Note: |
---|
| 16 | \fImpirun\fP, \fImpiexec\fP, and \fIorterun\fP are all synonyms for each |
---|
| 17 | other. Using any of the names will produce the same behavior. |
---|
| 18 | . |
---|
| 19 | .\" ************************** |
---|
| 20 | .\" Synopsis Section |
---|
| 21 | .\" ************************** |
---|
| 22 | .SH SYNOPSIS |
---|
| 23 | . |
---|
| 24 | .PP |
---|
| 25 | Single Process Multiple Data (SPMD) Model: |
---|
| 26 | |
---|
| 27 | .B mpirun |
---|
| 28 | [ options ] |
---|
| 29 | .B <program> |
---|
| 30 | [ <args> ] |
---|
| 31 | .P |
---|
| 32 | |
---|
| 33 | Multiple Instruction Multiple Data (MIMD) Model: |
---|
| 34 | |
---|
| 35 | .B mpirun |
---|
| 36 | [ global_options ] |
---|
| 37 | [ local_options1 ] |
---|
| 38 | .B <program1> |
---|
| 39 | [ <args1> ] : |
---|
| 40 | [ local_options2 ] |
---|
| 41 | .B <program2> |
---|
| 42 | [ <args2> ] : |
---|
| 43 | ... : |
---|
| 44 | [ local_optionsN ] |
---|
| 45 | .B <programN> |
---|
| 46 | [ <argsN> ] |
---|
| 47 | .P |
---|
| 48 | |
---|
| 49 | Note that in both models, invoking \fImpirun\fP via an absolute path |
---|
| 50 | name is equivalent to specifying the \fI--prefix\fP option with a |
---|
| 51 | \fI<dir>\fR value equivalent to the directory where \fImpirun\fR |
---|
| 52 | resides, minus its last subdirectory. For example: |
---|
| 53 | |
---|
| 54 | \fB%\fP /usr/local/bin/mpirun ... |
---|
| 55 | |
---|
| 56 | is equivalent to |
---|
| 57 | |
---|
| 58 | \fB%\fP mpirun --prefix /usr/local |
---|
| 59 | |
---|
| 60 | . |
---|
| 61 | .\" ************************** |
---|
| 62 | .\" Quick Summary Section |
---|
| 63 | .\" ************************** |
---|
| 64 | .SH QUICK SUMMARY |
---|
| 65 | . |
---|
| 66 | If you are simply looking for how to run an MPI application, you |
---|
| 67 | probably want to use a command line of the following form: |
---|
| 68 | |
---|
| 69 | \fB%\fP mpirun [ -np X ] [ --hostfile <filename> ] <program> |
---|
| 70 | |
---|
| 71 | This will run X copies of \fI<program>\fR in your current run-time |
---|
| 72 | environment (if running under a supported resource manager, Open MPI's |
---|
| 73 | \fImpirun\fR will usually automatically use the corresponding resource manager |
---|
| 74 | process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR, |
---|
| 75 | which require the use of a hostfile, or will default to running all X |
---|
| 76 | copies on the localhost), scheduling (by default) in a round-robin fashion by |
---|
| 77 | CPU slot. See the rest of this page for more details. |
---|
| 78 | . |
---|
| 79 | .\" ************************** |
---|
| 80 | .\" Options Section |
---|
| 81 | .\" ************************** |
---|
| 82 | .SH OPTIONS |
---|
| 83 | . |
---|
| 84 | .I mpirun |
---|
| 85 | will send the name of the directory where it was invoked on the local |
---|
| 86 | node to each of the remote nodes, and attempt to change to that |
---|
| 87 | directory. See the "Current Working Directory" section below for further |
---|
| 88 | details. |
---|
| 89 | .\" |
---|
| 90 | .\" Start options listing |
---|
| 91 | .\" Indent 10 characters from start of first column to start of second column |
---|
| 92 | .TP 10 |
---|
| 93 | .B <program> |
---|
| 94 | The program executable. This is identified as the first non-recognized argument |
---|
| 95 | to mpirun. |
---|
| 96 | . |
---|
| 97 | . |
---|
| 98 | .TP |
---|
| 99 | .B <args> |
---|
| 100 | Pass these run-time arguments to every new process. These must always |
---|
| 101 | be the last arguments to \fImpirun\fP. If an app context file is used, |
---|
| 102 | \fI<args>\fP will be ignored. |
---|
| 103 | . |
---|
| 104 | . |
---|
| 105 | .TP |
---|
| 106 | .B -h\fR,\fP --help |
---|
| 107 | Display help for this command |
---|
| 108 | . |
---|
| 109 | . |
---|
| 110 | .TP |
---|
| 111 | .B -q\fR,\fP --quiet |
---|
| 112 | Suppress informative messages from orterun during application execution. |
---|
| 113 | . |
---|
| 114 | . |
---|
| 115 | .TP |
---|
| 116 | .B -v\fR,\fP --verbose |
---|
| 117 | Be verbose |
---|
| 118 | . |
---|
| 119 | . |
---|
| 120 | .TP |
---|
| 121 | .B -V\fR,\fP --version |
---|
| 122 | Print version number. If no other arguments are given, this will also |
---|
| 123 | cause orterun to exit. |
---|
| 124 | . |
---|
| 125 | . |
---|
| 126 | . |
---|
| 127 | . |
---|
| 128 | .P |
---|
| 129 | To specify which hosts (nodes) of the cluster to run on: |
---|
| 130 | . |
---|
| 131 | . |
---|
| 132 | .TP |
---|
| 133 | .B -H\fR,\fP -host\fR,\fP --host \fR<host1,host2,...,hostN>\fP |
---|
| 134 | List of hosts on which to invoke processes. |
---|
| 135 | . |
---|
| 136 | . |
---|
| 137 | .TP |
---|
| 138 | .B |
---|
| 139 | -hostfile\fR,\fP --hostfile \fR<hostfile>\fP |
---|
| 140 | Provide a hostfile to use. |
---|
| 141 | .\" JJH - Should have man page for how to format a hostfile properly. |
---|
| 142 | . |
---|
| 143 | . |
---|
| 144 | .TP |
---|
| 145 | .B -machinefile\fR,\fP --machinefile \fR<machinefile>\fP |
---|
| 146 | Synonym for \fI-hostfile\fP. |
---|
| 147 | . |
---|
| 148 | . |
---|
| 149 | . |
---|
| 150 | . |
---|
| 151 | .P |
---|
| 152 | To specify the number of processes to launch: |
---|
| 153 | . |
---|
| 154 | . |
---|
| 155 | .TP |
---|
| 156 | .B -c\fR,\fP -n\fR,\fP --n\fR,\fP -np \fR<#>\fP |
---|
| 157 | Run this many copies of the program on the given nodes. This option |
---|
| 158 | indicates that the specified file is an executable program and not an |
---|
| 159 | application context. If no value is provided for the number of copies to |
---|
| 160 | execute (i.e., neither the "-np" nor its synonyms are provided on the command |
---|
| 161 | line), Open MPI will automatically execute a copy of the program on |
---|
| 162 | each process slot (see below for description of a "process slot"). This |
---|
| 163 | feature, however, can only be used in the SPMD model and will return an |
---|
| 164 | error (without beginning execution of the application) otherwise. |
---|
| 165 | . |
---|
| 166 | . |
---|
| 167 | .TP |
---|
| 168 | .B -npersocket\fR,\fP --npersocket <#persocket> |
---|
| 169 | On each node, launch this many processes times the number of processor |
---|
| 170 | sockets on the node. |
---|
| 171 | The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option. |
---|
| 172 | . |
---|
| 173 | . |
---|
| 174 | .TP |
---|
| 175 | .B -npernode\fR,\fP --npernode <#pernode> |
---|
| 176 | On each node, launch this many processes. |
---|
| 177 | . |
---|
| 178 | . |
---|
| 179 | .TP |
---|
| 180 | .B -pernode\fR,\fP --pernode |
---|
| 181 | On each node, launch one process -- equivalent to \fI-npernode\fP 1. |
---|
| 182 | . |
---|
| 183 | . |
---|
| 184 | . |
---|
| 185 | . |
---|
| 186 | .P |
---|
| 187 | To map processes to nodes: |
---|
| 188 | . |
---|
| 189 | . |
---|
| 190 | .TP |
---|
| 191 | .B -loadbalance\fR,\fP --loadbalance |
---|
| 192 | Uniform distribution of ranks across all nodes. See more detailed description below. |
---|
| 193 | . |
---|
| 194 | .TP |
---|
| 195 | .B -nolocal\fR,\fP --nolocal |
---|
| 196 | Do not run any copies of the launched application on the same node as |
---|
| 197 | orterun is running. This option will override listing the localhost |
---|
| 198 | with \fB--host\fR or any other host-specifying mechanism. |
---|
| 199 | . |
---|
| 200 | .TP |
---|
| 201 | .B -nooversubscribe\fR,\fP --nooversubscribe |
---|
| 202 | Do not oversubscribe any nodes; error (without starting any processes) |
---|
| 203 | if the requested number of processes would cause oversubscription. |
---|
| 204 | This option implicitly sets "max_slots" equal to the "slots" value for |
---|
| 205 | each node. |
---|
| 206 | . |
---|
| 207 | .TP |
---|
| 208 | .B -bynode\fR,\fP --bynode |
---|
| 209 | Launch processes one per node, cycling by node in a round-robin |
---|
| 210 | fashion. This spreads processes evenly among nodes and assigns |
---|
| 211 | ranks in a round-robin, "by node" manner. |
---|
| 212 | . |
---|
| 213 | . |
---|
| 214 | . |
---|
| 215 | . |
---|
| 216 | .P |
---|
| 217 | For process binding: |
---|
| 218 | . |
---|
| 219 | .TP |
---|
| 220 | .B -bycore\fR,\fP --bycore |
---|
| 221 | Associate processes with successive cores |
---|
| 222 | if used with one of the \fI-bind-to-*\fP options. |
---|
| 223 | . |
---|
| 224 | .TP |
---|
| 225 | .B -bysocket\fR,\fP --bysocket |
---|
| 226 | Associate processes with successive processor sockets |
---|
| 227 | if used with one of the \fI-bind-to-*\fP options. |
---|
| 228 | . |
---|
| 229 | .TP |
---|
| 230 | .B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc> |
---|
| 231 | Use the number of cores per process |
---|
| 232 | if used with one of the \fI-bind-to-*\fP options. |
---|
| 233 | . |
---|
| 234 | .TP |
---|
| 235 | .B -cpus-per-rank\fR,\fP --cpus-per-rank <#perrank> |
---|
| 236 | Alias for \fI-cpus-per-proc\fP. |
---|
| 237 | . |
---|
| 238 | .TP |
---|
| 239 | .B -bind-to-core\fR,\fP --bind-to-core |
---|
| 240 | Bind processes to cores. |
---|
| 241 | . |
---|
| 242 | .TP |
---|
| 243 | .B -bind-to-socket\fR,\fP --bind-to-socket |
---|
| 244 | Bind processes to processor sockets. |
---|
| 245 | . |
---|
| 246 | .TP |
---|
| 247 | .B -bind-to-none\fR,\fP --bind-to-none |
---|
| 248 | Do not bind processes. (Default.) |
---|
| 249 | . |
---|
| 250 | .TP |
---|
| 251 | .B -report-bindings\fR,\fP --report-bindings |
---|
| 252 | Report any bindings for launched processes. |
---|
| 253 | . |
---|
| 254 | .TP |
---|
| 255 | .B -slot-list\fR,\fP --slot-list <slots> |
---|
| 256 | List of processor IDs to be used for binding MPI processes. The specified bindings will |
---|
| 257 | be applied to all MPI processes. See explanation below for syntax. |
---|
| 258 | . |
---|
| 259 | . |
---|
| 260 | . |
---|
| 261 | . |
---|
| 262 | .P |
---|
| 263 | For rankfiles: |
---|
| 264 | . |
---|
| 265 | . |
---|
| 266 | .TP |
---|
| 267 | .B -rf\fR,\fP --rankfile <rankfile> |
---|
| 268 | Provide a rankfile file. |
---|
| 269 | . |
---|
| 270 | . |
---|
| 271 | . |
---|
| 272 | . |
---|
| 273 | .P |
---|
| 274 | To manage standard I/O: |
---|
| 275 | . |
---|
| 276 | . |
---|
| 277 | .TP |
---|
| 278 | .B -output-filename\fR,\fP --output-filename \fR<filename>\fP |
---|
| 279 | Redirect the stdout, stderr, and stddiag of all ranks to a rank-unique version of |
---|
| 280 | the specified filename. Any directories in the filename will automatically be created. |
---|
| 281 | Each output file will consist of filename.rank, where the rank will be left-filled with |
---|
| 282 | zero's for correct ordering in listings. |
---|
| 283 | . |
---|
| 284 | . |
---|
| 285 | .TP |
---|
| 286 | .B -stdin\fR,\fP --stdin <rank> |
---|
| 287 | The MPI rank that is to receive stdin. The default is to forward stdin to rank=0, but this |
---|
| 288 | option can be used to forward stdin to any rank. It is also acceptable to specify \fInone\fP, |
---|
| 289 | indicating that no ranks are to receive stdin. |
---|
| 290 | . |
---|
| 291 | . |
---|
| 292 | .TP |
---|
| 293 | .B -tag-output\fR,\fP --tag-output |
---|
| 294 | Tag each line of output to stdout, stderr, and stddiag with \fB[jobid, rank]<stdxxx>\fP indicating the process jobid |
---|
| 295 | and rank that generated the output, and the channel which generated it. |
---|
| 296 | . |
---|
| 297 | . |
---|
| 298 | .TP |
---|
| 299 | .B -timestamp-output\fR,\fP --timestamp-output |
---|
| 300 | Timestamp each line of output to stdout, stderr, and stddiag. |
---|
| 301 | . |
---|
| 302 | . |
---|
| 303 | .TP |
---|
| 304 | .B -xml\fR,\fP --xml |
---|
| 305 | Provide all output to stdout, stderr, and stddiag in an xml format. |
---|
| 306 | . |
---|
| 307 | . |
---|
| 308 | .TP |
---|
| 309 | .B -xterm\fR,\fP --xterm \fR<ranks>\fP |
---|
| 310 | Display the specified ranks in separate xterm windows. The ranks are specified |
---|
| 311 | as a comma-separated list of ranges, with a -1 indicating all. A separate |
---|
| 312 | window will be created for each specified rank. |
---|
| 313 | .B Note: |
---|
| 314 | In some environments, xterm may require that the executable be in the user's |
---|
| 315 | path, or be specified in absolute or relative terms. Thus, it may be necessary |
---|
| 316 | to specify a local executable as "./foo" instead of just "foo". If xterm fails to |
---|
| 317 | find the executable, mpirun will hang, but still respond correctly to a ctrl-c. |
---|
| 318 | If this happens, please check that the executable is being specified correctly |
---|
| 319 | and try again. |
---|
| 320 | . |
---|
| 321 | . |
---|
| 322 | . |
---|
| 323 | . |
---|
| 324 | .P |
---|
| 325 | To manage files and runtime environment: |
---|
| 326 | . |
---|
| 327 | . |
---|
| 328 | .TP |
---|
| 329 | .B -path\fR,\fP --path \fR<path>\fP |
---|
| 330 | <path> that will be used when attempting to locate the requested |
---|
| 331 | executables. This is used prior to using the local PATH setting. |
---|
| 332 | . |
---|
| 333 | . |
---|
| 334 | .TP |
---|
| 335 | .B --prefix \fR<dir>\fP |
---|
| 336 | Prefix directory that will be used to set the \fIPATH\fR and |
---|
| 337 | \fILD_LIBRARY_PATH\fR on the remote node before invoking Open MPI or |
---|
| 338 | the target process. See the "Remote Execution" section, below. |
---|
| 339 | . |
---|
| 340 | . |
---|
| 341 | .TP |
---|
| 342 | .B --preload-binary |
---|
| 343 | Copy the specified executable(s) to remote machines prior to starting remote processes. The |
---|
| 344 | executables will be copied to the Open MPI session directory and will be deleted upon |
---|
| 345 | completion of the job. |
---|
| 346 | . |
---|
| 347 | . |
---|
| 348 | .TP |
---|
| 349 | .B --preload-files <files> |
---|
| 350 | Preload the comma separated list of files to the current working directory of the remote |
---|
| 351 | machines where processes will be launched prior to starting those processes. |
---|
| 352 | . |
---|
| 353 | . |
---|
| 354 | .TP |
---|
| 355 | .B --preload-files-dest-dir <path> |
---|
| 356 | The destination directory to be used for preload-files, if other than the current working |
---|
| 357 | directory. By default, the absolute and relative paths provided by --preload-files are used. |
---|
| 358 | . |
---|
| 359 | . |
---|
| 360 | .TP |
---|
| 361 | .B --tmpdir \fR<dir>\fP |
---|
| 362 | Set the root for the session directory tree for mpirun only. |
---|
| 363 | . |
---|
| 364 | . |
---|
| 365 | .TP |
---|
| 366 | .B -wd \fR<dir>\fP |
---|
| 367 | Synonym for \fI-wdir\fP. |
---|
| 368 | . |
---|
| 369 | . |
---|
| 370 | .TP |
---|
| 371 | .B -wdir \fR<dir>\fP |
---|
| 372 | Change to the directory <dir> before the user's program executes. |
---|
| 373 | See the "Current Working Directory" section for notes on relative paths. |
---|
| 374 | .B Note: |
---|
| 375 | If the \fI-wdir\fP option appears both on the command line and in an |
---|
| 376 | application context, the context will take precedence over the command |
---|
| 377 | line. |
---|
| 378 | . |
---|
| 379 | . |
---|
| 380 | .TP |
---|
| 381 | .B -x \fR<env>\fP |
---|
| 382 | Export the specified environment variables to the remote nodes before |
---|
| 383 | executing the program. Only one environment variable can be specified |
---|
| 384 | per \fI-x\fP option. Existing environment variables can be specified |
---|
| 385 | or new variable names specified with corresponding values. For |
---|
| 386 | example: |
---|
| 387 | \fB%\fP mpirun -x DISPLAY -x OFILE=/tmp/out ... |
---|
| 388 | |
---|
| 389 | The parser for the \fI-x\fP option is not very sophisticated; it does |
---|
| 390 | not even understand quoted values. Users are advised to set variables |
---|
| 391 | in the environment, and then use \fI-x\fP to export (not define) them. |
---|
| 392 | . |
---|
| 393 | . |
---|
| 394 | . |
---|
| 395 | . |
---|
| 396 | .P |
---|
| 397 | Setting MCA parameters: |
---|
| 398 | . |
---|
| 399 | . |
---|
| 400 | .TP |
---|
| 401 | .B -gmca\fR,\fP --gmca \fR<key> <value>\fP |
---|
| 402 | Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is |
---|
| 403 | the parameter name; \fI<value>\fP is the parameter value. |
---|
| 404 | . |
---|
| 405 | . |
---|
| 406 | .TP |
---|
| 407 | .B -mca\fR,\fP --mca <key> <value> |
---|
| 408 | Send arguments to various MCA modules. See the "MCA" section, below. |
---|
| 409 | . |
---|
| 410 | . |
---|
| 411 | . |
---|
| 412 | . |
---|
| 413 | .P |
---|
| 414 | For debugging: |
---|
| 415 | . |
---|
| 416 | . |
---|
| 417 | .TP |
---|
| 418 | .B -debug\fR,\fP --debug |
---|
| 419 | Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP |
---|
| 420 | MCA parameter. |
---|
| 421 | . |
---|
| 422 | . |
---|
| 423 | .TP |
---|
| 424 | .B -debugger\fR,\fP --debugger |
---|
| 425 | Sequence of debuggers to search for when \fI--debug\fP is used (i.e. |
---|
| 426 | a synonym for \fIorte_base_user_debugger\fP MCA parameter). |
---|
| 427 | . |
---|
| 428 | . |
---|
| 429 | .TP |
---|
| 430 | .B -tv\fR,\fP --tv |
---|
| 431 | Launch processes under the TotalView debugger. |
---|
| 432 | Deprecated backwards compatibility flag. Synonym for \fI--debug\fP. |
---|
| 433 | . |
---|
| 434 | . |
---|
| 435 | . |
---|
| 436 | . |
---|
| 437 | .P |
---|
| 438 | There are also other options: |
---|
| 439 | . |
---|
| 440 | . |
---|
| 441 | .TP |
---|
| 442 | .B -aborted\fR,\fP --aborted \fR<#>\fP |
---|
| 443 | Set the maximum number of aborted processes to display. |
---|
| 444 | . |
---|
| 445 | . |
---|
| 446 | .TP |
---|
| 447 | .B --app \fR<appfile>\fP |
---|
| 448 | Provide an appfile, ignoring all other command line options. |
---|
| 449 | . |
---|
| 450 | . |
---|
| 451 | .TP |
---|
| 452 | .B -cf\fR,\fP --cartofile \fR<cartofile>\fP |
---|
| 453 | Provide a cartography file. |
---|
| 454 | . |
---|
| 455 | . |
---|
| 456 | .TP |
---|
| 457 | .B --hetero |
---|
| 458 | Indicates that multiple app_contexts are being provided that are a mix of 32/64-bit binaries. |
---|
| 459 | . |
---|
| 460 | . |
---|
| 461 | .TP |
---|
| 462 | .B -leave-session-attached\fR,\fP --leave-session-attached |
---|
| 463 | Do not detach OmpiRTE daemons used by this application. This allows error messages from the daemons |
---|
| 464 | as well as the underlying environment (e.g., when failing to launch a daemon) to be output. |
---|
| 465 | . |
---|
| 466 | . |
---|
| 467 | .TP |
---|
| 468 | .B -ompi-server\fR,\fP --ompi-server <uri or file> |
---|
| 469 | Specify the URI of the Open MPI server, or the name of the file (specified as file:filename) that |
---|
| 470 | contains that info. The Open MPI server is used to support multi-application data exchange via |
---|
| 471 | the MPI-2 MPI_Publish_name and MPI_Lookup_name functions. |
---|
| 472 | . |
---|
| 473 | . |
---|
| 474 | .TP |
---|
| 475 | .B -wait-for-server\fR,\fP --wait-for-server |
---|
| 476 | Pause mpirun before launching the job until ompi-server is detected. This |
---|
| 477 | is useful in scripts where ompi-server may be started in the background, followed immediately by |
---|
| 478 | an \fImpirun\fP command that wishes to connect to it. Mpirun will pause until either the specified |
---|
| 479 | ompi-server is contacted or the server-wait-time is exceeded. |
---|
| 480 | . |
---|
| 481 | . |
---|
| 482 | .TP |
---|
| 483 | .B -server-wait-time\fR,\fP --server-wait-time <secs> |
---|
| 484 | The max amount of time (in seconds) mpirun should wait for the ompi-server to start. The default |
---|
| 485 | is 10 seconds. |
---|
| 486 | . |
---|
| 487 | . |
---|
| 488 | . |
---|
| 489 | . |
---|
| 490 | .P |
---|
| 491 | The following options are useful for developers; they are not generally |
---|
| 492 | useful to most ORTE and/or MPI users: |
---|
| 493 | . |
---|
| 494 | .TP |
---|
| 495 | .B -d\fR,\fP --debug-devel |
---|
| 496 | Enable debugging of the OmpiRTE (the run-time layer in Open MPI). |
---|
| 497 | This is not generally useful for most users. |
---|
| 498 | . |
---|
| 499 | . |
---|
| 500 | .TP |
---|
| 501 | .B --debug-daemons |
---|
| 502 | Enable debugging of any OmpiRTE daemons used by this application. |
---|
| 503 | . |
---|
| 504 | . |
---|
| 505 | .TP |
---|
| 506 | .B --debug-daemons-file |
---|
| 507 | Enable debugging of any OmpiRTE daemons used by this application, storing |
---|
| 508 | output in files. |
---|
| 509 | . |
---|
| 510 | . |
---|
| 511 | .TP |
---|
| 512 | .B -launch-agent\fR,\fP --launch-agent |
---|
| 513 | Name of the executable that is to be used to start processes on the remote nodes. The default |
---|
| 514 | is "orted". This option can be used to test new daemon concepts, or to pass options back to the |
---|
| 515 | daemons without having mpirun itself see them. For example, specifying a launch agent of |
---|
| 516 | \fRorted -mca odls_base_verbose 5\fR allows the developer to ask the orted for debugging output |
---|
| 517 | without clutter from mpirun itself. |
---|
| 518 | . |
---|
| 519 | . |
---|
| 520 | .TP |
---|
| 521 | .B --noprefix |
---|
| 522 | Disable the automatic --prefix behavior |
---|
| 523 | . |
---|
| 524 | . |
---|
| 525 | .P |
---|
| 526 | There may be other options listed with \fImpirun --help\fP. |
---|
| 527 | . |
---|
| 528 | . |
---|
| 529 | .\" ************************** |
---|
| 530 | .\" Description Section |
---|
| 531 | .\" ************************** |
---|
| 532 | .SH DESCRIPTION |
---|
| 533 | . |
---|
| 534 | One invocation of \fImpirun\fP starts an MPI application running under Open |
---|
| 535 | MPI. If the application is single process multiple data (SPMD), the application |
---|
| 536 | can be specified on the \fImpirun\fP command line. |
---|
| 537 | |
---|
| 538 | If the application is multiple instruction multiple data (MIMD), comprising of |
---|
| 539 | multiple programs, the set of programs and argument can be specified in one of |
---|
| 540 | two ways: Extended Command Line Arguments, and Application Context. |
---|
| 541 | .PP |
---|
| 542 | An application context describes the MIMD program set including all arguments |
---|
| 543 | in a separate file. |
---|
| 544 | .\"See appcontext(5) for a description of the application context syntax. |
---|
| 545 | This file essentially contains multiple \fImpirun\fP command lines, less the |
---|
| 546 | command name itself. The ability to specify different options for different |
---|
| 547 | instantiations of a program is another reason to use an application context. |
---|
| 548 | .PP |
---|
| 549 | Extended command line arguments allow for the description of the application |
---|
| 550 | layout on the command line using colons (\fI:\fP) to separate the specification |
---|
| 551 | of programs and arguments. Some options are globally set across all specified |
---|
| 552 | programs (e.g. --hostfile), while others are specific to a single program |
---|
| 553 | (e.g. -np). |
---|
| 554 | . |
---|
| 555 | . |
---|
| 556 | . |
---|
| 557 | .SS Specifying Host Nodes |
---|
| 558 | . |
---|
| 559 | Host nodes can be identified on the \fImpirun\fP command line with the \fI-host\fP |
---|
| 560 | option or in a hostfile. |
---|
| 561 | . |
---|
| 562 | .PP |
---|
| 563 | For example, |
---|
| 564 | . |
---|
| 565 | .TP 4 |
---|
| 566 | mpirun -H aa,aa,bb ./a.out |
---|
| 567 | launches two processes on node aa and one on bb. |
---|
| 568 | . |
---|
| 569 | .PP |
---|
| 570 | Or, consider the hostfile |
---|
| 571 | . |
---|
| 572 | |
---|
| 573 | \fB%\fP cat myhostfile |
---|
| 574 | aa slots=2 |
---|
| 575 | bb slots=2 |
---|
| 576 | cc slots=2 |
---|
| 577 | |
---|
| 578 | . |
---|
| 579 | .PP |
---|
| 580 | Here, we list both the host names (aa, bb, and cc) but also how many "slots" |
---|
| 581 | there are for each. Slots indicate how many processes can potentially execute |
---|
| 582 | on a node. For best performance, the number of slots may be chosen to be the |
---|
| 583 | number of cores on the node or the number of processor sockets. If the hostfile |
---|
| 584 | does not provide slots information, a default of 1 is assumed. |
---|
| 585 | When running under resource managers (e.g., SLURM, Torque, etc.), |
---|
| 586 | Open MPI will obtain both the hostnames and the number of slots directly |
---|
| 587 | from the resource manger. |
---|
| 588 | . |
---|
| 589 | .PP |
---|
| 590 | . |
---|
| 591 | .TP 4 |
---|
| 592 | mpirun -hostfile myhostfile ./a.out |
---|
| 593 | will launch two processes on each of the three nodes. |
---|
| 594 | . |
---|
| 595 | .TP 4 |
---|
| 596 | mpirun -hostfile myhostfile -host aa ./a.out |
---|
| 597 | will launch two processes, both on node aa. |
---|
| 598 | . |
---|
| 599 | .TP 4 |
---|
| 600 | mpirun -hostfile myhostfile -host dd ./a.out |
---|
| 601 | will find no hosts to run on and abort with an error. |
---|
| 602 | That is, the specified host dd is not in the specified hostfile. |
---|
| 603 | . |
---|
| 604 | .SS Specifying Number of Processes |
---|
| 605 | . |
---|
| 606 | As we have just seen, the number of processes to run can be set using the |
---|
| 607 | hostfile. Other mechanisms exist. |
---|
| 608 | . |
---|
| 609 | .PP |
---|
| 610 | The number of processes launched can be specified as a multiple of the |
---|
| 611 | number of nodes or processor sockets available. For example, |
---|
| 612 | . |
---|
| 613 | .TP 4 |
---|
| 614 | mpirun -H aa,bb -npersocket 2 ./a.out |
---|
| 615 | launches processes 0-3 on node aa and process 4-7 on node bb, |
---|
| 616 | where aa and bb are both dual-socket nodes. |
---|
| 617 | The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option, |
---|
| 618 | which is discussed in a later section. |
---|
| 619 | . |
---|
| 620 | .TP 4 |
---|
| 621 | mpirun -H aa,bb -npernode 2 ./a.out |
---|
| 622 | launches processes 0-1 on node aa and processes 2-3 on node bb. |
---|
| 623 | . |
---|
| 624 | .TP 4 |
---|
| 625 | mpirun -H aa,bb -npernode 1 ./a.out |
---|
| 626 | launches one process per host node. |
---|
| 627 | . |
---|
| 628 | .TP 4 |
---|
| 629 | mpirun -H aa,bb -pernode ./a.out |
---|
| 630 | is the same as \fI-npernode\fP 1. |
---|
| 631 | . |
---|
| 632 | . |
---|
| 633 | .PP |
---|
| 634 | Another alternative is to specify the number of processes with the |
---|
| 635 | \fI-np\fP option. Consider now the hostfile |
---|
| 636 | . |
---|
| 637 | |
---|
| 638 | \fB%\fP cat myhostfile |
---|
| 639 | aa slots=4 |
---|
| 640 | bb slots=4 |
---|
| 641 | cc slots=4 |
---|
| 642 | |
---|
| 643 | . |
---|
| 644 | .PP |
---|
| 645 | Now, |
---|
| 646 | . |
---|
| 647 | .TP 4 |
---|
| 648 | mpirun -hostfile myhostfile -np 6 ./a.out |
---|
| 649 | will launch ranks 0-3 on node aa and ranks 4-5 on node bb. The remaining |
---|
| 650 | slots in the hostfile will not be used since the \fI-np\fP option indicated |
---|
| 651 | that only 6 processes should be launched. |
---|
| 652 | . |
---|
| 653 | .SS Mapping Processes to Nodes |
---|
| 654 | . |
---|
| 655 | The examples above illustrate the default mapping of process ranks |
---|
| 656 | to nodes. This mapping can also be controlled with various |
---|
| 657 | \fImpirun\fP options. Here, we consider the same hostfile as |
---|
| 658 | above with \fI-np\fP 6 again: |
---|
| 659 | . |
---|
| 660 | |
---|
| 661 | node aa node bb node cc |
---|
| 662 | |
---|
| 663 | mpirun 0 1 2 3 4 5 |
---|
| 664 | |
---|
| 665 | mpirun -loadbalance 0 1 2 3 4 5 |
---|
| 666 | |
---|
| 667 | mpirun -bynode 0 3 1 4 2 5 |
---|
| 668 | |
---|
| 669 | mpirun -nolocal 0 1 2 3 4 5 |
---|
| 670 | . |
---|
| 671 | .PP |
---|
| 672 | The \fI-loadbalance\fP option tries to spread processes out fairly among the |
---|
| 673 | nodes. |
---|
| 674 | . |
---|
| 675 | .PP |
---|
| 676 | The \fI-bynode\fP option does likewise but numbers the processes in "by node" |
---|
| 677 | in a round-robin fashion. |
---|
| 678 | . |
---|
| 679 | .PP |
---|
| 680 | The \fI-nolocal\fP option prevents any processes from being mapped onto the |
---|
| 681 | local host (in this case node aa). While \fImpirun\fP typically consumes |
---|
| 682 | few system resources, \fI-nolocal\fP can be helpful for launching very |
---|
| 683 | large jobs where \fImpirun\fP may actually need to use noticable amounts |
---|
| 684 | of memory and/or processing time. |
---|
| 685 | . |
---|
| 686 | .PP |
---|
| 687 | Just as \fI-np\fP can specify fewer processes than there are slots, it can |
---|
| 688 | also oversubscribe the slots. For example, with the same hostfile: |
---|
| 689 | . |
---|
| 690 | .TP 4 |
---|
| 691 | mpirun -hostfile myhostfile -np 14 ./a.out |
---|
| 692 | will launch processes 0-3 on node aa, 4-7 on bb, and 8-11 on cc. It will |
---|
| 693 | then add the remaining two processes to whichever nodes it chooses. |
---|
| 694 | . |
---|
| 695 | .PP |
---|
| 696 | One can also specify limits to oversubscription. For example, with the same |
---|
| 697 | hostfile: |
---|
| 698 | . |
---|
| 699 | .TP 4 |
---|
| 700 | mpirun -hostfile myhostfile -np 14 -nooversubscribe ./a.out |
---|
| 701 | will produce an error since \fI-nooversubscribe\fP prevents oversubscription. |
---|
| 702 | . |
---|
| 703 | .PP |
---|
| 704 | Limits to oversubscription can also be specified in the hostfile itself: |
---|
| 705 | . |
---|
| 706 | % cat myhostfile |
---|
| 707 | aa slots=4 max_slots=4 |
---|
| 708 | bb max_slots=4 |
---|
| 709 | cc slots=4 |
---|
| 710 | . |
---|
| 711 | .PP |
---|
| 712 | The \fImax_slots\fP field specifies such a limit. When it does, the |
---|
| 713 | \fIslots\fP value defaults to the limit. Now: |
---|
| 714 | . |
---|
| 715 | .TP 4 |
---|
| 716 | mpirun -hostfile myhostfile -np 14 ./a.out |
---|
| 717 | causes the first 12 processes to be launched as before, but the remaining |
---|
| 718 | two processes will be forced onto node cc. The other two nodes are |
---|
| 719 | protected by the hostfile against oversubscription by this job. |
---|
| 720 | . |
---|
| 721 | .PP |
---|
| 722 | Using the \fI--nooversubscribe\fR option can be helpful since Open MPI |
---|
| 723 | currently does not get "max_slots" values from the resource manager. |
---|
| 724 | . |
---|
| 725 | .PP |
---|
| 726 | Of course, \fI-np\fP can also be used with the \fI-H\fP or \fI-host\fP |
---|
| 727 | option. For example, |
---|
| 728 | . |
---|
| 729 | .TP 4 |
---|
| 730 | mpirun -H aa,bb -np 8 ./a.out |
---|
| 731 | launches 8 processes. Since only two hosts are specified, after the first |
---|
| 732 | two processes are mapped, one to aa and one to bb, the remaining processes |
---|
| 733 | oversubscribe the specified hosts. |
---|
| 734 | . |
---|
| 735 | .PP |
---|
| 736 | And here is a MIMD example: |
---|
| 737 | . |
---|
| 738 | .TP 4 |
---|
| 739 | mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime |
---|
| 740 | will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2 |
---|
| 741 | each running \fIuptime\fP on nodes bb and cc, respectively. |
---|
| 742 | . |
---|
| 743 | .SS Process Binding |
---|
| 744 | . |
---|
| 745 | Processes may be bound to specific resources on a node. This can |
---|
| 746 | improve performance if the operating system is placing processes |
---|
| 747 | suboptimally. For example, it might oversubscribe some multi-core |
---|
| 748 | processor sockets, leaving other sockets idle; this can lead |
---|
| 749 | processes to contend unnecessarily for common resources. Or, it |
---|
| 750 | might spread processes out too widely; this can be suboptimal if |
---|
| 751 | application performance is sensitive to interprocess communication |
---|
| 752 | costs. Binding can also keep the operating system from migrating |
---|
| 753 | processes excessively, regardless of how optimally those processes |
---|
| 754 | were placed to begin with. |
---|
| 755 | . |
---|
| 756 | .PP |
---|
| 757 | To bind processes, one must first associate them with the resources |
---|
| 758 | on which they should run. For example, the \fI-bycore\fP option |
---|
| 759 | associates the processes on a node with successive cores. Or, |
---|
| 760 | \fI-bysocket\fP associates the processes with successive processor sockets, |
---|
| 761 | cycling through the sockets in a round-robin fashion if necessary. |
---|
| 762 | And \fI-cpus-per-proc\fP indicates how many cores to bind per process. |
---|
| 763 | . |
---|
| 764 | .PP |
---|
| 765 | But, such association is meaningless unless the processes are actually |
---|
| 766 | bound to those resources. The binding option specifies the granularity |
---|
| 767 | of binding -- say, with \fI-bind-to-core\fP or \fI-bind-to-socket\fP. |
---|
| 768 | One can also turn binding off with \fI-bind-to-none\fP, which is |
---|
| 769 | typically the default. |
---|
| 770 | . |
---|
| 771 | .PP |
---|
| 772 | Finally, \fI-report-bindings\fP can be used to report bindings. |
---|
| 773 | . |
---|
| 774 | .PP |
---|
| 775 | As an example, consider a node with two processor sockets, each comprising |
---|
| 776 | four cores. We run \fImpirun\fP with \fI-np 4 -report-bindings\fP and |
---|
| 777 | the following additional options: |
---|
| 778 | . |
---|
| 779 | |
---|
| 780 | % mpirun ... -bycore -bind-to-core |
---|
| 781 | [...] ... binding child [...,0] to cpus 0001 |
---|
| 782 | [...] ... binding child [...,1] to cpus 0002 |
---|
| 783 | [...] ... binding child [...,2] to cpus 0004 |
---|
| 784 | [...] ... binding child [...,3] to cpus 0008 |
---|
| 785 | |
---|
| 786 | % mpirun ... -bysocket -bind-to-socket |
---|
| 787 | [...] ... binding child [...,0] to socket 0 cpus 000f |
---|
| 788 | [...] ... binding child [...,1] to socket 1 cpus 00f0 |
---|
| 789 | [...] ... binding child [...,2] to socket 0 cpus 000f |
---|
| 790 | [...] ... binding child [...,3] to socket 1 cpus 00f0 |
---|
| 791 | |
---|
| 792 | % mpirun ... -cpus-per-proc 2 -bind-to-core |
---|
| 793 | [...] ... binding child [...,0] to cpus 0003 |
---|
| 794 | [...] ... binding child [...,1] to cpus 000c |
---|
| 795 | [...] ... binding child [...,2] to cpus 0030 |
---|
| 796 | [...] ... binding child [...,3] to cpus 00c0 |
---|
| 797 | |
---|
| 798 | % mpirun ... -bind-to-none |
---|
| 799 | . |
---|
| 800 | .PP |
---|
| 801 | Here, \fI-report-bindings\fP shows the binding of each process as a mask. |
---|
| 802 | In the first case, the processes bind to successive cores as indicated by |
---|
| 803 | the masks 0001, 0002, 0004, and 0008. In the second case, processes bind |
---|
| 804 | to all cores on successive sockets as indicated by the masks 000f and 00f0. |
---|
| 805 | The processes cycle through the processor sockets in a round-robin fashion |
---|
| 806 | as many times as are needed. In the third case, the masks show us that |
---|
| 807 | 2 cores have been bind per process. In the fourth case, binding is |
---|
| 808 | turned off and no bindings are reported. |
---|
| 809 | . |
---|
| 810 | .PP |
---|
| 811 | Open MPI's support for process binding depends on the underlying |
---|
| 812 | operating system. Therefore, processing binding may not be available |
---|
| 813 | on every system. |
---|
| 814 | . |
---|
| 815 | .PP |
---|
| 816 | Process binding can also be set with MCA parameters. |
---|
| 817 | Their usage is less convenient than that of \fImpirun\fP options. |
---|
| 818 | On the other hand, MCA parameters can be set not only on the \fImpirun\fP |
---|
| 819 | command line, but alternatively in a system or user mca-params.conf file |
---|
| 820 | or as environment variables, as described in the MCA section below. |
---|
| 821 | The correspondences are: |
---|
| 822 | . |
---|
| 823 | |
---|
| 824 | mpirun option MCA parameter key value |
---|
| 825 | |
---|
| 826 | -bycore rmaps_base_schedule_policy core |
---|
| 827 | -bysocket rmaps_base_schedule_policy socket |
---|
| 828 | -bind-to-core orte_process_binding core |
---|
| 829 | -bind-to-socket orte_process_binding socket |
---|
| 830 | -bind-to-none orte_process_binding none |
---|
| 831 | . |
---|
| 832 | .PP |
---|
| 833 | The \fIorte_process_binding\fP value can also take on the |
---|
| 834 | \fI:if-avail\fP attribute. This attribute means that processes |
---|
| 835 | will be bound only if this is supported on the underlying |
---|
| 836 | operating system. Without the attribute, if there is no |
---|
| 837 | such support, the binding request results in an error. |
---|
| 838 | For example, you could have |
---|
| 839 | . |
---|
| 840 | |
---|
| 841 | % cat $HOME/.openmpi/mca-params.conf |
---|
| 842 | rmaps_base_schedule_policy = socket |
---|
| 843 | orte_process_binding = socket:if-avail |
---|
| 844 | . |
---|
| 845 | . |
---|
| 846 | .SS Rankfiles |
---|
| 847 | . |
---|
| 848 | Rankfiles provide a means for specifying detailed information about |
---|
| 849 | how process ranks should be mapped to nodes and how they should be bound. |
---|
| 850 | Consider the following: |
---|
| 851 | . |
---|
| 852 | |
---|
| 853 | cat myrankfile |
---|
| 854 | rank 0=aa slot=1:0-2 |
---|
| 855 | rank 1=bb slot=0:0,1 |
---|
| 856 | rank 2=cc slot=1-2 |
---|
| 857 | mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out |
---|
| 858 | . |
---|
| 859 | So that |
---|
| 860 | |
---|
| 861 | Rank 0 runs on node aa, bound to socket 1, cores 0-2. |
---|
| 862 | Rank 1 runs on node bb, bound to socket 0, cores 0 and 1. |
---|
| 863 | Rank 2 runs on node cc, bound to cores 1 and 2. |
---|
| 864 | . |
---|
| 865 | . |
---|
| 866 | .SS Application Context or Executable Program? |
---|
| 867 | . |
---|
| 868 | To distinguish the two different forms, \fImpirun\fP |
---|
| 869 | looks on the command line for \fI--app\fP option. If |
---|
| 870 | it is specified, then the file named on the command line is |
---|
| 871 | assumed to be an application context. If it is not |
---|
| 872 | specified, then the file is assumed to be an executable program. |
---|
| 873 | . |
---|
| 874 | . |
---|
| 875 | . |
---|
| 876 | .SS Locating Files |
---|
| 877 | . |
---|
| 878 | If no relative or absolute path is specified for a file, Open |
---|
| 879 | MPI will first look for files by searching the directories specified |
---|
| 880 | by the \fI--path\fP option. If there is no \fI--path\fP option set or |
---|
| 881 | if the file is not found at the \fI--path\fP location, then Open MPI |
---|
| 882 | will search the user's PATH environment variable as defined on the |
---|
| 883 | source node(s). |
---|
| 884 | .PP |
---|
| 885 | If a relative directory is specified, it must be relative to the initial |
---|
| 886 | working directory determined by the specific starter used. For example when |
---|
| 887 | using the rsh or ssh starters, the initial directory is $HOME by default. Other |
---|
| 888 | starters may set the initial directory to the current working directory from |
---|
| 889 | the invocation of \fImpirun\fP. |
---|
| 890 | . |
---|
| 891 | . |
---|
| 892 | . |
---|
| 893 | .SS Current Working Directory |
---|
| 894 | . |
---|
| 895 | The \fI\-wdir\fP mpirun option (and its synonym, \fI\-wd\fP) allows |
---|
| 896 | the user to change to an arbitrary directory before the program is |
---|
| 897 | invoked. It can also be used in application context files to specify |
---|
| 898 | working directories on specific nodes and/or for specific |
---|
| 899 | applications. |
---|
| 900 | .PP |
---|
| 901 | If the \fI\-wdir\fP option appears both in a context file and on the |
---|
| 902 | command line, the context file directory will override the command |
---|
| 903 | line value. |
---|
| 904 | .PP |
---|
| 905 | If the \fI-wdir\fP option is specified, Open MPI will attempt to |
---|
| 906 | change to the specified directory on all of the remote nodes. If this |
---|
| 907 | fails, \fImpirun\fP will abort. |
---|
| 908 | .PP |
---|
| 909 | If the \fI-wdir\fP option is \fBnot\fP specified, Open MPI will send |
---|
| 910 | the directory name where \fImpirun\fP was invoked to each of the |
---|
| 911 | remote nodes. The remote nodes will try to change to that |
---|
| 912 | directory. If they are unable (e.g., if the directory does not exit on |
---|
| 913 | that node), then Open MPI will use the default directory determined by |
---|
| 914 | the starter. |
---|
| 915 | .PP |
---|
| 916 | All directory changing occurs before the user's program is invoked; it |
---|
| 917 | does not wait until \fIMPI_INIT\fP is called. |
---|
| 918 | . |
---|
| 919 | . |
---|
| 920 | . |
---|
| 921 | .SS Standard I/O |
---|
| 922 | . |
---|
| 923 | Open MPI directs UNIX standard input to /dev/null on all processes |
---|
| 924 | except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process |
---|
| 925 | inherits standard input from \fImpirun\fP. |
---|
| 926 | .B Note: |
---|
| 927 | The node that invoked \fImpirun\fP need not be the same as the node where the |
---|
| 928 | MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection of |
---|
| 929 | \fImpirun\fP's standard input to the rank 0 process. |
---|
| 930 | .PP |
---|
| 931 | Open MPI directs UNIX standard output and error from remote nodes to the node |
---|
| 932 | that invoked \fImpirun\fP and prints it on the standard output/error of |
---|
| 933 | \fImpirun\fP. |
---|
| 934 | Local processes inherit the standard output/error of \fImpirun\fP and transfer |
---|
| 935 | to it directly. |
---|
| 936 | .PP |
---|
| 937 | Thus it is possible to redirect standard I/O for Open MPI applications by |
---|
| 938 | using the typical shell redirection procedure on \fImpirun\fP. |
---|
| 939 | |
---|
| 940 | \fB%\fP mpirun -np 2 my_app < my_input > my_output |
---|
| 941 | |
---|
| 942 | Note that in this example \fIonly\fP the MPI_COMM_WORLD rank 0 process will |
---|
| 943 | receive the stream from \fImy_input\fP on stdin. The stdin on all the other |
---|
| 944 | nodes will be tied to /dev/null. However, the stdout from all nodes will |
---|
| 945 | be collected into the \fImy_output\fP file. |
---|
| 946 | . |
---|
| 947 | . |
---|
| 948 | . |
---|
| 949 | .SS Signal Propagation |
---|
| 950 | . |
---|
| 951 | When orterun receives a SIGTERM and SIGINT, it will attempt to kill |
---|
| 952 | the entire job by sending all processes in the job a SIGTERM, waiting |
---|
| 953 | a small number of seconds, then sending all processes in the job a |
---|
| 954 | SIGKILL. |
---|
| 955 | . |
---|
| 956 | .PP |
---|
| 957 | SIGUSR1 and SIGUSR2 signals received by orterun are propagated to |
---|
| 958 | all processes in the job. |
---|
| 959 | . |
---|
| 960 | .PP |
---|
| 961 | One can turn on forwarding of SIGSTOP and SIGCONT to the program executed |
---|
| 962 | by mpirun by setting the MCA parameter orte_forward_job_control to 1. |
---|
| 963 | A SIGTSTOP signal to mpirun will then cause a SIGSTOP signal to be sent |
---|
| 964 | to all of the programs started by mpirun and likewise a SIGCONT signal |
---|
| 965 | to mpirun will cause a SIGCONT sent. |
---|
| 966 | . |
---|
| 967 | .PP |
---|
| 968 | Other signals are not currently propagated |
---|
| 969 | by orterun. |
---|
| 970 | . |
---|
| 971 | . |
---|
| 972 | .SS Process Termination / Signal Handling |
---|
| 973 | . |
---|
| 974 | During the run of an MPI application, if any rank dies abnormally |
---|
| 975 | (either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a |
---|
| 976 | signal), \fImpirun\fP will print out an error message and kill the rest of the |
---|
| 977 | MPI application. |
---|
| 978 | .PP |
---|
| 979 | User signal handlers should probably avoid trying to cleanup MPI state |
---|
| 980 | (Open MPI is, currently, neither thread-safe nor async-signal-safe). |
---|
| 981 | For example, if a segmentation fault occurs in \fIMPI_SEND\fP (perhaps because |
---|
| 982 | a bad buffer was passed in) and a user signal handler is invoked, if this user |
---|
| 983 | handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since |
---|
| 984 | Open MPI was already "in" MPI when the error occurred. Since \fImpirun\fP |
---|
| 985 | will notice that the process died due to a signal, it is probably not |
---|
| 986 | necessary (and safest) for the user to only clean up non-MPI state. |
---|
| 987 | . |
---|
| 988 | . |
---|
| 989 | . |
---|
| 990 | .SS Process Environment |
---|
| 991 | . |
---|
| 992 | Processes in the MPI application inherit their environment from the |
---|
| 993 | Open RTE daemon upon the node on which they are running. The |
---|
| 994 | environment is typically inherited from the user's shell. On remote |
---|
| 995 | nodes, the exact environment is determined by the boot MCA module |
---|
| 996 | used. The \fIrsh\fR launch module, for example, uses either |
---|
| 997 | \fIrsh\fR/\fIssh\fR to launch the Open RTE daemon on remote nodes, and |
---|
| 998 | typically executes one or more of the user's shell-setup files before |
---|
| 999 | launching the Open RTE daemon. When running dynamically linked |
---|
| 1000 | applications which require the \fILD_LIBRARY_PATH\fR environment |
---|
| 1001 | variable to be set, care must be taken to ensure that it is correctly |
---|
| 1002 | set when booting Open MPI. |
---|
| 1003 | .PP |
---|
| 1004 | See the "Remote Execution" section for more details. |
---|
| 1005 | . |
---|
| 1006 | . |
---|
| 1007 | .SS Remote Execution |
---|
| 1008 | . |
---|
| 1009 | Open MPI requires that the \fIPATH\fR environment variable be set to |
---|
| 1010 | find executables on remote nodes (this is typically only necessary in |
---|
| 1011 | \fIrsh\fR- or \fIssh\fR-based environments -- batch/scheduled |
---|
| 1012 | environments typically copy the current environment to the execution |
---|
| 1013 | of remote jobs, so if the current environment has \fIPATH\fR and/or |
---|
| 1014 | \fILD_LIBRARY_PATH\fR set properly, the remote nodes will also have it |
---|
| 1015 | set properly). If Open MPI was compiled with shared library support, |
---|
| 1016 | it may also be necessary to have the \fILD_LIBRARY_PATH\fR environment |
---|
| 1017 | variable set on remote nodes as well (especially to find the shared |
---|
| 1018 | libraries required to run user MPI applications). |
---|
| 1019 | .PP |
---|
| 1020 | However, it is not always desirable or possible to edit shell |
---|
| 1021 | startup files to set \fIPATH\fR and/or \fILD_LIBRARY_PATH\fR. The |
---|
| 1022 | \fI--prefix\fR option is provided for some simple configurations where |
---|
| 1023 | this is not possible. |
---|
| 1024 | .PP |
---|
| 1025 | The \fI--prefix\fR option takes a single argument: the base directory |
---|
| 1026 | on the remote node where Open MPI is installed. Open MPI will use |
---|
| 1027 | this directory to set the remote \fIPATH\fR and \fILD_LIBRARY_PATH\fR |
---|
| 1028 | before executing any Open MPI or user applications. This allows |
---|
| 1029 | running Open MPI jobs without having pre-configured the \fIPATH\fR and |
---|
| 1030 | \fILD_LIBRARY_PATH\fR on the remote nodes. |
---|
| 1031 | .PP |
---|
| 1032 | Open MPI adds the basename of the current |
---|
| 1033 | node's "bindir" (the directory where Open MPI's executables are |
---|
| 1034 | installed) to the prefix and uses that to set the \fIPATH\fR on the |
---|
| 1035 | remote node. Similarly, Open MPI adds the basename of the current |
---|
| 1036 | node's "libdir" (the directory where Open MPI's libraries are |
---|
| 1037 | installed) to the prefix and uses that to set the |
---|
| 1038 | \fILD_LIBRARY_PATH\fR on the remote node. For example: |
---|
| 1039 | .TP 15 |
---|
| 1040 | Local bindir: |
---|
| 1041 | /local/node/directory/bin |
---|
| 1042 | .TP |
---|
| 1043 | Local libdir: |
---|
| 1044 | /local/node/directory/lib64 |
---|
| 1045 | .PP |
---|
| 1046 | If the following command line is used: |
---|
| 1047 | |
---|
| 1048 | \fB%\fP mpirun --prefix /remote/node/directory |
---|
| 1049 | |
---|
| 1050 | Open MPI will add "/remote/node/directory/bin" to the \fIPATH\fR |
---|
| 1051 | and "/remote/node/directory/lib64" to the \fLD_LIBRARY_PATH\fR on the |
---|
| 1052 | remote node before attempting to execute anything. |
---|
| 1053 | .PP |
---|
| 1054 | Note that \fI--prefix\fR can be set on a per-context basis, allowing |
---|
| 1055 | for different values for different nodes. |
---|
| 1056 | .PP |
---|
| 1057 | The \fI--prefix\fR option is not sufficient if the installation paths |
---|
| 1058 | on the remote node are different than the local node (e.g., if "/lib" |
---|
| 1059 | is used on the local node, but "/lib64" is used on the remote node), |
---|
| 1060 | or if the installation paths are something other than a subdirectory |
---|
| 1061 | under a common prefix. |
---|
| 1062 | .PP |
---|
| 1063 | Note that executing \fImpirun\fR via an absolute pathname is |
---|
| 1064 | equivalent to specifying \fI--prefix\fR without the last subdirectory |
---|
| 1065 | in the absolute pathname to \fImpirun\fR. For example: |
---|
| 1066 | |
---|
| 1067 | \fB%\fP /usr/local/bin/mpirun ... |
---|
| 1068 | |
---|
| 1069 | is equivalent to |
---|
| 1070 | |
---|
| 1071 | \fB%\fP mpirun --prefix /usr/local |
---|
| 1072 | . |
---|
| 1073 | . |
---|
| 1074 | . |
---|
| 1075 | .SS Exported Environment Variables |
---|
| 1076 | . |
---|
| 1077 | All environment variables that are named in the form OMPI_* will automatically |
---|
| 1078 | be exported to new processes on the local and remote nodes. |
---|
| 1079 | The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment |
---|
| 1080 | variables to the new processes. While the syntax of the \fI\-x\fP |
---|
| 1081 | option allows the definition of new variables, note that the parser |
---|
| 1082 | for this option is currently not very sophisticated - it does not even |
---|
| 1083 | understand quoted values. Users are advised to set variables in the |
---|
| 1084 | environment and use \fI\-x\fP to export them; not to define them. |
---|
| 1085 | . |
---|
| 1086 | . |
---|
| 1087 | . |
---|
| 1088 | .SS Setting MCA Parameters |
---|
| 1089 | . |
---|
| 1090 | The \fI-mca\fP switch allows the passing of parameters to various MCA |
---|
| 1091 | (Modular Component Architecture) modules. |
---|
| 1092 | .\" Open MPI's MCA modules are described in detail in ompimca(7). |
---|
| 1093 | MCA modules have direct impact on MPI programs because they allow tunable |
---|
| 1094 | parameters to be set at run time (such as which BTL communication device driver |
---|
| 1095 | to use, what parameters to pass to that BTL, etc.). |
---|
| 1096 | .PP |
---|
| 1097 | The \fI-mca\fP switch takes two arguments: \fI<key>\fP and \fI<value>\fP. |
---|
| 1098 | The \fI<key>\fP argument generally specifies which MCA module will receive the value. |
---|
| 1099 | For example, the \fI<key>\fP "btl" is used to select which BTL to be used for |
---|
| 1100 | transporting MPI messages. The \fI<value>\fP argument is the value that is |
---|
| 1101 | passed. |
---|
| 1102 | For example: |
---|
| 1103 | . |
---|
| 1104 | .TP 4 |
---|
| 1105 | mpirun -mca btl tcp,self -np 1 foo |
---|
| 1106 | Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of |
---|
| 1107 | "foo" an allocated node. |
---|
| 1108 | . |
---|
| 1109 | .TP |
---|
| 1110 | mpirun -mca btl self -np 1 foo |
---|
| 1111 | Tells Open MPI to use the "self" BTL, and to run a single copy of "foo" an |
---|
| 1112 | allocated node. |
---|
| 1113 | .\" And so on. Open MPI's BTL MCA modules are described in ompimca_btl(7). |
---|
| 1114 | .PP |
---|
| 1115 | The \fI-mca\fP switch can be used multiple times to specify different |
---|
| 1116 | \fI<key>\fP and/or \fI<value>\fP arguments. If the same \fI<key>\fP is |
---|
| 1117 | specified more than once, the \fI<value>\fPs are concatenated with a comma |
---|
| 1118 | (",") separating them. |
---|
| 1119 | .PP |
---|
| 1120 | Note that the \fI-mca\fP switch is simply a shortcut for setting environment variables. |
---|
| 1121 | The same effect may be accomplished by setting corresponding environment |
---|
| 1122 | variables before running \fImpirun\fP. |
---|
| 1123 | The form of the environment variables that Open MPI sets is: |
---|
| 1124 | |
---|
| 1125 | OMPI_MCA_<key>=<value> |
---|
| 1126 | .PP |
---|
| 1127 | Thus, the \fI-mca\fP switch overrides any previously set environment |
---|
| 1128 | variables. The \fI-mca\fP settings similarly override MCA parameters set |
---|
| 1129 | in the |
---|
| 1130 | $OPAL_PREFIX/etc/openmpi-mca-params.conf or $HOME/.openmpi/mca-params.conf |
---|
| 1131 | file. |
---|
| 1132 | . |
---|
| 1133 | .PP |
---|
| 1134 | Unknown \fI<key>\fP arguments are still set as |
---|
| 1135 | environment variable -- they are not checked (by \fImpirun\fP) for correctness. |
---|
| 1136 | Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it |
---|
| 1137 | depends on the specific MCA module. |
---|
| 1138 | .PP |
---|
| 1139 | To find the available component types under the MCA architecture, or to find the |
---|
| 1140 | available parameters for a specific component, use the \fIompi_info\fP command. |
---|
| 1141 | See the \fIompi_info(1)\fP man page for detailed information on the command. |
---|
| 1142 | . |
---|
| 1143 | .\" ************************** |
---|
| 1144 | .\" Examples Section |
---|
| 1145 | .\" ************************** |
---|
| 1146 | .SH EXAMPLES |
---|
| 1147 | Be sure also to see the examples throughout the sections above. |
---|
| 1148 | . |
---|
| 1149 | .TP 4 |
---|
| 1150 | mpirun -np 4 -mca btl ib,tcp,self prog1 |
---|
| 1151 | Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport |
---|
| 1152 | of MPI messages. |
---|
| 1153 | . |
---|
| 1154 | . |
---|
| 1155 | .TP 4 |
---|
| 1156 | mpirun -np 4 -mca btl tcp,sm,self |
---|
| 1157 | .br |
---|
| 1158 | --mca btl_tcp_if_include ce0 prog1 |
---|
| 1159 | .br |
---|
| 1160 | Run 4 copies of prog1 using the "tcp", "sm" and "self" BTLs for the transport of |
---|
| 1161 | MPI messages, with TCP using only the ce0 interface to communicate. Note that |
---|
| 1162 | other BTLs have similar if_include MCA parameters. |
---|
| 1163 | . |
---|
| 1164 | .\" ************************** |
---|
| 1165 | .\" Diagnostics Section |
---|
| 1166 | .\" ************************** |
---|
| 1167 | . |
---|
| 1168 | .\" .SH DIAGNOSTICS |
---|
| 1169 | .\".TP 4 |
---|
| 1170 | .\"Error Msg: |
---|
| 1171 | .\"Description |
---|
| 1172 | . |
---|
| 1173 | .\" ************************** |
---|
| 1174 | .\" Return Value Section |
---|
| 1175 | .\" ************************** |
---|
| 1176 | . |
---|
| 1177 | .SH RETURN VALUE |
---|
| 1178 | . |
---|
| 1179 | \fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling |
---|
| 1180 | MPI_FINALIZE. A non-zero value is returned if an internal error occurred in |
---|
| 1181 | mpirun, or one or more ranks exited before calling MPI_FINALIZE. If an |
---|
| 1182 | internal error occurred in mpirun, the corresponding error code is returned. |
---|
| 1183 | In the event that one or more ranks exit before calling MPI_FINALIZE, the |
---|
| 1184 | return value of the rank of the process that \fImpirun\fP first notices died |
---|
| 1185 | before calling MPI_FINALIZE will be returned. Note that, in general, this will |
---|
| 1186 | be the first rank that died but is not guaranteed to be so. |
---|
| 1187 | . |
---|
| 1188 | .\" ************************** |
---|
| 1189 | .\" See Also Section |
---|
| 1190 | .\" ************************** |
---|
| 1191 | . |
---|
| 1192 | .\" .SH SEE ALSO |
---|
| 1193 | .\" orted(1), ompi-server(1) |
---|