1 | .\" Copyright (c) 2009 Cisco Systems, Inc. All rights reserved. |
---|
2 | .\" Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved. |
---|
3 | .\" |
---|
4 | .\" Man page for ORTE's orterun command |
---|
5 | .\" |
---|
6 | .\" .TH name section center-footer left-footer center-header |
---|
7 | .TH MPIRUN 1 "Dec 08, 2009" "1.4" "Open MPI" |
---|
8 | .\" ************************** |
---|
9 | .\" Name Section |
---|
10 | .\" ************************** |
---|
11 | .SH NAME |
---|
12 | . |
---|
13 | orterun, mpirun, mpiexec \- Execute serial and parallel jobs in Open MPI. |
---|
14 | |
---|
15 | .B Note: |
---|
16 | \fImpirun\fP, \fImpiexec\fP, and \fIorterun\fP are all synonyms for each |
---|
17 | other. Using any of the names will produce the same behavior. |
---|
18 | . |
---|
19 | .\" ************************** |
---|
20 | .\" Synopsis Section |
---|
21 | .\" ************************** |
---|
22 | .SH SYNOPSIS |
---|
23 | . |
---|
24 | .PP |
---|
25 | Single Process Multiple Data (SPMD) Model: |
---|
26 | |
---|
27 | .B mpirun |
---|
28 | [ options ] |
---|
29 | .B <program> |
---|
30 | [ <args> ] |
---|
31 | .P |
---|
32 | |
---|
33 | Multiple Instruction Multiple Data (MIMD) Model: |
---|
34 | |
---|
35 | .B mpirun |
---|
36 | [ global_options ] |
---|
37 | [ local_options1 ] |
---|
38 | .B <program1> |
---|
39 | [ <args1> ] : |
---|
40 | [ local_options2 ] |
---|
41 | .B <program2> |
---|
42 | [ <args2> ] : |
---|
43 | ... : |
---|
44 | [ local_optionsN ] |
---|
45 | .B <programN> |
---|
46 | [ <argsN> ] |
---|
47 | .P |
---|
48 | |
---|
49 | Note that in both models, invoking \fImpirun\fP via an absolute path |
---|
50 | name is equivalent to specifying the \fI--prefix\fP option with a |
---|
51 | \fI<dir>\fR value equivalent to the directory where \fImpirun\fR |
---|
52 | resides, minus its last subdirectory. For example: |
---|
53 | |
---|
54 | \fB%\fP /usr/local/bin/mpirun ... |
---|
55 | |
---|
56 | is equivalent to |
---|
57 | |
---|
58 | \fB%\fP mpirun --prefix /usr/local |
---|
59 | |
---|
60 | . |
---|
61 | .\" ************************** |
---|
62 | .\" Quick Summary Section |
---|
63 | .\" ************************** |
---|
64 | .SH QUICK SUMMARY |
---|
65 | . |
---|
66 | If you are simply looking for how to run an MPI application, you |
---|
67 | probably want to use a command line of the following form: |
---|
68 | |
---|
69 | \fB%\fP mpirun [ -np X ] [ --hostfile <filename> ] <program> |
---|
70 | |
---|
71 | This will run X copies of \fI<program>\fR in your current run-time |
---|
72 | environment (if running under a supported resource manager, Open MPI's |
---|
73 | \fImpirun\fR will usually automatically use the corresponding resource manager |
---|
74 | process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR, |
---|
75 | which require the use of a hostfile, or will default to running all X |
---|
76 | copies on the localhost), scheduling (by default) in a round-robin fashion by |
---|
77 | CPU slot. See the rest of this page for more details. |
---|
78 | . |
---|
79 | .\" ************************** |
---|
80 | .\" Options Section |
---|
81 | .\" ************************** |
---|
82 | .SH OPTIONS |
---|
83 | . |
---|
84 | .I mpirun |
---|
85 | will send the name of the directory where it was invoked on the local |
---|
86 | node to each of the remote nodes, and attempt to change to that |
---|
87 | directory. See the "Current Working Directory" section below for further |
---|
88 | details. |
---|
89 | .\" |
---|
90 | .\" Start options listing |
---|
91 | .\" Indent 10 characters from start of first column to start of second column |
---|
92 | .TP 10 |
---|
93 | .B <program> |
---|
94 | The program executable. This is identified as the first non-recognized argument |
---|
95 | to mpirun. |
---|
96 | . |
---|
97 | . |
---|
98 | .TP |
---|
99 | .B <args> |
---|
100 | Pass these run-time arguments to every new process. These must always |
---|
101 | be the last arguments to \fImpirun\fP. If an app context file is used, |
---|
102 | \fI<args>\fP will be ignored. |
---|
103 | . |
---|
104 | . |
---|
105 | .TP |
---|
106 | .B -h\fR,\fP --help |
---|
107 | Display help for this command |
---|
108 | . |
---|
109 | . |
---|
110 | .TP |
---|
111 | .B -q\fR,\fP --quiet |
---|
112 | Suppress informative messages from orterun during application execution. |
---|
113 | . |
---|
114 | . |
---|
115 | .TP |
---|
116 | .B -v\fR,\fP --verbose |
---|
117 | Be verbose |
---|
118 | . |
---|
119 | . |
---|
120 | .TP |
---|
121 | .B -V\fR,\fP --version |
---|
122 | Print version number. If no other arguments are given, this will also |
---|
123 | cause orterun to exit. |
---|
124 | . |
---|
125 | . |
---|
126 | . |
---|
127 | . |
---|
128 | .P |
---|
129 | To specify which hosts (nodes) of the cluster to run on: |
---|
130 | . |
---|
131 | . |
---|
132 | .TP |
---|
133 | .B -H\fR,\fP -host\fR,\fP --host \fR<host1,host2,...,hostN>\fP |
---|
134 | List of hosts on which to invoke processes. |
---|
135 | . |
---|
136 | . |
---|
137 | .TP |
---|
138 | .B |
---|
139 | -hostfile\fR,\fP --hostfile \fR<hostfile>\fP |
---|
140 | Provide a hostfile to use. |
---|
141 | .\" JJH - Should have man page for how to format a hostfile properly. |
---|
142 | . |
---|
143 | . |
---|
144 | .TP |
---|
145 | .B -machinefile\fR,\fP --machinefile \fR<machinefile>\fP |
---|
146 | Synonym for \fI-hostfile\fP. |
---|
147 | . |
---|
148 | . |
---|
149 | . |
---|
150 | . |
---|
151 | .P |
---|
152 | To specify the number of processes to launch: |
---|
153 | . |
---|
154 | . |
---|
155 | .TP |
---|
156 | .B -c\fR,\fP -n\fR,\fP --n\fR,\fP -np \fR<#>\fP |
---|
157 | Run this many copies of the program on the given nodes. This option |
---|
158 | indicates that the specified file is an executable program and not an |
---|
159 | application context. If no value is provided for the number of copies to |
---|
160 | execute (i.e., neither the "-np" nor its synonyms are provided on the command |
---|
161 | line), Open MPI will automatically execute a copy of the program on |
---|
162 | each process slot (see below for description of a "process slot"). This |
---|
163 | feature, however, can only be used in the SPMD model and will return an |
---|
164 | error (without beginning execution of the application) otherwise. |
---|
165 | . |
---|
166 | . |
---|
167 | .TP |
---|
168 | .B -npersocket\fR,\fP --npersocket <#persocket> |
---|
169 | On each node, launch this many processes times the number of processor |
---|
170 | sockets on the node. |
---|
171 | The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option. |
---|
172 | . |
---|
173 | . |
---|
174 | .TP |
---|
175 | .B -npernode\fR,\fP --npernode <#pernode> |
---|
176 | On each node, launch this many processes. |
---|
177 | . |
---|
178 | . |
---|
179 | .TP |
---|
180 | .B -pernode\fR,\fP --pernode |
---|
181 | On each node, launch one process -- equivalent to \fI-npernode\fP 1. |
---|
182 | . |
---|
183 | . |
---|
184 | . |
---|
185 | . |
---|
186 | .P |
---|
187 | To map processes to nodes: |
---|
188 | . |
---|
189 | . |
---|
190 | .TP |
---|
191 | .B -loadbalance\fR,\fP --loadbalance |
---|
192 | Uniform distribution of ranks across all nodes. See more detailed description below. |
---|
193 | . |
---|
194 | .TP |
---|
195 | .B -nolocal\fR,\fP --nolocal |
---|
196 | Do not run any copies of the launched application on the same node as |
---|
197 | orterun is running. This option will override listing the localhost |
---|
198 | with \fB--host\fR or any other host-specifying mechanism. |
---|
199 | . |
---|
200 | .TP |
---|
201 | .B -nooversubscribe\fR,\fP --nooversubscribe |
---|
202 | Do not oversubscribe any nodes; error (without starting any processes) |
---|
203 | if the requested number of processes would cause oversubscription. |
---|
204 | This option implicitly sets "max_slots" equal to the "slots" value for |
---|
205 | each node. |
---|
206 | . |
---|
207 | .TP |
---|
208 | .B -bynode\fR,\fP --bynode |
---|
209 | Launch processes one per node, cycling by node in a round-robin |
---|
210 | fashion. This spreads processes evenly among nodes and assigns |
---|
211 | ranks in a round-robin, "by node" manner. |
---|
212 | . |
---|
213 | . |
---|
214 | . |
---|
215 | . |
---|
216 | .P |
---|
217 | For process binding: |
---|
218 | . |
---|
219 | .TP |
---|
220 | .B -bycore\fR,\fP --bycore |
---|
221 | Associate processes with successive cores |
---|
222 | if used with one of the \fI-bind-to-*\fP options. |
---|
223 | . |
---|
224 | .TP |
---|
225 | .B -bysocket\fR,\fP --bysocket |
---|
226 | Associate processes with successive processor sockets |
---|
227 | if used with one of the \fI-bind-to-*\fP options. |
---|
228 | . |
---|
229 | .TP |
---|
230 | .B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc> |
---|
231 | Use the number of cores per process |
---|
232 | if used with one of the \fI-bind-to-*\fP options. |
---|
233 | . |
---|
234 | .TP |
---|
235 | .B -cpus-per-rank\fR,\fP --cpus-per-rank <#perrank> |
---|
236 | Alias for \fI-cpus-per-proc\fP. |
---|
237 | . |
---|
238 | .TP |
---|
239 | .B -bind-to-core\fR,\fP --bind-to-core |
---|
240 | Bind processes to cores. |
---|
241 | . |
---|
242 | .TP |
---|
243 | .B -bind-to-socket\fR,\fP --bind-to-socket |
---|
244 | Bind processes to processor sockets. |
---|
245 | . |
---|
246 | .TP |
---|
247 | .B -bind-to-none\fR,\fP --bind-to-none |
---|
248 | Do not bind processes. (Default.) |
---|
249 | . |
---|
250 | .TP |
---|
251 | .B -report-bindings\fR,\fP --report-bindings |
---|
252 | Report any bindings for launched processes. |
---|
253 | . |
---|
254 | .TP |
---|
255 | .B -slot-list\fR,\fP --slot-list <slots> |
---|
256 | List of processor IDs to be used for binding MPI processes. The specified bindings will |
---|
257 | be applied to all MPI processes. See explanation below for syntax. |
---|
258 | . |
---|
259 | . |
---|
260 | . |
---|
261 | . |
---|
262 | .P |
---|
263 | For rankfiles: |
---|
264 | . |
---|
265 | . |
---|
266 | .TP |
---|
267 | .B -rf\fR,\fP --rankfile <rankfile> |
---|
268 | Provide a rankfile file. |
---|
269 | . |
---|
270 | . |
---|
271 | . |
---|
272 | . |
---|
273 | .P |
---|
274 | To manage standard I/O: |
---|
275 | . |
---|
276 | . |
---|
277 | .TP |
---|
278 | .B -output-filename\fR,\fP --output-filename \fR<filename>\fP |
---|
279 | Redirect the stdout, stderr, and stddiag of all ranks to a rank-unique version of |
---|
280 | the specified filename. Any directories in the filename will automatically be created. |
---|
281 | Each output file will consist of filename.rank, where the rank will be left-filled with |
---|
282 | zero's for correct ordering in listings. |
---|
283 | . |
---|
284 | . |
---|
285 | .TP |
---|
286 | .B -stdin\fR,\fP --stdin <rank> |
---|
287 | The MPI rank that is to receive stdin. The default is to forward stdin to rank=0, but this |
---|
288 | option can be used to forward stdin to any rank. It is also acceptable to specify \fInone\fP, |
---|
289 | indicating that no ranks are to receive stdin. |
---|
290 | . |
---|
291 | . |
---|
292 | .TP |
---|
293 | .B -tag-output\fR,\fP --tag-output |
---|
294 | Tag each line of output to stdout, stderr, and stddiag with \fB[jobid, rank]<stdxxx>\fP indicating the process jobid |
---|
295 | and rank that generated the output, and the channel which generated it. |
---|
296 | . |
---|
297 | . |
---|
298 | .TP |
---|
299 | .B -timestamp-output\fR,\fP --timestamp-output |
---|
300 | Timestamp each line of output to stdout, stderr, and stddiag. |
---|
301 | . |
---|
302 | . |
---|
303 | .TP |
---|
304 | .B -xml\fR,\fP --xml |
---|
305 | Provide all output to stdout, stderr, and stddiag in an xml format. |
---|
306 | . |
---|
307 | . |
---|
308 | .TP |
---|
309 | .B -xterm\fR,\fP --xterm \fR<ranks>\fP |
---|
310 | Display the specified ranks in separate xterm windows. The ranks are specified |
---|
311 | as a comma-separated list of ranges, with a -1 indicating all. A separate |
---|
312 | window will be created for each specified rank. |
---|
313 | .B Note: |
---|
314 | In some environments, xterm may require that the executable be in the user's |
---|
315 | path, or be specified in absolute or relative terms. Thus, it may be necessary |
---|
316 | to specify a local executable as "./foo" instead of just "foo". If xterm fails to |
---|
317 | find the executable, mpirun will hang, but still respond correctly to a ctrl-c. |
---|
318 | If this happens, please check that the executable is being specified correctly |
---|
319 | and try again. |
---|
320 | . |
---|
321 | . |
---|
322 | . |
---|
323 | . |
---|
324 | .P |
---|
325 | To manage files and runtime environment: |
---|
326 | . |
---|
327 | . |
---|
328 | .TP |
---|
329 | .B -path\fR,\fP --path \fR<path>\fP |
---|
330 | <path> that will be used when attempting to locate the requested |
---|
331 | executables. This is used prior to using the local PATH setting. |
---|
332 | . |
---|
333 | . |
---|
334 | .TP |
---|
335 | .B --prefix \fR<dir>\fP |
---|
336 | Prefix directory that will be used to set the \fIPATH\fR and |
---|
337 | \fILD_LIBRARY_PATH\fR on the remote node before invoking Open MPI or |
---|
338 | the target process. See the "Remote Execution" section, below. |
---|
339 | . |
---|
340 | . |
---|
341 | .TP |
---|
342 | .B --preload-binary |
---|
343 | Copy the specified executable(s) to remote machines prior to starting remote processes. The |
---|
344 | executables will be copied to the Open MPI session directory and will be deleted upon |
---|
345 | completion of the job. |
---|
346 | . |
---|
347 | . |
---|
348 | .TP |
---|
349 | .B --preload-files <files> |
---|
350 | Preload the comma separated list of files to the current working directory of the remote |
---|
351 | machines where processes will be launched prior to starting those processes. |
---|
352 | . |
---|
353 | . |
---|
354 | .TP |
---|
355 | .B --preload-files-dest-dir <path> |
---|
356 | The destination directory to be used for preload-files, if other than the current working |
---|
357 | directory. By default, the absolute and relative paths provided by --preload-files are used. |
---|
358 | . |
---|
359 | . |
---|
360 | .TP |
---|
361 | .B --tmpdir \fR<dir>\fP |
---|
362 | Set the root for the session directory tree for mpirun only. |
---|
363 | . |
---|
364 | . |
---|
365 | .TP |
---|
366 | .B -wd \fR<dir>\fP |
---|
367 | Synonym for \fI-wdir\fP. |
---|
368 | . |
---|
369 | . |
---|
370 | .TP |
---|
371 | .B -wdir \fR<dir>\fP |
---|
372 | Change to the directory <dir> before the user's program executes. |
---|
373 | See the "Current Working Directory" section for notes on relative paths. |
---|
374 | .B Note: |
---|
375 | If the \fI-wdir\fP option appears both on the command line and in an |
---|
376 | application context, the context will take precedence over the command |
---|
377 | line. |
---|
378 | . |
---|
379 | . |
---|
380 | .TP |
---|
381 | .B -x \fR<env>\fP |
---|
382 | Export the specified environment variables to the remote nodes before |
---|
383 | executing the program. Only one environment variable can be specified |
---|
384 | per \fI-x\fP option. Existing environment variables can be specified |
---|
385 | or new variable names specified with corresponding values. For |
---|
386 | example: |
---|
387 | \fB%\fP mpirun -x DISPLAY -x OFILE=/tmp/out ... |
---|
388 | |
---|
389 | The parser for the \fI-x\fP option is not very sophisticated; it does |
---|
390 | not even understand quoted values. Users are advised to set variables |
---|
391 | in the environment, and then use \fI-x\fP to export (not define) them. |
---|
392 | . |
---|
393 | . |
---|
394 | . |
---|
395 | . |
---|
396 | .P |
---|
397 | Setting MCA parameters: |
---|
398 | . |
---|
399 | . |
---|
400 | .TP |
---|
401 | .B -gmca\fR,\fP --gmca \fR<key> <value>\fP |
---|
402 | Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is |
---|
403 | the parameter name; \fI<value>\fP is the parameter value. |
---|
404 | . |
---|
405 | . |
---|
406 | .TP |
---|
407 | .B -mca\fR,\fP --mca <key> <value> |
---|
408 | Send arguments to various MCA modules. See the "MCA" section, below. |
---|
409 | . |
---|
410 | . |
---|
411 | . |
---|
412 | . |
---|
413 | .P |
---|
414 | For debugging: |
---|
415 | . |
---|
416 | . |
---|
417 | .TP |
---|
418 | .B -debug\fR,\fP --debug |
---|
419 | Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP |
---|
420 | MCA parameter. |
---|
421 | . |
---|
422 | . |
---|
423 | .TP |
---|
424 | .B -debugger\fR,\fP --debugger |
---|
425 | Sequence of debuggers to search for when \fI--debug\fP is used (i.e. |
---|
426 | a synonym for \fIorte_base_user_debugger\fP MCA parameter). |
---|
427 | . |
---|
428 | . |
---|
429 | .TP |
---|
430 | .B -tv\fR,\fP --tv |
---|
431 | Launch processes under the TotalView debugger. |
---|
432 | Deprecated backwards compatibility flag. Synonym for \fI--debug\fP. |
---|
433 | . |
---|
434 | . |
---|
435 | . |
---|
436 | . |
---|
437 | .P |
---|
438 | There are also other options: |
---|
439 | . |
---|
440 | . |
---|
441 | .TP |
---|
442 | .B -aborted\fR,\fP --aborted \fR<#>\fP |
---|
443 | Set the maximum number of aborted processes to display. |
---|
444 | . |
---|
445 | . |
---|
446 | .TP |
---|
447 | .B --app \fR<appfile>\fP |
---|
448 | Provide an appfile, ignoring all other command line options. |
---|
449 | . |
---|
450 | . |
---|
451 | .TP |
---|
452 | .B -cf\fR,\fP --cartofile \fR<cartofile>\fP |
---|
453 | Provide a cartography file. |
---|
454 | . |
---|
455 | . |
---|
456 | .TP |
---|
457 | .B --hetero |
---|
458 | Indicates that multiple app_contexts are being provided that are a mix of 32/64-bit binaries. |
---|
459 | . |
---|
460 | . |
---|
461 | .TP |
---|
462 | .B -leave-session-attached\fR,\fP --leave-session-attached |
---|
463 | Do not detach OmpiRTE daemons used by this application. This allows error messages from the daemons |
---|
464 | as well as the underlying environment (e.g., when failing to launch a daemon) to be output. |
---|
465 | . |
---|
466 | . |
---|
467 | .TP |
---|
468 | .B -ompi-server\fR,\fP --ompi-server <uri or file> |
---|
469 | Specify the URI of the Open MPI server, or the name of the file (specified as file:filename) that |
---|
470 | contains that info. The Open MPI server is used to support multi-application data exchange via |
---|
471 | the MPI-2 MPI_Publish_name and MPI_Lookup_name functions. |
---|
472 | . |
---|
473 | . |
---|
474 | .TP |
---|
475 | .B -wait-for-server\fR,\fP --wait-for-server |
---|
476 | Pause mpirun before launching the job until ompi-server is detected. This |
---|
477 | is useful in scripts where ompi-server may be started in the background, followed immediately by |
---|
478 | an \fImpirun\fP command that wishes to connect to it. Mpirun will pause until either the specified |
---|
479 | ompi-server is contacted or the server-wait-time is exceeded. |
---|
480 | . |
---|
481 | . |
---|
482 | .TP |
---|
483 | .B -server-wait-time\fR,\fP --server-wait-time <secs> |
---|
484 | The max amount of time (in seconds) mpirun should wait for the ompi-server to start. The default |
---|
485 | is 10 seconds. |
---|
486 | . |
---|
487 | . |
---|
488 | . |
---|
489 | . |
---|
490 | .P |
---|
491 | The following options are useful for developers; they are not generally |
---|
492 | useful to most ORTE and/or MPI users: |
---|
493 | . |
---|
494 | .TP |
---|
495 | .B -d\fR,\fP --debug-devel |
---|
496 | Enable debugging of the OmpiRTE (the run-time layer in Open MPI). |
---|
497 | This is not generally useful for most users. |
---|
498 | . |
---|
499 | . |
---|
500 | .TP |
---|
501 | .B --debug-daemons |
---|
502 | Enable debugging of any OmpiRTE daemons used by this application. |
---|
503 | . |
---|
504 | . |
---|
505 | .TP |
---|
506 | .B --debug-daemons-file |
---|
507 | Enable debugging of any OmpiRTE daemons used by this application, storing |
---|
508 | output in files. |
---|
509 | . |
---|
510 | . |
---|
511 | .TP |
---|
512 | .B -launch-agent\fR,\fP --launch-agent |
---|
513 | Name of the executable that is to be used to start processes on the remote nodes. The default |
---|
514 | is "orted". This option can be used to test new daemon concepts, or to pass options back to the |
---|
515 | daemons without having mpirun itself see them. For example, specifying a launch agent of |
---|
516 | \fRorted -mca odls_base_verbose 5\fR allows the developer to ask the orted for debugging output |
---|
517 | without clutter from mpirun itself. |
---|
518 | . |
---|
519 | . |
---|
520 | .TP |
---|
521 | .B --noprefix |
---|
522 | Disable the automatic --prefix behavior |
---|
523 | . |
---|
524 | . |
---|
525 | .P |
---|
526 | There may be other options listed with \fImpirun --help\fP. |
---|
527 | . |
---|
528 | . |
---|
529 | .\" ************************** |
---|
530 | .\" Description Section |
---|
531 | .\" ************************** |
---|
532 | .SH DESCRIPTION |
---|
533 | . |
---|
534 | One invocation of \fImpirun\fP starts an MPI application running under Open |
---|
535 | MPI. If the application is single process multiple data (SPMD), the application |
---|
536 | can be specified on the \fImpirun\fP command line. |
---|
537 | |
---|
538 | If the application is multiple instruction multiple data (MIMD), comprising of |
---|
539 | multiple programs, the set of programs and argument can be specified in one of |
---|
540 | two ways: Extended Command Line Arguments, and Application Context. |
---|
541 | .PP |
---|
542 | An application context describes the MIMD program set including all arguments |
---|
543 | in a separate file. |
---|
544 | .\"See appcontext(5) for a description of the application context syntax. |
---|
545 | This file essentially contains multiple \fImpirun\fP command lines, less the |
---|
546 | command name itself. The ability to specify different options for different |
---|
547 | instantiations of a program is another reason to use an application context. |
---|
548 | .PP |
---|
549 | Extended command line arguments allow for the description of the application |
---|
550 | layout on the command line using colons (\fI:\fP) to separate the specification |
---|
551 | of programs and arguments. Some options are globally set across all specified |
---|
552 | programs (e.g. --hostfile), while others are specific to a single program |
---|
553 | (e.g. -np). |
---|
554 | . |
---|
555 | . |
---|
556 | . |
---|
557 | .SS Specifying Host Nodes |
---|
558 | . |
---|
559 | Host nodes can be identified on the \fImpirun\fP command line with the \fI-host\fP |
---|
560 | option or in a hostfile. |
---|
561 | . |
---|
562 | .PP |
---|
563 | For example, |
---|
564 | . |
---|
565 | .TP 4 |
---|
566 | mpirun -H aa,aa,bb ./a.out |
---|
567 | launches two processes on node aa and one on bb. |
---|
568 | . |
---|
569 | .PP |
---|
570 | Or, consider the hostfile |
---|
571 | . |
---|
572 | |
---|
573 | \fB%\fP cat myhostfile |
---|
574 | aa slots=2 |
---|
575 | bb slots=2 |
---|
576 | cc slots=2 |
---|
577 | |
---|
578 | . |
---|
579 | .PP |
---|
580 | Here, we list both the host names (aa, bb, and cc) but also how many "slots" |
---|
581 | there are for each. Slots indicate how many processes can potentially execute |
---|
582 | on a node. For best performance, the number of slots may be chosen to be the |
---|
583 | number of cores on the node or the number of processor sockets. If the hostfile |
---|
584 | does not provide slots information, a default of 1 is assumed. |
---|
585 | When running under resource managers (e.g., SLURM, Torque, etc.), |
---|
586 | Open MPI will obtain both the hostnames and the number of slots directly |
---|
587 | from the resource manger. |
---|
588 | . |
---|
589 | .PP |
---|
590 | . |
---|
591 | .TP 4 |
---|
592 | mpirun -hostfile myhostfile ./a.out |
---|
593 | will launch two processes on each of the three nodes. |
---|
594 | . |
---|
595 | .TP 4 |
---|
596 | mpirun -hostfile myhostfile -host aa ./a.out |
---|
597 | will launch two processes, both on node aa. |
---|
598 | . |
---|
599 | .TP 4 |
---|
600 | mpirun -hostfile myhostfile -host dd ./a.out |
---|
601 | will find no hosts to run on and abort with an error. |
---|
602 | That is, the specified host dd is not in the specified hostfile. |
---|
603 | . |
---|
604 | .SS Specifying Number of Processes |
---|
605 | . |
---|
606 | As we have just seen, the number of processes to run can be set using the |
---|
607 | hostfile. Other mechanisms exist. |
---|
608 | . |
---|
609 | .PP |
---|
610 | The number of processes launched can be specified as a multiple of the |
---|
611 | number of nodes or processor sockets available. For example, |
---|
612 | . |
---|
613 | .TP 4 |
---|
614 | mpirun -H aa,bb -npersocket 2 ./a.out |
---|
615 | launches processes 0-3 on node aa and process 4-7 on node bb, |
---|
616 | where aa and bb are both dual-socket nodes. |
---|
617 | The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option, |
---|
618 | which is discussed in a later section. |
---|
619 | . |
---|
620 | .TP 4 |
---|
621 | mpirun -H aa,bb -npernode 2 ./a.out |
---|
622 | launches processes 0-1 on node aa and processes 2-3 on node bb. |
---|
623 | . |
---|
624 | .TP 4 |
---|
625 | mpirun -H aa,bb -npernode 1 ./a.out |
---|
626 | launches one process per host node. |
---|
627 | . |
---|
628 | .TP 4 |
---|
629 | mpirun -H aa,bb -pernode ./a.out |
---|
630 | is the same as \fI-npernode\fP 1. |
---|
631 | . |
---|
632 | . |
---|
633 | .PP |
---|
634 | Another alternative is to specify the number of processes with the |
---|
635 | \fI-np\fP option. Consider now the hostfile |
---|
636 | . |
---|
637 | |
---|
638 | \fB%\fP cat myhostfile |
---|
639 | aa slots=4 |
---|
640 | bb slots=4 |
---|
641 | cc slots=4 |
---|
642 | |
---|
643 | . |
---|
644 | .PP |
---|
645 | Now, |
---|
646 | . |
---|
647 | .TP 4 |
---|
648 | mpirun -hostfile myhostfile -np 6 ./a.out |
---|
649 | will launch ranks 0-3 on node aa and ranks 4-5 on node bb. The remaining |
---|
650 | slots in the hostfile will not be used since the \fI-np\fP option indicated |
---|
651 | that only 6 processes should be launched. |
---|
652 | . |
---|
653 | .SS Mapping Processes to Nodes |
---|
654 | . |
---|
655 | The examples above illustrate the default mapping of process ranks |
---|
656 | to nodes. This mapping can also be controlled with various |
---|
657 | \fImpirun\fP options. Here, we consider the same hostfile as |
---|
658 | above with \fI-np\fP 6 again: |
---|
659 | . |
---|
660 | |
---|
661 | node aa node bb node cc |
---|
662 | |
---|
663 | mpirun 0 1 2 3 4 5 |
---|
664 | |
---|
665 | mpirun -loadbalance 0 1 2 3 4 5 |
---|
666 | |
---|
667 | mpirun -bynode 0 3 1 4 2 5 |
---|
668 | |
---|
669 | mpirun -nolocal 0 1 2 3 4 5 |
---|
670 | . |
---|
671 | .PP |
---|
672 | The \fI-loadbalance\fP option tries to spread processes out fairly among the |
---|
673 | nodes. |
---|
674 | . |
---|
675 | .PP |
---|
676 | The \fI-bynode\fP option does likewise but numbers the processes in "by node" |
---|
677 | in a round-robin fashion. |
---|
678 | . |
---|
679 | .PP |
---|
680 | The \fI-nolocal\fP option prevents any processes from being mapped onto the |
---|
681 | local host (in this case node aa). While \fImpirun\fP typically consumes |
---|
682 | few system resources, \fI-nolocal\fP can be helpful for launching very |
---|
683 | large jobs where \fImpirun\fP may actually need to use noticable amounts |
---|
684 | of memory and/or processing time. |
---|
685 | . |
---|
686 | .PP |
---|
687 | Just as \fI-np\fP can specify fewer processes than there are slots, it can |
---|
688 | also oversubscribe the slots. For example, with the same hostfile: |
---|
689 | . |
---|
690 | .TP 4 |
---|
691 | mpirun -hostfile myhostfile -np 14 ./a.out |
---|
692 | will launch processes 0-3 on node aa, 4-7 on bb, and 8-11 on cc. It will |
---|
693 | then add the remaining two processes to whichever nodes it chooses. |
---|
694 | . |
---|
695 | .PP |
---|
696 | One can also specify limits to oversubscription. For example, with the same |
---|
697 | hostfile: |
---|
698 | . |
---|
699 | .TP 4 |
---|
700 | mpirun -hostfile myhostfile -np 14 -nooversubscribe ./a.out |
---|
701 | will produce an error since \fI-nooversubscribe\fP prevents oversubscription. |
---|
702 | . |
---|
703 | .PP |
---|
704 | Limits to oversubscription can also be specified in the hostfile itself: |
---|
705 | . |
---|
706 | % cat myhostfile |
---|
707 | aa slots=4 max_slots=4 |
---|
708 | bb max_slots=4 |
---|
709 | cc slots=4 |
---|
710 | . |
---|
711 | .PP |
---|
712 | The \fImax_slots\fP field specifies such a limit. When it does, the |
---|
713 | \fIslots\fP value defaults to the limit. Now: |
---|
714 | . |
---|
715 | .TP 4 |
---|
716 | mpirun -hostfile myhostfile -np 14 ./a.out |
---|
717 | causes the first 12 processes to be launched as before, but the remaining |
---|
718 | two processes will be forced onto node cc. The other two nodes are |
---|
719 | protected by the hostfile against oversubscription by this job. |
---|
720 | . |
---|
721 | .PP |
---|
722 | Using the \fI--nooversubscribe\fR option can be helpful since Open MPI |
---|
723 | currently does not get "max_slots" values from the resource manager. |
---|
724 | . |
---|
725 | .PP |
---|
726 | Of course, \fI-np\fP can also be used with the \fI-H\fP or \fI-host\fP |
---|
727 | option. For example, |
---|
728 | . |
---|
729 | .TP 4 |
---|
730 | mpirun -H aa,bb -np 8 ./a.out |
---|
731 | launches 8 processes. Since only two hosts are specified, after the first |
---|
732 | two processes are mapped, one to aa and one to bb, the remaining processes |
---|
733 | oversubscribe the specified hosts. |
---|
734 | . |
---|
735 | .PP |
---|
736 | And here is a MIMD example: |
---|
737 | . |
---|
738 | .TP 4 |
---|
739 | mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime |
---|
740 | will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2 |
---|
741 | each running \fIuptime\fP on nodes bb and cc, respectively. |
---|
742 | . |
---|
743 | .SS Process Binding |
---|
744 | . |
---|
745 | Processes may be bound to specific resources on a node. This can |
---|
746 | improve performance if the operating system is placing processes |
---|
747 | suboptimally. For example, it might oversubscribe some multi-core |
---|
748 | processor sockets, leaving other sockets idle; this can lead |
---|
749 | processes to contend unnecessarily for common resources. Or, it |
---|
750 | might spread processes out too widely; this can be suboptimal if |
---|
751 | application performance is sensitive to interprocess communication |
---|
752 | costs. Binding can also keep the operating system from migrating |
---|
753 | processes excessively, regardless of how optimally those processes |
---|
754 | were placed to begin with. |
---|
755 | . |
---|
756 | .PP |
---|
757 | To bind processes, one must first associate them with the resources |
---|
758 | on which they should run. For example, the \fI-bycore\fP option |
---|
759 | associates the processes on a node with successive cores. Or, |
---|
760 | \fI-bysocket\fP associates the processes with successive processor sockets, |
---|
761 | cycling through the sockets in a round-robin fashion if necessary. |
---|
762 | And \fI-cpus-per-proc\fP indicates how many cores to bind per process. |
---|
763 | . |
---|
764 | .PP |
---|
765 | But, such association is meaningless unless the processes are actually |
---|
766 | bound to those resources. The binding option specifies the granularity |
---|
767 | of binding -- say, with \fI-bind-to-core\fP or \fI-bind-to-socket\fP. |
---|
768 | One can also turn binding off with \fI-bind-to-none\fP, which is |
---|
769 | typically the default. |
---|
770 | . |
---|
771 | .PP |
---|
772 | Finally, \fI-report-bindings\fP can be used to report bindings. |
---|
773 | . |
---|
774 | .PP |
---|
775 | As an example, consider a node with two processor sockets, each comprising |
---|
776 | four cores. We run \fImpirun\fP with \fI-np 4 -report-bindings\fP and |
---|
777 | the following additional options: |
---|
778 | . |
---|
779 | |
---|
780 | % mpirun ... -bycore -bind-to-core |
---|
781 | [...] ... binding child [...,0] to cpus 0001 |
---|
782 | [...] ... binding child [...,1] to cpus 0002 |
---|
783 | [...] ... binding child [...,2] to cpus 0004 |
---|
784 | [...] ... binding child [...,3] to cpus 0008 |
---|
785 | |
---|
786 | % mpirun ... -bysocket -bind-to-socket |
---|
787 | [...] ... binding child [...,0] to socket 0 cpus 000f |
---|
788 | [...] ... binding child [...,1] to socket 1 cpus 00f0 |
---|
789 | [...] ... binding child [...,2] to socket 0 cpus 000f |
---|
790 | [...] ... binding child [...,3] to socket 1 cpus 00f0 |
---|
791 | |
---|
792 | % mpirun ... -cpus-per-proc 2 -bind-to-core |
---|
793 | [...] ... binding child [...,0] to cpus 0003 |
---|
794 | [...] ... binding child [...,1] to cpus 000c |
---|
795 | [...] ... binding child [...,2] to cpus 0030 |
---|
796 | [...] ... binding child [...,3] to cpus 00c0 |
---|
797 | |
---|
798 | % mpirun ... -bind-to-none |
---|
799 | . |
---|
800 | .PP |
---|
801 | Here, \fI-report-bindings\fP shows the binding of each process as a mask. |
---|
802 | In the first case, the processes bind to successive cores as indicated by |
---|
803 | the masks 0001, 0002, 0004, and 0008. In the second case, processes bind |
---|
804 | to all cores on successive sockets as indicated by the masks 000f and 00f0. |
---|
805 | The processes cycle through the processor sockets in a round-robin fashion |
---|
806 | as many times as are needed. In the third case, the masks show us that |
---|
807 | 2 cores have been bind per process. In the fourth case, binding is |
---|
808 | turned off and no bindings are reported. |
---|
809 | . |
---|
810 | .PP |
---|
811 | Open MPI's support for process binding depends on the underlying |
---|
812 | operating system. Therefore, processing binding may not be available |
---|
813 | on every system. |
---|
814 | . |
---|
815 | .PP |
---|
816 | Process binding can also be set with MCA parameters. |
---|
817 | Their usage is less convenient than that of \fImpirun\fP options. |
---|
818 | On the other hand, MCA parameters can be set not only on the \fImpirun\fP |
---|
819 | command line, but alternatively in a system or user mca-params.conf file |
---|
820 | or as environment variables, as described in the MCA section below. |
---|
821 | The correspondences are: |
---|
822 | . |
---|
823 | |
---|
824 | mpirun option MCA parameter key value |
---|
825 | |
---|
826 | -bycore rmaps_base_schedule_policy core |
---|
827 | -bysocket rmaps_base_schedule_policy socket |
---|
828 | -bind-to-core orte_process_binding core |
---|
829 | -bind-to-socket orte_process_binding socket |
---|
830 | -bind-to-none orte_process_binding none |
---|
831 | . |
---|
832 | .PP |
---|
833 | The \fIorte_process_binding\fP value can also take on the |
---|
834 | \fI:if-avail\fP attribute. This attribute means that processes |
---|
835 | will be bound only if this is supported on the underlying |
---|
836 | operating system. Without the attribute, if there is no |
---|
837 | such support, the binding request results in an error. |
---|
838 | For example, you could have |
---|
839 | . |
---|
840 | |
---|
841 | % cat $HOME/.openmpi/mca-params.conf |
---|
842 | rmaps_base_schedule_policy = socket |
---|
843 | orte_process_binding = socket:if-avail |
---|
844 | . |
---|
845 | . |
---|
846 | .SS Rankfiles |
---|
847 | . |
---|
848 | Rankfiles provide a means for specifying detailed information about |
---|
849 | how process ranks should be mapped to nodes and how they should be bound. |
---|
850 | Consider the following: |
---|
851 | . |
---|
852 | |
---|
853 | cat myrankfile |
---|
854 | rank 0=aa slot=1:0-2 |
---|
855 | rank 1=bb slot=0:0,1 |
---|
856 | rank 2=cc slot=1-2 |
---|
857 | mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out |
---|
858 | . |
---|
859 | So that |
---|
860 | |
---|
861 | Rank 0 runs on node aa, bound to socket 1, cores 0-2. |
---|
862 | Rank 1 runs on node bb, bound to socket 0, cores 0 and 1. |
---|
863 | Rank 2 runs on node cc, bound to cores 1 and 2. |
---|
864 | . |
---|
865 | . |
---|
866 | .SS Application Context or Executable Program? |
---|
867 | . |
---|
868 | To distinguish the two different forms, \fImpirun\fP |
---|
869 | looks on the command line for \fI--app\fP option. If |
---|
870 | it is specified, then the file named on the command line is |
---|
871 | assumed to be an application context. If it is not |
---|
872 | specified, then the file is assumed to be an executable program. |
---|
873 | . |
---|
874 | . |
---|
875 | . |
---|
876 | .SS Locating Files |
---|
877 | . |
---|
878 | If no relative or absolute path is specified for a file, Open |
---|
879 | MPI will first look for files by searching the directories specified |
---|
880 | by the \fI--path\fP option. If there is no \fI--path\fP option set or |
---|
881 | if the file is not found at the \fI--path\fP location, then Open MPI |
---|
882 | will search the user's PATH environment variable as defined on the |
---|
883 | source node(s). |
---|
884 | .PP |
---|
885 | If a relative directory is specified, it must be relative to the initial |
---|
886 | working directory determined by the specific starter used. For example when |
---|
887 | using the rsh or ssh starters, the initial directory is $HOME by default. Other |
---|
888 | starters may set the initial directory to the current working directory from |
---|
889 | the invocation of \fImpirun\fP. |
---|
890 | . |
---|
891 | . |
---|
892 | . |
---|
893 | .SS Current Working Directory |
---|
894 | . |
---|
895 | The \fI\-wdir\fP mpirun option (and its synonym, \fI\-wd\fP) allows |
---|
896 | the user to change to an arbitrary directory before the program is |
---|
897 | invoked. It can also be used in application context files to specify |
---|
898 | working directories on specific nodes and/or for specific |
---|
899 | applications. |
---|
900 | .PP |
---|
901 | If the \fI\-wdir\fP option appears both in a context file and on the |
---|
902 | command line, the context file directory will override the command |
---|
903 | line value. |
---|
904 | .PP |
---|
905 | If the \fI-wdir\fP option is specified, Open MPI will attempt to |
---|
906 | change to the specified directory on all of the remote nodes. If this |
---|
907 | fails, \fImpirun\fP will abort. |
---|
908 | .PP |
---|
909 | If the \fI-wdir\fP option is \fBnot\fP specified, Open MPI will send |
---|
910 | the directory name where \fImpirun\fP was invoked to each of the |
---|
911 | remote nodes. The remote nodes will try to change to that |
---|
912 | directory. If they are unable (e.g., if the directory does not exit on |
---|
913 | that node), then Open MPI will use the default directory determined by |
---|
914 | the starter. |
---|
915 | .PP |
---|
916 | All directory changing occurs before the user's program is invoked; it |
---|
917 | does not wait until \fIMPI_INIT\fP is called. |
---|
918 | . |
---|
919 | . |
---|
920 | . |
---|
921 | .SS Standard I/O |
---|
922 | . |
---|
923 | Open MPI directs UNIX standard input to /dev/null on all processes |
---|
924 | except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process |
---|
925 | inherits standard input from \fImpirun\fP. |
---|
926 | .B Note: |
---|
927 | The node that invoked \fImpirun\fP need not be the same as the node where the |
---|
928 | MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection of |
---|
929 | \fImpirun\fP's standard input to the rank 0 process. |
---|
930 | .PP |
---|
931 | Open MPI directs UNIX standard output and error from remote nodes to the node |
---|
932 | that invoked \fImpirun\fP and prints it on the standard output/error of |
---|
933 | \fImpirun\fP. |
---|
934 | Local processes inherit the standard output/error of \fImpirun\fP and transfer |
---|
935 | to it directly. |
---|
936 | .PP |
---|
937 | Thus it is possible to redirect standard I/O for Open MPI applications by |
---|
938 | using the typical shell redirection procedure on \fImpirun\fP. |
---|
939 | |
---|
940 | \fB%\fP mpirun -np 2 my_app < my_input > my_output |
---|
941 | |
---|
942 | Note that in this example \fIonly\fP the MPI_COMM_WORLD rank 0 process will |
---|
943 | receive the stream from \fImy_input\fP on stdin. The stdin on all the other |
---|
944 | nodes will be tied to /dev/null. However, the stdout from all nodes will |
---|
945 | be collected into the \fImy_output\fP file. |
---|
946 | . |
---|
947 | . |
---|
948 | . |
---|
949 | .SS Signal Propagation |
---|
950 | . |
---|
951 | When orterun receives a SIGTERM and SIGINT, it will attempt to kill |
---|
952 | the entire job by sending all processes in the job a SIGTERM, waiting |
---|
953 | a small number of seconds, then sending all processes in the job a |
---|
954 | SIGKILL. |
---|
955 | . |
---|
956 | .PP |
---|
957 | SIGUSR1 and SIGUSR2 signals received by orterun are propagated to |
---|
958 | all processes in the job. |
---|
959 | . |
---|
960 | .PP |
---|
961 | One can turn on forwarding of SIGSTOP and SIGCONT to the program executed |
---|
962 | by mpirun by setting the MCA parameter orte_forward_job_control to 1. |
---|
963 | A SIGTSTOP signal to mpirun will then cause a SIGSTOP signal to be sent |
---|
964 | to all of the programs started by mpirun and likewise a SIGCONT signal |
---|
965 | to mpirun will cause a SIGCONT sent. |
---|
966 | . |
---|
967 | .PP |
---|
968 | Other signals are not currently propagated |
---|
969 | by orterun. |
---|
970 | . |
---|
971 | . |
---|
972 | .SS Process Termination / Signal Handling |
---|
973 | . |
---|
974 | During the run of an MPI application, if any rank dies abnormally |
---|
975 | (either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a |
---|
976 | signal), \fImpirun\fP will print out an error message and kill the rest of the |
---|
977 | MPI application. |
---|
978 | .PP |
---|
979 | User signal handlers should probably avoid trying to cleanup MPI state |
---|
980 | (Open MPI is, currently, neither thread-safe nor async-signal-safe). |
---|
981 | For example, if a segmentation fault occurs in \fIMPI_SEND\fP (perhaps because |
---|
982 | a bad buffer was passed in) and a user signal handler is invoked, if this user |
---|
983 | handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since |
---|
984 | Open MPI was already "in" MPI when the error occurred. Since \fImpirun\fP |
---|
985 | will notice that the process died due to a signal, it is probably not |
---|
986 | necessary (and safest) for the user to only clean up non-MPI state. |
---|
987 | . |
---|
988 | . |
---|
989 | . |
---|
990 | .SS Process Environment |
---|
991 | . |
---|
992 | Processes in the MPI application inherit their environment from the |
---|
993 | Open RTE daemon upon the node on which they are running. The |
---|
994 | environment is typically inherited from the user's shell. On remote |
---|
995 | nodes, the exact environment is determined by the boot MCA module |
---|
996 | used. The \fIrsh\fR launch module, for example, uses either |
---|
997 | \fIrsh\fR/\fIssh\fR to launch the Open RTE daemon on remote nodes, and |
---|
998 | typically executes one or more of the user's shell-setup files before |
---|
999 | launching the Open RTE daemon. When running dynamically linked |
---|
1000 | applications which require the \fILD_LIBRARY_PATH\fR environment |
---|
1001 | variable to be set, care must be taken to ensure that it is correctly |
---|
1002 | set when booting Open MPI. |
---|
1003 | .PP |
---|
1004 | See the "Remote Execution" section for more details. |
---|
1005 | . |
---|
1006 | . |
---|
1007 | .SS Remote Execution |
---|
1008 | . |
---|
1009 | Open MPI requires that the \fIPATH\fR environment variable be set to |
---|
1010 | find executables on remote nodes (this is typically only necessary in |
---|
1011 | \fIrsh\fR- or \fIssh\fR-based environments -- batch/scheduled |
---|
1012 | environments typically copy the current environment to the execution |
---|
1013 | of remote jobs, so if the current environment has \fIPATH\fR and/or |
---|
1014 | \fILD_LIBRARY_PATH\fR set properly, the remote nodes will also have it |
---|
1015 | set properly). If Open MPI was compiled with shared library support, |
---|
1016 | it may also be necessary to have the \fILD_LIBRARY_PATH\fR environment |
---|
1017 | variable set on remote nodes as well (especially to find the shared |
---|
1018 | libraries required to run user MPI applications). |
---|
1019 | .PP |
---|
1020 | However, it is not always desirable or possible to edit shell |
---|
1021 | startup files to set \fIPATH\fR and/or \fILD_LIBRARY_PATH\fR. The |
---|
1022 | \fI--prefix\fR option is provided for some simple configurations where |
---|
1023 | this is not possible. |
---|
1024 | .PP |
---|
1025 | The \fI--prefix\fR option takes a single argument: the base directory |
---|
1026 | on the remote node where Open MPI is installed. Open MPI will use |
---|
1027 | this directory to set the remote \fIPATH\fR and \fILD_LIBRARY_PATH\fR |
---|
1028 | before executing any Open MPI or user applications. This allows |
---|
1029 | running Open MPI jobs without having pre-configured the \fIPATH\fR and |
---|
1030 | \fILD_LIBRARY_PATH\fR on the remote nodes. |
---|
1031 | .PP |
---|
1032 | Open MPI adds the basename of the current |
---|
1033 | node's "bindir" (the directory where Open MPI's executables are |
---|
1034 | installed) to the prefix and uses that to set the \fIPATH\fR on the |
---|
1035 | remote node. Similarly, Open MPI adds the basename of the current |
---|
1036 | node's "libdir" (the directory where Open MPI's libraries are |
---|
1037 | installed) to the prefix and uses that to set the |
---|
1038 | \fILD_LIBRARY_PATH\fR on the remote node. For example: |
---|
1039 | .TP 15 |
---|
1040 | Local bindir: |
---|
1041 | /local/node/directory/bin |
---|
1042 | .TP |
---|
1043 | Local libdir: |
---|
1044 | /local/node/directory/lib64 |
---|
1045 | .PP |
---|
1046 | If the following command line is used: |
---|
1047 | |
---|
1048 | \fB%\fP mpirun --prefix /remote/node/directory |
---|
1049 | |
---|
1050 | Open MPI will add "/remote/node/directory/bin" to the \fIPATH\fR |
---|
1051 | and "/remote/node/directory/lib64" to the \fLD_LIBRARY_PATH\fR on the |
---|
1052 | remote node before attempting to execute anything. |
---|
1053 | .PP |
---|
1054 | Note that \fI--prefix\fR can be set on a per-context basis, allowing |
---|
1055 | for different values for different nodes. |
---|
1056 | .PP |
---|
1057 | The \fI--prefix\fR option is not sufficient if the installation paths |
---|
1058 | on the remote node are different than the local node (e.g., if "/lib" |
---|
1059 | is used on the local node, but "/lib64" is used on the remote node), |
---|
1060 | or if the installation paths are something other than a subdirectory |
---|
1061 | under a common prefix. |
---|
1062 | .PP |
---|
1063 | Note that executing \fImpirun\fR via an absolute pathname is |
---|
1064 | equivalent to specifying \fI--prefix\fR without the last subdirectory |
---|
1065 | in the absolute pathname to \fImpirun\fR. For example: |
---|
1066 | |
---|
1067 | \fB%\fP /usr/local/bin/mpirun ... |
---|
1068 | |
---|
1069 | is equivalent to |
---|
1070 | |
---|
1071 | \fB%\fP mpirun --prefix /usr/local |
---|
1072 | . |
---|
1073 | . |
---|
1074 | . |
---|
1075 | .SS Exported Environment Variables |
---|
1076 | . |
---|
1077 | All environment variables that are named in the form OMPI_* will automatically |
---|
1078 | be exported to new processes on the local and remote nodes. |
---|
1079 | The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment |
---|
1080 | variables to the new processes. While the syntax of the \fI\-x\fP |
---|
1081 | option allows the definition of new variables, note that the parser |
---|
1082 | for this option is currently not very sophisticated - it does not even |
---|
1083 | understand quoted values. Users are advised to set variables in the |
---|
1084 | environment and use \fI\-x\fP to export them; not to define them. |
---|
1085 | . |
---|
1086 | . |
---|
1087 | . |
---|
1088 | .SS Setting MCA Parameters |
---|
1089 | . |
---|
1090 | The \fI-mca\fP switch allows the passing of parameters to various MCA |
---|
1091 | (Modular Component Architecture) modules. |
---|
1092 | .\" Open MPI's MCA modules are described in detail in ompimca(7). |
---|
1093 | MCA modules have direct impact on MPI programs because they allow tunable |
---|
1094 | parameters to be set at run time (such as which BTL communication device driver |
---|
1095 | to use, what parameters to pass to that BTL, etc.). |
---|
1096 | .PP |
---|
1097 | The \fI-mca\fP switch takes two arguments: \fI<key>\fP and \fI<value>\fP. |
---|
1098 | The \fI<key>\fP argument generally specifies which MCA module will receive the value. |
---|
1099 | For example, the \fI<key>\fP "btl" is used to select which BTL to be used for |
---|
1100 | transporting MPI messages. The \fI<value>\fP argument is the value that is |
---|
1101 | passed. |
---|
1102 | For example: |
---|
1103 | . |
---|
1104 | .TP 4 |
---|
1105 | mpirun -mca btl tcp,self -np 1 foo |
---|
1106 | Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of |
---|
1107 | "foo" an allocated node. |
---|
1108 | . |
---|
1109 | .TP |
---|
1110 | mpirun -mca btl self -np 1 foo |
---|
1111 | Tells Open MPI to use the "self" BTL, and to run a single copy of "foo" an |
---|
1112 | allocated node. |
---|
1113 | .\" And so on. Open MPI's BTL MCA modules are described in ompimca_btl(7). |
---|
1114 | .PP |
---|
1115 | The \fI-mca\fP switch can be used multiple times to specify different |
---|
1116 | \fI<key>\fP and/or \fI<value>\fP arguments. If the same \fI<key>\fP is |
---|
1117 | specified more than once, the \fI<value>\fPs are concatenated with a comma |
---|
1118 | (",") separating them. |
---|
1119 | .PP |
---|
1120 | Note that the \fI-mca\fP switch is simply a shortcut for setting environment variables. |
---|
1121 | The same effect may be accomplished by setting corresponding environment |
---|
1122 | variables before running \fImpirun\fP. |
---|
1123 | The form of the environment variables that Open MPI sets is: |
---|
1124 | |
---|
1125 | OMPI_MCA_<key>=<value> |
---|
1126 | .PP |
---|
1127 | Thus, the \fI-mca\fP switch overrides any previously set environment |
---|
1128 | variables. The \fI-mca\fP settings similarly override MCA parameters set |
---|
1129 | in the |
---|
1130 | $OPAL_PREFIX/etc/openmpi-mca-params.conf or $HOME/.openmpi/mca-params.conf |
---|
1131 | file. |
---|
1132 | . |
---|
1133 | .PP |
---|
1134 | Unknown \fI<key>\fP arguments are still set as |
---|
1135 | environment variable -- they are not checked (by \fImpirun\fP) for correctness. |
---|
1136 | Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it |
---|
1137 | depends on the specific MCA module. |
---|
1138 | .PP |
---|
1139 | To find the available component types under the MCA architecture, or to find the |
---|
1140 | available parameters for a specific component, use the \fIompi_info\fP command. |
---|
1141 | See the \fIompi_info(1)\fP man page for detailed information on the command. |
---|
1142 | . |
---|
1143 | .\" ************************** |
---|
1144 | .\" Examples Section |
---|
1145 | .\" ************************** |
---|
1146 | .SH EXAMPLES |
---|
1147 | Be sure also to see the examples throughout the sections above. |
---|
1148 | . |
---|
1149 | .TP 4 |
---|
1150 | mpirun -np 4 -mca btl ib,tcp,self prog1 |
---|
1151 | Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport |
---|
1152 | of MPI messages. |
---|
1153 | . |
---|
1154 | . |
---|
1155 | .TP 4 |
---|
1156 | mpirun -np 4 -mca btl tcp,sm,self |
---|
1157 | .br |
---|
1158 | --mca btl_tcp_if_include ce0 prog1 |
---|
1159 | .br |
---|
1160 | Run 4 copies of prog1 using the "tcp", "sm" and "self" BTLs for the transport of |
---|
1161 | MPI messages, with TCP using only the ce0 interface to communicate. Note that |
---|
1162 | other BTLs have similar if_include MCA parameters. |
---|
1163 | . |
---|
1164 | .\" ************************** |
---|
1165 | .\" Diagnostics Section |
---|
1166 | .\" ************************** |
---|
1167 | . |
---|
1168 | .\" .SH DIAGNOSTICS |
---|
1169 | .\".TP 4 |
---|
1170 | .\"Error Msg: |
---|
1171 | .\"Description |
---|
1172 | . |
---|
1173 | .\" ************************** |
---|
1174 | .\" Return Value Section |
---|
1175 | .\" ************************** |
---|
1176 | . |
---|
1177 | .SH RETURN VALUE |
---|
1178 | . |
---|
1179 | \fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling |
---|
1180 | MPI_FINALIZE. A non-zero value is returned if an internal error occurred in |
---|
1181 | mpirun, or one or more ranks exited before calling MPI_FINALIZE. If an |
---|
1182 | internal error occurred in mpirun, the corresponding error code is returned. |
---|
1183 | In the event that one or more ranks exit before calling MPI_FINALIZE, the |
---|
1184 | return value of the rank of the process that \fImpirun\fP first notices died |
---|
1185 | before calling MPI_FINALIZE will be returned. Note that, in general, this will |
---|
1186 | be the first rank that died but is not guaranteed to be so. |
---|
1187 | . |
---|
1188 | .\" ************************** |
---|
1189 | .\" See Also Section |
---|
1190 | .\" ************************** |
---|
1191 | . |
---|
1192 | .\" .SH SEE ALSO |
---|
1193 | .\" orted(1), ompi-server(1) |
---|