source: proiecte/hpl/openmpi_compiled/share/man/man1/orterun.1 @ 97

Last change on this file since 97 was 97, checked in by (none), 14 years ago

Adding compiled files

File size: 37.9 KB
Line 
1.\" Copyright (c) 2009      Cisco Systems, Inc.  All rights reserved.
2.\" Copyright (c) 2008-2009 Sun Microsystems, Inc.  All rights reserved.
3.\"
4.\" Man page for ORTE's orterun command
5.\"
6.\" .TH name     section center-footer   left-footer  center-header
7.TH MPIRUN 1 "Dec 08, 2009" "1.4" "Open MPI"
8.\" **************************
9.\"    Name Section
10.\" **************************
11.SH NAME
12.
13orterun, mpirun, mpiexec \- Execute serial and parallel jobs in Open MPI.
14
15.B Note:
16\fImpirun\fP, \fImpiexec\fP, and \fIorterun\fP are all synonyms for each
17other.  Using any of the names will produce the same behavior.
18.
19.\" **************************
20.\"    Synopsis Section
21.\" **************************
22.SH SYNOPSIS
23.
24.PP
25Single Process Multiple Data (SPMD) Model:
26
27.B mpirun
28[ options ]
29.B <program>
30[ <args> ]
31.P
32
33Multiple Instruction Multiple Data (MIMD) Model:
34
35.B mpirun
36[ global_options ]
37       [ local_options1 ]
38.B <program1>
39[ <args1> ] :
40       [ local_options2 ]
41.B <program2>
42[ <args2> ] :
43       ... :
44       [ local_optionsN ]
45.B <programN>
46[ <argsN> ]
47.P
48
49Note that in both models, invoking \fImpirun\fP via an absolute path
50name is equivalent to specifying the \fI--prefix\fP option with a
51\fI<dir>\fR value equivalent to the directory where \fImpirun\fR
52resides, minus its last subdirectory.  For example:
53
54    \fB%\fP /usr/local/bin/mpirun ...
55
56is equivalent to
57
58    \fB%\fP mpirun --prefix /usr/local
59
60.
61.\" **************************
62.\"    Quick Summary Section
63.\" **************************
64.SH QUICK SUMMARY
65.
66If you are simply looking for how to run an MPI application, you
67probably want to use a command line of the following form:
68
69    \fB%\fP mpirun [ -np X ] [ --hostfile <filename> ]  <program>
70
71This will run X copies of \fI<program>\fR in your current run-time
72environment (if running under a supported resource manager, Open MPI's
73\fImpirun\fR will usually automatically use the corresponding resource manager
74process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
75which require the use of a hostfile, or will default to running all X
76copies on the localhost), scheduling (by default) in a round-robin fashion by
77CPU slot.  See the rest of this page for more details.
78.
79.\" **************************
80.\"    Options Section
81.\" **************************
82.SH OPTIONS
83.
84.I mpirun
85will send the name of the directory where it was invoked on the local
86node to each of the remote nodes, and attempt to change to that
87directory.  See the "Current Working Directory" section below for further
88details.
89.\"
90.\" Start options listing
91.\"    Indent 10 characters from start of first column to start of second column
92.TP 10
93.B <program>
94The program executable. This is identified as the first non-recognized argument
95to mpirun.
96.
97.
98.TP
99.B <args>
100Pass these run-time arguments to every new process.  These must always
101be the last arguments to \fImpirun\fP. If an app context file is used,
102\fI<args>\fP will be ignored.
103.
104.
105.TP
106.B -h\fR,\fP --help
107Display help for this command
108.
109.
110.TP
111.B -q\fR,\fP --quiet
112Suppress informative messages from orterun during application execution.
113.
114.
115.TP
116.B -v\fR,\fP --verbose
117Be verbose
118.
119.
120.TP
121.B -V\fR,\fP --version
122Print version number.  If no other arguments are given, this will also
123cause orterun to exit.
124.
125.
126.
127.
128.P
129To specify which hosts (nodes) of the cluster to run on:
130.
131.
132.TP
133.B -H\fR,\fP -host\fR,\fP --host \fR<host1,host2,...,hostN>\fP
134List of hosts on which to invoke processes.
135.
136.
137.TP
138.B
139-hostfile\fR,\fP --hostfile \fR<hostfile>\fP
140Provide a hostfile to use.
141.\" JJH - Should have man page for how to format a hostfile properly.
142.
143.
144.TP
145.B -machinefile\fR,\fP --machinefile \fR<machinefile>\fP
146Synonym for \fI-hostfile\fP.
147.
148.
149.
150.
151.P
152To specify the number of processes to launch:
153.
154.
155.TP
156.B -c\fR,\fP -n\fR,\fP --n\fR,\fP -np \fR<#>\fP
157Run this many copies of the program on the given nodes.  This option
158indicates that the specified file is an executable program and not an
159application context. If no value is provided for the number of copies to
160execute (i.e., neither the "-np" nor its synonyms are provided on the command
161line), Open MPI will automatically execute a copy of the program on
162each process slot (see below for description of a "process slot"). This
163feature, however, can only be used in the SPMD model and will return an
164error (without beginning execution of the application) otherwise.
165.
166.
167.TP
168.B -npersocket\fR,\fP --npersocket <#persocket>
169On each node, launch this many processes times the number of processor
170sockets on the node.
171The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option.
172.
173.
174.TP
175.B -npernode\fR,\fP --npernode <#pernode>
176On each node, launch this many processes.
177.
178.
179.TP
180.B -pernode\fR,\fP --pernode
181On each node, launch one process -- equivalent to \fI-npernode\fP 1.
182.
183.
184.
185.
186.P
187To map processes to nodes:
188.
189.
190.TP
191.B -loadbalance\fR,\fP --loadbalance
192Uniform distribution of ranks across all nodes. See more detailed description below.
193.
194.TP
195.B -nolocal\fR,\fP --nolocal
196Do not run any copies of the launched application on the same node as
197orterun is running.  This option will override listing the localhost
198with \fB--host\fR or any other host-specifying mechanism.
199.
200.TP
201.B -nooversubscribe\fR,\fP --nooversubscribe
202Do not oversubscribe any nodes; error (without starting any processes)
203if the requested number of processes would cause oversubscription.
204This option implicitly sets "max_slots" equal to the "slots" value for
205each node.
206.
207.TP
208.B -bynode\fR,\fP --bynode
209Launch processes one per node, cycling by node in a round-robin
210fashion.  This spreads processes evenly among nodes and assigns
211ranks in a round-robin, "by node" manner.
212.
213.
214.
215.
216.P
217For process binding:
218.
219.TP
220.B -bycore\fR,\fP --bycore
221Associate processes with successive cores
222if used with one of the \fI-bind-to-*\fP options.
223.
224.TP
225.B -bysocket\fR,\fP --bysocket
226Associate processes with successive processor sockets
227if used with one of the \fI-bind-to-*\fP options.
228.
229.TP
230.B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc>
231Use the number of cores per process
232if used with one of the \fI-bind-to-*\fP options.
233.
234.TP
235.B -cpus-per-rank\fR,\fP --cpus-per-rank <#perrank>
236Alias for \fI-cpus-per-proc\fP.
237.
238.TP
239.B -bind-to-core\fR,\fP --bind-to-core
240Bind processes to cores.
241.
242.TP
243.B -bind-to-socket\fR,\fP --bind-to-socket
244Bind processes to processor sockets.
245.
246.TP
247.B -bind-to-none\fR,\fP --bind-to-none
248Do not bind processes.  (Default.)
249.
250.TP
251.B -report-bindings\fR,\fP --report-bindings
252Report any bindings for launched processes.
253.
254.TP
255.B -slot-list\fR,\fP --slot-list <slots>
256List of processor IDs to be used for binding MPI processes. The specified bindings will
257be applied to all MPI processes. See explanation below for syntax.
258.
259.
260.
261.
262.P
263For rankfiles:
264.
265.
266.TP
267.B -rf\fR,\fP --rankfile <rankfile>
268Provide a rankfile file.
269.
270.
271.
272.
273.P
274To manage standard I/O:
275.
276.
277.TP
278.B -output-filename\fR,\fP --output-filename \fR<filename>\fP
279Redirect the stdout, stderr, and stddiag of all ranks to a rank-unique version of
280the specified filename. Any directories in the filename will automatically be created.
281Each output file will consist of filename.rank, where the rank will be left-filled with
282zero's for correct ordering in listings.
283.
284.
285.TP
286.B -stdin\fR,\fP --stdin <rank>
287The MPI rank that is to receive stdin. The default is to forward stdin to rank=0, but this
288option can be used to forward stdin to any rank. It is also acceptable to specify \fInone\fP,
289indicating that no ranks are to receive stdin.
290.
291.
292.TP
293.B -tag-output\fR,\fP --tag-output
294Tag each line of output to stdout, stderr, and stddiag with \fB[jobid, rank]<stdxxx>\fP indicating the process jobid
295and rank that generated the output, and the channel which generated it.
296.
297.
298.TP
299.B -timestamp-output\fR,\fP --timestamp-output
300Timestamp each line of output to stdout, stderr, and stddiag.
301.
302.
303.TP
304.B -xml\fR,\fP --xml
305Provide all output to stdout, stderr, and stddiag in an xml format.
306.
307.
308.TP
309.B -xterm\fR,\fP --xterm \fR<ranks>\fP
310Display the specified ranks in separate xterm windows. The ranks are specified
311as a comma-separated list of ranges, with a -1 indicating all. A separate
312window will be created for each specified rank.
313.B Note:
314In some environments, xterm may require that the executable be in the user's
315path, or be specified in absolute or relative terms. Thus, it may be necessary
316to specify a local executable as "./foo" instead of just "foo". If xterm fails to
317find the executable, mpirun will hang, but still respond correctly to a ctrl-c.
318If this happens, please check that the executable is being specified correctly
319and try again.
320.
321.
322.
323.
324.P
325To manage files and runtime environment:
326.
327.
328.TP
329.B -path\fR,\fP --path \fR<path>\fP
330<path> that will be used when attempting to locate the requested
331executables.  This is used prior to using the local PATH setting.
332.
333.
334.TP
335.B --prefix \fR<dir>\fP
336Prefix directory that will be used to set the \fIPATH\fR and
337\fILD_LIBRARY_PATH\fR on the remote node before invoking Open MPI or
338the target process.  See the "Remote Execution" section, below.
339.
340.
341.TP
342.B --preload-binary
343Copy the specified executable(s) to remote machines prior to starting remote processes. The
344executables will be copied to the Open MPI session directory and will be deleted upon
345completion of the job.
346.
347.
348.TP
349.B --preload-files <files>
350Preload the comma separated list of files to the current working directory of the remote
351machines where processes will be launched prior to starting those processes.
352.
353.
354.TP
355.B --preload-files-dest-dir <path>
356The destination directory to be used for preload-files, if other than the current working
357directory. By default, the absolute and relative paths provided by --preload-files are used.
358.
359.
360.TP
361.B --tmpdir \fR<dir>\fP
362Set the root for the session directory tree for mpirun only.
363.
364.
365.TP
366.B -wd \fR<dir>\fP
367Synonym for \fI-wdir\fP.
368.
369.
370.TP
371.B -wdir \fR<dir>\fP
372Change to the directory <dir> before the user's program executes.
373See the "Current Working Directory" section for notes on relative paths.
374.B Note:
375If the \fI-wdir\fP option appears both on the command line and in an
376application context, the context will take precedence over the command
377line.
378.
379.
380.TP
381.B -x \fR<env>\fP
382Export the specified environment variables to the remote nodes before
383executing the program.  Only one environment variable can be specified
384per \fI-x\fP option.  Existing environment variables can be specified
385or new variable names specified with corresponding values.  For
386example:
387    \fB%\fP mpirun -x DISPLAY -x OFILE=/tmp/out ...
388
389The parser for the \fI-x\fP option is not very sophisticated; it does
390not even understand quoted values.  Users are advised to set variables
391in the environment, and then use \fI-x\fP to export (not define) them.
392.
393.
394.
395.
396.P
397Setting MCA parameters:
398.
399.
400.TP
401.B -gmca\fR,\fP --gmca \fR<key> <value>\fP
402Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is
403the parameter name; \fI<value>\fP is the parameter value.
404.
405.
406.TP
407.B -mca\fR,\fP --mca <key> <value>
408Send arguments to various MCA modules.  See the "MCA" section, below.
409.
410.
411.
412.
413.P
414For debugging:
415.
416.
417.TP
418.B -debug\fR,\fP --debug
419Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP
420MCA parameter.
421.
422.
423.TP
424.B -debugger\fR,\fP --debugger
425Sequence of debuggers to search for when \fI--debug\fP is used (i.e.
426a synonym for \fIorte_base_user_debugger\fP MCA parameter).
427.
428.
429.TP
430.B -tv\fR,\fP --tv
431Launch processes under the TotalView debugger.
432Deprecated backwards compatibility flag. Synonym for \fI--debug\fP.
433.
434.
435.
436.
437.P
438There are also other options:
439.
440.
441.TP
442.B -aborted\fR,\fP --aborted \fR<#>\fP
443Set the maximum number of aborted processes to display.
444.
445.
446.TP
447.B --app \fR<appfile>\fP
448Provide an appfile, ignoring all other command line options.
449.
450.
451.TP
452.B -cf\fR,\fP --cartofile \fR<cartofile>\fP
453Provide a cartography file.
454.
455.
456.TP
457.B --hetero
458Indicates that multiple app_contexts are being provided that are a mix of 32/64-bit binaries.
459.
460.
461.TP
462.B -leave-session-attached\fR,\fP --leave-session-attached
463Do not detach OmpiRTE daemons used by this application. This allows error messages from the daemons
464as well as the underlying environment (e.g., when failing to launch a daemon) to be output.
465.
466.
467.TP
468.B -ompi-server\fR,\fP --ompi-server <uri or file>
469Specify the URI of the Open MPI server, or the name of the file (specified as file:filename) that
470contains that info. The Open MPI server is used to support multi-application data exchange via
471the MPI-2 MPI_Publish_name and MPI_Lookup_name functions.
472.
473.
474.TP
475.B -wait-for-server\fR,\fP --wait-for-server
476Pause mpirun before launching the job until ompi-server is detected. This
477is useful in scripts where ompi-server may be started in the background, followed immediately by
478an \fImpirun\fP command that wishes to connect to it. Mpirun will pause until either the specified
479ompi-server is contacted or the server-wait-time is exceeded.
480.
481.
482.TP
483.B -server-wait-time\fR,\fP --server-wait-time <secs>
484The max amount of time (in seconds) mpirun should wait for the ompi-server to start. The default
485is 10 seconds.
486.
487.
488.
489.
490.P
491The following options are useful for developers; they are not generally
492useful to most ORTE and/or MPI users:
493.
494.TP
495.B -d\fR,\fP --debug-devel
496Enable debugging of the OmpiRTE (the run-time layer in Open MPI).
497This is not generally useful for most users.
498.
499.
500.TP
501.B --debug-daemons
502Enable debugging of any OmpiRTE daemons used by this application.
503.
504.
505.TP
506.B --debug-daemons-file
507Enable debugging of any OmpiRTE daemons used by this application, storing
508output in files.
509.
510.
511.TP
512.B -launch-agent\fR,\fP --launch-agent
513Name of the executable that is to be used to start processes on the remote nodes. The default
514is "orted". This option can be used to test new daemon concepts, or to pass options back to the
515daemons without having mpirun itself see them. For example, specifying a launch agent of
516\fRorted -mca odls_base_verbose 5\fR allows the developer to ask the orted for debugging output
517without clutter from mpirun itself.
518.
519.
520.TP
521.B --noprefix
522Disable the automatic --prefix behavior
523.
524.
525.P
526There may be other options listed with \fImpirun --help\fP.
527.
528.
529.\" **************************
530.\"    Description Section
531.\" **************************
532.SH DESCRIPTION
533.
534One invocation of \fImpirun\fP starts an MPI application running under Open
535MPI. If the application is single process multiple data (SPMD), the application
536can be specified on the \fImpirun\fP command line.
537
538If the application is multiple instruction multiple data (MIMD), comprising of
539multiple programs, the set of programs and argument can be specified in one of
540two ways: Extended Command Line Arguments, and Application Context.
541.PP
542An application context describes the MIMD program set including all arguments
543in a separate file.
544.\"See appcontext(5) for a description of the application context syntax.
545This file essentially contains multiple \fImpirun\fP command lines, less the
546command name itself.  The ability to specify different options for different
547instantiations of a program is another reason to use an application context.
548.PP
549Extended command line arguments allow for the description of the application
550layout on the command line using colons (\fI:\fP) to separate the specification
551of programs and arguments. Some options are globally set across all specified
552programs (e.g. --hostfile), while others are specific to a single program
553(e.g. -np).
554.
555.
556.
557.SS Specifying Host Nodes
558.
559Host nodes can be identified on the \fImpirun\fP command line with the \fI-host\fP
560option or in a hostfile.
561.
562.PP
563For example,
564.
565.TP 4
566mpirun -H aa,aa,bb ./a.out
567launches two processes on node aa and one on bb.
568.
569.PP
570Or, consider the hostfile
571.
572
573   \fB%\fP cat myhostfile
574   aa slots=2
575   bb slots=2
576   cc slots=2
577
578.
579.PP
580Here, we list both the host names (aa, bb, and cc) but also how many "slots"
581there are for each.  Slots indicate how many processes can potentially execute
582on a node.  For best performance, the number of slots may be chosen to be the
583number of cores on the node or the number of processor sockets.  If the hostfile
584does not provide slots information, a default of 1 is assumed.
585When running under resource managers (e.g., SLURM, Torque, etc.),
586Open MPI will obtain both the hostnames and the number of slots directly
587from the resource manger.
588.
589.PP
590.
591.TP 4
592mpirun -hostfile myhostfile ./a.out
593will launch two processes on each of the three nodes.
594.
595.TP 4
596mpirun -hostfile myhostfile -host aa ./a.out
597will launch two processes, both on node aa.
598.
599.TP 4
600mpirun -hostfile myhostfile -host dd ./a.out
601will find no hosts to run on and abort with an error.
602That is, the specified host dd is not in the specified hostfile.
603.
604.SS Specifying Number of Processes
605.
606As we have just seen, the number of processes to run can be set using the
607hostfile.  Other mechanisms exist.
608.
609.PP
610The number of processes launched can be specified as a multiple of the
611number of nodes or processor sockets available.  For example,
612.
613.TP 4
614mpirun -H aa,bb -npersocket 2 ./a.out
615launches processes 0-3 on node aa and process 4-7 on node bb,
616where aa and bb are both dual-socket nodes.
617The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option,
618which is discussed in a later section.
619.
620.TP 4
621mpirun -H aa,bb -npernode 2 ./a.out
622launches processes 0-1 on node aa and processes 2-3 on node bb.
623.
624.TP 4
625mpirun -H aa,bb -npernode 1 ./a.out
626launches one process per host node.
627.
628.TP 4
629mpirun -H aa,bb -pernode ./a.out
630is the same as \fI-npernode\fP 1.
631.
632.
633.PP
634Another alternative is to specify the number of processes with the
635\fI-np\fP option.  Consider now the hostfile
636.
637
638   \fB%\fP cat myhostfile
639   aa slots=4
640   bb slots=4
641   cc slots=4
642
643.
644.PP
645Now,
646.
647.TP 4
648mpirun -hostfile myhostfile -np 6 ./a.out
649will launch ranks 0-3 on node aa and ranks 4-5 on node bb.  The remaining
650slots in the hostfile will not be used since the \fI-np\fP option indicated
651that only 6 processes should be launched.
652.
653.SS Mapping Processes to Nodes
654.
655The examples above illustrate the default mapping of process ranks
656to nodes.  This mapping can also be controlled with various
657\fImpirun\fP options.  Here, we consider the same hostfile as
658above with \fI-np\fP 6 again:
659.
660
661                          node aa      node bb      node cc
662
663  mpirun                  0 1 2 3      4 5
664
665  mpirun -loadbalance     0 1          2 3          4 5
666
667  mpirun -bynode          0 3          1 4          2 5
668
669  mpirun -nolocal                      0 1 2 3      4 5
670.
671.PP
672The \fI-loadbalance\fP option tries to spread processes out fairly among the
673nodes.
674.
675.PP
676The \fI-bynode\fP option does likewise but numbers the processes in "by node"
677in a round-robin fashion.
678.
679.PP
680The \fI-nolocal\fP option prevents any processes from being mapped onto the
681local host (in this case node aa).  While \fImpirun\fP typically consumes
682few system resources, \fI-nolocal\fP can be helpful for launching very
683large jobs where \fImpirun\fP may actually need to use noticable amounts
684of memory and/or processing time.
685.
686.PP
687Just as \fI-np\fP can specify fewer processes than there are slots, it can
688also oversubscribe the slots.  For example, with the same hostfile:
689.
690.TP 4
691mpirun -hostfile myhostfile -np 14 ./a.out
692will launch processes 0-3 on node aa, 4-7 on bb, and 8-11 on cc.  It will
693then add the remaining two processes to whichever nodes it chooses.
694.
695.PP
696One can also specify limits to oversubscription.  For example, with the same
697hostfile:
698.
699.TP 4
700mpirun -hostfile myhostfile -np 14 -nooversubscribe ./a.out
701will produce an error since \fI-nooversubscribe\fP prevents oversubscription.
702.
703.PP
704Limits to oversubscription can also be specified in the hostfile itself:
705.
706 % cat myhostfile
707 aa slots=4 max_slots=4
708 bb         max_slots=4
709 cc slots=4
710.
711.PP
712The \fImax_slots\fP field specifies such a limit.  When it does, the
713\fIslots\fP value defaults to the limit.  Now:
714.
715.TP 4
716mpirun -hostfile myhostfile -np 14 ./a.out
717causes the first 12 processes to be launched as before, but the remaining
718two processes will be forced onto node cc.  The other two nodes are
719protected by the hostfile against oversubscription by this job.
720.
721.PP
722Using the \fI--nooversubscribe\fR option can be helpful since Open MPI
723currently does not get "max_slots" values from the resource manager.
724.
725.PP
726Of course, \fI-np\fP can also be used with the \fI-H\fP or \fI-host\fP
727option.  For example,
728.
729.TP 4
730mpirun -H aa,bb -np 8 ./a.out
731launches 8 processes.  Since only two hosts are specified, after the first
732two processes are mapped, one to aa and one to bb, the remaining processes
733oversubscribe the specified hosts.
734.
735.PP
736And here is a MIMD example:
737.
738.TP 4
739mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime
740will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2
741each running \fIuptime\fP on nodes bb and cc, respectively.
742.
743.SS Process Binding
744.
745Processes may be bound to specific resources on a node.  This can
746improve performance if the operating system is placing processes
747suboptimally.  For example, it might oversubscribe some multi-core
748processor sockets, leaving other sockets idle;  this can lead
749processes to contend unnecessarily for common resources.  Or, it
750might spread processes out too widely;  this can be suboptimal if
751application performance is sensitive to interprocess communication
752costs.  Binding can also keep the operating system from migrating
753processes excessively, regardless of how optimally those processes
754were placed to begin with.
755.
756.PP
757To bind processes, one must first associate them with the resources
758on which they should run.  For example, the \fI-bycore\fP option
759associates the processes on a node with successive cores.  Or,
760\fI-bysocket\fP associates the processes with successive processor sockets,
761cycling through the sockets in a round-robin fashion if necessary.
762And \fI-cpus-per-proc\fP indicates how many cores to bind per process.
763.
764.PP
765But, such association is meaningless unless the processes are actually
766bound to those resources.  The binding option specifies the granularity
767of binding -- say, with \fI-bind-to-core\fP or \fI-bind-to-socket\fP.
768One can also turn binding off with \fI-bind-to-none\fP, which is
769typically the default.
770.
771.PP
772Finally, \fI-report-bindings\fP can be used to report bindings.
773.
774.PP
775As an example, consider a node with two processor sockets, each comprising
776four cores.  We run \fImpirun\fP with \fI-np 4 -report-bindings\fP and
777the following additional options:
778.
779
780 % mpirun ... -bycore -bind-to-core
781 [...] ... binding child [...,0] to cpus 0001
782 [...] ... binding child [...,1] to cpus 0002
783 [...] ... binding child [...,2] to cpus 0004
784 [...] ... binding child [...,3] to cpus 0008
785
786 % mpirun ... -bysocket -bind-to-socket
787 [...] ... binding child [...,0] to socket 0 cpus 000f
788 [...] ... binding child [...,1] to socket 1 cpus 00f0
789 [...] ... binding child [...,2] to socket 0 cpus 000f
790 [...] ... binding child [...,3] to socket 1 cpus 00f0
791
792 % mpirun ... -cpus-per-proc 2 -bind-to-core
793 [...] ... binding child [...,0] to cpus 0003
794 [...] ... binding child [...,1] to cpus 000c
795 [...] ... binding child [...,2] to cpus 0030
796 [...] ... binding child [...,3] to cpus 00c0
797
798 % mpirun ... -bind-to-none
799.
800.PP
801Here, \fI-report-bindings\fP shows the binding of each process as a mask.
802In the first case, the processes bind to successive cores as indicated by
803the masks 0001, 0002, 0004, and 0008.  In the second case, processes bind
804to all cores on successive sockets as indicated by the masks 000f and 00f0.
805The processes cycle through the processor sockets in a round-robin fashion
806as many times as are needed.  In the third case, the masks show us that
8072 cores have been bind per process.  In the fourth case, binding is
808turned off and no bindings are reported.
809.
810.PP
811Open MPI's support for process binding depends on the underlying
812operating system.  Therefore, processing binding may not be available
813on every system.
814.
815.PP
816Process binding can also be set with MCA parameters.
817Their usage is less convenient than that of \fImpirun\fP options.
818On the other hand, MCA parameters can be set not only on the \fImpirun\fP
819command line, but alternatively in a system or user mca-params.conf file
820or as environment variables, as described in the MCA section below.
821The correspondences are:
822.
823
824  mpirun option          MCA parameter key           value
825
826  -bycore                rmaps_base_schedule_policy  core
827  -bysocket              rmaps_base_schedule_policy  socket
828  -bind-to-core          orte_process_binding        core
829  -bind-to-socket        orte_process_binding        socket
830  -bind-to-none          orte_process_binding        none
831.
832.PP
833The \fIorte_process_binding\fP value can also take on the
834\fI:if-avail\fP attribute.  This attribute means that processes
835will be bound only if this is supported on the underlying
836operating system.  Without the attribute, if there is no
837such support, the binding request results in an error.
838For example, you could have
839.
840
841  % cat $HOME/.openmpi/mca-params.conf
842  rmaps_base_schedule_policy = socket
843  orte_process_binding       = socket:if-avail
844.
845.
846.SS Rankfiles
847.
848Rankfiles provide a means for specifying detailed information about
849how process ranks should be mapped to nodes and how they should be bound.
850Consider the following:
851.
852
853    cat myrankfile
854    rank 0=aa slot=1:0-2
855    rank 1=bb slot=0:0,1
856    rank 2=cc slot=1-2
857    mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out
858.
859So that
860
861  Rank 0 runs on node aa, bound to socket 1, cores 0-2.
862  Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
863  Rank 2 runs on node cc, bound to cores 1 and 2.
864.
865.
866.SS Application Context or Executable Program?
867.
868To distinguish the two different forms, \fImpirun\fP
869looks on the command line for \fI--app\fP option.  If
870it is specified, then the file named on the command line is
871assumed to be an application context.  If it is not
872specified, then the file is assumed to be an executable program.
873.
874.
875.
876.SS Locating Files
877.
878If no relative or absolute path is specified for a file, Open
879MPI will first look for files by searching the directories specified
880by the \fI--path\fP option.  If there is no \fI--path\fP option set or
881if the file is not found at the \fI--path\fP location, then Open MPI
882will search the user's PATH environment variable as defined on the
883source node(s).
884.PP
885If a relative directory is specified, it must be relative to the initial
886working directory determined by the specific starter used. For example when
887using the rsh or ssh starters, the initial directory is $HOME by default. Other
888starters may set the initial directory to the current working directory from
889the invocation of \fImpirun\fP.
890.
891.
892.
893.SS Current Working Directory
894.
895The \fI\-wdir\fP mpirun option (and its synonym, \fI\-wd\fP) allows
896the user to change to an arbitrary directory before the program is
897invoked.  It can also be used in application context files to specify
898working directories on specific nodes and/or for specific
899applications.
900.PP
901If the \fI\-wdir\fP option appears both in a context file and on the
902command line, the context file directory will override the command
903line value.
904.PP
905If the \fI-wdir\fP option is specified, Open MPI will attempt to
906change to the specified directory on all of the remote nodes. If this
907fails, \fImpirun\fP will abort.
908.PP
909If the \fI-wdir\fP option is \fBnot\fP specified, Open MPI will send
910the directory name where \fImpirun\fP was invoked to each of the
911remote nodes. The remote nodes will try to change to that
912directory. If they are unable (e.g., if the directory does not exit on
913that node), then Open MPI will use the default directory determined by
914the starter.
915.PP
916All directory changing occurs before the user's program is invoked; it
917does not wait until \fIMPI_INIT\fP is called. 
918.
919.
920.
921.SS Standard I/O
922.
923Open MPI directs UNIX standard input to /dev/null on all processes
924except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process
925inherits standard input from \fImpirun\fP.
926.B Note:
927The node that invoked \fImpirun\fP need not be the same as the node where the
928MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection of
929\fImpirun\fP's standard input to the rank 0 process.
930.PP
931Open MPI directs UNIX standard output and error from remote nodes to the node
932that invoked \fImpirun\fP and prints it on the standard output/error of
933\fImpirun\fP.
934Local processes inherit the standard output/error of \fImpirun\fP and transfer
935to it directly.
936.PP
937Thus it is possible to redirect standard I/O for Open MPI applications by
938using the typical shell redirection procedure on \fImpirun\fP.
939
940      \fB%\fP mpirun -np 2 my_app < my_input > my_output
941
942Note that in this example \fIonly\fP the MPI_COMM_WORLD rank 0 process will
943receive the stream from \fImy_input\fP on stdin.  The stdin on all the other
944nodes will be tied to /dev/null.  However, the stdout from all nodes will
945be collected into the \fImy_output\fP file.
946.
947.
948.
949.SS Signal Propagation
950.
951When orterun receives a SIGTERM and SIGINT, it will attempt to kill
952the entire job by sending all processes in the job a SIGTERM, waiting
953a small number of seconds, then sending all processes in the job a
954SIGKILL.
955.
956.PP
957SIGUSR1 and SIGUSR2 signals received by orterun are propagated to
958all processes in the job.
959.
960.PP
961One can turn on forwarding of SIGSTOP and SIGCONT to the program executed
962by mpirun by setting the MCA parameter orte_forward_job_control to 1.
963A SIGTSTOP signal to mpirun will then cause a SIGSTOP signal to be sent
964to all of the programs started by mpirun and likewise a SIGCONT signal
965to mpirun will cause a SIGCONT sent.
966.
967.PP
968Other signals are not currently propagated
969by orterun.
970.
971.
972.SS Process Termination / Signal Handling
973.
974During the run of an MPI application, if any rank dies abnormally
975(either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a
976signal), \fImpirun\fP will print out an error message and kill the rest of the
977MPI application.
978.PP
979User signal handlers should probably avoid trying to cleanup MPI state
980(Open MPI is, currently, neither thread-safe nor async-signal-safe).
981For example, if a segmentation fault occurs in \fIMPI_SEND\fP (perhaps because
982a bad buffer was passed in) and a user signal handler is invoked, if this user
983handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since
984Open MPI was already "in" MPI when the error occurred.  Since \fImpirun\fP
985will notice that the process died due to a signal, it is probably not
986necessary (and safest) for the user to only clean up non-MPI state.
987.
988.
989.
990.SS Process Environment
991.
992Processes in the MPI application inherit their environment from the
993Open RTE daemon upon the node on which they are running.  The
994environment is typically inherited from the user's shell.  On remote
995nodes, the exact environment is determined by the boot MCA module
996used.  The \fIrsh\fR launch module, for example, uses either
997\fIrsh\fR/\fIssh\fR to launch the Open RTE daemon on remote nodes, and
998typically executes one or more of the user's shell-setup files before
999launching the Open RTE daemon.  When running dynamically linked
1000applications which require the \fILD_LIBRARY_PATH\fR environment
1001variable to be set, care must be taken to ensure that it is correctly
1002set when booting Open MPI.
1003.PP
1004See the "Remote Execution" section for more details.
1005.
1006.
1007.SS Remote Execution
1008.
1009Open MPI requires that the \fIPATH\fR environment variable be set to
1010find executables on remote nodes (this is typically only necessary in
1011\fIrsh\fR- or \fIssh\fR-based environments -- batch/scheduled
1012environments typically copy the current environment to the execution
1013of remote jobs, so if the current environment has \fIPATH\fR and/or
1014\fILD_LIBRARY_PATH\fR set properly, the remote nodes will also have it
1015set properly).  If Open MPI was compiled with shared library support,
1016it may also be necessary to have the \fILD_LIBRARY_PATH\fR environment
1017variable set on remote nodes as well (especially to find the shared
1018libraries required to run user MPI applications).
1019.PP
1020However, it is not always desirable or possible to edit shell
1021startup files to set \fIPATH\fR and/or \fILD_LIBRARY_PATH\fR.  The
1022\fI--prefix\fR option is provided for some simple configurations where
1023this is not possible.
1024.PP
1025The \fI--prefix\fR option takes a single argument: the base directory
1026on the remote node where Open MPI is installed.  Open MPI will use
1027this directory to set the remote \fIPATH\fR and \fILD_LIBRARY_PATH\fR
1028before executing any Open MPI or user applications.  This allows
1029running Open MPI jobs without having pre-configured the \fIPATH\fR and
1030\fILD_LIBRARY_PATH\fR on the remote nodes.
1031.PP
1032Open MPI adds the basename of the current
1033node's "bindir" (the directory where Open MPI's executables are
1034installed) to the prefix and uses that to set the \fIPATH\fR on the
1035remote node.  Similarly, Open MPI adds the basename of the current
1036node's "libdir" (the directory where Open MPI's libraries are
1037installed) to the prefix and uses that to set the
1038\fILD_LIBRARY_PATH\fR on the remote node.  For example:
1039.TP 15
1040Local bindir:
1041/local/node/directory/bin
1042.TP
1043Local libdir:
1044/local/node/directory/lib64
1045.PP
1046If the following command line is used:
1047
1048    \fB%\fP mpirun --prefix /remote/node/directory
1049
1050Open MPI will add "/remote/node/directory/bin" to the \fIPATH\fR
1051and "/remote/node/directory/lib64" to the \fLD_LIBRARY_PATH\fR on the
1052remote node before attempting to execute anything.
1053.PP
1054Note that \fI--prefix\fR can be set on a per-context basis, allowing
1055for different values for different nodes.
1056.PP
1057The \fI--prefix\fR option is not sufficient if the installation paths
1058on the remote node are different than the local node (e.g., if "/lib"
1059is used on the local node, but "/lib64" is used on the remote node),
1060or if the installation paths are something other than a subdirectory
1061under a common prefix. 
1062.PP
1063Note that executing \fImpirun\fR via an absolute pathname is
1064equivalent to specifying \fI--prefix\fR without the last subdirectory
1065in the absolute pathname to \fImpirun\fR.  For example:
1066
1067    \fB%\fP /usr/local/bin/mpirun ...
1068
1069is equivalent to
1070
1071    \fB%\fP mpirun --prefix /usr/local
1072.
1073.
1074.
1075.SS Exported Environment Variables
1076.
1077All environment variables that are named in the form OMPI_* will automatically
1078be exported to new processes on the local and remote nodes.
1079The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
1080variables to the new processes.  While the syntax of the \fI\-x\fP
1081option allows the definition of new variables, note that the parser
1082for this option is currently not very sophisticated - it does not even
1083understand quoted values.  Users are advised to set variables in the
1084environment and use \fI\-x\fP to export them; not to define them.
1085.
1086.
1087.
1088.SS Setting MCA Parameters
1089.
1090The \fI-mca\fP switch allows the passing of parameters to various MCA
1091(Modular Component Architecture) modules.
1092.\" Open MPI's MCA modules are described in detail in ompimca(7).
1093MCA modules have direct impact on MPI programs because they allow tunable
1094parameters to be set at run time (such as which BTL communication device driver
1095to use, what parameters to pass to that BTL, etc.).
1096.PP
1097The \fI-mca\fP switch takes two arguments: \fI<key>\fP and \fI<value>\fP.
1098The \fI<key>\fP argument generally specifies which MCA module will receive the value.
1099For example, the \fI<key>\fP "btl" is used to select which BTL to be used for
1100transporting MPI messages.  The \fI<value>\fP argument is the value that is
1101passed.
1102For example:
1103.
1104.TP 4
1105mpirun -mca btl tcp,self -np 1 foo
1106Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of
1107"foo" an allocated node.
1108.
1109.TP
1110mpirun -mca btl self -np 1 foo
1111Tells Open MPI to use the "self" BTL, and to run a single copy of "foo" an
1112allocated node.
1113.\" And so on.  Open MPI's BTL MCA modules are described in ompimca_btl(7).
1114.PP
1115The \fI-mca\fP switch can be used multiple times to specify different
1116\fI<key>\fP and/or \fI<value>\fP arguments.  If the same \fI<key>\fP is
1117specified more than once, the \fI<value>\fPs are concatenated with a comma
1118(",") separating them.
1119.PP
1120Note that the \fI-mca\fP switch is simply a shortcut for setting environment variables.
1121The same effect may be accomplished by setting corresponding environment
1122variables before running \fImpirun\fP.
1123The form of the environment variables that Open MPI sets is:
1124
1125      OMPI_MCA_<key>=<value>
1126.PP
1127Thus, the \fI-mca\fP switch overrides any previously set environment
1128variables.  The \fI-mca\fP settings similarly override MCA parameters set
1129in the
1130$OPAL_PREFIX/etc/openmpi-mca-params.conf or $HOME/.openmpi/mca-params.conf
1131file.
1132.
1133.PP
1134Unknown \fI<key>\fP arguments are still set as
1135environment variable -- they are not checked (by \fImpirun\fP) for correctness.
1136Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it
1137depends on the specific MCA module.
1138.PP
1139To find the available component types under the MCA architecture, or to find the
1140available parameters for a specific component, use the \fIompi_info\fP command.
1141See the \fIompi_info(1)\fP man page for detailed information on the command.
1142.
1143.\" **************************
1144.\"    Examples Section
1145.\" **************************
1146.SH EXAMPLES
1147Be sure also to see the examples throughout the sections above.
1148.
1149.TP 4
1150mpirun -np 4 -mca btl ib,tcp,self prog1
1151Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport
1152of MPI messages.
1153.
1154.
1155.TP 4
1156mpirun -np 4 -mca btl tcp,sm,self
1157.br
1158--mca btl_tcp_if_include ce0 prog1
1159.br
1160Run 4 copies of prog1 using the "tcp", "sm" and "self" BTLs for the transport of
1161MPI messages, with TCP using only the ce0 interface to communicate.  Note that
1162other BTLs have similar if_include MCA parameters.
1163.
1164.\" **************************
1165.\"    Diagnostics Section
1166.\" **************************
1167.
1168.\" .SH DIAGNOSTICS
1169.\".TP 4
1170.\"Error Msg:
1171.\"Description
1172.
1173.\" **************************
1174.\"    Return Value Section
1175.\" **************************
1176.
1177.SH RETURN VALUE
1178.
1179\fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling
1180MPI_FINALIZE.  A non-zero value is returned if an internal error occurred in
1181mpirun, or one or more ranks exited before calling MPI_FINALIZE.  If an
1182internal error occurred in mpirun, the corresponding error code is returned.
1183In the event that one or more ranks exit before calling MPI_FINALIZE, the
1184return value of the rank of the process that \fImpirun\fP first notices died
1185before calling MPI_FINALIZE will be returned.  Note that, in general, this will
1186be the first rank that died but is not guaranteed to be so.
1187.
1188.\" **************************
1189.\"    See Also Section
1190.\" **************************
1191.
1192.\" .SH SEE ALSO
1193.\" orted(1), ompi-server(1)
Note: See TracBrowser for help on using the repository browser.