Previous | Contents | Index |
A program containing OpenMP Fortran API compiler directives begins execution as a single process, called the master thread of execution. The master thread executes sequentially until the first parallel construct is encountered.
In OpenMP Fortran API, the PARALLEL and END PARALLEL directives define the parallel construct. When the master thread encounters a parallel construct, it creates a team of threads, with the master thread becoming the master of the team. The program statements enclosed by the parallel construct are executed in parallel by each thread in the team. These statements include routines called from within the enclosed statements.
The statements enclosed lexically within a construct define the static extent of the construct. The dynamic extent also includes the routines called from within the construct. When the END PARALLEL directive is encountered, the threads in the team synchronize at that point, the team is dissolved, and only the master thread continues execution. The other threads in the team enter a wait state.
You can specify any number of parallel constructs in a single program. As a result, thread teams can be created and dissolved many times during program execution.
In routines called from within parallel constructs, you can also use
directives. Directives that are not in the lexical extent of the
parallel construct, but are in the dynamic extent, are called orphaned
directives. Orphaned directives allow you to execute major portions of
your program in parallel with only minimal changes to the sequential
version of the program. Using this functionality, you can code parallel
constructs at the top levels of your program call tree and use
directives to control execution in any of the called routines.
6.1.5 Controlling the Data Environment
The following sections describe how you control the data environment within parallel and worksharing constructs. Using directives and data environment clauses on directives, you can:
You can make named common blocks private to a thread, but global within the thread by using the THREADPRIVATE directive.
Each thread gets its own copy of the common block with the result that data written to the common block by one thread is not directly visible to other threads. During serial portions and MASTER sections of the program, accesses are to the master thread copy of the common block.
You cannot use a thread private common block or its constituent variables in any clause other than the COPYIN clause.
In the following example, common blocks BLK1 and FIELDS are specified as thread private:
COMMON /BLK1/ SCRATCH COMMON /FIELDS/ XFIELD, YFIELD, ZFIELD !$OMP THREADPRIVATE(/BLK1/,/FIELDS/) |
You can use several directive clauses to control the data scope attributes of variables for the duration of the construct in which you specify them. If you do not specify a data scope attribute clause on a directive, the default is SHARED for those variables affected by the directive.
Each of the data scope attribute clauses accepts a list, which is a comma-separated list of named variables or named common blocks that are accessible in the scoping unit. When you specify named common blocks, they must appear between slashes (/name/).
Not all of the clauses are allowed on all directives, but the directives to which each clause applies are listed in the clause descriptions.
The data scope attribute clauses are:
Use the COPYIN clause on the PARALLEL, PARALLEL DO, and PARALLEL SECTIONS directives to copy the data in the master thread common block to the thread private copies of the common block. The copy occurs at the beginning of the parallel region. The COPYIN clause applies only to common blocks that have been declared THREADPRIVATE (see Section 6.1.5.1).
You do not have to specify a whole common block to be copied in; you can specify named variables that appear in the THREADPRIVATE common block. In the following example, the common blocks BLK1 and FIELDS are specified as thread private, but only one of the variables in common block FIELDS is specified to be copied in:
COMMON /BLK1/ SCRATCH COMMON /FIELDS/ XFIELD, YFIELD, ZFIELD !$OMP THREADPRIVATE(/BLK1/, /FIELDS/) !$OMP PARALLEL DEFAULT(PRIVATE),COPYIN(/BLK1/,ZFIELD) |
Use the DEFAULT clause on the PARALLEL, PARALLEL DO, and PARALLEL SECTIONS directives to specify a default data scope attribute for all variables within the lexical extent of a parallel region. Variables in THREADPRIVATE common blocks are not affected by this clause. You can specify only one DEFAULT clause on a directive. The default data scope attribute can be one of the following:
If you do not specify the DEFAULT clause, the default is DEFAULT(SHARED). However, loop control variables are always PRIVATE by default.
You can exempt variables from the default data scope attribute by using other scope attribute clauses on the parallel region as shown in the following example:
!$OMP PARALLEL DO DEFAULT(PRIVATE), FIRSTPRIVATE(I),SHARED(X), !$OMP& SHARED(R) LASTPRIVATE(I) |
Use the PRIVATE clause on the PARALLEL, DO, SECTIONS, SINGLE, PARALLEL DO, and PARALLEL SECTIONS directives to declare variables to be private to each thread in the team.
The behavior of variables declared PRIVATE is as follows:
In the following example, the values of I and J are undefined on exit from the parallel region:
INTEGER I,J I =1 J =2 !$OMP PARALLEL PRIVATE(I) FIRSTPRIVATE(J) I =3 J =J+ 2 !$OMP END PARALLEL PRINT *, I, J |
Use the FIRSTPRIVATE clause on the PARALLEL, DO, SECTIONS, SINGLE, PARALLEL DO, and PARALLEL SECTIONS directives to provide a superset of the PRIVATE clause functionality.
In addition to the PRIVATE clause functionality, private copies of the variables are initialized from the original object existing before the parallel construct.
Use the LASTPRIVATE clause on the DO, SECTIONS, PARALLEL DO, and PARALLEL SECTIONS directives to provide a superset of the PRIVATE clause functionality.
When the LASTPRIVATE clause appears on a DO or PARALLEL DO directive, the thread that executes the sequentially last iteration updates the version of the object it had before the construct.
When the LASTPRIVATE clause appears on a SECTIONS or PARALLEL SECTIONS directive, the thread that executes the lexically last section updates the version of the object it had before the construct.
Subobjects that are not assigned a value by the last iteration of the DO loop or the lexically last SECTION directive are undefined after the construct.
Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. You must list all such variables as arguments to a LASTPRIVATE clause so that the values of the variables are the same as when the loop is executed sequentially. As shown in the following example, the value of I at the end of the parallel region is equal to N+1, as it would be with sequential execution.
!$OMP PARALLEL !$OMP DO LASTPRIVATE(I) DO I=1,N A(I) = B(I) + C(I) END DO !$OMP END PARALLEL CALL REVERSE(I) |
Use the REDUCTION clause on the PARALLEL, DO, SECTIONS, PARALLEL DO, and PARALLEL SECTIONS directives to perform a reduction on the specified variables by using an operator or intrinsic as shown.
REDUCTION (
Operator can be one of the following: +, *, -, .AND., .OR., .EQV., or .NEQV..
Intrinsic can be one of the following: MAX, MIN, IAND, IOR, or IEOR.
The specified variables must be named scalar variables of intrinsic type and must be SHARED in the enclosing context. A private copy of each specified variable is created for each thread as if you had used the PRIVATE clause. The private copy is initialized to a value that depends on the operator or intrinsic as shown in Table 6-2.
At the end of the construct to which the reduction applies, the shared variable is updated to reflect the result of combining the original value of the SHARED reduction variable with the final value of each of the private copies using the specified operator.
Except for subtraction, all of the reduction operators are associative and the compiler can freely reassociate the computation of the final value. The partial results of a subtraction reduction are added to form the final value.
The value of the shared variable becomes undefined when the first thread reaches the clause containing the reduction, and it remains undefined until the reduction computation is complete. Normally, the computation is complete at the end of the REDUCTION construct. However, if you use the REDUCTION clause on a construct to which NOWAIT is also applied, the shared variable remains undefined until a barrier synchronization has been performed. This ensures that all of the threads have completed the REDUCTION clause.
The REDUCTION clause is intended to be used on a region or worksharing construct in which the reduction variable is used only in reduction statements having one of the following forms:
x = x operator expr x = expr operator x (except for subtraction) x = intrinsic (x,expr) x = intrinsic (expr, x) |
Some reductions can be expressed in other forms. For instance, a MAX reduction might be expressed as follows:
IF (x .LT. expr) x = expr |
Alternatively, the reduction might be hidden inside a subroutine call. Be careful that the operator specified in the REDUCTION clause matches the reduction operation.
Table 6-2 lists the valid operators and intrinsics and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable.
Operator/Intrinsic | Initialization |
---|---|
+ | 0 |
* | 1 |
- | 0 |
.AND. | .TRUE. |
.OR. | .FALSE. |
.EQV. | .TRUE. |
.NEQV. | .FALSE. |
MAX | Smallest representable number |
MIN | Largest representable number |
IAND | All bits on |
IOR | 0 |
IEOR | 0 |
Any number of reduction clauses can be specified on the directive, but a variable can appear only once in a REDUCTION clause for that directive as shown in the following example:
!$OMP DO REDUCTION(+: A, Y),REDUCTION(.OR.: AM) |
The following example shows how to use the REDUCTION clause:
!$OMP PARALLEL DO DEFAULT(PRIVATE),SHARED(A,B),REDUCTION(+: A,B) DO I=1,N CALL WORK(ALOCAL,BLOCAL) A = A + ALOCAL B = B + BLOCAL END DO !$OMP END PARALLEL DO |
Use the SHARED clause on the PARALLEL, PARALLEL DO, and PARALLEL SECTIONS directives to make variables shared among all the threads in a team.
In the following example, the variables X and NPOINTS are shared among all the threads in the team:
!$OMP PARALLEL DEFAULT(PRIVATE),SHARED(X,NPOINTS) IAM = OMP_GET_THREAD_NUM() NP = OMP_GET_NUM_THREADS() IPOINTS = NPOINTS/NP CALL SUBDOMAIN(X,IAM,IPOINTS) !$OMP END PARALLEL |
A parallel region is a block of code that must be executed by a team of threads in parallel. The PARALLEL and END PARALLEL directives define a parallel region as follows:
!$OMP PARALLEL !parallel region !$OMP END PARALLEL |
When a thread encounters a parallel region, it creates a team of threads and becomes the master of the team. The master thread is also a member of the team. You can control the number of threads in a team by the use of an environment variable or a run-time library call, or both. For more information about environment variables, see Table 6-4. For more information about run-time library routines, see Appendix D.
Once created, the number of threads in the team remains constant for the duration of that parallel region. However, you can explicitly change the number of threads used in the next parallel region by calling the OMP_SET_NUM_THREADS run-time library routine from a serial portion of the program. This routine overrides any value you may have set using the OMP_NUM_THREADS environment variable.
Assuming you have used the OMP_NUM_THREADS environment variable to set the number of threads to 6, you can change the number of threads between parallel regions as follows:
CALL OMP_SET_NUM_THREADS(3) !$OMP PARALLEL . . . !$OMP END PARALLEL CALL OMP_SET_NUM_THREADS(4) !$OMP PARALLEL DO . . . !$OMP END PARALLEL DO |
The statements enclosed lexically within a parallel region define the static extent of the region.
In the following example, the !$OMP DO and !$OMP END DO directives and all the statements enclosed by them comprise the static extent of the parallel region:
!$OMP PARALLEL !$OMP DO DO I=1,N B(I) = (A(I) + A(I-1)) / 2.0 END DO !$OMP END DO !$OMP END PARALLEL |
The statements enclosed by the parallel region, including routines called from within the enclosed statements, define the dynamic extent of the parallel region.
In the following example, the !$OMP DO and !$OMP END DO directives and all the statements enclosed by them, including all statements contained in the WORK subroutine, comprise the dynamic extent of the parallel region:
!$OMP PARALLEL DEFAULT(SHARED) !$OMP DO DO I = 1, N CALL WORK(I,N) END DO !$OMP END DO !$OMP END PARALLEL |
When an IF clause is present on the PARALLEL directive, the enclosed code region is executed in parallel only if the scalar logical expression evaluates to .TRUE.. Otherwise, the parallel region is serialized. When there is no IF clause, the region is executed in parallel by default.
In the following example, the statements enclosed within the !$OMP DO and !$OMP END DO directives are executed in parallel only if there are more that 3 processors available. Otherwise the statements are executed serially.
!$OMP PARALLEL IF (OMP_GET_NUM_PROCS() .GT. 3) !$OMP DO DO I=1,N Y(I) = SQRT(Z(I)) END DO !$OMP END DO !$OMP END PARALLEL |
If a thread executing a parallel region encounters another parallel region, it creates a new team and becomes the master of that new team. By default, nested parallel regions are always executed by a team of one thread.
To achieve better performance than sequential execution, a parallel
region must contain one or more worksharing constructs so that the team
of threads can execute work in parallel. It is the contained
worksharing constructs that lead to the performance enhancements
offered by parallel processing. See Section 6.1.7 for information about
worksharing constructs.
6.1.7 Worksharing Constructs
The concept of the worksharing construct is the heart of parallel processing. A worksharing construct divides the execution of the enclosed code region among the members of the team created upon entering the enclosing parallel region construct.
A worksharing construct must be enclosed dynamically within a parallel region if the worksharing directive is to execute in parallel. No new threads are launched and there is no implied barrier upon entry to a worksharing construct.
The worksharing constructs are:
The DO directive specifies that the iterations of the immediately following DO loop must be dispatched across the team of threads so that each iteration is executed by a single thread. The loop that follows a DO directive cannot be a DO WHILE or a DO loop that does not have loop control. The iterations of the DO loop are dispatched among the existing team of threads.
You cannot use a GOTO statement, or any other statement, to transfer control into or out of the DO construct.
If you specify the optional END DO directive, it must appear immediately after the end of the DO loop. If you do not specify the END DO directive, an END DO directive is assumed at the end of the DO loop.
The loop iteration variable is private by default, so it is not necessary to declare it explicitly.
If you do not specify the optional NOWAIT clause on the END DO directive, threads synchronize at the END DO directive. If you specify NOWAIT, threads do not synchronize, and threads that finish early proceed directly to the instructions following the END DO directive.
The DO directive optionally lets you:
Controlling Data Scope Attributes
For information about controlling data scope attributes, see Section 6.1.5.2.
Specifying Schedule Type and Chunk Size
The SCHEDULE clause specifies a scheduling algorithm that determines how iterations of the DO loop are divided among and dispatched to the threads of the team. The SCHEDULE clause applies only to the current DO or PARALLEL DO directive.
Within the SCHEDULE clause, you must specify a schedule type and optionally, a chunk size. Chunk must be a scalar integer expression.
The following list describes the schedule types and how the chunk size affects scheduling:
You can determine the schedule type used for the current DO or PARALLEL DO directive by using the following prioritized list. The available schedule type closest to the top of the list is used:
You can determine the chunk size used for the current DO or PARALLEL DO directive by using the following prioritized list. The available chunk size closest to the top of the list is used:
Use the noniterative worksharing SECTIONS directive to divide the enclosed sections of code among the team. Each section is executed just one time by one thread.
Precede each section with a SECTION directive. However, the SECTION directive is optional for the first section. The SECTION directive must appear within the lexical extent of the SECTIONS and END SECTIONS directives.
The last section ends at the END SECTIONS directive. When a thread completes its section and there are no undispatched sections, it waits at the END SECTION directive unless you specify NOWAIT.
The following example shows how to use the SECTIONS and SECTION directives to execute subroutines XAXIS, YAXIS, and ZAXIS in parallel. The first SECTION directive is optional:
!$OMP PARALLEL !$OMP SECTIONS !$OMP SECTION CALL XAXIS !$OMP SECTION CALL YAXIS !$OMP SECTION CALL ZAXIS !$OMP END SECTIONS !$OMP END PARALLEL |
For information about controlling the data scope attributes, see Section 6.1.5.2.
Previous | Next | Contents | Index |