PGI コンパイル情報 最適化情報 オプション
PGI の F77, F2003, C, C++ コンパイラを使用する際に、コンパイラのメッセージ、実施した最適化、並列化情報を得るための基本的なオプションを説明します。以下は、主に、pgfortran を使用した場合の例ですが、コンパイラのオプションの設定方法は、他の言語コンパイラでも同じです。
2012年2月2日更新 Copyright © 株式会社ソフテック
$ pgfortran -fastsse -Mvect=prefetch -Minfo Test.f
initia:
195, Loop unrolled 8 times
204, Loop unrolled 8 times
fielde:
56, Interchange produces reordered loop nest: 57, 56
90, 1 loop-carried redundant expression removed with 3 operations
and 4 arrays
231, Unrolling inner loop 8 times
Generated prefetch instructions for 3 loads
239, Unrolling inner loop 8 times
Generated prefetch instructions for 2 loads
bounde:
269, Unrolling inner loop 8 times
Generated prefetch instructions for 2 loads
360, Generating vector sse code for inner loop
383, Generating vector sse code for inner loop
fieldh:
485, Unrolling inner loop 8 times
Generated prefetch instructions for 4 loads
boundh:
515, Generating vector sse code for inner loop
Generated prefetch instructions for 2 loads
array:
654, Unrolling inner loop 8 times
665, Unrolling inner loop 8 times
676, Unrolling inner loop 8 times
705, Generating vector sse code for inner loop
720, Generating vector sse code for inner loop
735, Generating vector sse code for inner loop
770, Generating vector sse code for inner loop
775, Generating vector sse code for inner loop
build_xxx:
796, Generating vector sse code for inner loop
Generated prefetch instructions for 3 loads
all 以下のサブオプションをすべて指定したものと解釈します。 accel PGI Accelerator最適化に関する情報 ccff common compiler feedback formatで最適化情報をオブジェクトファイル追加 ftn Fortran特有な情報の有効化 lre LRE情報の有効化 inline インライン化に関する情報 intensityループの計算密度の出力 ipa IPA最適化情報 loop ベクトル化等のループに関する情報 mp OpenMP並列化に関する情報 par 並列化に関する情報 opt 最適化に関する情報 pfo Profile Feed back最適化に関する情報 time コンパイル時間統計の出力 unroll アンロール最適化情報 par 並列化の情報の有効化 pfo プロファイル・フィードバックに関する情報の有効化 vect ベクトル化に関する情報
$ pgfortran -fastsse -Mneginfo Test.f
main:
67, Loop not vectorized: contains call
initia:
195, Loop not vectorized due to data dependency
bound:
306, Loop not vectorized: contains call
$ pgfortran -fastsse -Mconcur -Minfo Test.f
initia:
195, Loop unrolled 8 times
204, Parallel code activated if loop count >= 100; block distribution
Loop unrolled 8 times
fielde:
230, Parallel code for non-innermost loop generated; block distribution
231, Unrolling inner loop 8 times
238, Parallel code for non-innermost loop generated; block distribution
239, Unrolling inner loop 8 times
bounde:
269, Parallel code activated if loop count >= 100; block distribution
Unrolling inner loop 8 times
279, Parallel code activated if loop count >= 100; block distribution
295, Parallel code activated if loop count >= 100; block distribution
302, Parallel code activated if loop count >= 100; block distribution
321, Parallel code activated if loop count >= 100; block distribution
337, Parallel code activated if loop count >= 100; block distribution
346, Parallel code activated if loop count >= 100; block distribution
360, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
369, Parallel code activated if loop count >= 100; block distribution
383, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
fieldh:
484, Parallel code for non-innermost loop generated; block distribution
485, Unrolling inner loop 8 times
boundh:
515, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
array:
653, Parallel code for non-innermost loop generated; block distribution
654, Unrolling inner loop 8 times
664, Parallel code for non-innermost loop generated; block distribution
665, Unrolling inner loop 8 times
675, Parallel code for non-innermost loop generated; block distribution
676, Unrolling inner loop 8 times
696, Parallel code for non-innermost loop generated; block distribution
705, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
711, Parallel code for non-innermost loop generated; block distribution
720, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
726, Parallel code for non-innermost loop generated; block distribution
735, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
770, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
775, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
build_xxx:
796, Parallel code activated if loop count >= 100; block distribution
Generating vector sse code for inner loop
pml:
829, Parallel code for non-innermost loop generated; block distribution
850, Parallel code for non-innermost loop generated; block distribution
871, Parallel code for non-innermost loop generated; block distribution
892, Parallel code for non-innermost loop generated; block distribution
$ pgcc -c t.c (デフォルトは全てのメッセージを出す) PGC-W-0119-void function main cannot return value (t.c: 13) <== warning メッセージ PGC/x86-64 Linux 12.4-0: compilation completed with warnings $ pgcc -c -Minform=severe t.c (Severe、Fatal のみのメッセージを出す) PGC/x86-64 Linux 12.4-0: compilation completed with warnings
$ pgfortran -fastsse -Mlist -Minfo mat.f90
PGF90 (Version 11.10) 02/03/2012 16:56:41 page 1
Switches: -noasm -nodclchk -nodebug -nodlines -noline -list
-inform severe -opt 2 -nosave -object -noonetrip
-depchk on -nostandard
-nosymbol -noupcase
Filename: mat.f90
( 1) program mat
( 2) integer i, j, k, size, l, m, n
( 3) parameter (size=16000) ! >2GB
( 4) parameter (m=size,n=size)
( 5) real*8 a(m,n),b(m,n),c(m,n),d
( 6)
( 7) do i = 1, m
( 8) do j = 1, n
( 9) a(i,j)=10000.0D0*dble(i)+dble(j)
( 10) b(i,j)=20000.0D0*dble(i)+dble(j)
( 11) enddo
( 12) enddo
$ pgfortran -fastsse -Manno -S Test.f
あるいは、
$ pgcc -fastsse -Manno -Mkeepasm Test.c (Cの場合は、-Mkeepasmを入れる)
.LB1_836:
# lineno: 151
# DO 60 j = 1,n
# a(j) = b(j) + scalar*c(j)
# 60 CONTINUE
movapd %xmm0, %xmm1
movapd (%esi,%ecx), %xmm2
movapd 16(%esi,%ecx), %xmm3
subl $8, %eax
mulpd %xmm1, %xmm2
mulpd %xmm1, %xmm3
addpd (%edi,%ecx), %xmm2
addpd 16(%edi,%ecx), %xmm3
movapd %xmm2, (%edx,%ecx)
movapd 32(%esi,%ecx), %xmm2
movapd %xmm3, 16(%edx,%ecx)
mulpd %xmm1, %xmm2
mulpd 48(%esi,%ecx), %xmm1
addpd 32(%edi,%ecx), %xmm2
addpd 48(%edi,%ecx), %xmm1
movapd %xmm2, 32(%edx,%ecx)
movapd %xmm1, 48(%edx,%ecx)
addl $64, %ecx
testl %eax, %eax
jg .LB1_836
# lineno: 154
$ pgfortran -fastsse -flags -Minfo -Mlist mat.f90
Reading rcfile /usr/pgi/linux86-64/11.10/bin/.pgfortranrc
-M[no]list Generate a listing file
-fastsse == -fast
-fast Common optimizations; includes -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline
== -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre
-M[no]vect[=[no]altcode|[no]assoc|cachesize:<c>|[no]fuse|[no]gather|[no]idiom|levels:<n>|[no]partial
|prefetch|[no]short|[no]simd|[no]sizelimit[:n]|[no]sse|[no]tile|[no]uniform]
Control automatic vector pipelining
[no]altcode Generate appropriate alternative code for vectorized loops
[no]assoc Allow [disallow] reassociation
cachesize:<c> Optimize for cache size c
[no]fuse Enable [disable] loop fusion
[no]gather Enable [disable] vectorization of indirect array references
[no]idiom Enable [disable] idiom recognition
levels:<n> Maximum nest level of loops to optimize
[no]partial Enable [disable] partial loop vectorization via inner loop distribution
prefetch Generate prefetch instructions
[no]short Enable [disable] short vector operations
[no]simd Generate [don't generate] SIMD instructions
128 Use 128-bit SIMD instructions
256 Use 256-bit SIMD instructions
[no]sizelimit[:n]
Limit size of vectorized loops
[no]sse Generate [don't generate] SSE instructions
[no]tile Enable [disable] loop tiling
[no]uniform Perform consistent optimizations in both vectorized and residual loops;
this may affect the performance of the residual loop
-M[no]scalarsse Generate scalar sse code with xmm registers; implies -Mflushz
-Mcache_align Align long objects on cache-line boundaries
-M[no]flushz Set SSE to flush-to-zero mode
-M[no]pre Enable partial redundancy elimination
-Minfo[=all|accel|ccff|ftn|hpf|inline|intensity|ipa|loop|lre|mp|opt|par|pfo|stat|time|unified|vect]
Generate informational messages about optimizations
all -Minfo=accel,inline,ipa,loop,lre,mp,opt,par,unified,vect
accel Enable Accelerator information
ccff Append information to object file
ftn Enable Fortran-specific information
inline Enable inliner information
intensity Enable compute intensity information
ipa Enable IPA information
loop Enable loop optimization information
lre Enable LRE information
mp Enable OpenMP information
opt Enable optimizer information
par Enable parallelizer information
pfo Enable profile feedback information
time Display time spent in compiler phases
unified Enable unified binary information
vect Enable vectorizer information
$ pgfortran -help=group
Switch Classifications: (オプションスイッチのカテゴリを表示)
overall Overall switches
opt Optimization switches
debug Debugging switches
prepro Preprocessor switches
asm Assembler switches
linker Linker switches
language Language-specific switches
target Target-specific switches
other Other switches
例えば、カテゴリ prepro (プリプロセス処理)に関するオプションスイッチを表示
$ pgfortran -help=prepro
Preprocessor switches:
-D<macro> Define a preprocessor macro
-dD (C only) Print macros and values from source files
-dI (C only) Print include file names
-dM (C only) Print macros and values, including predefined and command-line macros
-dN (C only) Print macro names from source files
-E Stop after preprocessor; print output on standard output
-F Stop after preprocessing, save output in .f file
-I<incdir> Add directory to include file search path
-Mcpp[=m|md|mm|mmd|line|[no]comment|suffix:<suff> |<suff> |include:<file> ]
Just preprocess the input files
m Print makefile dependencies
md Print makefile dependencies to .d file
mm Print makefile dependencies; ignore system includes
mmd Print makefile dependencies to .d file; ignore system includes
line Insert line numbers into preprocess output
[no]comment Keep comments in preprocessed output
suffix:<suff> Suffix to use for makefile dependencies
<suff> Suffix to use for makefile dependencies
include:<file> Include file before processing source file
-Mnostddef Do not use standard macro definitions
-Mnostdinc Do not use standard include directories
-Mpreprocess Run preprocessor for assembly and Fortran files
-U<macro> Undefine a preprocessor macro
-YI,<incdir> Change standard include directory
-Yp,<preprodir> Change preprocessor directory
$ pgcc -# bigadd.c 【PGI コンパイラによるコード生成フェーズ】 /usr/pgi/linux86-64/11.10/bin/pgc bigadd.c -opt 1 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp sandybridge -x 120 0x1000 -astype 0 -stdinc /usr/pgi/linux86-64/11.10/include :/usr/local/include: /usr/lib/gcc/x86_64-redhat-linux/4.4.5/include:/usr/lib/gcc/x86_64-redhat-linux/4.4.5/include: /usr/include -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def __NO_MATH_INLINES -def __x86_64__ -def __LONG_MAX__=9223372036854775807L -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __THROW= -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -predicate '#machine(x86_64) #lint(off) #system(posix) #cpu(x86_64)' -cmdline '+pgcc bigadd.c -# -mcmodel=medium' -x 123 0x80000000 -x 123 4 -x 2 0x400 -x 119 0x20 -def __pgnu_vsn=40405 -alwaysinline /usr/pgi/linux86-64/11.10/lib/libintrinsics.il 4 -x 120 0x200000 -x 135 1 -x 68 0x1 -asm /tmp/pgccMcbbYxZUBfYm.s PGC/x86-64 Linux 11.10-0: compilation completed with informational messages 【アセンブラ(as)でオブジェクトを作成するフェーズ】 /usr/bin/as /tmp/pgccMcbbYxZUBfYm.s -o /tmp/pgccwcbbcwoqwBdv.o 【リンカーld でリンケージするフェーズとそのオプション】 /usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /usr/pgi/linux86-64/11.10/libso/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtbegin.o /usr/pgi/linux86-64/11.10/libso/initmp.o -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/pgi/linux86-64/11.10/lib/pgi.ld -L/usr/pgi/linux86-64/11.10/libso -L/usr/pgi/linux86-64/11.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 /tmp/pgccwcbbcwoqwBdv.o -rpath /usr/pgi/linux86-64/11.10/libso -rpath /usr/pgi/linux86-64/11.10/lib /usr/pgi/linux86-64/11.10/lib/nonuma.o -lpgmp -lpthread -lnspgc -lpgc -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtend.o /usr/lib64/crtn.o (リンクされるライブラリの順序等の確認が可能) Unlinking /tmp/pgccMcbbYxZUBfYm.s Unlinking /tmp/pgccwcbbcwoqwBdv.o