PGI コンパイル情報 最適化情報 オプション
PGI の F77, F2003, C, C++ コンパイラを使用する際に、コンパイラのメッセージ、実施した最適化、並列化情報を得るための基本的なオプションを説明します。以下は、主に、pgfortran を使用した場合の例ですが、コンパイラのオプションの設定方法は、他の言語コンパイラでも同じです。
2012年2月2日更新 Copyright © 株式会社ソフテック
$ pgfortran -fastsse -Mvect=prefetch -Minfo Test.f initia: 195, Loop unrolled 8 times 204, Loop unrolled 8 times fielde: 56, Interchange produces reordered loop nest: 57, 56 90, 1 loop-carried redundant expression removed with 3 operations and 4 arrays 231, Unrolling inner loop 8 times Generated prefetch instructions for 3 loads 239, Unrolling inner loop 8 times Generated prefetch instructions for 2 loads bounde: 269, Unrolling inner loop 8 times Generated prefetch instructions for 2 loads 360, Generating vector sse code for inner loop 383, Generating vector sse code for inner loop fieldh: 485, Unrolling inner loop 8 times Generated prefetch instructions for 4 loads boundh: 515, Generating vector sse code for inner loop Generated prefetch instructions for 2 loads array: 654, Unrolling inner loop 8 times 665, Unrolling inner loop 8 times 676, Unrolling inner loop 8 times 705, Generating vector sse code for inner loop 720, Generating vector sse code for inner loop 735, Generating vector sse code for inner loop 770, Generating vector sse code for inner loop 775, Generating vector sse code for inner loop build_xxx: 796, Generating vector sse code for inner loop Generated prefetch instructions for 3 loads
all 以下のサブオプションをすべて指定したものと解釈します。 accel PGI Accelerator最適化に関する情報 ccff common compiler feedback formatで最適化情報をオブジェクトファイル追加 ftn Fortran特有な情報の有効化 lre LRE情報の有効化 inline インライン化に関する情報 intensityループの計算密度の出力 ipa IPA最適化情報 loop ベクトル化等のループに関する情報 mp OpenMP並列化に関する情報 par 並列化に関する情報 opt 最適化に関する情報 pfo Profile Feed back最適化に関する情報 time コンパイル時間統計の出力 unroll アンロール最適化情報 par 並列化の情報の有効化 pfo プロファイル・フィードバックに関する情報の有効化 vect ベクトル化に関する情報
$ pgfortran -fastsse -Mneginfo Test.f main: 67, Loop not vectorized: contains call initia: 195, Loop not vectorized due to data dependency bound: 306, Loop not vectorized: contains call
$ pgfortran -fastsse -Mconcur -Minfo Test.f initia: 195, Loop unrolled 8 times 204, Parallel code activated if loop count >= 100; block distribution Loop unrolled 8 times fielde: 230, Parallel code for non-innermost loop generated; block distribution 231, Unrolling inner loop 8 times 238, Parallel code for non-innermost loop generated; block distribution 239, Unrolling inner loop 8 times bounde: 269, Parallel code activated if loop count >= 100; block distribution Unrolling inner loop 8 times 279, Parallel code activated if loop count >= 100; block distribution 295, Parallel code activated if loop count >= 100; block distribution 302, Parallel code activated if loop count >= 100; block distribution 321, Parallel code activated if loop count >= 100; block distribution 337, Parallel code activated if loop count >= 100; block distribution 346, Parallel code activated if loop count >= 100; block distribution 360, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop 369, Parallel code activated if loop count >= 100; block distribution 383, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop fieldh: 484, Parallel code for non-innermost loop generated; block distribution 485, Unrolling inner loop 8 times boundh: 515, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop array: 653, Parallel code for non-innermost loop generated; block distribution 654, Unrolling inner loop 8 times 664, Parallel code for non-innermost loop generated; block distribution 665, Unrolling inner loop 8 times 675, Parallel code for non-innermost loop generated; block distribution 676, Unrolling inner loop 8 times 696, Parallel code for non-innermost loop generated; block distribution 705, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop 711, Parallel code for non-innermost loop generated; block distribution 720, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop 726, Parallel code for non-innermost loop generated; block distribution 735, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop 770, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop 775, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop build_xxx: 796, Parallel code activated if loop count >= 100; block distribution Generating vector sse code for inner loop pml: 829, Parallel code for non-innermost loop generated; block distribution 850, Parallel code for non-innermost loop generated; block distribution 871, Parallel code for non-innermost loop generated; block distribution 892, Parallel code for non-innermost loop generated; block distribution
$ pgcc -c t.c (デフォルトは全てのメッセージを出す) PGC-W-0119-void function main cannot return value (t.c: 13) <== warning メッセージ PGC/x86-64 Linux 12.4-0: compilation completed with warnings $ pgcc -c -Minform=severe t.c (Severe、Fatal のみのメッセージを出す) PGC/x86-64 Linux 12.4-0: compilation completed with warnings
$ pgfortran -fastsse -Mlist -Minfo mat.f90 PGF90 (Version 11.10) 02/03/2012 16:56:41 page 1 Switches: -noasm -nodclchk -nodebug -nodlines -noline -list -inform severe -opt 2 -nosave -object -noonetrip -depchk on -nostandard -nosymbol -noupcase Filename: mat.f90 ( 1) program mat ( 2) integer i, j, k, size, l, m, n ( 3) parameter (size=16000) ! >2GB ( 4) parameter (m=size,n=size) ( 5) real*8 a(m,n),b(m,n),c(m,n),d ( 6) ( 7) do i = 1, m ( 8) do j = 1, n ( 9) a(i,j)=10000.0D0*dble(i)+dble(j) ( 10) b(i,j)=20000.0D0*dble(i)+dble(j) ( 11) enddo ( 12) enddo
$ pgfortran -fastsse -Manno -S Test.f あるいは、 $ pgcc -fastsse -Manno -Mkeepasm Test.c (Cの場合は、-Mkeepasmを入れる) .LB1_836: # lineno: 151 # DO 60 j = 1,n # a(j) = b(j) + scalar*c(j) # 60 CONTINUE movapd %xmm0, %xmm1 movapd (%esi,%ecx), %xmm2 movapd 16(%esi,%ecx), %xmm3 subl $8, %eax mulpd %xmm1, %xmm2 mulpd %xmm1, %xmm3 addpd (%edi,%ecx), %xmm2 addpd 16(%edi,%ecx), %xmm3 movapd %xmm2, (%edx,%ecx) movapd 32(%esi,%ecx), %xmm2 movapd %xmm3, 16(%edx,%ecx) mulpd %xmm1, %xmm2 mulpd 48(%esi,%ecx), %xmm1 addpd 32(%edi,%ecx), %xmm2 addpd 48(%edi,%ecx), %xmm1 movapd %xmm2, 32(%edx,%ecx) movapd %xmm1, 48(%edx,%ecx) addl $64, %ecx testl %eax, %eax jg .LB1_836 # lineno: 154
$ pgfortran -fastsse -flags -Minfo -Mlist mat.f90 Reading rcfile /usr/pgi/linux86-64/11.10/bin/.pgfortranrc -M[no]list Generate a listing file -fastsse == -fast -fast Common optimizations; includes -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline == -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -M[no]vect[=[no]altcode|[no]assoc|cachesize:<c>|[no]fuse|[no]gather|[no]idiom|levels:<n>|[no]partial |prefetch|[no]short|[no]simd|[no]sizelimit[:n]|[no]sse|[no]tile|[no]uniform] Control automatic vector pipelining [no]altcode Generate appropriate alternative code for vectorized loops [no]assoc Allow [disallow] reassociation cachesize:<c> Optimize for cache size c [no]fuse Enable [disable] loop fusion [no]gather Enable [disable] vectorization of indirect array references [no]idiom Enable [disable] idiom recognition levels:<n> Maximum nest level of loops to optimize [no]partial Enable [disable] partial loop vectorization via inner loop distribution prefetch Generate prefetch instructions [no]short Enable [disable] short vector operations [no]simd Generate [don't generate] SIMD instructions 128 Use 128-bit SIMD instructions 256 Use 256-bit SIMD instructions [no]sizelimit[:n] Limit size of vectorized loops [no]sse Generate [don't generate] SSE instructions [no]tile Enable [disable] loop tiling [no]uniform Perform consistent optimizations in both vectorized and residual loops; this may affect the performance of the residual loop -M[no]scalarsse Generate scalar sse code with xmm registers; implies -Mflushz -Mcache_align Align long objects on cache-line boundaries -M[no]flushz Set SSE to flush-to-zero mode -M[no]pre Enable partial redundancy elimination -Minfo[=all|accel|ccff|ftn|hpf|inline|intensity|ipa|loop|lre|mp|opt|par|pfo|stat|time|unified|vect] Generate informational messages about optimizations all -Minfo=accel,inline,ipa,loop,lre,mp,opt,par,unified,vect accel Enable Accelerator information ccff Append information to object file ftn Enable Fortran-specific information inline Enable inliner information intensity Enable compute intensity information ipa Enable IPA information loop Enable loop optimization information lre Enable LRE information mp Enable OpenMP information opt Enable optimizer information par Enable parallelizer information pfo Enable profile feedback information time Display time spent in compiler phases unified Enable unified binary information vect Enable vectorizer information
$ pgfortran -help=group Switch Classifications: (オプションスイッチのカテゴリを表示) overall Overall switches opt Optimization switches debug Debugging switches prepro Preprocessor switches asm Assembler switches linker Linker switches language Language-specific switches target Target-specific switches other Other switches 例えば、カテゴリ prepro (プリプロセス処理)に関するオプションスイッチを表示 $ pgfortran -help=prepro Preprocessor switches: -D<macro> Define a preprocessor macro -dD (C only) Print macros and values from source files -dI (C only) Print include file names -dM (C only) Print macros and values, including predefined and command-line macros -dN (C only) Print macro names from source files -E Stop after preprocessor; print output on standard output -F Stop after preprocessing, save output in .f file -I<incdir> Add directory to include file search path -Mcpp[=m|md|mm|mmd|line|[no]comment|suffix:<suff> |<suff> |include:<file> ] Just preprocess the input files m Print makefile dependencies md Print makefile dependencies to .d file mm Print makefile dependencies; ignore system includes mmd Print makefile dependencies to .d file; ignore system includes line Insert line numbers into preprocess output [no]comment Keep comments in preprocessed output suffix:<suff> Suffix to use for makefile dependencies <suff> Suffix to use for makefile dependencies include:<file> Include file before processing source file -Mnostddef Do not use standard macro definitions -Mnostdinc Do not use standard include directories -Mpreprocess Run preprocessor for assembly and Fortran files -U<macro> Undefine a preprocessor macro -YI,<incdir> Change standard include directory -Yp,<preprodir> Change preprocessor directory
$ pgcc -# bigadd.c 【PGI コンパイラによるコード生成フェーズ】 /usr/pgi/linux86-64/11.10/bin/pgc bigadd.c -opt 1 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp sandybridge -x 120 0x1000 -astype 0 -stdinc /usr/pgi/linux86-64/11.10/include :/usr/local/include: /usr/lib/gcc/x86_64-redhat-linux/4.4.5/include:/usr/lib/gcc/x86_64-redhat-linux/4.4.5/include: /usr/include -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def __NO_MATH_INLINES -def __x86_64__ -def __LONG_MAX__=9223372036854775807L -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __THROW= -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -predicate '#machine(x86_64) #lint(off) #system(posix) #cpu(x86_64)' -cmdline '+pgcc bigadd.c -# -mcmodel=medium' -x 123 0x80000000 -x 123 4 -x 2 0x400 -x 119 0x20 -def __pgnu_vsn=40405 -alwaysinline /usr/pgi/linux86-64/11.10/lib/libintrinsics.il 4 -x 120 0x200000 -x 135 1 -x 68 0x1 -asm /tmp/pgccMcbbYxZUBfYm.s PGC/x86-64 Linux 11.10-0: compilation completed with informational messages 【アセンブラ(as)でオブジェクトを作成するフェーズ】 /usr/bin/as /tmp/pgccMcbbYxZUBfYm.s -o /tmp/pgccwcbbcwoqwBdv.o 【リンカーld でリンケージするフェーズとそのオプション】 /usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o /usr/pgi/linux86-64/11.10/libso/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtbegin.o /usr/pgi/linux86-64/11.10/libso/initmp.o -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/pgi/linux86-64/11.10/lib/pgi.ld -L/usr/pgi/linux86-64/11.10/libso -L/usr/pgi/linux86-64/11.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 /tmp/pgccwcbbcwoqwBdv.o -rpath /usr/pgi/linux86-64/11.10/libso -rpath /usr/pgi/linux86-64/11.10/lib /usr/pgi/linux86-64/11.10/lib/nonuma.o -lpgmp -lpthread -lnspgc -lpgc -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtend.o /usr/lib64/crtn.o (リンクされるライブラリの順序等の確認が可能) Unlinking /tmp/pgccMcbbYxZUBfYm.s Unlinking /tmp/pgccwcbbcwoqwBdv.o