コンパイル情報表示オプション

PGI コンパイル情報　最適化情報　オプション

　 PGI の F77, F2003, C, C++ コンパイラを使用する際に、コンパイラのメッセージ、実施した最適化、並列化情報を得るための基本的なオプションを説明します。以下は、主に、pgfortran を使用した場合の例ですが、コンパイラのオプションの設定方法は、他の言語コンパイラでも同じです。
2012年2月2日更新 Copyright © 株式会社ソフテック

コンパイルの最適化の情報を得るためのオプション（-Minfo等)

■ コンパイラが最適化を施した部分の情報を得る

$ pgfortran -fastsse -Mvect=prefetch -Minfo Test.f

initia:
   195, Loop unrolled 8 times
   204, Loop unrolled 8 times
fielde:
    56, Interchange produces reordered loop nest: 57, 56
    90, 1 loop-carried redundant expression removed with 3 operations 
        and 4 arrays
   231, Unrolling inner loop 8 times
        Generated prefetch instructions for 3 loads
   239, Unrolling inner loop 8 times
        Generated prefetch instructions for 2 loads
bounde:
   269, Unrolling inner loop 8 times
        Generated prefetch instructions for 2 loads
   360, Generating vector sse code for inner loop
   383, Generating vector sse code for inner loop
fieldh:
   485, Unrolling inner loop 8 times
        Generated prefetch instructions for 4 loads
boundh:
   515, Generating vector sse code for inner loop
        Generated prefetch instructions for 2 loads
array:
   654, Unrolling inner loop 8 times
   665, Unrolling inner loop 8 times
   676, Unrolling inner loop 8 times
   705, Generating vector sse code for inner loop
   720, Generating vector sse code for inner loop
   735, Generating vector sse code for inner loop
   770, Generating vector sse code for inner loop
   775, Generating vector sse code for inner loop
build_xxx:
   796, Generating vector sse code for inner loop
        Generated prefetch instructions for 3 loads

-Minfo オプションは、コンパイル時の様々な情報を出力するためのオプションです。一般には、上記のとおりサブフラグなしで指定することで、ほとんどの最適化情報が得られます。
-Minfo=[flag] と言うサブフラグを指定すると表示する情報を制御できます。サブフラグの詳細は、コンパイル・オプション一覧を参照して下さい。-Minfo=all は全ての情報が表示されます。その他のフラグとしては、以下のものがあります。

　all      以下のサブオプションをすべて指定したものと解釈します。
　accel    PGI Accelerator最適化に関する情報
　ccff     common compiler feedback formatで最適化情報をオブジェクトファイル追加
　ftn      Fortran特有な情報の有効化
　lre      LRE情報の有効化
　inline   インライン化に関する情報
　intensityループの計算密度の出力
　ipa      IPA最適化情報
　loop     ベクトル化等のループに関する情報
　mp       OpenMP並列化に関する情報
　par      並列化に関する情報
　opt      最適化に関する情報
　pfo      Profile Feed back最適化に関する情報
　time     コンパイル時間統計の出力
　unroll   アンロール最適化情報
　par      並列化の情報の有効化
　pfo      プロファイル・フィードバックに関する情報の有効化
　vect     ベクトル化に関する情報

■ コンパイル時に最適化が不可能であった部分のみの情報を得る

$ pgfortran -fastsse -Mneginfo Test.f
main:
    67, Loop not vectorized: contains call
initia:
   195, Loop not vectorized due to data dependency
bound:
   306, Loop not vectorized: contains call

-Mneginfo オプションを指定すると、最適化できなかった部分とその理由を表示します。

■ 自動並列化を行った時の並列化情報メッセージ

$ pgfortran -fastsse -Mconcur -Minfo Test.f
initia:
   195, Loop unrolled 8 times
   204, Parallel code activated if loop count >= 100; block distribution
        Loop unrolled 8 times
fielde:
   230, Parallel code for non-innermost loop generated; block distribution
   231, Unrolling inner loop 8 times
   238, Parallel code for non-innermost loop generated; block distribution
   239, Unrolling inner loop 8 times
bounde:
   269, Parallel code activated if loop count >= 100; block distribution
        Unrolling inner loop 8 times
   279, Parallel code activated if loop count >= 100; block distribution
   295, Parallel code activated if loop count >= 100; block distribution
   302, Parallel code activated if loop count >= 100; block distribution
   321, Parallel code activated if loop count >= 100; block distribution
   337, Parallel code activated if loop count >= 100; block distribution
   346, Parallel code activated if loop count >= 100; block distribution
   360, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
   369, Parallel code activated if loop count >= 100; block distribution
   383, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
fieldh:
   484, Parallel code for non-innermost loop generated; block distribution
   485, Unrolling inner loop 8 times
boundh:
   515, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
array:
   653, Parallel code for non-innermost loop generated; block distribution
   654, Unrolling inner loop 8 times
   664, Parallel code for non-innermost loop generated; block distribution
   665, Unrolling inner loop 8 times
   675, Parallel code for non-innermost loop generated; block distribution
   676, Unrolling inner loop 8 times
   696, Parallel code for non-innermost loop generated; block distribution
   705, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
   711, Parallel code for non-innermost loop generated; block distribution
   720, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
   726, Parallel code for non-innermost loop generated; block distribution
   735, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
   770, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
   775, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
build_xxx:
   796, Parallel code activated if loop count >= 100; block distribution
        Generating vector sse code for inner loop
pml:
   829, Parallel code for non-innermost loop generated; block distribution
   850, Parallel code for non-innermost loop generated; block distribution
   871, Parallel code for non-innermost loop generated; block distribution
   892, Parallel code for non-innermost loop generated; block distribution

-Mconcur オプションは自動並列化を行うオプションです。-Minfo により、コンパイラが抽出した並列化対象部分に対する並列コード生成の情報を出力しています。

■ コンパイラメッセージの出力レベルを指示する

$ pgcc -c t.c (デフォルトは全てのメッセージを出す）
PGC-W-0119-void function main cannot return value (t.c: 13) <== warning　メッセージ
PGC/x86-64 Linux 12.4-0: compilation completed with warnings

$ pgcc -c -Minform=severe t.c (Severe、Fatal のみのメッセージを出す）
PGC/x86-64 Linux 12.4-0: compilation completed with warnings

-Minform=level オプションは、コンパイラのメッセージのレベルを指定します。
- -Minform=inform : inform、warn, severe, fatal の全てのメッセージを出力
- -Minform=warn : warn, severe, fatal メッセージを出力
- -Minform=severe : severe, fatal メッセージを出力
- -Minform=fatal : fatalメッセージのみ出力

■ ソースプログラムのリスティング・ファイルを作成する

$ pgfortran -fastsse -Mlist -Minfo mat.f90
PGF90 (Version    11.10)          02/03/2012  16:56:41      page 1

Switches: -noasm -nodclchk -nodebug -nodlines -noline -list
          -inform severe -opt 2 -nosave -object -noonetrip
          -depchk on -nostandard
          -nosymbol -noupcase

Filename: mat.f90

(    1) program mat
(    2) integer i, j, k, size, l, m, n
(    3) parameter (size=16000) ! >2GB
(    4) parameter (m=size,n=size)
(    5) real*8 a(m,n),b(m,n),c(m,n),d
(    6)
(    7) do i = 1, m
(    8) do j = 1, n
(    9)   a(i,j)=10000.0D0*dble(i)+dble(j)
(   10)   b(i,j)=20000.0D0*dble(i)+dble(j)
(   11) enddo
(   12) enddo

-Mlist オプションを指定すると、{file}.lst と言う名称でソース・リスティングファイルを作成します。また、コンパイル時に使用された最適化スイッチの内容も記されます。

■ ソースプログラムとその生成アセンブラのリスティングを対応付けて出力する

$ pgfortran -fastsse -Manno -S Test.f
あるいは、
$ pgcc -fastsse -Manno -Mkeepasm Test.c (Cの場合は、－Mkeepasmを入れる）

.LB1_836:
# lineno: 151

#           DO 60 j = 1,n
#               a(j) = b(j) + scalar*c(j)
#    60     CONTINUE
        movapd  %xmm0, %xmm1
        movapd  (%esi,%ecx), %xmm2
        movapd  16(%esi,%ecx), %xmm3
        subl    $8, %eax
        mulpd   %xmm1, %xmm2
        mulpd   %xmm1, %xmm3
        addpd   (%edi,%ecx), %xmm2
        addpd   16(%edi,%ecx), %xmm3
        movapd  %xmm2, (%edx,%ecx)
        movapd  32(%esi,%ecx), %xmm2
        movapd  %xmm3, 16(%edx,%ecx)
        mulpd   %xmm1, %xmm2
        mulpd   48(%esi,%ecx), %xmm1
        addpd   32(%edi,%ecx), %xmm2
        addpd   48(%edi,%ecx), %xmm1
        movapd  %xmm2, 32(%edx,%ecx)
        movapd  %xmm1, 48(%edx,%ecx)
        addl    $64, %ecx
        testl   %eax, %eax
        jg      .LB1_836
# lineno: 154

-Manno はアセンブリコードと共にソースコードを注釈するオプションです。-Manno -S の指定により、アセンブラ・リスティング・ファイル {file}.s ファイルが生成され、その中にソース・リストとそれに対するアセンブラアセンブラ・リストが両方表示される。
-Sオプションはリンケージ処理を行わない。このコマンド実行後、Test.s と言うアセンブラ・リスティング・ファイルに、対応するソース・コードも注釈される。
-Mkeepasm は、生成されたアセンブルのリスティングを xxxx.s ファイルに出力します。

■ 指定したコンパイル・オプションの意味を知る

$ pgfortran -fastsse -flags -Minfo -Mlist mat.f90

Reading rcfile /usr/pgi/linux86-64/11.10/bin/.pgfortranrc
-M[no]list          Generate a listing file
-fastsse            == -fast
-fast               Common optimizations; includes -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline
                    == -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre
-M[no]vect[=[no]altcode|[no]assoc|cachesize:<c>|[no]fuse|[no]gather|[no]idiom|levels:<n>|[no]partial
|prefetch|[no]short|[no]simd|[no]sizelimit[:n]|[no]sse|[no]tile|[no]uniform]
                    Control automatic vector pipelining
    [no]altcode     Generate appropriate alternative code for vectorized loops
    [no]assoc       Allow [disallow] reassociation
    cachesize:<c>   Optimize for cache size c
    [no]fuse        Enable [disable] loop fusion
    [no]gather      Enable [disable] vectorization of indirect array references
    [no]idiom       Enable [disable] idiom recognition
    levels:<n>      Maximum nest level of loops to optimize
    [no]partial     Enable [disable] partial loop vectorization via inner loop distribution
    prefetch        Generate prefetch instructions
    [no]short       Enable [disable] short vector operations
    [no]simd        Generate [don't generate] SIMD instructions
     128            Use 128-bit SIMD instructions
     256            Use 256-bit SIMD instructions
    [no]sizelimit[:n]
                    Limit size of vectorized loops
    [no]sse         Generate [don't generate] SSE instructions
    [no]tile        Enable [disable] loop tiling
    [no]uniform     Perform consistent optimizations in both vectorized and residual loops; 
                    this may affect the performance of the residual loop
-M[no]scalarsse     Generate scalar sse code with xmm registers; implies -Mflushz
-Mcache_align       Align long objects on cache-line boundaries
-M[no]flushz        Set SSE to flush-to-zero mode
-M[no]pre           Enable partial redundancy elimination
-Minfo[=all|accel|ccff|ftn|hpf|inline|intensity|ipa|loop|lre|mp|opt|par|pfo|stat|time|unified|vect]
                    Generate informational messages about optimizations
    all             -Minfo=accel,inline,ipa,loop,lre,mp,opt,par,unified,vect
    accel           Enable Accelerator information
    ccff            Append information to object file
    ftn             Enable Fortran-specific information
    inline          Enable inliner information
    intensity       Enable compute intensity information
    ipa             Enable IPA information
    loop            Enable loop optimization information
    lre             Enable LRE information
    mp              Enable OpenMP information
    opt             Enable optimizer information
    par             Enable parallelizer information
    pfo             Enable profile feedback information
    time            Display time spent in compiler phases
    unified         Enable unified binary information
    vect            Enable vectorizer information

-flags オプションを指定すると、実際のコンパイルは行いませんが、指定したコンパイルオプションの意味とそのフラグを表示します。
-Sオプションはリンケージ処理を行わない。このコマンド実行後、Test.s と言うアセンブラ・リスティング・ファイルに、対応するソース・コードも注釈される。
同様なオプションとして、-help がありますが、このオプションは、PGI コンパイラが提供する全オプションのリストとその意味を標準出力に出力します。また、 -help -{option} を同時に指定すると、上記 -flags と同等な機能を提供します。

■ オプションヘルプ(-help) でオプションの意味を調べる

$ pgfortran -help=group
Switch Classifications: （オプションスイッチのカテゴリを表示）
overall             Overall switches
opt                 Optimization switches
debug               Debugging switches
prepro              Preprocessor switches
asm                 Assembler switches
linker              Linker switches
language            Language-specific switches
target              Target-specific switches
other               Other switches

例えば、カテゴリ prepro (プリプロセス処理）に関するオプションスイッチを表示
$ pgfortran -help=prepro

Preprocessor switches:
-D<macro>           Define a preprocessor macro
-dD                 (C only) Print macros and values from source files
-dI                 (C only) Print include file names
-dM                 (C only) Print macros and values, including predefined and command-line macros
-dN                 (C only) Print macro names from source files
-E                  Stop after preprocessor; print output on standard output
-F                  Stop after preprocessing, save output in .f file
-I<incdir>           Add directory to include file search path
-Mcpp[=m|md|mm|mmd|line|[no]comment|suffix:<suff> |<suff> |include:<file> ]
                    Just preprocess the input files
    m               Print makefile dependencies
    md              Print makefile dependencies to .d file
    mm              Print makefile dependencies; ignore system includes
    mmd             Print makefile dependencies to .d file; ignore system includes
    line            Insert line numbers into preprocess output
    [no]comment     Keep comments in preprocessed output
    suffix:<suff>    Suffix to use for makefile dependencies
    <suff>           Suffix to use for makefile dependencies
    include:<file>   Include file before processing source file
-Mnostddef          Do not use standard macro definitions
-Mnostdinc          Do not use standard include directories
-Mpreprocess        Run preprocessor for assembly and Fortran files
-U<macro>           Undefine a preprocessor macro
-YI,<incdir>         Change standard include directory
-Yp,<preprodir>      Change preprocessor directory

-help オプションでオプションのカテゴリ別に整理されたオプションフラグの意味を表示します。

■ PGI コンパイラの内部手続き（コード生成、アセンブラ、リンケージ）の詳細オプションを見る

$ pgcc -# bigadd.c
						
【PGI コンパイラによるコード生成フェーズ】

/usr/pgi/linux86-64/11.10/bin/pgc bigadd.c -opt 1 -x 119 0xa10000 -x 122 0x40
 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000
 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -x 59 4 -x 59 4 -tp sandybridge -x 120
 0x1000 -astype 0 -stdinc /usr/pgi/linux86-64/11.10/include :/usr/local/include:
 /usr/lib/gcc/x86_64-redhat-linux/4.4.5/include:/usr/lib/gcc/x86_64-redhat-linux/4.4.5/include:
/usr/include -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ 
 -def __NO_MATH_INLINES -def __x86_64__ -def __LONG_MAX__=9223372036854775807L
 -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __THROW= 
 -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ 
 -def __SSSE3__ -predicate '#machine(x86_64) #lint(off) #system(posix) #cpu(x86_64)' 
 -cmdline '+pgcc bigadd.c -# -mcmodel=medium' -x 123 0x80000000 -x 123 4 -x 2 0x400 
 -x 119 0x20 -def __pgnu_vsn=40405 -alwaysinline /usr/pgi/linux86-64/11.10/lib/libintrinsics.il 4 
 -x 120 0x200000 -x 135 1 -x 68 0x1 -asm /tmp/pgccMcbbYxZUBfYm.s
 PGC/x86-64 Linux 11.10-0: compilation completed with informational messages

【アセンブラ(as)でオブジェクトを作成するフェーズ】

/usr/bin/as /tmp/pgccMcbbYxZUBfYm.s -o /tmp/pgccwcbbcwoqwBdv.o

【リンカーld でリンケージするフェーズとそのオプション】

/usr/bin/ld /usr/lib64/crt1.o /usr/lib64/crti.o 
 /usr/pgi/linux86-64/11.10/libso/trace_init.o /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtbegin.o
 /usr/pgi/linux86-64/11.10/libso/initmp.o -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2
 /usr/pgi/linux86-64/11.10/lib/pgi.ld -L/usr/pgi/linux86-64/11.10/libso 
 -L/usr/pgi/linux86-64/11.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5
 /tmp/pgccwcbbcwoqwBdv.o -rpath /usr/pgi/linux86-64/11.10/libso
 -rpath /usr/pgi/linux86-64/11.10/lib /usr/pgi/linux86-64/11.10/lib/nonuma.o -lpgmp 
 -lpthread -lnspgc -lpgc -lm -lgcc -lc -lgcc /usr/lib/gcc/x86_64-redhat-linux/4.4.5/crtend.o 
 /usr/lib64/crtn.o  (リンクされるライブラリの順序等の確認が可能）
Unlinking /tmp/pgccMcbbYxZUBfYm.s
Unlinking /tmp/pgccwcbbcwoqwBdv.o

-# オプションを指定すると、PGI コンパイラの内部手続き（コード生成、アセンブラ、リンケージ）の各コマンドのエコーバックが得られます。また各コマンドには、その詳細全オプションが出力されます。
-Sオプションはリンケージ処理を行わない。このコマンド実行後、Test.s と言うアセンブラ・リスティング・ファイルに、対応するソース・コードも注釈される。
このオプションが有効な場面は、リンケージにおけるライブラリのリンク状況を知ることができるため、例えば静的リンク時におけるトラブルとして多い、未解決なライブラリがどのような順番でリンクされているか等の確認ができます。

無償版 PGI Community Edition をお試しください

ページの先頭へ