SPRAB89A September 2011 – March 2014
There is a performance penalty for dynamic linking. Imported functions called via the PLT incur the overhead of an additional call, similar to a trampoline. If the function's address is accessed through the GOT, there also the overhead of an indirect access to load its address.
There is no penalty for near data addressed via DP. For far data, DP-relative addressing requires three instructions, versus two for position-dependent absolute addressing. For objects addressed via the GOT, there is the overhead of an additional reference to the GOT to load the address.
Symbol preemption significantly exacerbates the GOT penalty. Any symbol that may be preempted—that is, any global symbol defined in a shared library—must be treated by the compiler and static linker as if it were imported. Even a locally defined function must be called via the PLT, thereby precluding inlining or specialization. A locally defined variable must be accessed indirectly via the GOT. These restrictions apply to the code generated by the compiler so the losses generally cannot be recovered even if the symbol is not ultimately preempted.
The penalty due to preemption applies only to shared libraries. Symbols defined in an executable (that is, not a library), cannot be preempted.
Systems employ a handful of techniques to mitigate these effects. In some systems that follow the DLL model (Windows, Palm, Symbian) defined symbols are not considered exported unless specifically declared so.
In UNIX systems (including Linux), all external symbols are potentially dynamically linked, meaning a compiler must generate the inefficient GOT indirection for all such symbols. To alleviate this effect, the UNIX model adopts the import-as-own model, described in Section 15.10.
Toolchains may adopt additional vendor-specific ways of alleviating the preemption penalty, such as options or declaration specifiers that alter the default visibility of extern symbols.
The DSBT model introduces overhead in that exported functions must save and restore the DP, a cost of 3 instructions and 2 memory references. There is also the data size overhead of the table itself, which adds N+1 words to the data segment of each executable and library, where N is the maximum index of any library used by the application.