We're hiring!
*

Dynamic relocs, runtime overflows and -fPIC

Philip Withnall avatar

Philip Withnall
October 01, 2014

Share this post:

Reading time:

While merrily compiling something a little while ago, my linker threw me this gem of an error message (using GNU gold):

error: libmumble.a(libmumble.o): requires dynamic R_X86_64_PC32 reloc against 'g_strdup' which may overflow at runtime; recompile with -fPIC

or, if you’re using GNU ld (the two linkers have different error messages for the same problem):

error: mumble.o: relocation R_X86_64_PC32 against symbol `g_strdup' can not be used when making a shared object; recompile with -fPIC

I recompiled everything with -fPIC, and magically the problem went away. But I didn’t understand why. I finally got a bit of time to investigate, so here we go.

tl;dr: This is caused by linking a shared library (which requires position-independent code, PIC) to a static library (which has not been compiled with PIC). You need to either link the shared library against a shared version of the static code (such as is produced automatically by libtool), or re-compile the static library with PIC enabled (-fPIC or -fpic).

To understand this, we need a brief introduction to the different types of linking, and how static objects and libraries differ from shared (or dynamic) objects and libraries. Let’s run with a minimal working example: two C files, shared.c and static.cstatic.c is compiled to a static archive, libstatic.a (without position-independent code, PIC), and shared.c is compiled to a shared object, libshared.so, which links against libstatic.a.

What is a static object? It’s one where all symbol references are resolved at compile time. What’s a dynamic object? One where symbol references can be resolved at runtime. This means that dynamic objects have to have relocations performed as they’re loaded, which incurs a load-time penalty, but allows for shared libraries and symbol interpositing.

It is these relocations which cause the problem hinted at by the error message above. Each relocation is effectively a note to the runtime loader instructing it to replace a symbol reference in the dynamic object being loaded, with an address calculated at load time.

There are various types of relocations, defined by the platform ABI, as they are specific to the processor’s instruction set. For a more in-depth account of them, see Relocations, Relocations by Michael Guyver. In this case, the R_X86_64_PC32 relocation was chosen by the compiler, which is defined by the AMD64 ABI (Table 4.10). What does that mean? Each relocation type is essentially a mathematical function to define the address of a relocated symbol, given the information in various symbol, section and relocation tables in the dynamic object. The ABI defines R_X86_64_PC32 as \(S+A-P\). Less succinctly, it is the offset of the referenced symbol, plus a constant adjustment (the addend) minus the offset of the relocation. This is all explained brilliantly by Michael Guyver on his blog.

So, with our example, we get the error:

$ make libshared.so
cc -Wall -c -o shared.o shared.c
cc -Wall -c -o static.o static.c
ar rcs libstatic.a static.o
cc -shared -o libshared.so shared.o libstatic.a
/usr/bin/ld: error: shared.o: requires dynamic R_X86_64_PC32 reloc against 'my_static_function' which may overflow at runtime; recompile with -fPIC
collect2: error: ld returned 1 exit status
make: *** [libshared.so] Error 1

If we look at the disassembly of the shared object:

$ objdump -d shared.o

shared.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <my_shared_function>:
   0:    55                       push   %rbp
   1:    48 89 e5                 mov    %rsp,%rbp
   4:    e8 00 00 00 00           callq  9 <my_shared_function+0x9>
   9:    5d                       pop    %rbp
   a:    c3                       retq

we can see at offset 4 that the callq instruction (calling my_static_function()) leaves 4 bytes for the address of the function to call (actually, callq is instruction-pointer-relative, so the 4 bytes are for the offset of the function from the RIP register).

As the code in libstatic.a is not PIC, it has to be loaded at a fixed offset in a process’ address space. The shared library, libshared.so, must be capable of being loaded anywhere in an address space. This would be fine if the callq instruction could take an absolute address to call, as the linker could substitute in the absolute address of my_static_function() (as is done on 32-bit systems). However, it cannot – it only has 4 bytes of operand to play with, rather than the 8 needed for a 64-bit address – so linking has to fail. And that’s why we get an error which talks about overflow.

What happens if libstatic.a is compiled with PIC enabled? Not a whole lot changes, actually. The disassembly of libstatic.a remains unchanged. shared.o gains a global object table (GOT) section and its relocation for the my_static_function() call changes from R_X86_64_PC32 toR_X86_64_PLT32 — a procedure linkage table (PLT) relocation using the GOT. We can see that in action in the disassembly of the successfully-linked libshared.so (with irrelevant bits omitted):

$ objdump --disassemble libshared.so 

libshared.so:     file format elf64-x86-64


Disassembly of section .plt:

00000000000005f0 <my_static_function@plt>:
 5f0:    ff 25 fa 13 00 00        jmpq   *0x13fa(%rip)        # 19f0 <_GLOBAL_OFFSET_TABLE_+0x28>
 5f6:    68 02 00 00 00           pushq  $0x2
 5fb:    e9 c0 ff ff ff           jmpq   5c0 <_init+0x20>

Disassembly of section .text:

00000000000006e8 <my_shared_function>:
 6e8:    55                       push   %rbp
 6e9:    48 89 e5                 mov    %rsp,%rbp
 6ec:    e8 ff fe ff ff           callq  5f0 <my_static_function@plt>
 6f1:    5d                       pop    %rbp
 6f2:    c3                       retq   
 6f3:    90                       nop

00000000000006f4 <my_static_function>:
 6f4:    55                       push   %rbp
 6f5:    48 89 e5                 mov    %rsp,%rbp
 6f8:    5d                       pop    %rbp
 6f9:    c3                       retq   
 6fa:    66 90                    xchg   %ax,%ax

Firstly, the callq instruction in my_shared_function() has acquired a non-zero operand. This is a constant offset from the instruction pointer at that instruction which references the entry formy_static_function() in the PLT, which we can see as my_static_function@plt in the .pltsection. Rather than being the code for the my_static_function(), this is actually a ‘trampoline’ which loads the address of my_static_function() from the GOT, then jumps to it. The GOT is set up by the runtime loader, and allows for the address of my_static_function() to be changed; for example when relocating it, or when interpositing a different version using LD_PRELOAD. By default, the GOT entry for my_static_function() will point to the implementation in the .textsection, as linked in from libstatic.a.

This trampolining through a PLT and GOT is the standard solution for producing position independent code, and demonstrates three things:

  • Exported functions incur a runtime cost (in the PLT) on every call. This can be eliminated for private symbols (or internal calls to public symbols, with -Bsymbolic), but not (easily) for public ones, as explained by Ian Lance Taylor. This cost is only three instructions; as they change control flow, they could be relatively expensive, but are probably also catered specifically for in modern superscalar 64-bit processors, as the majority of the code they execute will do indirect function calls this way. So the cost can be safely ignored for all but rather specific use cases.
  • Position independent code is easy to achieve, and the indirection it requires brings other benefits like the ever-useful LD_PRELOAD, used by developer tools everywhere.
  • Marking internal functions as static is important, because ELF exports functions by default, so internal function calls end up being indirected through the PLT if you omit thestatic modifier. (Though note that none of the functions here could have been marked as such, as they were all in different compilation units.)

So in summary:

  • The “requires dynamic R_X86_64_PC32 reloc against ‘mumble’ which may overflow at runtime; recompile with -fPIC” error is caused by attempting to link a shared library against a static object.
    • One solution is to compile a position-independent version of the static object. libtool does this automatically, so why aren’t you using libtool?
    • Another (highly related) solution is to link against a shared version of the static object.
  • This isn’t an issue on 32-bit systems because PIC is possible by default on those systems, since instruction operands are wide enough to contain absolute symbol addresses .
  • Compiling with position independent code introduces a procedure linkage table (PLT) and global offset table (GOT) for each object file, which are very hard to eliminate if you want to avoid the (small) function call overhead they introduce.
    • So you should avoid PIC if compiling for constrained targets like embedded devices.
    • But use it otherwise (e.g. on desktop systems) for the flexibility (the use of shared libraries!) and security (address space layout randomisation) it affords.

Source code for the example here is available on gitorious in the public domain.

 

Original post

Related Posts

Related Posts

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Faster inference: torch.compile vs TensorRT

19/12/2024

In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2024. All rights reserved. Privacy Notice. Sitemap.