Issue
I am new to c++ symbol tables and libraries, wanted to understand the behavior of symbol table. We are having an android application with native support on it. In process of analyzing symbol tables of shared libraries, I am noticing duplicate symbols present in .so file. Please find the sample list of symbol table.
0162502c w DO .data 00000004 Base boost::asio::error::get_addrinfo_category()::instance
00aaa4f4 w DF .text 0000009c Base boost::asio::error::get_misc_category()
01626334 w DO .bss 00000004 Base guard variable for boost::asio::error::get_misc_category()::instance
00aab4d0 w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
00aab368 w DF .text 0000003c Base boost::asio::error::detail::addrinfo_category::~addrinfo_category()
00aab3a4 w DF .text 00000034 Base boost::asio::error::detail::addrinfo_category::name() const
00aab3d8 w DF .text 000000f8 Base boost::asio::error::detail::addrinfo_category::message(int) const
00aab50c w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
Here you can notice following symbol "boost::asio::error::detail::misc_category::~misc_category()" appearing twice.
I wanted to understand why are we getting duplicate symbols in symbol table. Also interested to know why my app is running fine when there are duplicate symbols [ which linker should ideally throw duplicate symbols error ] Also would like to know does having duplicate symbols in symbol tables would increase the size of "so" eventually leading to increasing in the size of app
If this happens, how could I ensure that I get only unique entries in symbol table. Note:- we are using clang
Solution
I am noticing duplicate symbols present in .so file
Like this?
$ cat foo.c
int foo(void)
{
return 42;
}
Compile:
$ gcc -Wall -fPIC -c foo.c
Check symbols in the object file for foo
:
$ readelf -s foo.o | grep foo
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
8: 0000000000000000 11 FUNC GLOBAL DEFAULT 1 foo
One hit.
Make a shared library:
$ gcc -Wall -shared -o libfoo.so foo.o
Check symbols in the shared library for foo
:
$ readelf -s libfoo.so | grep foo
5: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
44: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
Now two hits.
Nothing is wrong here. See some more of the picture:
$ readelf -s foo.o | egrep '(foo|Symbol table|Ndx)'
Symbol table '.symtab' contains 9 entries:
Num: Value Size Type Bind Vis Ndx Name
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
8: 0000000000000000 11 FUNC GLOBAL DEFAULT 1 foo
An object file has one symbol table, its static symbol table .symtab
,
that is used by the linker for link-time symbol resolution. But:
$ readelf -s libfoo.so | egrep '(foo|Symbol table|Ndx)'
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
5: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
Symbol table '.symtab' contains 48 entries:
Num: Value Size Type Bind Vis Ndx Name
29: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
44: 000000000000057a 11 FUNC GLOBAL DEFAULT 9 foo
a shared library has two symbol tables: a static symbol table .symtab
, like
an object file, plus a dynamic symbol table, .dynsym
, used by the loader for run-time symbol resolution.
When you link object files into a shared library, the linker by default transcribes the
GLOBAL
symbols from their .symtab
s into the .symtab
and the .dynsym
of the shared
library, except for those symbols that have HIDDEN
visibility in the object files
(which they get from being defined with the attribute of hidden visibility
at compilation).
Any GLOBAL
symbols with HIDDEN
visibility in the object files are transcribed as LOCAL
symbols
with DEFAULT
visibility into the .symtab
of the shared library and are not transcribed
into the .dynsym
of the shared library at all. So when the shared library is linked with
anything else, neither the linker nor the loader can see the global symbols that were HIDDEN
at compilation.
But apart from hidden symbols, of which there are often none, the same global symbols
will appear in the .symtab
and the .dynsym
tables of a shared library. Each defined symbol
that appears in both tables addresses the same definition.
Later, OP comments
I took the symbol table by running objdump -T command, which should ideally list symbols present only in dynamic symbol table.
This steers us to a different explanation, because objdump -T
does indeed report only
the dynamic symbol table (like readelf --dyn-syms
).
Notice that the symbol reported twice:
...
00aab4d0 w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
...
00aab50c w DF .text 0000003c Base boost::asio::error::detail::misc_category::~misc_category()
...
is classified w
in column 2 (as are all the other symbols in your snippet). What objdump
means by that is
that the symbol is weak.
Let's repoduce the observation:
foo.hpp
#pragma once
#include <iostream>
struct foo
{
explicit foo(int i)
: _i{i}
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
~foo()
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
int _i = 0;
};
bar.cpp
#include "foo.hpp"
foo bar()
{
return foo(2);
}
gum.cpp
#include "foo.hpp"
foo gum()
{
return foo(1);
}
Compile and make a shared library:
$ g++ -Wall -Wextra -c -fPIC bar.cpp gum.cpp
$ g++ -shared -o libbargum.so bar.o gum.o
See what dynamic symbols objdump
reports from struct foo
:
$ objdump -CT libbargum.so | grep 'foo::'
00000000000009bc w DF .text 0000000000000046 Base foo::foo(int)
00000000000009bc w DF .text 0000000000000046 Base foo::foo(int)
Duplicate weak exports of the constructor foo::foo(int)
. Just like what you
noticed.
Hang on a tick though. foo::foo(int)
is a C++ method signature, but not
actually a symbol that the linker can recognise. Let's do that again, this time
without demangling:
$ objdump -T libbargum.so | grep 'foo'
00000000000009bc w DF .text 0000000000000046 Base _ZN3fooC1Ei
00000000000009bc w DF .text 0000000000000046 Base _ZN3fooC2Ei
Now we see the symbols the linker sees, and the duplication is no longer to be seen:
_ZN3fooC1Ei
!= _ZN3fooC2Ei
, although both symbols have the same address and
$ c++filt _ZN3fooC1Ei
foo::foo(int)
$ c++filt _ZN3fooC2Ei
foo::foo(int)
they both demangle to the same thing, foo::foo(int)
. There are in fact 5
distinct symbols - _ZN3fooC
NEi
, for 1 <= N <= 5 - that demangle to foo::foo(int)
.
(And g++
actually uses _ZN3fooC1Ei
, _ZN3fooC2Ei
and _ZN3fooC5Ei
in the object
files bar.o
and gum.o
).
So in reality, there are no duplicated symbols in the dynamic symbol table: the sneaky many-to-one nature of the name-demangling mapping just makes it look that way.
But why?
I'm afraid the answer to that is too long and complicated for here.
Executive Summary
The GCC C++ compiler employs the two weak symbols that demangle identically to refer to a global inline class-method in different ways, as part of its stock formula for enabling the successful linkaqe of global inline class-methods in Position Independent Code. This is a non-neglible problem for any compiler, and the GCC formula for it is not the only possible one. Clang has a different solution, that does involve the use of synonymous but distinct symbols and so doesn't give rise to the illusory "duplication" of symbols that you've seen.
Answered By - Mike Kinghan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.