Both C++ and Jave provide overloaded function and methods, which are methods with the same types but different parameter lists. Selecting the correct version is done at compile time. Though the overloaded functions have the same name in the source code, they need to be translated into different assembler-level names, since typical assemblers and linkers cannot handle overloading. This process of encoding the parameter types with the method name into a unique name is called name mangling. The inverse process is called demangling.
It is convenient that C++ and Java use compatible mangling schemes, since the makes life easier for tools such as gdb, and it eases integration between C++ and Java.
Note there is also a standard "Jave Native Interface" (JNI) which implements a different calling convention, and uses a different mangling scheme. The JNI is a rather abstract ABI so Java can call methods written in C or C++; we are concerned here about a lower-level interface primarily intended for methods written in Java, but that can also be used for C++ (and less easily C).
C++ mangles a method by emitting the function name, followed by __
,
followed by encodings of any method qualifiers (such as const
),
followed by the mangling of the method's class,
followed by the mangling of the parameters, in order.
For example Foo::bar(int, long) const
is mangled
as `bar__C3Fooil'.
For a constructor, the method name is left out.
That is Foo::Foo(int, long) const
is mangled
as `__C3Fooil'.
GNU Java does the same.
The C++ types int
, long
, short
, char
,
and long long
are mangled as `i', `l',
`s', `c', and `x', respectively.
The corresponding unsigned types have `U' prefixed
to the mangling. The type signed char
is mangled `Sc'.
The C++ and Java floating-point types float
and double
are mangled as `f' and `d' respectively.
The C++ bool
type and the Java boolean
type are
mangled as `b'.
The C++ wchar_t
and the Java char
types are
mangled as `w'.
The Java integral types byte
, short
, int
and long
are mangled as `c', `s', `i',
and `x', respectively.
C++ code that has included javatypes.h
will mangle
the typedefs jbyte
, jshort
, jint
and jlong
as respectively `c', `s', `i',
and `x'. (This has not been implemented yet.)
A simple class, package, template, or namespace name is
encoded as the number of characters in the name, followed by
the actual characters. Thus the class Foo
is encoded as `3Foo'.
If any of the characters in the name are not alphanumeric (i.e not one of the standard ASCII letters, digits, or '_'), or the initial character is a digit, then the name is mangled as a sequence of encoded Unicode letters. A Unicode encoding starts with a `U' to indicate that Unicode escapes are used, followed by the number of bytes used by the Unicode encoding, followed by the bytes representing the encoding. ASSCI letters and non-initial digits are encoded without change. However, all other characters (including underscore and initial digits) are translated into a sequence starting with an underscore, followed by the big-endian 4-hex-digit lower-case encoding of the character.
If a method name contains Unicode-escaped characters, the entire mangled method name is followed by a `U'.
For example, the method X\u0319::M\u002B(int)
is encoded as
`M_002b__U6X_0319iU'.
A C++ pointer type is mangled as `P' followed by the mangling of the type pointed to.
A C++ reference type as mangled as `R' followed by the mangling of the type referenced.
A Java object reference type is equivalent to a C++ pointer parameter, so we mangle such an parameter type as `P' followed by the mangling of the class name.
Both C++ and Java allow a class to be lexically nested inside another class. C++ also supports namespaces (not yet implemented by G++). Java also supports packages.
These are all mangled the same way: First the letter `Q' indicates that we are emitting a qualified name. That is followed by the number of parts in the qualified name. If that number is 9 or less, it is emitted with no delimiters. Otherwise, an underscore is written before and after the count. Then follows each part of the qualified name, as described above.
For example Foo::\u0319::Bar
is encoded as
`Q33FooU5_03193Bar'.
A class template instantiation is encoded as the letter `t', followed by the encoding of the template name, followed the number of template parameters, followed by encoding of the template parameters. If a template parameter is a type, it is written as a `Z' followed by the encoding of the type.
A function template specialization (either an instantiation or an explicit specialization) is encoded by an `H' followed by the encoding of the template parameters, as described above, followed by an `_', the encoding of the argument types template function (not the specialization), another `_', and the return type. (Like the argument types, the return type is the return type of the function template, not the specialization.) Template parameters in the argument and return types are encoded by an `X' for type parameters, or a `Y' for constant parameters, and an index indicating their position in the template parameter list declaration.
C++ array types are mangled by emitting `A', followed by the length of the array, followed by an `_', followed by the mangling of the element type. Of course, normally array parameter types decay into a pointer types, so you don't see this.
Java arrays are objects. A Java type T[]
is mangled
as if it were the C++ type JArray<T>
.
For example java.lang.String[]
is encoded as
`Pt6JArray1ZPQ34java4lang6String'.
The following special characters are used in mangling:
bool
type,
and the Java boolean
type.
char
type, and the Java byte
type.
const
type.
Also used to indicate a const
member function
(in which cases it precedes the encoding of the method's class).
double
types.
...
.
float
types.
int
types.
long
type.
long double
type.
short
types.
char
.
Also used as a modifier to indicate a static member function.
void
types.
const
type or method.
wchar_t
type, and the Java char
types.
long long
type, and the Java long
type.
The letters `G', `M', `O', and `p' also seem to be used for obscure purposes ...
Go to the first, previous, next, last section, table of contents.