DEV Community

Prashant Sharma
Prashant Sharma

Posted on

Is there a way to determine if a shared object ( .so ) file was generated from a c or c++ code?

Shared object files (.so) are the backbone of dynamic linking in Linux and Unix systems. They allow programs to share code libraries efficiently, reducing memory usage and enabling modular development. But have you ever wondered whether a .so file was compiled from C or C++ source code?

This question, asked on StackOverflow, is more nuanced than it seems. While there’s no foolproof method, several techniques can help you make an educated guess. Let’s explore them.


Why Does It Matter?

Before diving into solutions, let’s understand why this distinction is useful:

  1. Debugging & Reverse Engineering – Knowing the original language helps when analyzing binaries or debugging crashes.
  2. Dependency Management – Some tools or scripts may behave differently based on the language.
  3. Security Audits – C++ binaries might use different runtime features (e.g., exceptions, RTTI) than C.
  4. Compatibility Checks – Ensuring ABI (Application Binary Interface) compatibility between libraries.

Methods to Identify the Source Language

1. Check for C++-Specific Symbols

C++ introduces name mangling to support function overloading, namespaces, and classes. Tools like nm or objdump can reveal these mangled names.

Using nm:

bash
nm -D /path/to/library.so | grep -E '(Z|_cxa)'

  • _Z prefixes indicate mangled C++ symbols (e.g., _Z3fooi for foo(int)).
  • __cxa symbols (e.g., __cxa_throw) are part of the C++ ABI for exceptions.

Using objdump:

bash
objdump -tT /path/to/library.so | grep -E '(Z|_cxa)'

Takeaway: If you see mangled names or C++ ABI symbols, the .so was likely compiled from C++.


2. Inspect the ELF Header and Sections

ELF (Executable and Linkable Format) files contain metadata that can hint at the language.

Using readelf:

bash
readelf -h /path/to/library.so

Look for:

  • .comment section: May contain compiler info (e.g., GCC: (GNU) 11.3.0).
  • .gnu.debuglink: Debug symbols might reveal source files (.cpp vs .c).

Using file:

bash
file /path/to/library.so

Output like ELF 64-bit LSB shared object, x86-64 won’t directly reveal the language, but it’s a starting point.

Takeaway: Compiler metadata can indirectly suggest the language, but it’s not definitive.


3. Analyze Runtime Dependencies

C++ binaries often link against libstdc++ (GCC) or libc++ (LLVM).

Using ldd:

bash
ldd /path/to/library.so | grep -E '(libstdc++|libc++)'

If these libraries appear, the .so was likely compiled from C++.

Takeaway: Dynamic linker dependencies are a strong indicator of C++.


4. Check for C++-Specific Features

C++ introduces features absent in C, such as:

  • Exceptions: Look for __cxa_throw or __cxa_begin_catch.
  • RTTI (Run-Time Type Information): Symbols like _ZTI (typeinfo) or _ZTV (vtable).
  • Standard Library Usage: Strings (std::string), containers (std::vector), etc.

Using strings:

bash
strings /path/to/library.so | grep -E '(std::|_ZTI|_ZTV)'

Takeaway: These features are almost exclusively C++.


5. Use Binary Analysis Tools

Advanced tools like:

  • Ghidra (NSA’s reverse engineering tool)
  • IDA Pro (commercial disassembler)
  • radare2 (open-source framework)

These can decompile sections of the .so and reveal language-specific patterns (e.g., C++ class layouts, virtual tables).

Takeaway: For deep analysis, these tools are invaluable but require expertise.


Limitations and Caveats

  1. Mixed-Language Binaries: A .so might contain both C and C++ code (e.g., a C++ library with C-compatible APIs).
  2. Compiler Optimizations: Aggressive optimizations can obfuscate symbols.
  3. Stripped Binaries: Debug symbols might be removed, hiding clues.
  4. Extern "C" Blocks: C++ code wrapped in extern "C" avoids name mangling.

Pro Tip: Combine multiple methods for higher confidence.


Developer Takeaways

  1. For Debugging: Start with nm and ldd for quick checks.
  2. For Security: Use readelf and strings to audit dependencies.
  3. For Reverse Engineering: Leverage Ghidra or IDA Pro for deep dives.
  4. For Automation: Script these checks (e.g., grep for _Z in nm output).

Conclusion

While there’s no 100% reliable way to determine if a .so file was generated from C or C++, combining symbol analysis, ELF metadata, and runtime dependencies can give you a strong indication. C++ leaves more distinctive fingerprints (mangled names, RTTI, exceptions), whereas C binaries tend to be simpler.

Happy hacking! 🚀

Top comments (0)