Demystifying C++ - Virtual Classes (Polymorphism)

From emmtrix Wiki
Jump to navigation Jump to search


Polymorphism, a core concept of object-oriented programming (OOP), allows different objects to be treated uniformly, typically through a common interface. C++ natively supports polymorphism through its class syntax, especially with the use of virtual methods.

C does not support native polymorphism, as it lacks classes or virtual methods. However, polymorphism in C can be emulated using structures, function pointers, and manual initialization of virtual tables, much like a C++ compiler would implement in assembler. The following examples show a simplified C code by changing global variable names and omitting elements from the virtual function table relevant to multiple inheritances.

To implement polymorphism in C, a __vptr is first inserted into the base class. This points to a table of function pointers. Depending on which class is derived from Base and which virtual methods are overridden, this pointer points to different tables. The derived class does not need its own __vptr, as it reuses the __vptr of the base class.

Virtual Classes
class Base {
public:
  virtual void func1() {}
  virtual void func2() {}
};

class Derived : public Base {
public:
  void func1() override {}
  virtual void func3() {}
};
struct Base_vftable1 {
    void (*func1)(struct Base *);
    void (*func2)(struct Base *);
};

struct Derived_vftable1 {
    struct Base_vftable1 base1;
    void (*func3)(struct Derived *);
};

struct Base {
    const struct Base_vftable1 *__vptr;
};

struct Derived {
    struct Base __base1;
};

If a virtual class does not have a constructor, an implicit constructor is created. The constructor first calls the constructors of the base classes and then sets the __vptr. In this example, Derived_ctor_complete first calls Base_ctor_base. In both functions, the __vptr is sequentially set to different values. Although this may initially seem inefficient and unintuitive, it has a crucial rationale. Thus, code in the Base constructor can call virtual methods without them having already been modified. In practice, this is often unnecessary, and the first __vptr initialization can be optimized away in subsequent compiler phases, resulting in no disadvantages.

Constructor Implicit
const struct Base_vftable1 Base_vtable = {&Base_func1, &Base_func2};
const struct Derived_vftable1 Derived_vtable = {{&Derived_func1, &Base_func2}, &Derived_func3};

inline void Base_ctor_base(struct Base *this) {
    this->__vptr = &Base_vtable;
}

inline void Derived_ctor_complete(struct Derived *this) {
  Base_ctor_base((struct Base*)this);
  this->__base1.__vptr = &Derived_vtable;
}

The call of a virtual method is relatively simple. The respective function pointer in __vptr is used and an indirect function call is executed.

Virtual Method Call
void call_virtual_method(Base* base) {
  base->func1();
}
void call_virtual_method(struct Base *base) {
    (*base->__vptr->func1)(base);
}

Indirect function calls are not only computationally intensive in the processor but also prevent compiler optimizations like inlining. Therefore, the compiler tries to resolve these in advance whenever possible. This technique is called devirtualization. In C++11, the 'final' keyword was introduced, which can be used for both virtual methods and classes. This keyword is not only a security feature, similar to 'public', 'private', and 'protected', but it also supports the compiler in devirtualization by clarifying that a method or class cannot be changed by a derived class.

Devirtualized Method Call
void call_devirtualized_method() {
  Derived d;

  d.func1();
}
void call_devirtualized_method() {
  Derived d;

  Derived_ctor_complete(&d);
  Derived_func1(&d);
}

Dynamic casting from base classes to a derived class is enabled in C++ by the dynamic_cast operator. In the background, the compiler uses the helper function __dynamic_cast, which is implemented in libcxx. This receives type information (RTTI) of the two classes, which contain the entire class hierarchy. Within the class hierarchy of the derived class, the base class is then searched for, and if a match is found, the pointer is adjusted by an offset. The RTTI information is represented in the example via global variables typeinfo_*, without going into their content. Interested readers can view the implementation of the __dynamic_cast function from the LLVM libcxx at [1].

Dynamic Cast
Derived& dynamic_cast_ref(Base& base) {
  return dynamic_cast<Derived&>(base);
}
struct Derived* dynamic_cast_ref(struct Base* base) {
    struct Derived* tmpPtr1 = __dynamic_cast(base, &typeinfo_Base, &typeinfo_Derived, 0);
    if (!tmpPtr1) {
        cxa_bad_cast();
    }
    return tmpPtr1;
}