Hot reloading: polymorphism

This is part of an ongoing series on hot reloading, a core feature of Sanable Engine. It is closely related to Unreal’s live coding and Visual Studio’s edit-and-continue: those systems only rewrite changed bytecode, whereas Sanable’s hot reload feature unloads and reloads an entire DLL while also fixing object layouts.

The most fundamental issue of any hot-reload or live coding system is that code may grow or shrink. When a DLL is unloaded and loaded again, we cannot guarantee that it will appear in the same memory location. This means that if any polymorphic objects from that DLL are still alive, they are now ticking time bombs. The next time our program uses virtual functions, dynamic_cast, or typeid, we’d crash1–the built-in mechanisms for facilitating those now point to bad memory.

Some programs like Visual Studio/MSVC pad their functions ahead of time so they can keep bytecode loaded–if a function shrinks they can just rewrite in place, and if it grows they can have it redirect to a new block of memory. Others like Unity avoid this entirely by serializing before a reload, then deserializing to restore state–but anything unserializable is lost.

Almost all modern compilers use a vtable for all type info, which is a mechanism for resolving override polymorphism. Short for “virtual function table”, it contains pointers to the correct version of each virtual function for the corresponding type. By convention, compilers quietly insert a pointer to that type’s vtable (a vptr) at the start of the object layout, but in our case this pointer is the cause of the access violation exception.

We can see vptrs in full detail using Visual Studio’s new memory layout viewing feature. Note: Base (vfptr) is shorthand here for Derived1-in-Base (vfptr), which is relevant for multiple inheritance.

Sanable addresses the reload problem using a technique called vptr jamming or pointer hydration, which copies vptrs from one object to all living objects. A naive implementation (like Sanable’s first iteration) might rewrite just the first 4/8 bytes:

MyObject dummy; //Never used directly, just stealing a fresh vptr

memcpy(badObj, &dummy, sizeof(void*)); //All pointers have the same size, regardless of the pointed-to type

badObj->update(); //No longer crashes!

While this is easy to implement and test, it isn’t consistent. With multiple inheritance or virtual inheritance, we’d have multiple vptrs to overwrite, locking us out of using interface classes (IUpdatable). Additionally, compilers are allowed to store metadata however they see fit: MSVC sometimes adds an additional type info field right after the vptr, and some compilers skip the vtable to put function pointers in the object itself.

We can refine our approach if we know the exact layout of an object in memory, which normally would mean writing a compiler plugin. However, that would only yield information for that specific compiler. Thanks to the offsetof macro, we can identify which bytes correspond to explicitly defined fields, and by process of elimination which bytes were generated by the compiler. These gaps must be either compiler-generated constants (metadata we should capture), or padding generated to respect alignment requirements (safe to write/ignore).

Note: most optimizing compilers will append Derived’s virtual members to the “Derived cast to BaseA” vtable, so only one vptr would be emitted. (Source: Effective C++)

Writing out the members of every class is fragile and hard to reliably check for errors, so I wrote a dedicated tool for Sanable v2. While this uses Clang’s Abstract Syntax Tree, it runs as a pre-build step and generates source code that any compiler can use.

We can take this approach one step even further, differentiating between what is padding and what is an implicit field. We can create multiple dummy objects, first filling the memory they will occupy to some known value unique to that dummy. Constructors shouldn’t2 touch padding bytes (see this post), so the bytes that always match the known value must be padding. Bytes that change but don’t match the known value are instead marked as unknown/anomalous, and ignored.

Further steps

A significant drawback with this approach is that constructors must be called. Default constructors are the easy option, followed by constructors with trivial arguments that can be cast from 0, and default-constructible arguments. Nested nontrivial constructors are almost impossible to programmatically resolve. This also assumes that the constructors have no side effects.

One potential workaround would be to analyze bytecode at runtime. Although it would require per-platform definitions for instructions, it could be performed with any constructor without any side effects. The main instructions to look for would be mov/lea for setting vptrs, and call/ret to follow the full constructor hierarchy.

  1. An access violation (Windows) or segmentation fault (*nix). There’s no well-defined way to catch or recover from a crash like this, and even platform-specific methods may not work consistently. atexit hooks will still be called. ↩︎
  2. GCC is known to pre-fill objects’ entire memory, padding included. ↩︎