Pointer Tagging in C++: The Art of Packing Bits Into a Pointer
Using tagged pointers to save memory, speed up dynamic dispatch, and compact data structures
A 64-bit pointer can address over 18 exabytes (18 billion gigabytes) of memory, which far exceeds the needs of even the most top-end supercomputers1. For this reason, most modern desktop CPUs only use 48 bits for virtual addresses2, leaving the upper 16 bits unused.
The bottom bits are typically underutilized, too. Most malloc
implementations align allocations to 16-byte boundaries, so the bottom four bits are always set to zero. These bottom four bits can be repurposed to store extra data, as long as those bits are cleared back to zero when dereferencing the pointer.
Taken together, a 16-byte aligned pointer on a CPU with 48 bits of virtual address space has a total of 20 bits available for packing extra data. This concept may sound abstract — after all, who cares about some measly 20 bits? — but tagging pointers with extra data is a very common technique, so much so that x64 and ARM have included features to support it at the ISA-level3.
Any data structure that uses a significant number of pointers can benefit immensely from these techniques. Some examples include:
Dynamic type information in high-level, untyped languages
Chrome’s V8 engine uses pointer tagging to distinguish whether a value represents a raw integer or a reference to a heap-allocated object.4
Objective-C uses a similar approach: if an object is small enough to fit within a pointer’s worth of memory, the runtime stores its data directly in the pointer (along with a tag) instead of allocating it on the heap.5
Tree-like data structures where each node can be interpreted multiple ways
In the Linux kernel’s implementation of a red–black tree, one bit of each node’s parent pointer is used to determine whether a node is red or black.6
Dynamic Polymorphism
PBRT uses tagged pointers in place of a per-object v-table, reducing the overhead of dynamic dispatch.7
The benefits go beyond just space savings alone. Rather than traversing a pointer every single time, packed data in a pointer is already likely to be local in the CPU cache. It can be used to very quickly determine if a pointer should even be traversed in the first place.
An implementation in C++
Pointer Bit Manipulation In Practice
To make the details of tagged pointers more concrete, we can look at an implementation in C++. For this article, we’ll assume a 48-bit canonical address space (though keep in mind that this detail is platform-dependent), giving us 16 high bits to work with.
0x0000 0048 1030 8C90 0200
└───┬───┘ └─────┬──────┘
Data Bits 48-bit Address
These topmost 16 bits aren’t completely ignored by the CPU. Before dereferencing the pointer, the packed data bits have to be masked off properly. In the case of x64, the upper bits also have to be sign-extended from the 47ᵗʰ bit.
We can wrap a normal C++ pointer in a small helper type that handles all the shifting and masking automatically.
template <typename T>
struct tagged_ptr {
T *Ptr;
};
As mentioned before, packing the data into a pointer is a set of relatively straightforward bitwise operations: shift the packed data up by 48 bits, clear the top bits of the target pointer, and bitwise OR the results together. Extracting the packed data is just a matter of bitshifting the data back down by 48 bits.
static constexpr u64 CanonicalAddressSize = 48;
static constexpr u64 PtrMask = 0x0000FFFFFFFFFFFF;
// ...
tagged_ptr() : Ptr(0) { }
tagged_ptr(T *InPtr, u16 Data = 0) : Ptr(InPtr) {
PackData(Data);
}
void PackData(u16 Data) {
u64 PackedData = (u64)Data << CanonicalAddressSize;
u64 MaskedPtr = (u64)this->Ptr & PtrMask;
u64 Result = PackedData | MaskedPtr;
this->Ptr = (T *)Result;
}
u16 ExtractData() const {
u64 Result = (u64)this->Ptr >> CanonicalAddressSize;
return (u16)Result;
}
// ...
Getting the canonical pointer address requires that the compacted data in the upper bits of the pointer be cleared. x64 also requires that the most significant bits be sign-extended.
// ...
T *GetRawPointer() const {
u64 MaskedPtr = (u64)this->Ptr & PtrMask;
#if defined(__x86_64__) || defined(_M_X64)
// Set all top bits to ones if 47th bit is enabled
u64 SignBit = 1ULL << 47ULL;
u64 SignExtensionBits = 0xFFFF000000000000;
if (MaskedPtr & SignBit) {
MaskedPtr |= SignExtensionBits;
}
#endif
T *Result = (T *)MaskedPtr;
return Result;
}
void SetRawPointer(T *Other) {
u16 Data = ExtractData();
Ptr = Other;
PackData(Data);
}
// ...
The standard C++ pointer operators can be overloaded to act on the untagged part of the pointer, automatically masking out the data bits when needed.
// ...
bool operator==(tagged_ptr<T> Other) const {
return this->GetRawPointer() == Other->GetRawPointer();
}
bool operator!=(tagged_ptr<T> Other) const {
return this->GetRawPointer() != Other->GetRawPointer();
}
explicit operator bool() const {
return GetRawPointer() != 0;
}
T operator*() const {
T *Pointer = GetRawPointer();
return *Pointer;
}
T *operator->() const {
return GetRawPointer();
}
};
An Example With Tagged Unions
Basic run-time polymorphism
Tagged unions implement low-level polymorphism by storing an enum tag alongside data members that share the same memory location. The tag determines how the data should be interpreted.
enum class tagged_union_type : u32 {
Int,
Float
};
struct tagged_union {
tagged_union_type Type;
union {
f32 Float;
s32 Int;
};
};
tagged_union T = /* ... */
switch (T->Type) {
case tagged_union_type::Int: {
s32 Value = T.Int;
// Interpret Data As Int
} break;
case tagged_union_type::Float: {
f32 Value = T.Float;
// Interpret Data As Float
} break;
}
A more real-world example might be something like an abstract syntax tree, where some nodes might represent an integer constant, while others represent a math operation involving a left and a right side (addition, for example). Each node in the abstract syntax has to be interpreted differently, depending on what type it is.
enum class ast_node_type : u32 {
Addition,
Subtraction,
Multiplication,
Division,
IntConstant
};
struct ast_node;
struct binary_op_node {
ast_node *Left;
ast_node *Right;
};
struct ast_node {
// Type is stored in each node
ast_node_type Type;
union {
binary_op_node BinaryOp;
u64 IntConstant;
};
};
Rather than storing the type enum in the node itself, it can be embedded in the pointer referencing it. In the case of a binary_op_node
, both the Left
and Right
pointers can be tagged with the enum type of the node that they are pointing to. This reduces the amount of memory consumption by 4 bytes times N number of nodes in the data structure.
union ast_node;
struct binary_op_node {
tagged_ptr<ast_node> Left;
tagged_ptr<ast_node> Right;
};
union ast_node {
binary_op_node BinaryOp;
u64 IntConstant;
};
tagged_pointer<ast_node> Node = /* ... */;
ast_node_type Type = (ast_node_type)Node.ExtractData();
switch (Type) {
case ast_node_type::Addition:
case ast_node_type::Subtraction:
case ast_node_type::Multiplication:
case ast_node_type::Division: {
binary_op_node BinaryOpNode = Node->BinaryOp;
tagged_ptr<ast_node> Left = BinaryOpNode.Left;
tagged_ptr<ast_node> Right = BinaryOpNode.Right;
// ...
} break;
case ast_node_type::IntConstant: {
u64 IntConstant = Node->IntConstant;
// ...
} break;
}
While this specific example is a bit contrived — a more straightforward tagged union would be just fine — the general technique of using tagged pointers for run-time polymorphism is quite common. Dynamically typed languages, where every value is inherently polymorphic, use tagged pointers to compactly encode a value’s type and whether it resides on the heap. A more elaborate example appears in PBRT, which uses tagged pointers to implement virtual functions without per-object v-tables. They use compile-time C++ templates to map each type to an integer. At run-time, a switch statement casts the pointer to the correct type before calling the corresponding “virtual” function — not far off from what was shown in this article!
Conclusion
Know your hardware
Hardware limitations often shape how data structures are built. In a 64-bit pointer, not all bits participate in the virtual address calculation. These spare bits can be repurposed to store additional information, enabling more compact data layouts and reducing the cost of run-time polymorphism.
Pointer tagging is just one trick among many. Modern CPUs have surprising attributes that, once understood, can be used to solve hard problems. And sometimes, creatively using a few extra bits per pointer is all you need.
The Pleiades Supercomputer at NASA, for example, has a total memory capacity of 903TB (https://www.nas.nasa.gov/hecc/resources/pleiades.html)
Some more recent x64 CPUs use 5-level paging, which would increase this number to 57 bits (https://en.wikipedia.org/wiki/Intel_5-level_paging). However, this isn’t available in most consumer-level operating systems.
ARM and RISC-V also typically use 48 bits of virtual address, but newer specifications expand the address space further.
AMD’s Upper Address Space Ignore (UAI), Intel’s Linear Address Masking (LAM), and ARM’s Top Byte Ignore (TBI) are all hardware features that allow software to store metadata in the unused high-order bits of a pointer without affecting address translation.