| ======================================================= |
| Hardware-assisted AddressSanitizer Design Documentation |
| ======================================================= |
| |
| This page is a design document for |
| **hardware-assisted AddressSanitizer** (or **HWASAN**) |
| a tool similar to :doc:`AddressSanitizer`, |
| but based on partial hardware assistance. |
| |
| |
| Introduction |
| ============ |
| |
| :doc:`AddressSanitizer` |
| tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*), |
| uses *redzones* to find buffer-overflows and |
| *quarantine* to find use-after-free. |
| The redzones, the quarantine, and, to a less extent, the shadow, are the |
| sources of AddressSanitizer's memory overhead. |
| See the `AddressSanitizer paper`_ for details. |
| |
| AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows |
| software to use 8 most significant bits of a 64-bit pointer as |
| a tag. HWASAN uses `Address Tagging`_ |
| to implement a memory safety tool, similar to :doc:`AddressSanitizer`, |
| but with smaller memory overhead and slightly different (mostly better) |
| accuracy guarantees. |
| |
| Algorithm |
| ========= |
| * Every heap/stack/global memory object is forcibly aligned by `TG` bytes |
| (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**. |
| * For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8) |
| * The pointer to the object is tagged with `T`. |
| * The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory) |
| * Every load and store is instrumented to read the memory tag and compare it |
| with the pointer tag, exception is raised on tag mismatch. |
| |
| For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf |
| |
| Instrumentation |
| =============== |
| |
| Memory Accesses |
| --------------- |
| All memory accesses are prefixed with an inline instruction sequence that |
| verifies the tags. Currently, the following sequence is used: |
| |
| |
| .. code-block:: asm |
| |
| // int foo(int *a) { return *a; } |
| // clang -O2 --target=aarch64-linux -fsanitize=hwaddress -c load.c |
| foo: |
| 0: 08 00 00 90 adrp x8, 0 <__hwasan_shadow> |
| 4: 08 01 40 f9 ldr x8, [x8] // shadow base (to be resolved by the loader) |
| 8: 09 dc 44 d3 ubfx x9, x0, #4, #52 // shadow offset |
| c: 28 69 68 38 ldrb w8, [x9, x8] // load shadow tag |
| 10: 09 fc 78 d3 lsr x9, x0, #56 // extract address tag |
| 14: 3f 01 08 6b cmp w9, w8 // compare tags |
| 18: 61 00 00 54 b.ne 24 // jump on mismatch |
| 1c: 00 00 40 b9 ldr w0, [x0] // original load |
| 20: c0 03 5f d6 ret |
| 24: 40 20 21 d4 brk #0x902 // trap |
| |
| Alternatively, memory accesses are prefixed with a function call. |
| |
| Heap |
| ---- |
| |
| Tagging the heap memory/pointers is done by `malloc`. |
| This can be based on any malloc that forces all objects to be TG-aligned. |
| `free` tags the memory with a different tag. |
| |
| Stack |
| ----- |
| |
| Stack frames are instrumented by aligning all non-promotable allocas |
| by `TG` and tagging stack memory in function prologue and epilogue. |
| |
| Tags for different allocas in one function are **not** generated |
| independently; doing that in a function with `M` allocas would require |
| maintaining `M` live stack pointers, significantly increasing register |
| pressure. Instead we generate a single base tag value in the prologue, |
| and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where |
| ReTag can be as simple as exclusive-or with constant `M`. |
| |
| Stack instrumentation is expected to be a major source of overhead, |
| but could be optional. |
| |
| Globals |
| ------- |
| |
| TODO: details. |
| |
| Error reporting |
| --------------- |
| |
| Errors are generated by the `HLT` instruction and are handled by a signal handler. |
| |
| Attribute |
| --------- |
| |
| HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching |
| C function attribute. An alternative would be to re-use ASAN's attribute |
| `sanitize_address`. The reasons to use a separate attribute are: |
| |
| * Users may need to disable ASAN but not HWASAN, or vise versa, |
| because the tools have different trade-offs and compatibility issues. |
| * LLVM (ideally) does not use flags to decide which pass is being used, |
| ASAN or HWASAN are being applied, based on the function attributes. |
| |
| This does mean that users of HWASAN may need to add the new attribute |
| to the code that already uses the old attribute. |
| |
| |
| Comparison with AddressSanitizer |
| ================================ |
| |
| HWASAN: |
| * Is less portable than :doc:`AddressSanitizer` |
| as it relies on hardware `Address Tagging`_ (AArch64). |
| Address Tagging can be emulated with compiler instrumentation, |
| but it will require the instrumentation to remove the tags before |
| any load or store, which is infeasible in any realistic environment |
| that contains non-instrumented code. |
| * May have compatibility problems if the target code uses higher |
| pointer bits for other purposes. |
| * May require changes in the OS kernels (e.g. Linux seems to dislike |
| tagged pointers passed from address space: |
| https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt). |
| * **Does not require redzones to detect buffer overflows**, |
| but the buffer overflow detection is probabilistic, with roughly |
| `(2**TS-1)/(2**TS)` probability of catching a bug. |
| * **Does not require quarantine to detect heap-use-after-free, |
| or stack-use-after-return**. |
| The detection is similarly probabilistic. |
| |
| The memory overhead of HWASAN is expected to be much smaller |
| than that of AddressSanitizer: |
| `1/TG` extra memory for the shadow |
| and some overhead due to `TG`-aligning all objects. |
| |
| Supported architectures |
| ======================= |
| HWASAN relies on `Address Tagging`_ which is only available on AArch64. |
| For other 64-bit architectures it is possible to remove the address tags |
| before every load and store by compiler instrumentation, but this variant |
| will have limited deployability since not all of the code is |
| typically instrumented. |
| |
| The HWASAN's approach is not applicable to 32-bit architectures. |
| |
| |
| Related Work |
| ============ |
| * `SPARC ADI`_ implements a similar tool mostly in hardware. |
| * `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses |
| similar approaches ("lock & key"). |
| * `Watchdog`_ discussed a heavier, but still somewhat similar |
| "lock & key" approach. |
| * *TODO: add more "related work" links. Suggestions are welcome.* |
| |
| |
| .. _Watchdog: http://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf |
| .. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf |
| .. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html |
| .. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf |
| .. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html |
| |