| =============== |
| ShadowCallStack |
| =============== |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| ShadowCallStack is an **experimental** instrumentation pass, currently only |
| implemented for x86_64 and aarch64, that protects programs against return |
| address overwrites (e.g. stack buffer overflows.) It works by saving a |
| function's return address to a separately allocated 'shadow call stack' |
| in the function prolog and checking the return address on the stack against |
| the shadow call stack in the function epilog. |
| |
| Comparison |
| ---------- |
| |
| To optimize for memory consumption and cache locality, the shadow call stack |
| stores an index followed by an array of return addresses. This is in contrast |
| to other schemes, like :doc:`SafeStack`, that mirror the entire stack and |
| trade-off consuming more memory for shorter function prologs and epilogs with |
| fewer memory accesses. Similarly, `Return Flow Guard`_ consumes more memory with |
| shorter function prologs and epilogs than ShadowCallStack but suffers from the |
| same race conditions (see `Security`_). Intel `Control-flow Enforcement Technology`_ |
| (CET) is a proposed hardware extension that would add native support to |
| use a shadow stack to store/check return addresses at call/return time. It |
| would not suffer from race conditions at calls and returns and not incur the |
| overhead of function instrumentation, but it does require operating system |
| support. |
| |
| .. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ |
| .. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf |
| |
| Compatibility |
| ------------- |
| |
| ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not |
| currently provided in compiler-rt so one must be provided by the compiled |
| application. |
| |
| On aarch64, the instrumentation makes use of the platform register ``x18``. |
| On some platforms, ``x18`` is reserved, and on others, it is designated as |
| a scratch register. This generally means that any code that may run on the |
| same thread as code compiled with ShadowCallStack must either target one |
| of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and |
| Windows) or be compiled with the flag ``-ffixed-x18``. |
| |
| Security |
| ======== |
| |
| ShadowCallStack is intended to be a stronger alternative to |
| ``-fstack-protector``. It protects from non-linear overflows and arbitrary |
| memory writes to the return address slot; however, similarly to |
| ``-fstack-protector`` this protection suffers from race conditions because of |
| the call-return semantics on x86_64. There is a short race between the call |
| instruction and the first instruction in the function that reads the return |
| address where an attacker could overwrite the return address and bypass |
| ShadowCallStack. Similarly, there is a time-of-check-to-time-of-use race in the |
| function epilog where an attacker could overwrite the return address after it |
| has been checked and before it has been returned to. Modifying the call-return |
| semantics to fix this on x86_64 would incur an unacceptable performance overhead |
| due to return branch prediction. |
| |
| The instrumentation makes use of the ``gs`` segment register on x86_64, |
| or the ``x18`` register on aarch64, to reference the shadow call stack |
| meaning that references to the shadow call stack do not have to be stored in |
| memory. This makes it possible to implement a runtime that avoids exposing |
| the address of the shadow call stack to attackers that can read arbitrary |
| memory. However, attackers could still try to exploit side channels exposed |
| by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the |
| address of the shadow call stack. |
| |
| .. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ |
| .. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf |
| .. _`[3]`: https://www.vusec.net/projects/anc/ |
| |
| On x86_64, leaf functions are optimized to store the return address in a |
| free register and avoid writing to the shadow call stack if a register is |
| available. Very short leaf functions are uninstrumented if their execution |
| is judged to be shorter than the race condition window intrinsic to the |
| instrumentation. |
| |
| On aarch64, the architecture's call and return instructions (``bl`` and |
| ``ret``) operate on a register rather than the stack, which means that |
| leaf functions are generally protected from return address overwrites even |
| without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not |
| vulnerable to the same types of time-of-check-to-time-of-use races as x86_64. |
| |
| Usage |
| ===== |
| |
| To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` |
| flag to both compile and link command lines. On aarch64, you also need to pass |
| ``-ffixed-x18`` unless your target already reserves ``x18``. |
| |
| Low-level API |
| ------------- |
| |
| ``__has_feature(shadow_call_stack)`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| In some cases one may need to execute different code depending on whether |
| ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can |
| be used for this purpose. |
| |
| .. code-block:: c |
| |
| #if defined(__has_feature) |
| # if __has_feature(shadow_call_stack) |
| // code that builds only under ShadowCallStack |
| # endif |
| #endif |
| |
| ``__attribute__((no_sanitize("shadow-call-stack")))`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function |
| declaration to specify that the shadow call stack instrumentation should not be |
| applied to that function, even if enabled globally. |
| |
| Example |
| ======= |
| |
| The following example code: |
| |
| .. code-block:: c++ |
| |
| int foo() { |
| return bar() + 1; |
| } |
| |
| Generates the following x86_64 assembly when compiled with ``-O2``: |
| |
| .. code-block:: gas |
| |
| push %rax |
| callq bar |
| add $0x1,%eax |
| pop %rcx |
| retq |
| |
| or the following aarch64 assembly: |
| |
| .. code-block:: none |
| |
| stp x29, x30, [sp, #-16]! |
| mov x29, sp |
| bl bar |
| add w0, w0, #1 |
| ldp x29, x30, [sp], #16 |
| ret |
| |
| |
| Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64 |
| assembly: |
| |
| .. code-block:: gas |
| |
| mov (%rsp),%r10 |
| xor %r11,%r11 |
| addq $0x8,%gs:(%r11) |
| mov %gs:(%r11),%r11 |
| mov %r10,%gs:(%r11) |
| push %rax |
| callq bar |
| add $0x1,%eax |
| pop %rcx |
| xor %r11,%r11 |
| mov %gs:(%r11),%r10 |
| mov %gs:(%r10),%r10 |
| subq $0x8,%gs:(%r11) |
| cmp %r10,(%rsp) |
| jne trap |
| retq |
| |
| trap: |
| ud2 |
| |
| or the following aarch64 assembly: |
| |
| .. code-block:: none |
| |
| str x30, [x18], #8 |
| stp x29, x30, [sp, #-16]! |
| mov x29, sp |
| bl bar |
| add w0, w0, #1 |
| ldp x29, x30, [sp], #16 |
| ldr x30, [x18, #-8]! |
| ret |