A large number of attacks on embedded devices rely on being able to write to where code can execute from. Dump your shell code into a buffer. Overwrite the return pointer on stack. Presto, you're running unauthorized code! This isn't a new problem: modern servers, legacy UNIX workstations, Windows all have had such issues. All will continue to have such issues.
There's something unique for embedded systems that works in our favor though. Most RTOSes and bare metal code are dead simple. You don't need to carry 50 years of legacy baggage with you. Nor do you (hopefully) need self-modifying code. That means that delineation of what needs to be executable code, what is read-only data, and what needs to be modifiable is easy to identify.
As a part of the ARMv7M architecture, ARM introduced an optional feature called the 'Memory Protection Unit' or MPU. The MPU is not page-based, nor is it anything resembling a traditional MMU - it is much more limited in scope.
With the MPU, you can delineate the base and size of regions of the address space of the device. There are a limited number of these regions (8), so you want to use them with care. Each region can have unique memory access and behavior attributes that control how the MPU behaves on certain types of accesses.. A region of memory will receive treatment based on the highest priority MPU entry that overlaps that address.
Some examples of these attributes are:
- Read/Write/Execute (what we're interested in);
- Device/Strongly-Ordered;
- Cache policy (non-cacheable, write-back, write-through, etc.).
Some attributes (i.e. cache policy) are not applicable to certain ARMv7M implementations. The relevance of these is beyond the scope of this blog post, alas.
Let's use the Marvell MV88MC200 as an example. This part implements the ARMv7M MPU, and has several disjoint memory regions:
- A boot ROM (4kB at 0x00000000);
- 2 contiguous 192 KiB Code/Data SRAM blocks (starting at 0x00100000);
- 2 contiguous 64KiB Data SRAM blocks (starting at 0x20000000);
- An 'Always On' 4KiB SRAM block (starting at 0x480c0000);
- I/O space for APB is from 0x4400000 through to 0x4a000000;
- The usual Cortex M3 peripherals are at 0xE0000000 to 0xE0100000.
We could mark each memory region as Readable/Executable, Read-Only or Read/Write. The ARMv7M MPU has an interesting feature: memory mappings can overlap. The mapping with the highest priority (based on the region number) is how the CPU will treat that region of memory.
In a hypothetical MV88MC200-based device, the first step is to set up the MPU as follows:
- Mark the code memory as Read/Execute only
- If you can, mark read-only data memory as Read-only
- Mark the stack, initialized data and uninitialized data regions as Read/Write-only
- Apply appropriate mappings for peripherals (i.e. Read/Write with Device ordering)
- Create a 4GiB background region that marks all other memory as unaccessible.
- Enable the MPU
There's an interesting side effect to the second-to-last step. Note how we don't set up access for the boot ROM. The boot ROM is static code, at a fixed location. This code is identical for all MV88MC200 device variants. Some more clever attacks (i.e. ROP) rely on fixed code in a fixed location to build 'gadgets.' Gadgets are chunks of known code that perform certain actions given a crafted stack. With the ROM non-executable after start-up, these attacks become harder to execute in a software-agnostic way.
We've glossed over the concept of privilege in the ARMv7M. For how the MPU behaves with code running at different privilege levels, see the ARMv7M MPU documentation. We'll discuss code privilege for ARMv7M at a later date.
Note that this post mostly focuses on the ARMv7M implementation. Older ARM variants have other tools available, and other architectures (MIPS, Blackfin, etc.) have their own twist on the same functionality.