Far too often, cryptography gets treated as a mysterious salve. Apply some ECC here, some RSA there, then use some AES to round it all off. But, there are several factors that limit how secure a channel can be. Since we're discussing memory-limited ARM Cortex-M3s and similar, let's focus our discussion there. Alas, it is all about trade offs at this point.
Asymmetric or Symmetric Keys?
We've discussed the challenges around building a secure channel using only symmetric keys. Key management will always be a complicating factor. Both ends of the communication need to be pre-built with a list and schedule of keys to use. This isn't a bad thing: the NSA uses this technique with certain sensitive communications. But, the security of the key store will dictate how secure the link can be. On top of this, you will need policy built around storing and managing keys.
There are practical concerns around this decision. First, do you have the resources to perform these operations in real time? Does your microcontroller provide a modular arithmetic accelerator? Devices like the BCM58xx and the MAX326x1 both provide accelerated RSA and ECC support. Do you have an AES accelerator? Does it perform various block cipher modes for you? Will you have to use the cipher in ECB and build up the rest of the block modes?
The second practical concern is where you will store the keys. Do you have an internal RAM that you can use? Is it battery backed, or will you need to write your keys out to external flash? Do you even care if someone with physical access to the device steals the keys?
Of course, it goes without saying, each device needs unique keys. Sharing the same set of keys for all devices means that a single key theft will compromise all your devices. That's bad news, even for a rudimentary security model.
Block Cipher Modes: Choose and Use With Care
AES has several NIST approved block cipher operating modes. Cipher Block Chaining (CBC), Counter (CTR) and Galois Counter Mode (GCM) are all approved. Each mode has different intended use cases. Each mode has different limitations an engineer needs to know. All three modes need different inputs. To make life even more difficult, all three of these modes have different secrecy requirements around their inputs.
CBC, requires an initialization vector (IV), that is equal to a single AES block in length (128 bits). The IV should be unique for each session. Without uniqueness, replay attacks and information leaks can occur. Imagine if you encrypted same 'HELLO' message over and over with the same key and the same IV. At that point, we would see the same initial message in the ciphertext. The IV just needs to be a random number, and you don't even need to keep it a secret. Unfortunately, we know most IoT devices don't have a lot of entropy available to them. We need to be smart about when we rotate IVs and start new sessions.
CTR requires a similar random value, but in this case it's a nonce (number used once). This nonce must be a secret to guarantee the security of the channel. Again, to generate your nonces, you need access to good quality entropy.
GCM is a cross between counter and an authentication mode. GCM multiples the ciphertext blocks with a calculated authentication 'tag'. You then send this tag along with the ciphertext. The recipient uses this tag to determine the message was not tampered with. If your hardware has support for the carryless multiply needed for GCM, then it's a sound choice.
Session Keys, Time-Based Keys, Oh My!
When working in a performance-limited environment, you have to make trade-offs. If you're building a secure channel, you have a heavy-weight protocol stack that it will ride on. We're already burning precious compute resources and we aren't even doing our real job yet!
Managing keys is anything but trivial, as we know. Entropy is scarce in most MCUs*. Exchanging keys can be cost-prohibitive in time - an RSA private key operation is heavy-weight. It seems clear we have to do some serious work, but can we amortize the cost of it?
Again, it's all about trade offs. Manufacturing a device with an asymmetric key is about the best you can hope for. The beauty of Diffie-Hellman, ElGamal and similar schemes is that your 'session' lasts as long as you need it to. This means you can have sessions that span many TCP or BLE connections. A simple policy to rotate keys weekly, or after encrypting a certain amount of data, helps. This means you can use keys longer than a single connection with an endpoint. Mix what little entropy you have into a larger pool over days, and seed a DRBG/CSPRNG to generate new symmetric keys from. Of course, this only works well if you have a limited number of endpoints. You do end up maintaining a fair amount of state.
So in the end we don't have many solutions - but just trade-offs. There are many options that you can use to establish a secure channel. When you have limited resources at your disposal, the trade-offs will be many. This is why you need to have a well-defined threat model. We've simplified around implementation details here, too. Unintentional oracles and other implementation hazards are another discussion for another time.
* A note to MCU vendors: please make TRNGs a standard feature. Please get them verified by the CAVP/CMVP process, too. Thanks!