Over-The-Air update system
OTA updates became a must have for any connected embedded device to continuously improve functionality & mitigate cybersecurity risks during its lifetime. However traditional OTA solutions imported from phone or IT domains are generally not well suited for industrial IoT : limited bandwidth, resources, severe cybersecurity constraints, tests & certifications, etc mandate partial updates capabilities. Changing only what is required is the key challenge.
To do so, the only way is to natively integrate the OTA system since the beginning of the CI/CD process in order to get real end-to-end traceability of all the changes which occurred (the goal is to establish an update mechanism per transaction). Unfortunately many OTA systems are added at the end & have only access to the final binary image without any possibility to go up and trace the dependency chain. That can be enough to push updates to the device but the key difficulty is to calculate, test & approve update differentials before deploying them.
The first requirement is to achieve binary reproducibility so a fully controlled CI/CD process is mandatory. Then the testing environment should be able to simulate the update from the update version hops you want to support. Last but not least, in case of an unsupported version hop or a critical error, an automatic rollback feature with a factory reset to your latest stable version is necessary. redpesk Factory offers an end-to-end solution with all key features : per package-based nature allowing smaller updates and quicker deployment times, fully controlled CI/CD process for binary reproducibility, automated testing facility (through Rackable Test Modules) & various update upload solutions (Mender proposed as a reference). The following highlights how updates are deployed from the CI/CD system:
To be able to do package-based updates, the CI/CD computes the differences between the packages currently running on a board (retrieved as part of the board fleet inventory) and the new release contents. This delta is then assembled into an artifact that can be deployed via any uploader solution.
OTA update design
In addition to keeping the system up-to-date, one of the main goals for the target updater system is to be as reliable as possible so that no boards ends up non-operational because of a failed update. This is achieved using an A’-B design that is:
- a main partition (named B) hosting the full system runtime. While operational, the system always boots on it. Its purpose is to be incrementally updated via RPM updates pushed through the uploader agent. Those packages could be either ‘Core OS’ ones (i.e targeting the base system) or ‘Application level’ (for a customer application).
- a recovery partition (named A’) which contains a minimal OS albeit fully equipped for a full B partition system restore. This partition is not meant to be frequently updated but can be if needed (for instance if a new WiFi protocol needs to be supported).
Failover and recovery for a failed update pushed by the uploader agent running as a service of the main system is done via the RPM toolkit transactional database support. RPM indeed provides the ability to commit updates atomically and revert back to a previous package version in case something fails.
The recovery system itself includes WiFi access point functionality (with a factory preset SSID), an HTML5 web-based administration user interface (supporting web or USB image download), an SSH server for remote diagnostics and control. This tooling allows flexible choices to restore the B partition to a fully working state. Triggering recovery mode (that is booting on the A’ system to restore the contents of the B partition) can be done via a hardware reset procedure, for example a physical button long press. Another possibility would be to have a hardware watchdog which increments a failed boot counter checked by the device bootloader. The target device partition scheme is laid out as follows to support the above mentioned update mechanism :
redpesk OTA benefits
- secure, reliable, faster and cost-efficient update mechanism via package-based streams
- optimized time-to-market: releases can be done sooner, as more updates can be performed after devices have been deployed
- ability to deliver OTA updates along the entire product lifecycle: engineering, deployment and production
- continuous updates prevent cybersecurity threats
- transactional differentials allow update traceability per target & limited tests