What This Is
A two-node embedded system where one MCU can crash or hang and the other keeps things running. The STM32 handles sensor acquisition and control timing under FreeRTOS. The ESP32 acts as a supervisor — it watches heartbeat signals, monitors sequence continuity, and drives recovery based on what it observes. The two nodes talk over UART with CRC-framed packets.
The point was not "build a system that doesn't crash." The point was "build a system where crashes are survivable and auditable." Those are different problems.
[UART TELEMETRY LINK] // [WATCHDOG POLICY TREE]
Architecture
The split between the two nodes is intentional: the supervisor needs to be independent from the thing it is supervising. If both roles lived on the same MCU, a runaway task or stack corruption that took out the control node would also take out recovery. Separate silicon gives the supervisor a real chance to intervene.
The STM32 runs four FreeRTOS tasks: sensor acquisition, control output, telemetry TX, and a local watchdog kick. Task priorities are assigned so the watchdog kick cannot be starved by sensor or output tasks. If the watchdog misses its kick deadline, that event itself gets logged before reset.
Failure Policy Tree
Failures are not all the same. Treating every fault as a hard reset burns the reboot budget and makes logs useless. The policy tree grades response by how bad the fault actually is.
Packet Framing
Raw UART without framing is unreliable under electrical noise or partial writes. Every packet uses a fixed header, payload type byte, 16-bit CRC, and an end delimiter. The supervisor ACKs known-good packets and NACKs bad CRCs. Packets that get no response within the timeout window are retransmitted up to three times before the link is marked degraded.
What's Left
The architecture is stable and the tier-1 and tier-2 paths work in isolation. Tier-3 degrade mode and the full NV write-on-fault sequence are still being validated. The flash ring buffer logic works in unit test but hasn't been stress-tested for power-loss mid-write. That's the active front right now.