BUILD #02 // MAIN PROJECT

SELF-HEALING I2C STACK
FAULT INJECTION RIG

Adversarial peripheral emulation with staged recovery: bus clear, timeout fencing, quarantine, and trace logging

STM32 ESP32 I2C Fault Injection SPI Flash UART CLI
IN PROGRESS — ACTIVE BUILD
4 Fault Types
3 Recovery Stages
CLI Scriptable Faults
NOR Trace Log Storage

What This Is

A test rig for I2C reliability. The STM32 is the master under test. The ESP32 acts as a hostile peripheral emulator — it injects faults on command and measures recovery time. The goal was to build something that could stress an I2C stack systematically, not just once, but reproducibly across firmware revisions.

The rig is instrumented: every fault injection, recovery attempt, and final outcome gets timestamped into a SPI flash ring buffer. When a new recovery strategy is tested, you can diff the log to see whether it actually improved anything.

STM32 MASTER // ESP32 ADVERSARIAL EMULATOR
[I2C BUS] // [UART CLI CONTROL]
02

Injected Fault Set

Faults are triggered by UART CLI commands sent to the ESP32. Each fault type can be parameterized: duration, severity, repeat count. That makes test campaigns repeatable and comparable.

NACK BURST
ESP32 NACKs every incoming byte for N cycles. Tests whether the master retries correctly without hanging the bus.
CLOCK STRETCHING
Peripheral holds SCL low for an extreme duration. Tests the master's clock-stretch timeout logic and whether it recovers or deadlocks.
STUCK SDA
SDA line held low, simulating a hung peripheral that never releases the bus. Tests the 9-pulse recovery sequence.
ADDRESS COLLISION
ESP32 responds to addresses it does not own, creating unexpected ACKs. Tests address validation in the master's RX path.

Recovery Stack

Recovery runs in stages, not as a single nuclear reset. This matters because a hard bus reset can disturb other peripherals on the same bus that were operating fine. The rig tries the least invasive option first.

// RECOVERY SEQUENCE
Stage 1Timeout fence — abort transaction, release bus, wait for line idle
Stage 2Bus clear — 9 SCL pulses to unstick any SDA-holding peripheral
Stage 3Quarantine — mark faulting device address, reroute traffic, retry later
On failureLog fault context to SPI flash, escalate to soft I2C peripheral reset

Each stage outcome gets logged. If stage 1 is sufficient 90% of the time, that's information worth having. It avoids the bus-clear sequence running unnecessarily on transient glitches.

Trace Logging

The SPI flash ring buffer stores timestamped entries: fault type, fault parameters, recovery path taken, recovery latency, and final outcome. It wraps on overflow rather than halting, so sustained fault campaigns don't cause a different kind of failure.

The reason this matters: I2C flakiness in production embedded systems is often blamed on "noise" or "hardware issues" with no real data behind the claim. The rig generates that data. If recovery consistently takes 12ms for stuck-SDA but 200us for NACK bursts, you know where to optimize and you can prove it changed.

Current State

NACK burst and clock stretch faults are working with reliable recovery detection. Stuck-SDA recovery is implemented but the 9-pulse sequence timing needs tightening on the STM32 side. Address collision handling is still in design. The UART CLI is functional for all existing fault types.

Loading