CGA-bench/analysis/custom_task_diagnosis.md

# Custom Task Diagnosis

This note records why the original custom-task experiment should not be used as
direct CGA evidence.

## Current failure causes

- `i2c_controller`
  - The original DUT references `scl_in` without defining it.
  - CGA baseline already fails at Verilator compile, so `coverage = 0.0` is not
    evidence about branch reachability.
  - The original testbench is stimulus-only and does not provide a clear pass/fail
    contract or a slave-side bus model.

- `spi_controller`
  - The original DUT leaves `spi_clk` effectively disconnected from observable
    transaction progress.
  - The original `ERROR/mode_fault` path is not meaningfully driven.
  - The pipeline spends most of its time in `TBcheck -> reboot`, so the run is
    dominated by unstable generated TB quality rather than CGA coverage search.

## Clean-data policy

- Keep the original `data/myproject/combined.jsonl` as the historical failure case.
- Use `data/myproject/combined_clean.jsonl` with:
  - compileable DUTs,
  - self-checking golden testbenches,
  - descriptions that match the actual reachable protocol behavior.

## Paper usage

- Use the original custom-task run as a "task definition not yet valid" failure mode.
- Use the clean custom tasks only for:
  - structural coverage comparison,
  - CGA iteration case studies,
  - qualitative discussion of protocol-style controllers.