Files
CGA-bench/analysis/custom_task_diagnosis.md
2026-05-22 10:02:42 +08:00

1.4 KiB

Custom Task Diagnosis

This note records why the original custom-task experiment should not be used as direct CGA evidence.

Current failure causes

  • i2c_controller

    • The original DUT references scl_in without defining it.
    • CGA baseline already fails at Verilator compile, so coverage = 0.0 is not evidence about branch reachability.
    • The original testbench is stimulus-only and does not provide a clear pass/fail contract or a slave-side bus model.
  • spi_controller

    • The original DUT leaves spi_clk effectively disconnected from observable transaction progress.
    • The original ERROR/mode_fault path is not meaningfully driven.
    • The pipeline spends most of its time in TBcheck -> reboot, so the run is dominated by unstable generated TB quality rather than CGA coverage search.

Clean-data policy

  • Keep the original data/myproject/combined.jsonl as the historical failure case.
  • Use data/myproject/combined_clean.jsonl with:
    • compileable DUTs,
    • self-checking golden testbenches,
    • descriptions that match the actual reachable protocol behavior.

Paper usage

  • Use the original custom-task run as a "task definition not yet valid" failure mode.
  • Use the clean custom tasks only for:
    • structural coverage comparison,
    • CGA iteration case studies,
    • qualitative discussion of protocol-style controllers.