Files
CGA-bench/analysis/paper_runs/paper_fsm/stats_summary.txt

18 lines
533 B
Plaintext
Raw Normal View History

2026-05-22 10:02:42 +08:00
CorrectBench Paper Experiment Summary
==================================================
Total run-level rows: 4
Total paired rows: 2
[Overall Coverage Delta]
Mean delta: 0.0
95% bootstrap CI: [0.0, 0.0]
Wilcoxon signed-rank: n=0, p=1.000000, method=degenerate
[Overall Semantic Coverage Delta]
Mean delta: 0.0
95% bootstrap CI: [0.0, 0.0]
Wilcoxon signed-rank: n=0, p=1.000000, method=degenerate
[Per-Task Paired Coverage Delta]
qwen-max | 2012_q2fsm: mean=0.0 CI=[0.0, 0.0] n=1
qwen-max | 2013_q2afsm: mean=0.0 CI=[0.0, 0.0] n=1