11 lines
298 B
Plaintext
11 lines
298 B
Plaintext
CorrectBench Paper Experiment Summary
|
|
==================================================
|
|
Total run-level rows: 0
|
|
Total paired rows: 0
|
|
|
|
[Overall Coverage Delta]
|
|
Mean delta: None
|
|
95% bootstrap CI: [None, None]
|
|
Wilcoxon signed-rank: n=0, p=1.000000, method=degenerate
|
|
|
|
[Per-Task Paired Coverage Delta] |