754 B
754 B
| 1 | model | condition | task_id | n_runs | coverage_mean | coverage_std | coverage_best | semantic_coverage_mean | semantic_coverage_std | semantic_coverage_best | eval2_mean | eval2_std | eval2_best | time_mean_sec | time_std_sec | token_cost_mean | token_cost_std | first_improvement_iter_mean |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | qwen-max | baseline | 2012_q2fsm | 1 | 91.66666666666666 | 0.0 | 91.66666666666666 | 61.17 | 0.0 | 61.17 | 0.7 | 0.0 | 0.7 | 348.97 | 0.0 | 0.62682 | 0.0 | |
| 3 | qwen-max | baseline | 2013_q2afsm | 1 | 92.3076923076923 | 0.0 | 92.3076923076923 | 73.51 | 0.0 | 73.51 | 0.0 | 519.09 | 0.0 | 0.7850400000000001 | 0.0 | |||
| 4 | qwen-max | cga | 2012_q2fsm | 1 | 91.66666666666666 | 0.0 | 91.66666666666666 | 61.17 | 0.0 | 61.17 | 0.3 | 0.0 | 0.3 | 489.72 | 0.0 | 0.7213599999999999 | 0.0 | |
| 5 | qwen-max | cga | 2013_q2afsm | 1 | 92.3076923076923 | 0.0 | 92.3076923076923 | 73.51 | 0.0 | 73.51 | 0.0 | 1803.02 | 0.0 | 2.66142 | 0.0 |