Files
TBgen_App/__pycache__/analyze.cpython-312.pyc

218 lines
43 KiB
Plaintext
Raw Normal View History

2026-03-30 16:46:48 +08:00
<EFBFBD>
<00><>pi<06><00><00><00>dZddlZddlmZddlmZmZmZddl m
Z
ddl Z ddl Z dZ dZdZe j j#ee<0F>ZdgZd Zd
Zd <0B>Zd <0C>Zd <0A>Zd<0E>Zddeezfd<10>ZGd<11>de<07>ZGd<13>de<08>Zd<15>Zd<16>Ze dk(re<16>yy)z<>
Description : analyze the output from autoline mode.
Author : Ruidi Qiu (r.qiu@tum.de)
Time : 2023/12/12 17:35:00
LastEdited : 2024/9/17 23:35:03
<EFBFBD>N)<03>dictlist<73>HDLBitsProbset<65> muti_dictlist)<01>PRICING_MODELSg<53><67><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?<3F>analysiszanalyze_out.log<6F>zjsaves_inEDA/DATE25/Main_Results/CorrectBench/disc_70wrong_25correct_20240831_181427/Chatbench_RunInfo.jsonz,saves_inEDA/DATE25/Main_Results/CorrectBenchc<00><00>t<00>y<00>N)<01> regular_main<69><00><00>-/home/zhang/CorrectBench/TBgen_App/analyze.py<70>mainr!s<00><00><10>Nr c<00><><00>tjt<00>}t|<00>}|j <00>|j
D]5}|j dd<02>s<01>|j|<02>s<01>(t|d<00><00>7dg}tt|<03>}|xjdz c_ |j <00>|j<00>|j<00>|j<00>tdddi<01> <09>}tddd
i<01> <09>}|j }|j }tt|<03>} | j#|d <0B> <0C>| xjd z c_ | j <00>| j<00>| j<00>| jt$j&j)t*dt,z<00><00>tt|<03>}
|
j#|d <0B> <0C>|
xjdz c_ |
j <00>|
j<00>|
j<00>|
jt$j&j)t*dt,z<00><00>y)N<> TB_correctedF<64>task_idrz1
#################### TOTAL ####################
z'data/HDLBits/HDLBits_circuit_type.jsonl<6E> circuit_type<70>CMB)<01>filter_content<6E>SEQT)<01> del_by_listz/
#################### CMB ####################
<EFBFBD>CMB_z/
#################### SEQ ####################
<EFBFBD>SEQ_)<17>ls<6C>load_json_dict<63>CHATBENCH_RUNINFO_PATH<54>Analyzer<65>run<75>data<74>get<65>
Eval2_pass<EFBFBD>print<6E> MultiAnalyzer<65> MULTI_DIR<49>messages<65>get_avg_tokens_one_task<73>get_avg_pass_by_disc_and_corr<72>saver<00> task_id_list<73> del_items<6D>os<6F>path<74>join<69>DEFAULT_SAVING_DIR<49>DEFAULT_LOG_NAME) <0B>Chatbench_RunInfo<66>analyzer<65>i<>k_list<73>multi_analyzer<65>CMB_set<65>SEQ_set<65> CMB_tasks<6B> SEQ_tasks<6B>multi_analyzer_CMB<4D>multi_analyzer_SEQs r<00>diy_mainr;&s<><00><00><1B>)<29>)<29>*@<40>A<><15><17>)<29>*<2A>H<EFBFBD> <0C>L<EFBFBD>L<EFBFBD>N<EFBFBD> <15>]<5D>]<5D> <20><01> <0C>5<EFBFBD>5<EFBFBD><1E><15> '<27>H<EFBFBD>,?<3F>,?<3F><01>,B<> <11>!<21>I<EFBFBD>,<2C> <1F> <20>.<10>S<EFBFBD>F<EFBFBD>#<23>9<EFBFBD>f<EFBFBD>5<>N<EFBFBD><12><1B><1B>T<>T<><1B><12><16><16><18><12>*<2A>*<2A>,<2C><12>0<>0<>2<><12><17><17><19><1C>F<>Xf<58>hm<68>Wn<57>o<>G<EFBFBD><1C>F<>Xf<58>hm<68>Wn<57>o<>G<EFBFBD><17>$<24>$<24>I<EFBFBD><17>$<24>$<24>I<EFBFBD>&<26>y<EFBFBD>&<26>9<><16><16> <20> <20><19><04> <20>=<3D><16><1F><1F>#V<>V<><1F><16><1A><1A><1C><16>.<2E>.<2E>0<><16>4<>4<>6<><16><1B><1B>B<EFBFBD>G<EFBFBD>G<EFBFBD>L<EFBFBD>L<EFBFBD>);<3B>V<EFBFBD>FV<46>=V<>W<>X<>&<26>y<EFBFBD>&<26>9<><16><16> <20> <20><19><04> <20>=<3D><16><1F><1F>#V<>V<><1F><16><1A><1A><1C><16>.<2E>.<2E>0<><16>4<>4<>6<><16><1B><1B>B<EFBFBD>G<EFBFBD>G<EFBFBD>L<EFBFBD>L<EFBFBD>);<3B>V<EFBFBD>FV<46>=V<>W<>X<>R r c<00>b<00>tt<00>}|j<00>|j<00>yr
)r#r$rr()r4s r<00>regular_multiA_mainr=<00>s$<00><00>"<22>9<EFBFBD>-<2D>N<EFBFBD><12><16><16><18><12><17><17>r c<00>8<00>tjt<00>}t|<00>}|j <00>t t d<01>5}|j|j<00>|jdtj<00>z<00>ddd<00>y#1swYyxYw<01>N<>w<>analysis time: %s
) rrrrr<00>open<65>DEFAULT_LOG_PATH<54>writer%<00>utils<6C>get_time)r0r1<00>fs rr r <00>su<00><00><1A>)<29>)<29>*@<40>A<><15><17>)<29>*<2A>H<EFBFBD> <0C>L<EFBFBD>L<EFBFBD>N<EFBFBD> <0A><1E><03> $<24><<3C><01> <09><07><07><08>!<21>!<21>"<22> <09><07><07>%<25><15><1E><1E>)9<>:<3A>;<3B><<3C><<3C><<3C>s <00>AB<03>B<07>subsetc<00><00>t|t<00>r t|<00>}nt|t<00>r|}n td<01><00>|j}t t j|<01><00>}|j|d<02>|xjd|<02>d<04>z c_ |j<00>ttd<05>5}|j|j<00>|jdtj <00>z<00>ddd<07>y#1swYyxYw)a!
this function is used to only analyze a subset of the runinfo data
- subset (only the task_ids are needed):
- str: path of the subset
- HDLBitsProbset: the subset
- runinfo_path: path of the Chatbench_RunInfo.json
- subset_name: the name of the subset
z+subset should be a path or a HDLBitsProbsetFz
#################### z ####################
<EFBFBD>arAN)<11>
isinstance<EFBFBD>strr<00> TypeErrorr)rrrr*<00>out_txtrrBrCrDr%rErF)rH<00> runinfo_path<74> subset_name<6D> subset_tasksr1rGs r<00>analyze_subsetrR<00>s<><00><00><12>&<26>#<23><1E><1F><06>'<27><06> <13>F<EFBFBD>N<EFBFBD> +<2B><17><06><17>E<>F<>F<><19>&<26>&<26>L<EFBFBD><17><02>)<29>)<29>,<2C>7<>8<>H<EFBFBD> <0C><16><16>|<7C>U<EFBFBD>+<2B> <0C><14><14>1<>+<2B><1D>>U<>V<>V<><14> <0C>L<EFBFBD>L<EFBFBD>N<EFBFBD> <0A><1E><03> $<24><<3C><01> <09><07><07><08>!<21>!<21>"<22> <09><07><07>%<25><15><1E><1E>)9<>:<3A>;<3B><<3C><<3C><<3C>s <00>0AC;<03>;Dc<00><><00><00>eZdZd+<2B>fd<01> Zd<02>Zd<03>Zd<04>Zd,d<05>Zed<06><00>Z ed<07><00>Z
ed<08><00>Z ed <09><00>Z ed
<EFBFBD><00>Z ed <0B><00>Zed <0C><00>Zed <0A><00>Zed<0E><00>Zed<0F><00>Zed<10><00>Zed<11><00>Zed<12><00>Zed<13><00>Zed<14><00>Zed<15><00>Zd<16>Zd<17>Zd<18>Zd<19>Zd<1A>Zd<1B>Zd<1C>Zd<1D>Z d<1E>Z!d<1F>Z"d <20>Z#d!<21>Z$d"<22>Z%d#<23>Z&d$<24>Z'd%<25>Z(d&<26>Z)d'<27>Z*d(<28>Z+d)<29>Z,d*<2A>Z-<2D>xZ.S)-rc<00><><00><01>t<00>|<00><00>||_|j<00>||_d|_t |_y)N<>)<08>super<65>__init__r<00>check_existance<63> pricing_modelrN<00> LOOSE_FACTOR<4F> loose_factor)<04>selfr0rY<00> __class__s <20>rrWzAnalyzer.__init__<5F>s8<00><><00> <0A><07><18><1A>%<25><04> <09> <0C><1C><1C><1E>*<2A><04><1A><19><04> <0C>(<28><04>r c<00>J <00>|xjdz c_|xjdz c_|jr"|xjd|jzz c_|xjd|jzz c_|xjd|jzz c_|xjd|j
zz c_|xjd|j zz c_|xjd|j |j
z
zz c_|jr"|xjd |jzz c_|jr"|xjd
|jzz c_d }d }d }d }|jD]/}|jdd <0B>}|dkDrd}||z }||kDr|}||ks<01>.|}<03>1|j dkDr||j z nd }|xjdz c_|xjd|zz c_|r1|xjd|zz c_|xjd|zz c_n|xjdz c_|xjdz c_|xjd|j|j z zz c_|xjd|j|j z zz c_|xjd|jzz c_|xjd|j zz c_|xjdz c_|xjd|j"zz c_|xjdz c_|xj|j%<00>z c_|xjdz c_|xj|j'<00>z c_|xjdz c_|xjd d!<21>d"d#d$<24>d%<25>z c_|xjd&z c_|jD]C}|jd'd(<28>}|jdd <0B>}|xj|d!<21>d"|d)<29>d*<2A>z c_<00>E|jr8|xjd+z c_|xj|j)<00>z c_|xjd,|j*zz c_y)-Nz4
########## Analyze of Chatbench_RunInfo ##########
z
#### pass numbers:
z Eval2b: %d
z Eval2 : %d
z Eval1 : %d
z Eval0 : %d
z total : %d z (Failed: %d)
z9passed TB by autoline reboot action (from TB3_check): %d
z'
passed TB by functional corrector: %d
<EFBFBD>gY@F<>coveragerTz
#### CGA Coverage Info:
zAverage Coverage : %.2f%%
zMax Coverage : %.2f%%
zMin Coverage : %.2f%%
z!(No coverage data found in JSON)
z
#### tokens and cost:
<EFBFBD>average prompt tokens: %d
<EFBFBD>average completion tokens: %d
ztotal cost: %.4f
zaverage cost: %.4f
z
#### time:
zaverage time: %.2fs
z
#### debug info table:
z
#### Eval2 ratio:
z
#### CGA Coverage Detail List:
zTask IDz<25z | <20>Coveragez<10<31>
z)----------------------------------------
r<00>Unknownz.2fz%
z
#### Eval2b ratio:
<EFBFBD>&
loose Eval2 pass metric applied: %s
)rN<00> Eval2b_exist<73>Eval2bpass_num<75> fullpass_num<75> Eval1pass_num<75> Eval0pass_num<75> total_num<75>reboot_times_exist<73>autoline_reboot_task_num<75>TB_corrected_exist<73> corrected_numrr <00>prompt_tokens_num<75>completion_tokens_num<75>cost<73>avg_cost<73>avg_time<6D>get_debug_infotable<6C>get_eval2_ratio_each_problem<65>get_eval2b_ratio_each_problemr[) r\<00>total_coverage<67> max_coverage<67> min_coverage<67> has_cov_data<74>task<73>cov<6F>avg_cov<6F>tids rrz Analyzer.run<75>s<><00><00> <0C> <0C> <0C>P<>P<> <0C> <0C> <0C> <0C>0<>0<> <0C> <0F> <1C> <1C> <10>L<EFBFBD>L<EFBFBD>N<EFBFBD>T<EFBFBD>-@<40>-@<40>@<40> @<40>L<EFBFBD> <0C> <0C> <0C><0E><14>):<3A>):<3A>:<3A>:<3A> <0C> <0C> <0C> <0C><0E><14>);<3B>);<3B>;<3B>;<3B> <0C> <0C> <0C> <0C><0E><14>);<3B>);<3B>;<3B>;<3B> <0C> <0C> <0C> <0C> <0A><04><0E><0E>6<>6<> <0C> <0C> <0C> <0C>(<28>D<EFBFBD>N<EFBFBD>N<EFBFBD>T<EFBFBD>=O<>=O<>,O<>P<>P<> <0C> <0F> "<22> "<22> <10>L<EFBFBD>L<EFBFBD>X<>[_<>[x<>[x<>x<> x<>L<EFBFBD> <0F> "<22> "<22> <10>L<EFBFBD>L<EFBFBD>G<>$<24>J\<5C>J\<5C>\<5C> \<5C>L<EFBFBD><1D><0E><1A> <0C><1C> <0C><1C> <0C><18>I<EFBFBD>I<EFBFBD> 6<>D<EFBFBD><16>(<28>(<28>:<3A>s<EFBFBD>+<2B>C<EFBFBD><12>Q<EFBFBD>w<EFBFBD>t<EFBFBD> <0C> <1A>c<EFBFBD> !<21>N<EFBFBD><12>\<5C>!<21>#<23><<3C><12>\<5C>!<21>#<23><<3C>  6<>6:<3A>^<5E>^<5E>a<EFBFBD>5G<35>.<2E>4<EFBFBD>><3E>><3E>1<>S<EFBFBD><07> <0C> <0C> <0C>5<>5<> <0C> <0C> <0C> <0C>5<><07>?<3F>?<3F> <0C> <17> <10>L<EFBFBD>L<EFBFBD>9<>L<EFBFBD>H<> H<>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>9<>L<EFBFBD>H<> H<>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>@<40> @<40>L<EFBFBD> <0A> <0C> <0C>3<>3<> <0C> <0C> <0C> <0C>5<><14>9O<39>9O<39>RV<52>R`<60>R`<60>9`<60>a<>a<> <0C> <0C> <0C> <0C>9<>T<EFBFBD>=W<>=W<>Z^<5E>Zh<5A>Zh<5A>=h<>i<>i<> <0C> <0C> <0C> <0C>,<2C>t<EFBFBD>y<EFBFBD>y<EFBFBD>8<>8<> <0C> <0C> <0C> <0C>.<2E><14><1D><1D>><3E>><3E> <0C> <0C> <0C> <0C>(<28>(<28> <0C> <0C> <0C> <0C>/<2F>$<24>-<2D>-<2D>?<3F>?<3F> <0C> <0C> <0C> <0C>4<>4<> <0C> <0C> <0C> <0C><04>0<>0<>2<>2<> <0C> <0C> <0C> <0C>/<2F>/<2F> <0C> <0C> <0C> <0C><04>9<>9<>;<3B>;<3B> <0C> <0A> <0C> <0C><<3C><<3C> <0C> <0C> <0C> <0C>9<EFBFBD>S<EFBFBD>/<2F><13>Z<EFBFBD><03>,<<3C>B<EFBFBD>?<3F>?<3F> <0C> <0C> <0C> <0C><0F>'<27> <0C><18>I<EFBFBD>I<EFBFBD> 8<>D<EFBFBD><16>(<28>(<28>9<EFBFBD>i<EFBFBD>0<>C<EFBFBD><16>(<28>(<28>:<3A>s<EFBFBD>+<2B>C<EFBFBD> <10>L<EFBFBD>L<EFBFBD>s<EFBFBD>3<EFBFBD>i<EFBFBD>s<EFBFBD>3<EFBFBD>s<EFBFBD>)<29>3<EFBFBD>7<> 7<>L<EFBFBD> 8<>
<10> <1C> <1C> <10>L<EFBFBD>L<EFBFBD>4<> 4<>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>D<EFBFBD>><3E>><3E>@<40> @<40>L<EFBFBD>
<0A> <0C> <0C>C<>d<EFBFBD>FW<46>FW<46>W<>W<> r c<00>$<00>|jddi<01>g}|jD],}|jdd<04>dk(s<01>|j|d<00><00>.|xjdt |<01>zz c_|D]}|xj|dzz c_<00>y)N<>sim_passr<00>
Eval1_passzNO datarzfake Eval0 pass: %d
rd)<06>filterrr <00>appendrN<00>len)r\<00>task_ids_fake_eval0passr2s r<00>find_fake_eval0passzAnalyzer.find_fake_eval0passPs<><00><00> <0C> <0B> <0B>Z<EFBFBD><11>O<EFBFBD>$<24>"$<24><1F><15><19><19> =<3D>A<EFBFBD><10>u<EFBFBD>u<EFBFBD>\<5C>)<29>,<2C> <09>9<>'<27>.<2E>.<2E>q<EFBFBD><19>|<7C><<3C> =<3D> <0A> <0C> <0C>/<2F>#<23>6M<36>2N<32>N<>N<> <0C>(<28> %<25>A<EFBFBD> <10>L<EFBFBD>L<EFBFBD>A<EFBFBD><04>H<EFBFBD> $<24>L<EFBFBD> %r c<00>6<00>d|_d|_d|_|jD]u}d|j <00>vrd|_d|j <00>vrd|_d|j <00>vrd|_|js<01>[|js<01>h|js<01>uyy)NF<4E> Eval2b_passTr<00> reboot_times)rgrormr<00>keys<79>r\r2s rrXzAnalyzer.check_existanceZs<><00><00>!<21><04><19>"'<27><04><1F>"'<27><04><1F><15><19><19> <16>A<EFBFBD><1C><01><06><06><08>(<28>$(<28><04>!<21><1D><11><16><16><18>)<29>*.<2E><04>'<27><1D><11><16><16><18>)<29>*.<2E><04>'<27><13> <20> <20>T<EFBFBD>%<<3C>%<<3C><14>AX<41>AX<41><15> r c<00><><00>ddlm}ddl}|jd<02>g}|jD]w}|j |<05>s<01>|j |<05>s<01>'|jdd<00>}|<06><01><|jd<04>\}}t|<07>t|<08>z } |j| <09><00>y|j|dd<06><07>|jtjjt |<01><00>|j#<00>y)Nr<00>Agg<67> Eval2_ratio<69>/<2F>
<00>rr<00><02>bins<6E>range)<12>matplotlib.pyplot<6F>pyplot<6F>
matplotlib<EFBFBD>user<00>
Eval0_passr<EFBFBD>r <00>split<69>floatr<74><00>hist<73>savefigr+r,r-r.<00>close)
r\<00>
figurename<EFBFBD>pltr<74><00>ratiosr2<00> ratio_str<74> numerator<6F> denominator<6F>ratios
r<00>draw_Eval2_histogramzAnalyzer.draw_Eval2_histogramis<><00><00>'<27><19><12><0E><0E>u<EFBFBD><1D><13><06><15><19><19> %<25>A<EFBFBD><13><EFBFBD><EFBFBD>q<EFBFBD>!<21>d<EFBFBD>o<EFBFBD>o<EFBFBD>a<EFBFBD>&8<><1D>E<EFBFBD>E<EFBFBD>-<2D><14>6<> <09><1C>$<24><1C>)2<><1F><1F><13>)=<3D>&<26> <09>;<3B><1D>i<EFBFBD>(<28>5<EFBFBD><1B>+=<3D>=<3D><05><16> <0A> <0A>e<EFBFBD>$<24> %<25> <0C><08><08><16>b<EFBFBD><05><08>.<2E> <0B> <0B> <0B>B<EFBFBD>G<EFBFBD>G<EFBFBD>L<EFBFBD>L<EFBFBD>!3<>Z<EFBFBD>@<40>A<> <0B> <09> <09> r c<00><00>|jSr
)rN<00>r\s rr%zAnalyzer.messages}s <00><00><13>|<7C>|<7C>r c<00>f<00>t|d<01>st|j<00>|_|jS)N<>
_total_num)<04>hasattrr<72>rr<>r<>s rrlzAnalyzer.total_num<75>s%<00><00><16>t<EFBFBD>\<5C>*<2A>!<21>$<24>)<29>)<29>n<EFBFBD>D<EFBFBD>O<EFBFBD><13><EFBFBD><EFBFBD>r c<00><><00>t|d<01>sdd|_|jD]N}|j|<01>s<01>|j dd<02>s<01>(|j |<01>s<01>:|xjdz c_<00>P|jS<00>N<> _fullpass_numrr<>r)r<>r<>rr<>r r!r<>s rrizAnalyzer.fullpass_num<75>sk<00><00><16>t<EFBFBD>_<EFBFBD>-<2D>!"<22>D<EFBFBD> <1E><19>Y<EFBFBD>Y<EFBFBD> ,<2C><01><17>?<3F>?<3F>1<EFBFBD>%<25>!<21>%<25>%<25> <0C>Q<EFBFBD>*?<3F>D<EFBFBD>O<EFBFBD>O<EFBFBD>TU<54>DV<44><18>&<26>&<26>!<21>+<2B>&<26> ,<2C><14>!<21>!<21>!r c<00>$<00>t|d<01>syd|_|jD]c}|j|<01>s<01>|j dd<02>s<01>(|j |<01>s<01>:|j |<01>dk(s<01>O|xjdz c_<00>e|jSr<>)r<>r<>rr<>r r!<00>
debug_iterr<EFBFBD>s r<00>fullpass_num_nodebugzAnalyzer.fullpass_num_nodebug<75>s<><00><00><16>t<EFBFBD>_<EFBFBD>-<2D>!"<22>D<EFBFBD> <1E><19>Y<EFBFBD>Y<EFBFBD> ,<2C><01><17>?<3F>?<3F>1<EFBFBD>%<25>!<21>%<25>%<25> <0C>Q<EFBFBD>*?<3F>D<EFBFBD>O<EFBFBD>O<EFBFBD>TU<54>DV<44>[_<>[j<>[j<>kl<6B>[m<>qr<71>[r<><18>&<26>&<26>!<21>+<2B>&<26> ,<2C><14>!<21>!<21>!r c<00><00>t|d<01>sqd|_|jr^|jD]O}|j |<01>s<01>|j dd<02>s<01>(|j dd<02>s<01>;|xjdz c_<00>Q|jS)N<>_Eval2bpass_numrr<>r<>r)r<>r<>rgrr<>r r<>s rrhzAnalyzer.Eval2bpass_num<75>sw<00><00><16>t<EFBFBD>.<2E>/<2F>#$<24>D<EFBFBD> <20><13> <20> <20><1D><19><19>2<>A<EFBFBD><1B><EFBFBD><EFBFBD>q<EFBFBD>)<29>a<EFBFBD>e<EFBFBD>e<EFBFBD>L<EFBFBD><11>.C<><01><05><05>m<EFBFBD>\]<5D>H^<5E><1C>,<2C>,<2C><01>1<>,<2C>2<><14>#<23>#<23>#r c<00><><00>t|d<01>s?d|_|jD])}|j|<01>s<01>|xjdz c_<00>+|jS)N<>_Eval0pass_numrr)r<>r<>rr<>r<>s rrkzAnalyzer.Eval0pass_num<75>sS<00><00><16>t<EFBFBD>-<2D>.<2E>"#<23>D<EFBFBD> <1F><19>Y<EFBFBD>Y<EFBFBD> -<2D><01><17>?<3F>?<3F>1<EFBFBD>%<25><18>'<27>'<27>1<EFBFBD>,<2C>'<27> -<2D><14>"<22>"<22>"r c<00><><00>t|d<01>sRd|_|jD]<}|j|<01>s<01>|j dd<02>s<01>(|xjdz c_<00>>|jS)N<>_Eval1pass_numrr<>r)r<>r<>rr<>r r<>s rrjzAnalyzer.Eval1pass_num<75>s`<00><00><16>t<EFBFBD>-<2D>.<2E>"#<23>D<EFBFBD> <1F><19>Y<EFBFBD>Y<EFBFBD> -<2D><01><17>?<3F>?<3F>1<EFBFBD>%<25>!<21>%<25>%<25> <0C>Q<EFBFBD>*?<3F><18>'<27>'<27>1<EFBFBD>,<2C>'<27> -<2D><14>"<22>"<22>"r c<00><><00>t|d<01>s^d|_|jrK|jD]<}|j dd<02>s<01>|j |<01>s<01>(|xjdz c_<00>>|jS)N<>_corrected_numrrr)r<>r<>rorr r!r<>s rrpzAnalyzer.corrected_num<75>si<00><00><16>t<EFBFBD>-<2D>.<2E>"#<23>D<EFBFBD> <1F><13>&<26>&<26><1D><19><19>1<>A<EFBFBD><18>u<EFBFBD>u<EFBFBD>^<5E>A<EFBFBD>.<2E>4<EFBFBD>?<3F>?<3F>1<EFBFBD>3E<33><1C>+<2B>+<2B>q<EFBFBD>0<>+<2B>1<><14>"<22>"<22>"r c<00><><00>t|d<01>sad|_|jrN|jD]?}|j dd<02>dkDs<01>|j |<01>s<01>+|xjdz c_<00>A|jS)N<>_autoline_reboot_task_numrr<>r)r<>r<>rmrr r!r<>s rrnz!Analyzer.autoline_reboot_task_num<75>sn<00><00><16>t<EFBFBD>8<>9<>-.<2E>D<EFBFBD> *<2A><13>&<26>&<26><1D><19><19><<3C>A<EFBFBD><19><05><05>n<EFBFBD>Q<EFBFBD>/<2F>!<21>3<><14><1F><1F><11>9K<39><1C>6<>6<>!<21>;<3B>6<><<3C><14>-<2D>-<2D>-r c<00><><00>t|d<01>sEd}|jD]}||jdd<02>z }<01>|t|j<00>z |_|jS)N<> _avg_timer<00>time)r<>rr r<>r<>)r\<00>time_sumr2s rruzAnalyzer.avg_time<6D>sV<00><00><16>t<EFBFBD>[<5B>)<29><18>H<EFBFBD><19>Y<EFBFBD>Y<EFBFBD> ,<2C><01><18>A<EFBFBD>E<EFBFBD>E<EFBFBD>&<26><11>O<EFBFBD>+<2B><08> ,<2C>%<25><03>D<EFBFBD>I<EFBFBD>I<EFBFBD><0E>6<>D<EFBFBD>N<EFBFBD><13>~<7E>~<7E>r c<00><><00>t|d<01>sWd}d}|jD],}||jdd<02>z }||jdd<02>z }<02>.||_||_||z|_|j
S)N<> _tokens_numr<00> prompt_tokens<6E>completion_tokens)r<>rr <00>_prompt_tokens_num<75>_completion_tokens_numr<6D>)r\<00>prompt_tokens_sum<75>completion_tokens_sumr2s r<00>
tokens_numzAnalyzer.tokens_num<75>s<><00><00><16>t<EFBFBD>]<5D>+<2B> !<21> <1D>$%<25> !<21><19>Y<EFBFBD>Y<EFBFBD> F<01><01>!<21>Q<EFBFBD>U<EFBFBD>U<EFBFBD>?<3F>1<EFBFBD>%=<3D>=<3D>!<21>%<25><11><15><15>/B<>1<EFBFBD>)E<>E<>%<25> F<01>'8<>D<EFBFBD> #<23>*?<3F>D<EFBFBD> '<27>0<>3H<33>H<>D<EFBFBD> <1C><13><1F><1F>r c<00>J<00>t|d<01>s |j|jS)Nr<4E>)r<>r<>r<>r<>s rrqzAnalyzer.prompt_tokens_num<75>s <00><00><16>t<EFBFBD>1<>2<> <10>O<EFBFBD>O<EFBFBD><13>&<26>&<26>&r c<00>J<00>t|d<01>s |j|jS)Nr<4E>)r<>r<>r<>r<>s rrrzAnalyzer.completion_tokens_num<75>s <00><00><16>t<EFBFBD>5<>6<> <10>O<EFBFBD>O<EFBFBD><13>*<2A>*<2A>*r c<00>n<00>t|d<01>s|j|jz |_|jS)N<> _avg_tokens)r<>r<>rlr<>r<>s r<00>
avg_tokenszAnalyzer.avg_tokens<6E>s-<00><00><16>t<EFBFBD>]<5D>+<2B>#<23><EFBFBD><EFBFBD><14><1E><1E>?<3F>D<EFBFBD> <1C><13><1F><1F>r c<00>\<00>t|d<01>s|j<00>|_|jS)N<>_cost)r<><00>get_total_costr<74>r<>s rrsz Analyzer.cost<73>s&<00><00><16>t<EFBFBD>W<EFBFBD>%<25><1D>,<2C>,<2C>.<2E>D<EFBFBD>J<EFBFBD><13>z<EFBFBD>z<EFBFBD>r c<00>n<00>t|d<01>s|j|jz |_|jS)N<> _avg_cost)r<>rsrlr<>r<>s rrtzAnalyzer.avg_cost<73>s*<00><00><16>t<EFBFBD>[<5B>)<29>!<21>Y<EFBFBD>Y<EFBFBD><14><1E><1E>7<>D<EFBFBD>N<EFBFBD><13>~<7E>~<7E>r c<00><><00>t|j\}}|j|zdz }|j|zdz }||z}|S)z5
return the average cost of the data
i<>)rrYrqrr)r\<00>prompt_cost_perk<72>completion_cost_perk<72> prompt_cost<73>completion_cost<73>
total_costs rr<>zAnalyzer.get_total_cost<73>sX<00><00>2@<01><04>@R<>@R<>1S<31>.<2E><18>.<2E><1A>,<2C>,<2C>/?<3F>?<3F>$<24>F<> <0B><1E>4<>4<>7K<37>K<>d<EFBFBD>R<><0F> <20>?<3F>2<>
<EFBFBD><19>r c<00><><00>d}|jD]I}|j|<02>s<01>|jdd<03>s<01>(|d}|jdd<06>}||<03>d|<04>d<08>z }<01>K|S) <09>;
return the ratio of the second evaluation
rUr<>rrr<>zNo Eval2 ratio data<74>: rd)rr<>r <00>r\<00>txt_outr2r<00> eval2_ratios rrwz%Analyzer.get_eval2_ratio_each_problemse<00><00><15><07><15><19><19> ?<3F>A<EFBFBD><13><EFBFBD><EFBFBD>q<EFBFBD>!<21>a<EFBFBD>e<EFBFBD>e<EFBFBD>L<EFBFBD><11>&;<3B><1B>I<EFBFBD>,<2C><07><1F>e<EFBFBD>e<EFBFBD>M<EFBFBD>3H<33>I<> <0B><17><17>+<2B>><3E>><3E><07>  ?<3F>
<17>r c<00><><00>d}|jD]X}|jrH|j|<02>s<01>!|jdd<03>s<01>4|d}|jdd<06>}||<03>d|<04>d<08>z }<01>Wd }<01>Z|S)
r<EFBFBD>rUr<>rr<00> Eval2b_ratiozNo Eval2b ratio datar<61>rdzNo Eval2b data)rrgr<>r r<>s rrxz&Analyzer.get_eval2b_ratio_each_problemss<00><00><15><07><15><19><19> +<2B>A<EFBFBD><13> <20> <20><17>?<3F>?<3F>1<EFBFBD>%<25>!<21>%<25>%<25> <0C>Q<EFBFBD>*?<3F><1F> <09>l<EFBFBD>G<EFBFBD>"#<23>%<25>%<25><0E>8N<38>"O<>K<EFBFBD><1B>W<EFBFBD>k<EFBFBD>B<>B<>G<EFBFBD>*<2A><07> +<2B><17>r c<00><><00>d}d}d}d\}}}}d\}} }
} |js |jrd} nd} |jD<00>]} | dk(r|j| <0A>dk7}n:| dk(r*| j dd<02>dkDxs| j dd<08>}n t d <09><00>||j | <0A>s|rd
ndz }||j | <0A>s|sd
ndz }||j | <0A>r|rd
ndz }||j | <0A>r|sd
ndz }||j | <0A>r| j d d<02>r|rd
ndz }| |j | <0A>r| j d d<02>r|sd
ndz } ||j | <0A>r'| j d d<02>r|j| <0A>r|rd
ndz }|
|j | <0A>r'| j d d<02>r|j| <0A>r|sd
ndz }
|js<02><01><>||j | <0A>r'| j d d<02>r|j| <0A>r|rd
ndz }| |j | <0A>r'| j d d<02>r|j| <0A>r|sd
ndz } <0B><02>||z}||z}|| z}||
z}|jr|| z}|| dk(rd nd dzz }|d| dk(rdndzdzz }| dk(r|dz }n
| dk(r|dz }|d|||fzz }|d|||fzz }|d| ||fzz }|d|
||fzz }|jr |d| |fzz }|S)a<>
return the debug info table:
| un-debugged | debugged | total |
failed | - | 2 | 2 |
Eval0 | 3 | 5 | 8 |
Eval1 | 2 | 2 | 4 |
Eval2 | 1 | 0 | 1 |
if have Eval2b:
Eval2b | 1 | 0 | 1 |
rUr)rrrr<00> funcdebug<75>syndebugr<67>rFz(mode should be 'syndebug' or 'funcdebug'rr<><00> SYNTACTIC<49>
FUNCTIONALz debug info table:
z(debugged here means zsyntactic debuggingzfunctional debuggingz)
z6 | un-synt-debugged | synt-debugged | total |
z6 | un-func-debugged | func-debugged | total |
zfailed | %16d | %13d | %5d |
zEval0 | %16d | %13d | %5d |
zEval1 | %16d | %13d | %5d |
zEval2 | %16d | %13d | %5d |
zEval2b | %16d | %13d | %5d |
)
rmrorr<>r <00>
ValueErrorr<EFBFBD>r!rgr<>)r\r<><00>failed_debugged_num<75>failed_undebugged_num<75>Eval0_debugged_num<75>Eval1_debugged_num<75>Eval2_debugged_num<75>Eval2b_debugged_num<75>Eval0_undebugged_num<75>Eval1_undebugged_num<75>Eval2_undebugged_num<75>Eval2b_undebugged_num<75>moder2<00>debugged<65>
failed_num<EFBFBD> Eval0_num<75> Eval1_num<75> Eval2_num<75>
Eval2b_nums rrvzAnalyzer.get_debug_infotable"s<><00><00><15><07><1F><1B> !<21><1D>Zd<5A>W<><1A>.<2E>0B<30>DW<44>bl<62>_<><1C>2<>4H<34>J_<4A> <0F> "<22> "<22>d<EFBFBD>&=<3D>&=<3D><1E>D<EFBFBD><1D>D<EFBFBD><15><19><19> N<02>A<EFBFBD><13>z<EFBFBD>!<21> <20>O<EFBFBD>O<EFBFBD>A<EFBFBD>.<2E>!<21>3<><08><15><1B>$<24><1D>E<EFBFBD>E<EFBFBD>.<2E>!<21>4<>q<EFBFBD>8<>Z<>Q<EFBFBD>U<EFBFBD>U<EFBFBD>><3E>SX<53>=Y<><08> <20>!K<>L<>L<> <1F>D<EFBFBD>O<EFBFBD>O<EFBFBD>A<EFBFBD>,><3E>8<EFBFBD>1<EFBFBD>QR<51> R<> <1F> !<21>d<EFBFBD>o<EFBFBD>o<EFBFBD>a<EFBFBD>.@<40>(<28>Q<EFBFBD>YZ<59> Z<> !<21> <1E>t<EFBFBD><EFBFBD><EFBFBD>q<EFBFBD>'9<>h<EFBFBD>!<21>A<EFBFBD> M<> <1E> <20><14><1F><1F><11>);<3B>X<EFBFBD>A<EFBFBD>TU<54> U<> <20> <1E>t<EFBFBD><EFBFBD><EFBFBD>q<EFBFBD>'9<>a<EFBFBD>e<EFBFBD>e<EFBFBD>L<EFBFBD>RS<52>>T<>Ya<59>!<21>gh<67> h<> <1E> <20><14><1F><1F><11>);<3B><01><05><05>l<EFBFBD>TU<54>@V<>`h<>A<EFBFBD>op<6F> p<> <20> <1E>t<EFBFBD><EFBFBD><EFBFBD>q<EFBFBD>'9<>a<EFBFBD>e<EFBFBD>e<EFBFBD>L<EFBFBD>RS<52>>T<>Y]<5D>Yh<59>Yh<59>ij<69>Yk<59>px<70>!<21>~<> <> <1E> <20><14><1F><1F><11>);<3B><01><05><05>l<EFBFBD>TU<54>@V<>[_<>[j<>[j<>kl<6B>[m<>w<77>A<EFBFBD>GH<02> H<02> <20><13> <20> <20>#<23>D<EFBFBD>O<EFBFBD>O<EFBFBD>A<EFBFBD>,><3E>1<EFBFBD>5<EFBFBD>5<EFBFBD><1C>WX<57>CY<43>^b<>^n<>^n<>op<6F>^q<>v~<7E>q<EFBFBD>EF<02>F<02>#<23>%<25>d<EFBFBD>o<EFBFBD>o<EFBFBD>a<EFBFBD>.@<40>Q<EFBFBD>U<EFBFBD>U<EFBFBD><<3C>YZ<59>E[<5B>`d<>`p<>`p<>qr<71>`s<>~F<02><11>MN<02>N<02>%<25># N<02>$)<29>+@<40>@<40>
<EFBFBD>&<26>)=<3D>=<3D> <09>&<26>)=<3D>=<3D> <09>&<26>)=<3D>=<3D> <09> <0F> <1C> <1C>,<2C>/D<>D<>J<EFBFBD><0F>4<EFBFBD>:<3A>#5<>K<EFBFBD><<3C>Ka<4B>a<>a<><07><0F>*<2A>t<EFBFBD>z<EFBFBD>GY<47>.C<>_u<5F>v<>y~<7E>~<7E>~<7E><07> <0F>:<3A> <1D> <13>P<> P<>G<EFBFBD> <11>[<5B> <20> <13>P<> P<>G<EFBFBD><0F>5<>9N<39>Pc<50>eo<65>8p<38>p<>p<><07><0F>5<>9M<39>Oa<4F>cl<63>8m<38>m<>m<><07><0F>5<>9M<39>Oa<4F>cl<63>8m<38>m<>m<><07><0F>5<>9M<39>Oa<4F>cl<63>8m<38>m<>m<><07> <0F> <1C> <1C> <13>9<>=R<>Tg<54>is<69><t<>t<> t<>G<EFBFBD><16>r c<00><><00>d}d}d}d}|jD]Q}|j|<05>s<01>t|jdd<01><00>}||kDs|dk(r|}||ks|dk(r|}||z }|dz }<04>S|dk7r||z nd}|dk7r^|xjdz c_|xjd|zz c_|xjd|zz c_|xjd|zz c_yy) Nr_r<00>iv_runing_timerz
#### iv_runing_time info:
zavg_time: %.2fs
zmax_time: %.2fs
zmin_time: %.2fs
)rr<>r<>r rN)r\<00>max_time<6D>min_time<6D>
total_time<EFBFBD>cntr2r<>rus r<00>get_iv_runing_time_infoz Analyzer.get_iv_runing_time_info^s<><00><00><16><08><16><08><18>
<EFBFBD><0F><03><15><19><19> <19>A<EFBFBD><13><EFBFBD><EFBFBD>q<EFBFBD>!<21><1C>Q<EFBFBD>U<EFBFBD>U<EFBFBD>#3<>S<EFBFBD>9<>:<3A><04><18>8<EFBFBD>O<EFBFBD><18>S<EFBFBD><1F>#<23>H<EFBFBD><18>8<EFBFBD>O<EFBFBD><18>S<EFBFBD><1F>#<23>H<EFBFBD><1A>d<EFBFBD>"<22>
<EFBFBD><13>q<EFBFBD><08><03> <19>(+<2B>a<EFBFBD>x<EFBFBD>:<3A><03>#<23>S<EFBFBD><08> <0E>!<21>8<EFBFBD> <10>L<EFBFBD>L<EFBFBD>;<3B> ;<3B>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>/<2F>(<28>:<3A> :<3A>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>/<2F>(<28>:<3A> :<3A>L<EFBFBD> <10>L<EFBFBD>L<EFBFBD>/<2F>(<28>:<3A> :<3A>L<EFBFBD> r c<00>`<00>d|j<00>vr|dSd|j<00>vr|dSy)Nr<4E>r<>F)r<><00>r\rs rr<>zAnalyzer.Eval0_passts7<00><00> <17>4<EFBFBD>9<EFBFBD>9<EFBFBD>;<3B> &<26><17> <0C>%<25> %<25> <17>4<EFBFBD>9<EFBFBD>9<EFBFBD>;<3B> &<26><17>
<EFBFBD>#<23> #<23>r c<00>&<00>|jdd<02>S)Nr<4E>F<>r rs rr<>zAnalyzer.Eval1_pass|s<00><00><13>x<EFBFBD>x<EFBFBD> <0C>e<EFBFBD>,<2C>,r c<00><><00>|ddk(r*|jdg<00>gd<04>k(s|j|<01>ryy|j|<01>S)<07>!check if one data pass the Eval 2r<00>m2014_q3<71>Eval2_failed_mutant_idxes)<06><00><00><00><00> r<>TF)r <00>loose_Eval2_passrs rr!zAnalyzer.Eval2_passsK<00><00> <10> <09>?<3F>j<EFBFBD> (<28><13>x<EFBFBD>x<EFBFBD>3<>R<EFBFBD>8<>N<EFBFBD>J<>d<EFBFBD>Nc<4E>Nc<4E>dh<64>Ni<4E><1B><1C><18>(<28>(<28><14>.<2E> .r c<00>L<00>|j|<01>xr|jdd<02>S<00>N<>checklist_workedF<64>r<>r rs r<00>Eval0_scencheck_passzAnalyzer.Eval0_scencheck_pass<73><00>"<00><00><13><EFBFBD><EFBFBD>t<EFBFBD>$<24>L<><14><18><18>2D<32>e<EFBFBD>)L<>Lr c<00>L<00>|j|<01>xr|jdd<02>Sr<00>r<>r rs r<00>Eval1_scencheck_passzAnalyzer.Eval1_scencheck_pass<73>rr c<00>L<00>|j|<01>xr|jdd<02>Sr<00>r!r rs r<00>Eval2_scencheck_passzAnalyzer.Eval2_scencheck_pass<73>rr c<00>N<00>|j|<01>xr|jdd<02> Srrrs r<00>Eval0_noscencheck_passzAnalyzer.Eval0_noscencheck_pass<73><00>%<00><00><13><EFBFBD><EFBFBD>t<EFBFBD>$<24>R<>d<EFBFBD>h<EFBFBD>h<EFBFBD>7I<37>5<EFBFBD>.Q<>*Q<>Rr c<00>N<00>|j|<01>xr|jdd<02> Srrrs r<00>Eval1_noscencheck_passzAnalyzer.Eval1_noscencheck_pass<73>rr c<00>N<00>|j|<01>xr|jdd<02> Srrrs r<00>Eval2_noscencheck_passzAnalyzer.Eval2_noscencheck_pass<73>rr c<00>P<00>|j|<01>xr|j|<01>dk(S<00>Nr)r<>r<>rs r<00>Eval0_nodebug_passzAnalyzer.Eval0_nodebug_pass<73><00>$<00><00><14><0F><0F><04>%<25>G<>D<EFBFBD>O<EFBFBD>O<EFBFBD>D<EFBFBD>,A<>Q<EFBFBD>,F<>Gr c<00>P<00>|j|<01>xr|j|<01>dk(Sr#)r<>r<>rs r<00>Eval1_nodebug_passzAnalyzer.Eval1_nodebug_pass<73>r%r c<00>P<00>|j|<01>xr|j|<01>dk(Sr#)r!r<>rs r<00>Eval2_nodebug_passzAnalyzer.Eval2_nodebug_pass<73>r%r c<00>Z<00>|ddk(r|j|<01>ryy|j|<01>S)rrrTF)r!<00>loose_Eval2b_passrs rr<>zAnalyzer.Eval2b_pass<73>s5<00><00> <10> <09>?<3F>j<EFBFBD> (<28><13><EFBFBD><EFBFBD>t<EFBFBD>$<24><1B><1C><18>)<29>)<29>$<24>/<2F> /r c<00>z<00>|jdd<00><00>|dS|jdd<03>|jdd<03>zS)Nr<4E><00> debug_iter_ivr<00> debug_iter_pyrrs rr<>zAnalyzer.debug_iter<65>s?<00><00> <0F>8<EFBFBD>8<EFBFBD>L<EFBFBD>$<24> '<27> 3<><17> <0C>%<25> %<25><17>8<EFBFBD>8<EFBFBD>O<EFBFBD>Q<EFBFBD>/<2F>$<24>(<28>(<28>?<3F>A<EFBFBD>2N<32>N<> Nr c<00><><00>|jdd<02>ry|jdd<05>}|<02>y|jd<06>\}}t|<03>t|<04>}}t|<03>t|<04>z |jk\ryy)<07>pass for 9/10, 8/10 and 4/5r!FTr<54>Nr<4E><00>r r<><00>intr<74>r[<00>r\rr<>r<>r<>s rrzAnalyzer.loose_Eval2_pass<73>st<00><00> <0F>8<EFBFBD>8<EFBFBD>L<EFBFBD>%<25> (<28><17><18>H<EFBFBD>H<EFBFBD>]<5D>D<EFBFBD>1<> <09> <14> <1C><18>!*<2A><1F><1F><13>!5<><1E> <09>;<3B>!$<24>Y<EFBFBD><1E><13>[<5B>1A<31>;<3B> <09> <10><19> <1B>e<EFBFBD>K<EFBFBD>0<> 0<>D<EFBFBD>4E<34>4E<34> E<><17>r c<00><><00>|jdd<02>ry|jdd<05>}|<02>y|jd<06>\}}t|<03>t|<04>}}t|<03>t|<04>z |jk\ryy)r0r<>FTr<54>Nr<4E>r1r3s rr+zAnalyzer.loose_Eval2b_pass<73>st<00><00> <0F>8<EFBFBD>8<EFBFBD>M<EFBFBD>5<EFBFBD> )<29><17><18>H<EFBFBD>H<EFBFBD>^<5E>T<EFBFBD>2<> <09> <14> <1C><18>!
__module__<EFBFBD> __qualname__rWrr<>rXr<><00>propertyr%rlrir<>rhrkrjrprnrur<>rqrrr<>rsrtr<>rwrxrvrr<>r<>r!rrrrrr!r$r'r)r<>r<>rr+<00> __classcell__<5F>r]s@rrr<00>s<><00><><00>)<29>MX<01>b%<25> <16><14>(<0E><1C><0E><1C><0E><1F><0E><1F>
<0E>"<22><0E>"<22><0E>"<22><0E>"<22><0E>$<24><0E>$<24><0E>#<23><0E>#<23><0E>#<23><0E>#<23><0E>#<23><0E>#<23><0E>.<2E><0E>.<2E><0E><1E><0E><1E><0E>
 <20><0E>
 <20><0E>'<27><0E>'<27>
<0E>+<2B><0E>+<2B>
<0E> <20><0E> <20>
<0E><1A><0E><1A>
<0E><1E><0E><1E>
<1A>
<17> <17>:<17>x;<3B>,<19>-<2D>
/<2F>M<01>M<01>M<01>S<01>S<01>S<01>H<01>H<01>H<01>
0<>O<01> <19> r rc<00><><00><00>eZdZdefdef<02>fd<03> Zed<04><00>Zgd<05>fd<06>Zddefd<08>Z d <09>Z
d
<EFBFBD>Z d <0B>Z dd e fd <0A>Zede de de fd<11><04>Z<10>xZS)r#N<> group_dirc<00>\<00><01>t<00>|<00>d<01><02>g|_i|_||_d|_d|_||_tj|<01>D]^}tjj||d<05>}tjj|<04>s<01>D|jj|<04><00>`|jD]9}|jjtt!j"|<05><00><00><00>;||j%d<06>s t'|j(<00>t+d<07><00>y) za
group_dir: includes many subdirs, each subdir contains a Chatbench_RunInfo.json
r)<01>id_keyFrUzChatbench_RunInfo.json<6F>numz*The total_num of the data are not the sameN)rVrW<00> runinfo_paths<68>result<6C>pass_at_k_kvalues<65> exclude_debugr%r<r+<00>listdirr,r-<00>existsr<73><00> dictlistsrrr<00> all_equalr"r?r<>)r\r<rB<00>subdir<69> path_runinfor,r]s <20>rrWzMultiAnalyzer.__init__<5F>s<><00><><00> <0E><07><18> <09><18>*<2A><1F><04><1A><18><04> <0B>!2<><04><1E>"<22><04><1A><1A><04> <0A>"<22><04><0E><18>j<EFBFBD>j<EFBFBD><19>+<2B> 8<>F<EFBFBD><1D>7<EFBFBD>7<EFBFBD><<3C><<3C> <09>6<EFBFBD>;S<>T<>L<EFBFBD><11>w<EFBFBD>w<EFBFBD>~<7E>~<7E>l<EFBFBD>+<2B><14>"<22>"<22>)<29>)<29>,<2C>7<> 8<><19>&<26>&<26> E<01>D<EFBFBD> <10>N<EFBFBD>N<EFBFBD> !<21> !<21>(<28>2<EFBFBD>+<<3C>+<<3C>T<EFBFBD>+B<>"C<> D<> E<01> <0C><13>~<7E>~<7E>e<EFBFBD>$<24> <11>$<24>(<28>(<28>O<EFBFBD><1C>I<>J<> J<>%r c<00><00>|jSr
)rFr<>s r<00> analyzerszMultiAnalyzer.analyzers<72>s <00><00><13>~<7E>~<7E>r )<03>Eval0<6C>Eval1<6C>Eval2c<00><><00>|jdj}|j}|D]}|D]}|j||<05><00><00>|xjdz c_|xjdz c_|xjd|jdjzz c_|xjdt |j<00>zz c_|xjdz c_|j j<00>D]&\}}|xjd||dz||zfzz c_<00>(|xjd |jdjzz c_y)
Nrz5
########## Analyze of Chatbench_RunInfos ##########
z
#### basic info:
ztotal number of tasks: %d
zsample numbers: %d
z
#### pass@k ratios:
z%s: %.2f%% (%.1f)
<EFBFBD>drf) rFrlrB<00>Evalx_ratio_passatkr%r<>rA<00>itemsr[)r\<00>Evals<6C> num_tasks<6B>pass_at<61>Eval_idx<64> pass_at_k<5F>key<65>values rrzMultiAnalyzer.run<75>s'<00><00><18>N<EFBFBD>N<EFBFBD>1<EFBFBD>%<25>/<2F>/<2F> <09><16>(<28>(<28><07><1D> ><3E>H<EFBFBD>$<24> ><3E> <09><14>(<28>(<28><18>9<EFBFBD>=<3D> ><3E> ><3E> <0A> <0A> <0A>R<>R<> <0A> <0C> <0A> <0A>/<2F>/<2F> <0A> <0C> <0A> <0A>6<><14><1E><1E><01>9J<39>9T<39>9T<39>T<>T<> <0A> <0C> <0A> <0A>/<2F>#<23>d<EFBFBD>n<EFBFBD>n<EFBFBD>2E<32>E<>E<> <0A> <0C> <0A> <0A>2<>2<> <0A><1E>+<2B>+<2B>+<2B>+<2B>-<2D> W<01>J<EFBFBD>C<EFBFBD><15> <10>M<EFBFBD>M<EFBFBD>2<>c<EFBFBD>5<EFBFBD><13>9<EFBFBD>e<EFBFBD>I<EFBFBD>o<EFBFBD>5V<35>V<> V<>M<EFBFBD> W<01> <0C> <0A> <0A>D<>t<EFBFBD>~<7E>~<7E>VW<56>GX<47>Ge<47>Ge<47>e<>e<> r r,c<00><><00>|<01>t}t|d<01>5}|j|j<00>|jdt j
<00>z<00>ddd<00>y#1swYyxYwr?)rCrBrDr%rErF)r\r,rGs rr(zMultiAnalyzer.savesY<00><00> <0F><<3C>#<23>D<EFBFBD> <11>$<24><03>_<EFBFBD> @<01><01> <0A>G<EFBFBD>G<EFBFBD>D<EFBFBD>M<EFBFBD>M<EFBFBD> "<22> <0A>G<EFBFBD>G<EFBFBD>)<29>U<EFBFBD>^<5E>^<5E>-=<3D>><3E> ?<3F> @<01> @<01> @<01>s <00>AA <03> A)c<00>J<00>d|_d|_|jD]@}|xj|jz c_|xj|jz c_<00>B|jt|j<00>z |jdjz |_|jt|j<00>z |jdjz |_|xjd|j
zz c_|xjd|j zz c_y)Nrrarb)rqrrrFr<>rl<00>avg_prompt_tokens<6E>avg_completion_tokensr%<00>r\r1s rr&z%MultiAnalyzer.get_avg_tokens_one_task s<><00><00>!"<22><04><1E>%&<26><04>"<22><1C><0E><0E> I<01>H<EFBFBD> <10> "<22> "<22>h<EFBFBD>&@<40>&@<40> @<40> "<22> <10> &<26> &<26>(<28>*H<>*H<> H<> &<26> I<01>"&<26>!7<>!7<>#<23>d<EFBFBD>n<EFBFBD>n<EFBFBD>:M<>!M<>PT<50>P^<5E>P^<5E>_`<60>Pa<50>Pk<50>Pk<50>!k<><04><1E>%)<29>%?<3F>%?<3F>#<23>d<EFBFBD>n<EFBFBD>n<EFBFBD>BU<42>%U<>X\<5C>Xf<58>Xf<58>gh<67>Xi<58>Xs<58>Xs<58>%s<><04>"<22> <0C> <0A> <0A>6<><14>9O<39>9O<39>O<>O<> <0A> <0C> <0A> <0A>:<3A>T<EFBFBD>=W<>=W<>W<>W<> r c<00><><00>d|_d|_|jD]@}|xj|jz c_|xj|jz c_<00>B|xjt |j<00>zc_|xjt |j<00>zc_|xj d|jzz c_|xj d|jzz c_y)Nr_z'passed with functional corrector: %.1f
z)passed with autoline reboot action: %.1f
)<07>pass_by_corrected<65> pass_by_discrFrprnr<>r%r^s rr'z+MultiAnalyzer.get_avg_pass_by_disc_and_corrs<><00><00>!$<24><04><1E><1F><04><19><1C><0E><0E> C<01>H<EFBFBD> <10> "<22> "<22>h<EFBFBD>&<<3C>&<<3C> <<3C> "<22> <10> <1D> <1D><18>!B<>!B<> B<> <1D> C<01> <0A><1E><1E>#<23>d<EFBFBD>n<EFBFBD>n<EFBFBD>"5<>5<><1E> <0C><19><19>S<EFBFBD><14><1E><1E>0<>0<><19> <0C> <0A> <0A>C<>d<EFBFBD>F\<5C>F\<5C>\<5C>\<5C> <0A> <0C> <0A> <0A>E<><04>HY<48>HY<48>Y<>Y<> r c<00><><00>t<00>|_|jj|jdj<00>yr#)r<00> result_dict<63>create_empty_set_via_taskidsrFr)r<>s r<00>renew_result_dictzMultiAnalyzer.renew_result_dicts0<00><00>)<29>+<2B><04><18> <0C><18><18>5<>5<>d<EFBFBD>n<EFBFBD>n<EFBFBD>Q<EFBFBD>6G<36>6T<36>6T<36>Ur rUc<00><><00>tt|dz<00>std|z<00><00>|}t|j<00>}d}|jdj
D]<5D>}t|d<04>r|j j|<06>}d}|jD]^} t| d|z<00>}
|
| j|<06><00>s<01>+|jr#| j| j|<06><00>r<01>Z|dz }<08>`|j|||<08>} || z }t|d<04>s<01><>|d|z<| |d||fz<<00><>||jdjz}||jd||fz<y )
zA
return the ratio of the Eval0 pass under pass@k
<20>_passz/The function %s_pass is not defined in Analyzerrrcz%s_passrz %s_pass_numz %s_pass_at_%dN)r<>rr<>r<>rFr)rc<00>access_data_via_taskid<69>getattrrCr<><00>pass_at_k_under_nrlrA) r\rVrU<00>k<>n<>Evalx_pass_at_k_totalr<00> task_result<6C>pass_numr<00>Evalx_pass_funcrWs rrQz!MultiAnalyzer.Evalx_ratio_passatk#sn<00><00>
<17>x<EFBFBD><18>G<EFBFBD>!3<>4<><1C>N<>QY<51>Y<>Z<> Z<> <13><01> <0F><04><0E><0E> <1F><01> !<21><1D><1B>~<7E>~<7E>a<EFBFBD>(<28>5<>5<> G<01>G<EFBFBD><16>t<EFBFBD>]<5D>+<2B>"<22>.<2E>.<2E>E<>E<>g<EFBFBD>N<> <0B><18>H<EFBFBD> <20>N<EFBFBD>N<EFBFBD> &<26><08>")<29>(<28>I<EFBFBD>h<EFBFBD>4F<34>"G<><0F>"<22>#B<>8<EFBFBD>#B<>#B<>7<EFBFBD>#K<>L<> <20>.<2E>.<2E>3F<33>8<EFBFBD>3F<33>3F<33>Gf<47>x<EFBFBD>Gf<47>Gf<47>gn<67>Go<47>3p<33> <20>A<EFBFBD> <0A><08>  &<26><1D>.<2E>.<2E>q<EFBFBD>!<21>X<EFBFBD>><3E>I<EFBFBD> !<21>Y<EFBFBD> .<2E> !<21><16>t<EFBFBD>]<5D>+<2B>6><3E> <0B>M<EFBFBD>(<28>2<>3<>=F<> <0B>O<EFBFBD>X<EFBFBD>q<EFBFBD>M<EFBFBD>9<>:<3A>! G<01>" <1E><14><1E><1E><01>!2<>!<<3C>!<<3C><<3C><1D>7L<37><04> <0B> <0B>O<EFBFBD>x<EFBFBD><11>m<EFBFBD>3<>4r rlrk<00>cc<00>h<00>dtj||z
|<01>tj||<01>z z
S)aj
- n: total number of samples
- k: number of samples we want to pick
- c: number of samples passed
- output: pass@k under n
- return the pass ratio under pass@k for n times; we have n samples, pass_num samples passed. Now we want to calculate the possibility that we pick k samples and at least one of them passed
r)<02>math<74>comb)rlrkrqs rrjzMultiAnalyzer.pass_at_k_under_nCs-<00><00><11>D<EFBFBD>I<EFBFBD>I<EFBFBD>a<EFBFBD><01>c<EFBFBD>1<EFBFBD>%<25><04> <09> <09>!<21>Q<EFBFBD><0F>7<>8<>8r r
)rLr)r5r6r7<00>K_LISTrLrWr8rKrr(r&r'rer2rQ<00> staticmethodrjr9r:s@rr#r#<00>s<><00><><00>%)<29>v<EFBFBD>K<01><13>K<01>.<0E><1E><0E><1E>4<> f<01>@<01><03>@<01> X<01> Z<01>V<01>M<01>C<EFBFBD>M<01>@<12>9<>C<EFBFBD>9<>3<EFBFBD>9<>#<23>9<><12>9r r#c <00>~<00>gd<01>}tt|<00>}g}|jD]<5D>}|jD]w}|j |<04>s<01>|j |<04>s<01>'|j dd<03>}|<05><01><|jd<04>\}}t|<06>t|<07>z }|j|<08><00>y<00><>ddl
m } ddl }
|
jd<06>| j|dd<08> <09>| jd
<EFBFBD>| j!d <0B>| j#t%dd d <0A><00>| j'dddddd<12><13>| j)d<14>| j+t%d<15>D<00>cgc]}d|z<00><02> c}<04>| j-t.j0j3t4d<17><00>t7t.j0j3t4d<18>d<19>5} t%d<15>D]R}d|z}t9|D<00> cgc] } t;j<| dz<00>|k(s<01>| <0C><02>"c} <0C>} | j?d|| fz<00><00>T ddd<03>t7t.j0j3t4d<1B>d<19>5} |D]}| j?d|z<00><00> ddd<03>| jA<00>ycc}wcc} w#1swY<00>wxYw#1swY<00>0xYw)zdraw Eval2 histogram)r<00>r<>r<>Nr<4E>rr<>r<>r<>r<>z distribution of Eval2 (Baseline)znumber of tasksi<73><00>2T<>both<74> lightgray<61>-g<00>?)<05>which<63>axis<69>color<6F> linestyle<6C> linewidthr<68><00> g<><67><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?zeval2_histogram_Baseline.pngzeval2_ratios_Baseline.txtr@z %.1f, %d
zeval2_ratios_bin_Baseline.txtz%.2f
)!r#r$rFrr<>r<>r r<>r<>r<>r<>r<>r<>r<>r<><00>title<6C>ylabel<65>yticksr<73><00>grid<69>xlabel<65>xticksr<73>r+r,r-r.rBr<>rs<00>floorrDr<>)r3r4r<>r1r2r<>r<>r<>r<>r<>r<>rG<00>j<> ratio_nums r<00>Eval2_histogramr<6D>NsX<00><00> <15>F<EFBFBD>"<22>9<EFBFBD>f<EFBFBD>5<>N<EFBFBD> <0F>F<EFBFBD>"<22>,<2C>,<2C> %<25><08><19><1D><1D> %<25>A<EFBFBD><17>"<22>"<22>1<EFBFBD>%<25>(<28>*=<3D>*=<3D>a<EFBFBD>*@<40><1D>E<EFBFBD>E<EFBFBD>-<2D><14>6<> <09><1C>$<24><1C>)2<><1F><1F><13>)=<3D>&<26> <09>;<3B><1D>i<EFBFBD>(<28>5<EFBFBD><1B>+=<3D>=<3D><05><16> <0A> <0A>e<EFBFBD>$<24> %<25> %<25>$<24><15><0E>N<EFBFBD>N<EFBFBD>5<EFBFBD><19><07>H<EFBFBD>H<EFBFBD>V<EFBFBD>"<22>E<EFBFBD>H<EFBFBD>*<2A><07>I<EFBFBD>I<EFBFBD>0<>1<><07>J<EFBFBD>J<EFBFBD> <20>!<21><07>J<EFBFBD>J<EFBFBD>u<EFBFBD>Q<EFBFBD><03>R<EFBFBD> <20>!<21><07>H<EFBFBD>H<EFBFBD>T<EFBFBD><16>f<EFBFBD>K<EFBFBD>3<EFBFBD>Z]<5D>H<EFBFBD>^<5E><07>J<EFBFBD>J<EFBFBD>w<EFBFBD><17><07>J<EFBFBD>J<EFBFBD><15>r<EFBFBD><19>+<2B>A<EFBFBD><03>a<EFBFBD><07>+<2B>,<2C><07>K<EFBFBD>K<EFBFBD><02><07><07> <0C> <0C>/<2F>1O<31>P<>Q<>
<0E>b<EFBFBD>g<EFBFBD>g<EFBFBD>l<EFBFBD>l<EFBFBD>-<2D>/J<>K<>S<EFBFBD> Q<>7<>UV<55><16>r<EFBFBD><19> 7<>A<EFBFBD><17>!<21>G<EFBFBD>E<EFBFBD><1B><06>H<>1<EFBFBD>$<24>*<2A>*<2A>Q<EFBFBD>r<EFBFBD>T<EFBFBD>2B<32>a<EFBFBD>2G<32>Q<EFBFBD>H<>I<>I<EFBFBD> <0A>G<EFBFBD>G<EFBFBD>L<EFBFBD>E<EFBFBD>9<EFBFBD>#5<>5<> 6<> 7<>7<>
<0E>b<EFBFBD>g<EFBFBD>g<EFBFBD>l<EFBFBD>l<EFBFBD>-<2D>/N<>O<>QT<51> U<>"<22>YZ<59><17> "<22>A<EFBFBD> <0A>G<EFBFBD>G<EFBFBD>H<EFBFBD>q<EFBFBD>L<EFBFBD> !<21> "<22>"<22><08>I<EFBFBD>I<EFBFBD>K<EFBFBD><4B>,<2C><>I<01>7<>7<><37> "<22>"<22>s6<00> J<08>J'<03>, J" <0C> J" <0C>J'<03>'J3<03>"J'<03>'J0<07>3J<c<00>^<00>tt<00>}g}|jD]L}g}|jD](}|j |<04>s<01>|j |d<00><00>*|j |<03><00>Ng}t |<01>D]\}}d|vs<01> |j |<06><00>td<03>t|<05>y)Nr<00>countbcdzcountbcd passed at:)r#r$rFrr!r<><00> enumerater")r4<00>pass_taskids_listr1<00> pass_taskidsr<00>idxs<78>idxs r<00>task_eval2passtimes_analyzer<65>{s<><00><00>"<22>9<EFBFBD>-<2D>N<EFBFBD><1A><15>"<22>,<2C>,<2C>/<2F><08><19> <0C><1C>M<EFBFBD>M<EFBFBD> 5<>D<EFBFBD><17>"<22>"<22>4<EFBFBD>(<28><1C>#<23>#<23>D<EFBFBD><19>O<EFBFBD>4<> 5<> <1A> <20> <20><1C>.<2E> /<2F> <0E>D<EFBFBD>&<26>'8<>9<><1D><19><03>\<5C> <15><1C> %<25> <10>K<EFBFBD>K<EFBFBD><03> <1C><1D>
<EFBFBD>
<1F> <20> <09>$<24>Kr <00>__main__)rU)!<21>__doc__<5F> loader_saverr<00> utils.utilsrE<00> data.probsetrrr<00>LLM_callrr+rsrZr.r/r,r-rCrurr$rr;r=r rLrRrr#r<>r<>r5r r r<00><module>r<>s<><00><01><04><1A><1B>@<40>@<40>#<23> <09> <0B><12> <0C><1F><12>$<24><10><15>7<EFBFBD>7<EFBFBD><<3C><<3C> 2<>4D<34>E<><10>
<0B><13><06>F<02><16> ;<3B> <09><13>
e <09>N<1A>
<<3C><<3C>#<23>n<EFBFBD>,<2C><<3C>2^<19>~<7E>^<19>@v9<>M<EFBFBD>v9<>p+<10>Z<10>R <0C>z<EFBFBD><19><08>F<EFBFBD>r