Files
cdtestplant_v1/apps/project/tool/__pycache__/xq_parse.cpython-313.pyc

58 lines
11 KiB
Plaintext
Raw Normal View History

2025-04-29 18:09:00 +08:00
<EFBFBD>
<00><>h<>)<00><00><><00>SSKrSSKrSSKJr SSKJr SSKJr SSKJ r J
r
SSK J r SSK Jr SSKJr "S S
\5r\S :XaS r\"\5r\R-S 5 gg)<0E>N)<01>Document)<01> Paragraph)<01> ImagePart)<02>_Cell<6C>Table)<01>CT_Tbl)<01>CT_P)<01> OrderedDictc<00>n<00>\rSrSrSrSrSrSrSrSr Sr
S \ S
\ 4S jr S \ S
\ 4S jrS rSrSrg)<11>DocxChapterExtractor<6F> c<00>:<00>[R"U5Ulg<00>N)<03>docxr<00>doc)<02>self<6C> docx_paths <20>?E:\pycharmProjects\cdtestplant_v1\apps\project\tool\xq_parse.py<70>__init__<5F>DocxChapterExtractor.__init__ s<00><00><17>=<3D>=<3D><19>+<2B><04><08>c<00><><00>Sn[R"X!5nSnSnU(a3URS5nURS5R5nXE4$[ SUS35 XE4$)u提取章节编号和标题z'^(\d+(?:\.\d+)*)\s+(.*?)(?:\s*\d+)?\s*$N<><00><00>'z
' no match)<05>re<72>match<63>group<75>strip<69>print)r<00>text<78>patternr<00> chapter_num<75>contents r<00>extract_chapter_info<66>)DocxChapterExtractor.extract_chapter_infosl<00><00><<3C><07><12><08><08><17>'<27><05><1A> <0B><16><07> <10><1F>+<2B>+<2B>a<EFBFBD>.<2E>K<EFBFBD><1B>k<EFBFBD>k<EFBFBD>!<21>n<EFBFBD>*<2A>*<2A>,<2C>G<EFBFBD><1B>#<23>#<23> <12>A<EFBFBD>d<EFBFBD>V<EFBFBD>:<3A>&<26> '<27><1A>#<23>#rc<00>B<00>SU-S-n[R"X25SL$)Nz^(\d+(?:\.\d+)*)\s+z(?:\s*\d+)?\s*$)rr)r<00> chaptera_namer!r"s r<00>if_valid_match<63>#DocxChapterExtractor.if_valid_matchs&<00><00>(<28>=<3D>8<>;M<>M<><07><11>x<EFBFBD>x<EFBFBD><07>&<26>d<EFBFBD>2<>2rc<00><00>/nSnSnURRH<>nURXR5(aMSURR
;a3UR UR5up6URX645 SnMpU(dMyURRU5(dM<>SURR
;dM<>UR UR5upvURXv45 M<> U$)u获取目录结构<E7BB93>F<>tocT) r<00>
paragraphsr)r!<00>style<6C>namer%<00>append<6E>
startswith)r<00> chapter_name<6D> directoryr#<00>flag<61> paragraphr$<00>nums r<00>get_chapter_number<65>'DocxChapterExtractor.get_chapter_number s<><00><00><16> <09><18> <0B><14><04><1D><18><18>,<2C>,<2C>I<EFBFBD><13>"<22>"<22><<3C><1E><1E>@<40>@<40>U<EFBFBD>i<EFBFBD>o<EFBFBD>o<EFBFBD>Nb<4E>Nb<4E>Eb<45>'+<2B>'@<40>'@<40><19><1E><1E>'P<>$<24> <0B><19> <20> <20>+<2B>!7<>8<><1B><04><18><13>t<EFBFBD> <09><0E><0E>1<>1<>+<2B>><3E>><3E>5<EFBFBD>I<EFBFBD>O<EFBFBD>O<EFBFBD>L`<60>L`<60>C`<60>#<23>8<>8<><19><1E><1E>H<> <0C><03><19> <20> <20>#<23><1E>0<>-<2D><19>rc<00><00>0n/nUH<>n[U5S:XaUupVnO[U5S:XaUupVOM.URS5n[U5[U5:<3A>a*UR5 [U5[U5:<3A>aM*Un [[U55H0n
X:n X<>;aSR USU
S-5S0S.X<>'X<>Sn M2 U[U5n X<>;aUU0S.X<>'UR 5nM<> U$) u*将线性章节列表转换为嵌套结构<E7BB93>r<00>.Nr<00>[未命名章节])<03>number<65>title<6C>childrenr@)<06>len<65>split<69>pop<6F>range<67>join<69>copy) r<00>chapter_body_list<73> hierarchy<68>path<74>itemr7r$<00>_<>parts<74> current_level<65>i<>part<72> current_parts r<00>build_hierarchy<68>$DocxChapterExtractor.build_hierarchy0s<00><00><16> <09><11><04>%<25>D<EFBFBD><12>4<EFBFBD>y<EFBFBD>A<EFBFBD>~<7E>"&<26><0F><03>a<EFBFBD><14>T<EFBFBD><19>a<EFBFBD><1E>#<23> <0C><03>W<EFBFBD><18><17>I<EFBFBD>I<EFBFBD>c<EFBFBD>N<EFBFBD>E<EFBFBD><15>d<EFBFBD>)<29>s<EFBFBD>5<EFBFBD>z<EFBFBD>)<29><14><08><08>
<EFBFBD><16>d<EFBFBD>)<29>s<EFBFBD>5<EFBFBD>z<EFBFBD>)<29>&<26>M<EFBFBD><1A>3<EFBFBD>t<EFBFBD>9<EFBFBD>%<25><01><1B>w<EFBFBD><04><17>,<2C>"%<25>(<28>(<28>5<EFBFBD><16>!<21>a<EFBFBD>%<25>=<3D>"9<>!4<>$&<26>+<16>M<EFBFBD>'<27>
!.<2E> 3<>J<EFBFBD> ?<3F> <0A>&<26>!<21><13>T<EFBFBD><19>+<2B>L<EFBFBD><1B>0<>!<21>$<24> "<22>/<12> <0A>+<2B> <19>:<3A>:<3A><<3C>D<EFBFBD>E&<26>F<19>rc<00><><00>Sn[R"X!5nU(aAURS5R5nURS5R5nXE4$UnSnXE4$)Nu^(.*?)\s*[(](.*?)[)]$rr)rrrr)r<00>sr"rr?<00>ordinals r<00>extract_title_ordinal<61>*DocxChapterExtractor.extract_title_ordinalYsb<00><00>0<><07><12><08><08><17>$<24><05> <10><19>K<EFBFBD>K<EFBFBD><01>N<EFBFBD>(<28>(<28>*<2A>E<EFBFBD><1B>k<EFBFBD>k<EFBFBD>!<21>n<EFBFBD>*<2A>*<2A>,<2C>G<EFBFBD><15>~<7E><1D><16>E<EFBFBD><1A>G<EFBFBD><14>~<7E>rc <00><><00>SSS/S.n[5nX#S'UGH.n[U5S:XaUupVnURU5up<>O+[U5S:XaUupVURU5up<>SnOMWURS5n
Un [ [U
55H<>n SR U
SU S-5n X<>;atU U [U
5S-
:XaUOS U [U
5S-
:XaU OSU [U
5S-
:XaUOS/S
.nSR U
SU 5nX?n U S R U5 X<>U 'X=n M<> X<>US 'X<>US 'XsUS'GM1 US (aUS S$0$)u直接生成树形JSON结构r,<00>ROOT)r>r?r$r@r;rr<Nrr=)r>r?rUr$r@r@r?rUr$r)r
rArVrBrDrEr1)rrG<00>root<6F>node_maprJr7r3<00>chapter_contentr?rUrL<00> parent_node<64>depth<74> current_num<75>new_node<64>
parent_nums r<00>build_json_tree<65>$DocxChapterExtractor.build_json_treegs<><00><00><1C>v<EFBFBD>"<22>"<22>M<><04><1E>=<3D><08><1B><12> <0C>%<25>D<EFBFBD><12>4<EFBFBD>y<EFBFBD>A<EFBFBD>~<7E>59<35>2<><03>?<3F>!%<25>!;<3B>!;<3B>L<EFBFBD>!I<><0E><05>w<EFBFBD><14>T<EFBFBD><19>a<EFBFBD><1E>$(<28>!<21><03>!%<25>!;<3B>!;<3B>L<EFBFBD>!I<><0E><05>"$<24><0F><18><17>I<EFBFBD>I<EFBFBD>c<EFBFBD>N<EFBFBD>E<EFBFBD><1E>K<EFBFBD><1E>s<EFBFBD>5<EFBFBD>z<EFBFBD>*<2A><05>!<21>h<EFBFBD>h<EFBFBD>u<EFBFBD>Z<EFBFBD>e<EFBFBD>a<EFBFBD>i<EFBFBD>'8<>9<> <0B><1E>.<2E>"-<2D>+0<>C<EFBFBD><05>J<EFBFBD><11>N<EFBFBD>+B<><15>I\<5C>/4<><03>E<EFBFBD>
<EFBFBD>Q<EFBFBD><0E>/F<>7<EFBFBD>R<EFBFBD>7<<3C><03>E<EFBFBD>
<EFBFBD>Q<EFBFBD><0E>7N<37>?<3F>UW<55>$&<26>  <16>H<EFBFBD>"%<25><18><18>%<25><06><15>-<2D>!8<>J<EFBFBD>"*<2A>"6<>K<EFBFBD><1F>
<EFBFBD>+<2B>2<>2<>8<EFBFBD><<3C>,4<>[<5B>)<29>&<26>3<> <0B>+<2B> &+<2B>S<EFBFBD>M<EFBFBD>'<27> "<22>'.<2E>S<EFBFBD>M<EFBFBD>)<29> $<24>'6<>S<EFBFBD>M<EFBFBD>)<29> $<24>?&<26>@'+<2B>:<3A>&6<>t<EFBFBD>J<EFBFBD><1F><01>"<22>><3E>B<EFBFBD>>r<00>graphrc<00><><00>URRS5nUHLnURS5H4nURRUn[ U[
5(dM3 g MN g)u判断段落是否图片<E59BBE>
.//pic:pic<69>.//a:blip/@r:embedTF)<06>_element<6E>xpathrO<00> related_parts<74>
isinstancer<00>rrdr<00>images<65>image<67>img_idrOs r<00>is_image<67>DocxChapterExtractor.is_image<67>s]<00><00><16><1E><1E>%<25>%<25>l<EFBFBD>3<><06><1B>E<EFBFBD><1F>+<2B>+<2B>&:<3A>;<3B><06><1A>x<EFBFBD>x<EFBFBD>-<2D>-<2D>f<EFBFBD>5<><04><1D>d<EFBFBD>I<EFBFBD>.<2E>.<2E><1F><<3C><1C>
rc<00><><00>URRS5nUHYnURS5HAnURRUn[ U[
5(dM3UR s s $ M[ g)u&获取图片字节流类型为bytesrfrgN)rhrirOrjrkr<00>blobrls r<00> get_ImagePart<72>"DocxChapterExtractor.get_ImagePart<72>sd<00><00><16><1E><1E>%<25>%<25>l<EFBFBD>3<><06><1B>E<EFBFBD><1F>+<2B>+<2B>&:<3A>;<3B><06><1A>x<EFBFBD>x<EFBFBD>-<2D>-<2D>f<EFBFBD>5<><04><1D>d<EFBFBD>I<EFBFBD>.<2E>.<2E><1F>9<EFBFBD>9<EFBFBD>$<24><<3C><1C>
rc<00> <00>[U[5(aURRnO-[U[5(a UR
nO [ S5eSn/n/nSnUR5GH<>n[U[5(Ga<>[X<>5n U[U5S-
:GaU RX$S:XaSU RR;aSnMkU RX$S-S:XaESU RR;a+X$[U54-n
URU
5 /nUS- nM<>U(ajUR!X<>5(a#URUR#X<>55 GMU RS:waURU R5 GM6GM9GM<U[U5S-
:Xa<>SU RR;a%X$[U54-n
URU
5 U$UR!X<>5(a#URUR#X<>55 GM<>U RS:waURU R5 GM<>GM<>SnGM<>[U[$5(dGMU(dGM/n ['X<>5R(HHn U R*V s/sHo<>RPM nn U RSR-U55 MJ URU 5 GM<> U$s sn f) u<>
根据目录匹配章节内容
parent: docx解析内容, 传入self.doc
directory: 章节目录结构,例如[('4', '工程需求'), ('4.1', '外部接口需求'),
('4.2', '功能需求'), ('4.2.1', '知识库大模型检索问答功能')]
zsomething's not rightrFr<00>HeadingTr,<00> )rkr<00>element<6E>bodyr<00>_tc<74>
ValueError<EFBFBD> iterchildrenr rrAr!r/r0<00>reprr1rprtrr<00>rows<77>cellsrE)r<00>parentr4<00>
parent_elmrN<00> body_listrzr5<00>childr6<00> new_tuple<6C>table<6C>row<6F>cell<6C>row_texts r<00>iter_block_items<6D>%DocxChapterExtractor.iter_block_items<6D>s~<00><00> <16>f<EFBFBD>h<EFBFBD> '<27> '<27><1F><1E><1E>,<2C>,<2C>J<EFBFBD> <17><06><05> &<26> &<26><1F><1A><1A>J<EFBFBD><1C>4<>5<> 5<> <0A><01><16> <09><11><04><14><04><1F>,<2C>,<2C>.<2E>E<EFBFBD><19>%<25><14>&<26>&<26>%<25>e<EFBFBD>4<> <09><14>s<EFBFBD>9<EFBFBD>~<7E><01>)<29>)<29> <20>~<7E>~<7E><19><1C>a<EFBFBD><1F>8<>Y<EFBFBD>)<29>/<2F>/<2F>J^<5E>J^<5E>=^<5E>#<23><04> <20> <20>~<7E>~<7E><19>q<EFBFBD>5<EFBFBD>)9<>!<21>)<<3C><<3C><19>i<EFBFBD>o<EFBFBD>o<EFBFBD>Nb<4E>Nb<4E>Ab<41>$-<2D>L<EFBFBD>D<EFBFBD><14>J<EFBFBD>=<3D>$@<40> <09>!<21>(<28>(<28><19>3<>!<21><04><19>Q<EFBFBD><06><01> <20><1B><1F>=<3D>=<3D><19>;<3B>;<3B> <20>K<EFBFBD>K<EFBFBD><04>(:<3A>(:<3A>9<EFBFBD>(M<>N<>&<26>^<5E>^<5E>r<EFBFBD>1<> <20>K<EFBFBD>K<EFBFBD> <09><0E><0E>7<>2<> <1C> <17>#<23>i<EFBFBD>.<2E>1<EFBFBD>,<2C>,<2C> <20>I<EFBFBD>O<EFBFBD>O<EFBFBD>$8<>$8<>8<>$-<2D>L<EFBFBD>D<EFBFBD><14>J<EFBFBD>=<3D>$@<40> <09>!<21>(<28>(<28><19>3<><1D>$<19><18>#<1C>}<7D>}<7D>Y<EFBFBD>7<>7<><1C> <0B> <0B>D<EFBFBD>$6<>$6<>y<EFBFBD>$I<>J<>"<22><1E><1E>2<EFBFBD>-<2D><1C> <0B> <0B>I<EFBFBD>N<EFBFBD>N<EFBFBD>3<>.<2E>
!<21>D<EFBFBD><1B>E<EFBFBD>6<EFBFBD>*<2A>*<2A><17>4<EFBFBD><1E>E<EFBFBD>$<24>U<EFBFBD>3<>8<>8<><03>:=<3D>)<29>)<29>#D<>)<29>$<24>I<EFBFBD>I<EFBFBD>)<29><08>#D<><1D> <0C> <0C>T<EFBFBD>Y<EFBFBD>Y<EFBFBD>x<EFBFBD>%8<>9<> 9<>
<19>K<EFBFBD>K<EFBFBD><05>&<26>U/<2F>V<19><18><> $Es<00>8L c<00><><00>URU5n[U5 URURU5n[U5 UR U5n[U5 gr)r8r r<>rrb)rr3r4rG<00> json_trees r<00>main<69>DocxChapterExtractor.main<69>sU<00><00><18>+<2B>+<2B>L<EFBFBD>9<> <09> <0A>i<EFBFBD><18> <20>1<>1<>$<24>(<28>(<28>I<EFBFBD>F<><19> <0A><1F> <20><19>(<28>(<28>):<3A>;<3B> <09> <0A>i<EFBFBD>r)rN)<12>__name__<5F>
__module__<EFBFBD> __qualname__<5F>__firstlineno__rr%r)r8rQrVrbrrrprtr<>r<><00>__static_attributes__<5F>rrr r sX<00><00>,<2C> $<24>3<><19> '<19>R <1E>%?<3F>N<15>i<EFBFBD><15>h<EFBFBD><15><14>9<EFBFBD><14>8<EFBFBD><14><<19>| rr <00>__main__utest - 副本.docxu 工程需求)rr<00> docx.documentr<00>docx.text.paragraphr<00>docx.parts.imager<00>
docx.tablerr<00>docx.oxml.tabler<00>docx.oxml.text.paragraphr <00> collectionsr
<00>objectr r<>r<00> extractorr<72>r<>rr<00><module>r<>sY<00><01> <09> <0B>"<22>)<29>&<26>#<23>"<22>)<29>#<23>^<19>6<EFBFBD>^<19>@ <0C>z<EFBFBD><19>$<24>I<EFBFBD>$<24>Y<EFBFBD>/<2F>I<EFBFBD> <0A>N<EFBFBD>N<EFBFBD>><3E>"<22>r