docx - How to preserve text styles (bolditalic) and extract footnotes from a Word document using Python? - Stack Overflow

I’m working on a Python script to extract content from a Word document (.docx) and insert it into a SQL

I’m working on a Python script to extract content from a Word document (.docx) and insert it into a SQL Server database. The challenge is that I need to preserve text styles like bold and italic, as well as handle line breaks and footnotes from the Word document.

Currently, I'm using the python-docx library to process the document. Line breaks have been successfully transferred using <br\>, but text styles (bold/italic) and footnotes are not being included in the output.

Here's what I’ve attempted so far:

1. For text styles:

I tried looping through paragraph.runs to detect run.bold and run.italic. However, the styled text doesn’t appear in my database output.

2. For footnotes:

I tried extracting footnotes using a custom function with doc.footnotes or checking for the style Footnote Text. While the function doesn’t raise errors, footnotes don’t appear in the final output.

Here’s the snippet of my code for processing styles and footnotes:

text_with_style = []
if paragraph.runs:
    for run in paragraph.runs:
        styled_text = run.text.strip()
        if run.bold:
            styled_text = f"<b>{styled_text}</b>"
        if run.italic:
            styled_text = f"<i>{styled_text}</i>"
        text_with_style.append(styled_text)

formatted_text = " ".join(text_with_style).replace("\n", "<br>")

For footnotes:


def extract_footnotes(doc):

    footnotes_text = []
    
    if hasattr(doc, 'footnotes'):
    
        for footnote in doc.footnotes:
    
            footnotes_text.append(footnote.text.strip())
    
    return footnotes_text

What am I missing? How can I reliably preserve bold/italic styles and extract footnotes so they’re included in the output that gets inserted into SQL Server? Any advice or working examples would be greatly appreciated.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1743746741a4500143.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信