Working with .docx Files in Python
The python-docx library provides the primary interface for programmatically creating, reading, and modifying Microsoft Word .docx files within Python. This package abstracts the complexities of the Office Open XML format, presenting a straightforward object model for document manipulation.
Library Installation
Install the library using pip:
pip install python-docx
Opening and Parsing Existing Documents
Use the Document class to load an existing .docx file for inspection or editing:
from docx import Document
report = Document('quarterly_report.docx')
# Access and print all paragraph text
for section in report.paragraphs:
print(section.text)
# Iterate through document tables
for tbl in report.tables:
for row in tbl.rows:
for column in row.cells:
print(column.text)
Generating New Documents
Create a new Document object to build a file from scratch:
from docx import Document
new_doc = Document()
# Add a main title
new_doc.add_heading('Project Overview', level=0)
# Insert a text paragraph
desc_para = new_doc.add_paragraph('This document outlines the project specifications.')
# Create a table with a header row
summary_table = new_doc.add_table(rows=2, cols=3)
header_row = summary_table.rows[0].cells
header_row[0].text = 'Task'
header_row[1].text = 'Owner'
header_row[2].text = 'Deadline'
# Save the created document
new_doc.save('project_brief.docx')
Editing Document Content
Modify elements within a loaded Document object and save the changes:
# Load the document to be edited
original_doc = Document('draft_proposal.docx')
# Update the text of the initial paragraph
original_doc.paragraphs[0].text = 'Revised introductory text.'
# Append a new row to the first table
first_table = original_doc.tables[0]
new_row = first_table.add_row().cells
new_row[0].text = 'Additional Item'
new_row[1].text = 'Jane Doe'
new_row[2].text = 'Q4'
# Save the modified version
original_doc.save('final_proposal.docx')
Managing Document Styles
Apply and customize paragraph and character styles to control formatting:
from docx.shared import Pt
from docx.enum.style import WD_STYLE_TYPE
presentation = Document()
# Access the document's style collection
doc_styles = presentation.styles
# Define a custom paragraph style
custom_style = doc_styles.add_style('HighlightText', WD_STYLE_TYPE.PARAGRAPH)
custom_style.font.name = 'Calibri'
custom_style.font.size = Pt(14)
custom_style.font.color.rgb = RGBColor(0xFF, 0x00, 0x00) # Red
# Apply the custom style to a new paragraph
styled_para = presentation.add_paragraph('Important Notice', style='HighlightText')
Inserting Images
Add images to a document by embedding them within a paragraph run:
from docx.shared import Inches
manual = Document()
manual.add_heading('Figure 1: Setup Diagram', level=2)
# Add a picture from a file, specifying its width
img_paragraph = manual.add_paragraph()
img_run = img_paragraph.add_run()
img_run.add_picture('network_diagram.png', width=Inches(5.5))
Important Considerations
- File system permissions are requirde to read from or write to document paths.
- The
python-docxlibrary focuses on the core Word Open XML specification and may not support advanced features like macros, ActiveX controls, or version-specific formatting. - For complex automation involving other Office applications (Excel, PowerPoint), dedicated libraries such as
openpyxlorpython-pptxare more appropriate. - Merging documents or preserving complex formatting across operations may require custom logic, as styles and document settings are not always perfectly transferred.