Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit369c05e

Browse files
committed
add docx metadata extractor tutorial
1 parente086cab commit369c05e

File tree

5 files changed

+44
-0
lines changed

5 files changed

+44
-0
lines changed

‎README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
6363
-[How to Build a Username Search Tool in Python](https://thepythoncode.com/code/social-media-username-finder-in-python). ([code](ethical-hacking/username-finder))
6464
-[How to Find Past Wi-Fi Connections on Windows in Python](https://thepythoncode.com/article/find-past-wifi-connections-on-windows-in-python). ([code](ethical-hacking/find-past-wifi-connections-on-windows))
6565
-[How to Remove Metadata from PDFs in Python](https://thepythoncode.com/article/how-to-remove-metadata-from-pdfs-in-python). ([code](ethical-hacking/pdf-metadata-remover))
66+
-[How to Extract Metadata from Docx Files in Python](https://thepythoncode.com/article/docx-metadata-extractor-in-python). ([code](ethical-hacking/docx-metadata-extractor))
6667

6768
-###[Machine Learning](https://www.thepythoncode.com/topic/machine-learning)
6869
-###[Natural Language Processing](https://www.thepythoncode.com/topic/nlp)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#[How to Extract Metadata from Docx Files in Python](https://thepythoncode.com/article/docx-metadata-extractor-in-python)
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
importdocx# Import the docx library for working with Word documents.
2+
frompprintimportpprint# Import the pprint function for pretty printing.
3+
4+
defextract_metadata(docx_file):
5+
doc=docx.Document(docx_file)# Create a Document object from the Word document file.
6+
core_properties=doc.core_properties# Get the core properties of the document.
7+
8+
metadata= {}# Initialize an empty dictionary to store metadata
9+
10+
# Extract core properties
11+
forpropindir(core_properties):# Iterate over all properties of the core_properties object.
12+
ifprop.startswith('__'):# Skip properties starting with double underscores (e.g., __elenent). Not needed
13+
continue
14+
value=getattr(core_properties,prop)# Get the value of the property.
15+
ifcallable(value):# Skip callable properties (methods).
16+
continue
17+
ifprop=='created'orprop=='modified'orprop=='last_printed':# Check for datetime properties.
18+
ifvalue:
19+
value=value.strftime('%Y-%m-%d %H:%M:%S')# Convert datetime to string format.
20+
else:
21+
value=None
22+
metadata[prop]=value# Store the property and its value in the metadata dictionary.
23+
24+
# Extract custom properties (if available).
25+
try:
26+
custom_properties=core_properties.custom_properties# Get the custom properties (if available).
27+
ifcustom_properties:# Check if custom properties exist.
28+
metadata['custom_properties']= {}# Initialize a dictionary to store custom properties.
29+
forpropincustom_properties:# Iterate over custom properties.
30+
metadata['custom_properties'][prop.name]=prop.value# Store the custom property name and value.
31+
exceptAttributeError:
32+
# Custom properties not available in this version.
33+
pass# Skip custom properties extraction if the attribute is not available.
34+
35+
returnmetadata# Return the metadata dictionary.
36+
37+
38+
39+
docx_path='test.docx'# Path to the Word document file.
40+
metadata=extract_metadata(docx_path)# Call the extract_metadata function.
41+
pprint(metadata)# Pretty print the metadata dictionary.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
python-docx
Binary file not shown.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp