Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9934bff

Browse files
committed
Add a chapter on compliance to the docs
Details how to filter warnings or convert them into exceptions anddetails the class hierarchy of warnings and errors defined bycompoundfiles.
1 parenta44bb44 commit9934bff

File tree

12 files changed

+550
-38
lines changed

12 files changed

+550
-38
lines changed

‎README.rst

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,18 @@ compoundfiles
44

55
|pypi| |rtd| |travis|
66

7-
This package provides a library for reading Microsoft's `OLE Compound
8-
Document`_ format, which also forms the basis of the `Advanced Authoring
9-
Format`_ (AAF) published by Microsoft Corporation. It is compatible with
10-
Python 2.7 (or above) and Python 3.2 (or above).
11-
12-
The code is pure Python and should run on any platform. The library has an
13-
emphasis on rigour and performs numerous validity checks on opened files. By
14-
default, the library merely warnings when it comes across non-fatal errors in
15-
source files but this behaviour is configurable by developers through Python's
16-
``warnings`` mechanisms.
7+
This package provides a library for reading Microsoft's `Compound File Binary`_
8+
format (CFB), formerly known as `OLE Compound Documents`_, the `Advanced
9+
Authoring Format`_ (AAF), or just plain old Microsoft Office files (the non-XML
10+
sort). This format is also widely used with certain media systems and a number
11+
of scientific applications (tomography and microscopy).
12+
13+
The code is pure Python and should run on any platform; it is compatible with
14+
Python 2.7 (or above) and Python 3.2 (or above). The library has an emphasis
15+
on rigour and performs numerous validity checks on opened files. By default,
16+
the library merely warns when it comes across non-fatal errors in source files
17+
but this behaviour is configurable by developers through Python's ``warnings``
18+
mechanisms.
1719

1820
Links
1921
=====
@@ -28,7 +30,8 @@ Links
2830
.. _documentation:http://compound-files.readthedocs.org/
2931
.. _source code:https://github.com/waveform80/compoundfiles
3032
.. _bug tracker:https://github.com/waveform80/compoundfiles/issues
31-
.. _OLE Compound Document:http://www.openoffice.org/sc/compdocfileformat.pdf
33+
.. _Compound File Binary:http://msdn.microsoft.com/en-gb/library/dd942138.aspx
34+
.. _OLE Compound Documents:http://www.openoffice.org/sc/compdocfileformat.pdf
3235
.. _Advanced Authoring Format:http://www.amwa.tv/downloads/specifications/aafcontainerspec-v1.0.1.pdf
3336
.. _MIT license:http://opensource.org/licenses/MIT
3437
.. _build status:https://travis-ci.org/waveform80/compoundfiles

‎compoundfiles/reader.py

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -77,25 +77,24 @@
7777
)
7878

7979

80-
# A quick personal rant: the AAF or OLE Compound Document format is yet another
81-
# example of bad implementations of a bad specification (thanks Microsoft! See
82-
# the W3C log file format for previous examples of MS' incompetence in this
83-
# area)...
80+
# Good grief! Since my last in-source rant it appears someone in MS actually
81+
# figured out how to write a decent spec! Unfortunately it appears someone in
82+
# the marketing department also thought that yet another name change was in
83+
# order so the Advanced Authoring Format (formerly known as OLE Compound
84+
# Documents) is now known as the Compound File Binary File Format.
8485
#
85-
# The specification doesn't try and keep the design simple (the DIFAT could be
86-
# fully in the header or partially in the header, and the header itself doesn't
87-
# necessarily match the sector size), whoever wrote the spec didn't quite
88-
# understand what version numbers are used for (several versions exist, but the
89-
# spec doesn't specify exactly which bits of the header became relevant in
90-
# which versions), and the spec has huge amounts of redundancy (always fun as
91-
# it inevitably leads to implementations getting one bit right and another bit
92-
# wrong, leaving readers to guess which is correct).
86+
# Anyway, silly name changes aside, the point is that someone's actually
87+
# written a decent spec this time rather than the half-assed AAF spec which
88+
# read like adhoc notes on a reference implementation. The URL is (currently)
9389
#
94-
# TL;DR: if you're looking for a nice fast binary format with good random
95-
# access characteristics this may look attractive, but please don't use it.
96-
# Ideally, loop-mounting a proper file-system would be the way to go, although
97-
# it generally involves jumping through several hoops due to mount being a
98-
# privileged operation.</rant>
90+
# http://msdn.microsoft.com/en-gb/library/dd942138.aspx
91+
#
92+
# But given how MSDN changes its URLs you might just be better off Googling for
93+
# "MS CFB" which'll find it (assuming they haven't changed the name again for
94+
# kicks). The file format is still a pile of steaming underwear in places
95+
# (unicode names with byte-length fields...) but as long as the spec is clear
96+
# and well written I can forgive that (after all, it's hard to change something
97+
# as established as this).
9998
#
10099
# In the interests of trying to keep naming vaguely consistent and sensible
101100
# here's a translation list with the names we'll be using first and the names

‎docs/Makefile

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ SPHINXOPTS =
66
SPHINXBUILD = sphinx-build
77
PAPER =
88
BUILDDIR = _build
9+
DOT_DIAGRAMS =$(wildcard*.dot)
10+
MSC_DIAGRAMS =$(wildcard*.mscgen)
11+
SVG_IMAGES =$(wildcard*.svg)$(DOT_DIAGRAMS:%.dot=%.svg)$(MSC_DIAGRAMS:%.mscgen=%.svg)
12+
PDF_IMAGES =$(SVG_IMAGES:%.svg=%.pdf)
913

1014
# Internal variables.
1115
PAPEROPT_a4 = -D latex_paper_size=a4
@@ -41,17 +45,17 @@ help:
4145
clean:
4246
-rm -rf$(BUILDDIR)/*
4347

44-
html:
48+
html:$(SVG_IMAGES)
4549
$(SPHINXBUILD) -b html$(ALLSPHINXOPTS)$(BUILDDIR)/html
4650
@echo
4751
@echo"Build finished. The HTML pages are in$(BUILDDIR)/html."
4852

49-
dirhtml:
53+
dirhtml:$(SVG_IMAGES)
5054
$(SPHINXBUILD) -b dirhtml$(ALLSPHINXOPTS)$(BUILDDIR)/dirhtml
5155
@echo
5256
@echo"Build finished. The HTML pages are in$(BUILDDIR)/dirhtml."
5357

54-
singlehtml:
58+
singlehtml:$(SVG_IMAGES)
5559
$(SPHINXBUILD) -b singlehtml$(ALLSPHINXOPTS)$(BUILDDIR)/singlehtml
5660
@echo
5761
@echo"Build finished. The HTML page is in$(BUILDDIR)/singlehtml."
@@ -95,14 +99,14 @@ epub:
9599
@echo
96100
@echo"Build finished. The epub file is in$(BUILDDIR)/epub."
97101

98-
latex:
102+
latex:$(PDF_IMAGES)
99103
$(SPHINXBUILD) -b latex$(ALLSPHINXOPTS)$(BUILDDIR)/latex
100104
@echo
101105
@echo"Build finished; the LaTeX files are in$(BUILDDIR)/latex."
102106
@echo"Run\`make' in that directory to run these through (pdf)latex"\
103107
"(use\`make latexpdf' here to do that automatically)."
104108

105-
latexpdf:
109+
latexpdf:$(PDF_IMAGES)
106110
$(SPHINXBUILD) -b latex$(ALLSPHINXOPTS)$(BUILDDIR)/latex
107111
@echo"Running LaTeX files through pdflatex..."
108112
$(MAKE) -C$(BUILDDIR)/latex all-pdf
@@ -151,3 +155,13 @@ doctest:
151155
$(SPHINXBUILD) -b doctest$(ALLSPHINXOPTS)$(BUILDDIR)/doctest
152156
@echo"Testing of doctests in the sources finished, look at the"\
153157
"results in$(BUILDDIR)/doctest/output.txt."
158+
159+
%.svg:%.msc
160+
mscgen -T svg -o$@$<
161+
162+
%.svg:%.dot
163+
dot -T svg -o$@$<
164+
165+
%.pdf:%.svg
166+
inkscape -A$@$<
167+

‎docs/compliance.rst

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
.. _compliance:
2+
3+
=====================
4+
Compliance mechanisms
5+
=====================
6+
7+
As noted in the `CFB`_ specification, the compound document format presents a
8+
number of validation challenges. For example, maliciously constructed files
9+
might include circular references in their FAT table, leading a naive reader
10+
into an infinite loop, or they may allocate a large number of DIFAT sectors
11+
hoping to cause resource exhaustion when the reader goes to allocate memory for
12+
reading the FAT.
13+
14+
The compoundfiles library goes to some lengths to detect erroneous structures
15+
(whether malicious in intent or otherwise) and work around them where possible.
16+
Some issues are considered fatal and will always raise an exception (circular
17+
chains in the FAT are an example of this). Other issues are considered
18+
non-fatal and will raise a warning (unusual sector sizes are an example of
19+
this). Python:mod:`warnings` are a special sort of exception with particularly
20+
flexible handling.
21+
22+
With Python's defaults, a specific warning will print a message to the console
23+
the first time it is encountered and will then do nothing if it's encountered
24+
again (this avoids spamming the console in case a warning is raised in a tight
25+
loop). With some simple code, you can specify alternative behaviours: warnings
26+
can be raised as full-blown exceptions, or suppressed entirely. The
27+
compoundfiles library defines a large hierarchy of errors and warnings to
28+
enable developers to finetune their handling.
29+
30+
For example, consider a developer writing an application for working with
31+
computed tomography (CT) scans. The files produced by the scanner's software
32+
are compound documents, but they use an unusual sector size. Whenever the
33+
developer's Python script opens a file the following warning is emitted::
34+
35+
/usr/lib/pyshared/python2.7/compoundfiles/compoundfiles/reader.py:275: CompoundFileSectorSizeWarning: unexpected sector size in v3 file (1024)
36+
37+
Other than this, the script runs successfully. The developer decides the
38+
warning is unimportant (after all there's nothing he can do about it given he
39+
can't change the scanner's software) and wishes to suppress it entirely, so he
40+
adds the following line to the top of his script::
41+
42+
import warnings
43+
import compoundfiles as cf
44+
45+
warnings.filterwarnings('ignore', category=cf.CompoundFileSectorSizeWarning)
46+
47+
Another developer is working on a file validation service. She wishes to use
48+
the compoundfiles library to extract and examine the contents of such files.
49+
For safety, she decides to treat any violation of the specification as an
50+
error, so she adds the following line to the top of her script to tell Python
51+
to convert all compound file warnings into exceptions::
52+
53+
import warnings
54+
import compoundfiles as cf
55+
56+
warnings.filterwarnings('error', category=cf.CompoundFileWarning)
57+
58+
The class hierarchies for compoundfiles warnings and errors is illustrated
59+
below:
60+
61+
..image::warnings.*
62+
:align:center
63+
64+
..image::errors.*
65+
:align:center
66+
67+
To set filters on all warnings in the hierarchy, simply use the category
68+
:exc:`~compoundfiles.CompoundFileWarning`. Otherwise, you can use intermediate
69+
or leaf classes in the hierarchy for more specific filters. Likewise, when
70+
catching exceptions you can target the root of the hierarchy
71+
(:exc:`~compoundfiles.CompoundFileError`) to catch any error that the
72+
compoundfiles library might raise, or a more specific class to deal with a
73+
particular error.
74+
75+
.. _CFB:http://msdn.microsoft.com/en-gb/library/dd942138.aspx

‎docs/errors.dot

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
digraphG {
2+
graph [rankdir="LR"];
3+
4+
node [shape=rect,style=filled,color="#000000",fillcolor="#99aadd",fontname=Arial,fontsize=12.0];
5+
CompoundFileError->IOError;
6+
CompoundFileHeaderError->CompoundFileError;
7+
CompoundFileMasterFatError->CompoundFileError;
8+
CompoundFileNormalFatError->CompoundFileError;
9+
CompoundFileMiniFatError->CompoundFileError;
10+
CompoundFileDirEntryError->CompoundFileError;
11+
CompoundFileInvalidMagicError->CompoundFileHeaderError;
12+
CompoundFileInvalidBomError->CompoundFileHeaderError;
13+
CompoundFileLargeNormalFatError->CompoundFileNormalFatError;
14+
CompoundFileNormalLoopError->CompoundFileNormalFatError;
15+
CompoundFileLargeMiniFatError->CompoundFileMiniFatError;
16+
CompoundFileNoMiniFatError->CompoundFileMiniFatError;
17+
CompoundFileMasterLoopError->CompoundFileMasterFatError;
18+
CompoundFileDirLoopError->CompoundFileDirEntryError;
19+
CompoundFileNotFoundError->CompoundFileError;
20+
CompoundFileNotStreamError->CompoundFileError;
21+
}
22+

‎docs/errors.pdf

14.7 KB
Binary file not shown.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp