@@ -169,41 +169,3 @@ the following way:
169
169
* If all else fails, the default encoding will be used. This is usually
170
170
`Windows-1252 <http://en.wikipedia.org/wiki/Windows-1252 >`_, which is
171
171
a common fallback used by Web browsers.
172
-
173
-
174
- Tokenizers
175
- ----------
176
-
177
- The part of the parser responsible for translating a raw input stream
178
- into meaningful tokens is the tokenizer. Currently html5lib provides
179
- two.
180
-
181
- To set up a tokenizer, simply pass it when instantiating
182
- a:class: `~html5lib.html5parser.HTMLParser `:
183
-
184
- ..code-block ::python
185
-
186
- import html5lib
187
- from html5libimport sanitizer
188
-
189
- p= html5lib.HTMLParser(tokenizer = sanitizer.HTMLSanitizer)
190
- p.parse(" <p>Surprise!<script>alert('Boo!');</script>" )
191
-
192
- HTMLTokenizer
193
- ~~~~~~~~~~~~~
194
-
195
- This is the default tokenizer, the heart of html5lib. The implementation
196
- can be found in `html5lib/tokenizer.py
197
- <https://github.com/html5lib/html5lib-python/blob/master/html5lib/tokenizer.py> `_.
198
-
199
- HTMLSanitizer
200
- ~~~~~~~~~~~~~
201
-
202
- This is a tokenizer that removes unsafe markup and CSS styles from the
203
- input. Elements that are known to be safe are passed through and the
204
- rest is converted to visible text. The default configuration of the
205
- sanitizer follows the `WHATWG Sanitization Rules
206
- <http://wiki.whatwg.org/wiki/Sanitization_rules> `_.
207
-
208
- The implementation can be found in `html5lib/sanitizer.py
209
- <https://github.com/html5lib/html5lib-python/blob/master/html5lib/sanitizer.py> `_.