@@ -169,41 +169,3 @@ the following way:
169169* If all else fails, the default encoding will be used. This is usually
170170 `Windows-1252 <http://en.wikipedia.org/wiki/Windows-1252 >`_, which is
171171 a common fallback used by Web browsers.
172-
173-
174- Tokenizers
175- ----------
176-
177- The part of the parser responsible for translating a raw input stream
178- into meaningful tokens is the tokenizer. Currently html5lib provides
179- two.
180-
181- To set up a tokenizer, simply pass it when instantiating
182- a:class: `~html5lib.html5parser.HTMLParser `:
183-
184- ..code-block ::python
185-
186- import html5lib
187- from html5libimport sanitizer
188-
189- p= html5lib.HTMLParser(tokenizer = sanitizer.HTMLSanitizer)
190- p.parse(" <p>Surprise!<script>alert('Boo!');</script>" )
191-
192- HTMLTokenizer
193- ~~~~~~~~~~~~~
194-
195- This is the default tokenizer, the heart of html5lib. The implementation
196- can be found in `html5lib/tokenizer.py
197- <https://github.com/html5lib/html5lib-python/blob/master/html5lib/tokenizer.py> `_.
198-
199- HTMLSanitizer
200- ~~~~~~~~~~~~~
201-
202- This is a tokenizer that removes unsafe markup and CSS styles from the
203- input. Elements that are known to be safe are passed through and the
204- rest is converted to visible text. The default configuration of the
205- sanitizer follows the `WHATWG Sanitization Rules
206- <http://wiki.whatwg.org/wiki/Sanitization_rules> `_.
207-
208- The implementation can be found in `html5lib/sanitizer.py
209- <https://github.com/html5lib/html5lib-python/blob/master/html5lib/sanitizer.py> `_.