``javascript``: Removes any Javascript, like an ``onclick`` attribute. Also removes stylesheets as they could contain Javascript.
``comments``: Removes any comments.
``style``: Removes any style tags.
``inline_style`` Removes any style attributes. Defaults to the value of the ``style`` option.
``links``: Removes any ``<link>`` tags
``meta``: Removes any ``<meta>`` tags
``page_structure``: Structural parts of a page: ``<head>``, ``<html>``, ``<title>``.
``processing_instructions``: Removes any processing instructions.
``embedded``: Removes any embedded objects (flash, iframes)
``frames``: Removes any frame-related tags
``forms``: Removes any form tags
``annoying_tags``: Tags that aren't *wrong*, but are annoying. ``<blink>`` and ``<marquee>``
``remove_tags``: A list of tags to remove. Only the tags will be removed, their content will get pulled up into the parent tag.
``kill_tags``: A list of tags to kill. Killing also removes the tag's content, i.e. the whole subtree, not just the tag itself.
``allow_tags``: A list of tags to include (default include all).
``remove_unknown_tags``: Remove any tags that aren't standard parts of HTML.
``safe_attrs_only``: If true, only include 'safe' attributes (specifically the list from the feedparser HTML sanitisation web site).
``safe_attrs``: A set of attribute names to override the default list of attributes considered 'safe' (when safe_attrs_only=True).
``add_nofollow``: If true, then any <a> tags will have ``rel="nofollow"`` added to them.
``host_whitelist``: A list or set of hosts that you can use for embedded content (for content like ``<object>``, ``<link rel="stylesheet">``, etc). You can also implement/override the method ``allow_embedded_url(el, url)`` or ``allow_element(el)`` to implement more complex rules for what can be embedded. Anything that passes this test will be shown, regardless of the value of (for instance) ``embedded``.
Note that this parameter might not work as intended if you do not make the links absolute before doing the cleaning.
Note that you may also need to set ``whitelist_tags``.
``whitelist_tags``: A set of tags that can be included with ``host_whitelist``. The default is ``iframe`` and ``embed``; you may wish to include other tags like ``script``, or you may want to implement ``allow_embedded_url`` for more control. Set to None to include all tags.
This modifies the document *in place*. ImportErrorIElementCommentCleaner.kill_conditional_commentsCleaner.kill_conditional_comments.<locals>.<lambda>Cleaner.clean_htmlCleaner.allow_followCleaner.allow_embedded_urlCleaner.allow_elementCleaner._remove_javascript_linkCleaner._kill_elementsCleaner._has_sneaky_javascriptCleaner.__init__Cleaner.__call__CleanerAttributeErrorASCII^127\.0\.0\.1$exactlykeys%.200s() takes %.8s %zd positional argument%.1s (%zd given) while calling a Python objectNULL result without error in PyObject_Call'NoneType' object has no attribute '%.30s'lxml.html.clean.Cleaner.__call__lxml.html.clean._link_text'%.200s' object is unsliceable�S���S���S���S���S��at leastat mostassignmentlxml.html.clean.autolinktoo many values to unpack (expected %zd)need more than %zd value%.1s to unpack'%.200s' object does not support slice %.10s����À������������������x���o���f���]���T���H���`����������������<cyfunction %U at %p>cannot import name %Sunbound method %.200S() needs an argument%.200s() takes no arguments (%zd given)%.200s() takes exactly one argument (%zd given)Bad call flags in __Pyx_CyFunction_Call. METH_OLDARGS is no longer supported!%.200s() takes no keyword argumentsdictionary changed size during iterationlxml.html.clean.word_break������������y���u���l�������������������������������l�������������lxml.html.clean.Cleaner.__init__lxml.html.clean.Cleaner.allow_embedded_urllxml.html.clean._insert_break__annotations__ must be set to a dict object__kwdefaults__ must be set to a dict object__defaults__ must be set to a tuple objectfunction's dictionary may not be deletedsetting function's dictionary to a non-dict__qualname__ must be set to a string object__name__ must be set to a string objectcalling %R should have returned an instance of BaseException, not %Rraise: exception class must be a subclass of BaseException'%.50s' object has no attribute '%U'co_argcountco_posonlyargcountco_kwonlyargcountco_nlocalsco_stacksizeco_flagsco_codeco_constsco_namesco_varnamesco_freevarsco_cellvarsco_linetablereplace�?'%.200s' object is not subscriptablecannot fit '%.200s' into an index-sized integerlxml.html.clean.Cleaner.allow_elementhas_conditional_commentfree variable '%s' referenced before assignment in enclosing scopelxml.html.clean.Cleaner.kill_conditional_comments.lambdalxml.html.clean.Cleaner._kill_elementsTC��JC��@C��6C��-C��lxml.html.clean.Cleaner._has_sneaky_javascriptlxml.html.clean.__defaults__name '%U' is not definedlxml.html.clean.autolink_htmllxml.html.clean.Cleaner.clean_htmllxml.html.clean.Cleaner._remove_javascript_linklxml.html.clean.word_break_htmllxml.html.clean._break_textlxml.html.clean._has_javascript_schemelambdalxml.html.clean.Cleaner.kill_conditional_comments%s() got multiple values for keyword argument '%U'%.200s() keywords must be strings%s() got an unexpected keyword argument '%U'Interpreter change detected - this module can only be loaded into one interpreter per process.__loader__loader__file__origin__package__parent__path__submodule_search_locationsA cleanup tool for HTML.
Removes unwanted tags and content. See the `Cleaner` class for details. lxml.html.clean.Cleaner.allow_follow;�[`�0���(P���@�v��H �w��x�x����x��0hy����~��p��7���@�������� ��� �9���v��X�����Н���� С��� �� ���� �0 ���� ��� ����� �����������HP���p�0 @��� P��� `��� ���������0���H@��x���� ���� ���� �� ��0� ��P@��p����`�������@ ���p ��� ��`P����������P��`�������X������@���pp��������4��`�7����F��P�W����X��ph��h@i����v������p������X`�����������������������(����������0 �������`���������(���p������ zRx�$X��FJw�?;*3$"D��\�B�B�B �B(�D0�A8�G�� �Z�B�F�F�B�B�A�I���i�G�L�Q�Z�E�F�F�B�B�A�I�@�F�F�F�F�F�F�F�F�F�F�F�I�[�Z�E�F�F�B�B�A�I�E�Z�E�F�F�B�B�A�I�Z�Z�E�F�F�B�B�A�I�[�i�F�F�I�[�Z�E�F�F�B�B�A�I�L�Z�E�F�F�B�B�A�I�[�i�F�K�Q�Z�E�F�F�B�B�A�I�k�Z�E�F�F�B�B�A�I�E�Z�E�F�F�B�B�A�I�S�Z�E�F�F�B�B�A�I�~�i�F�F�F�F�F�F�I�[�Z�E�F�F�B�B�A�I�V�M�M�M�M�M�F�F�F�F�F�F�F�F�P�[�Z�E�F�F�B�B�A�I�Z�Z�E�F�F�B�B�A�I�@�F�F�F�I�[�Z�E�F�F�B�B�A�I�}�Z�E�F�F�B�B�A�I�[�i�F�K�Q�Z�E�F�F�B�B�A�I�[�i�F�F�I�[�Z�E�F�F�B�B�A�I��h8A0A(B BBBl<�|��s�B�F�B �B(�A0�A8�J���G�g�F�b 8D0A(B BBBB#�H�g�A���2��f<B�F�B �B(�A0�A8�J���G�g�F�b 8D0A(B BBBB� �I�^�A�y�I�g�B�X�J�h�A���H�g�G���H�h�G���H�h�F�\t�m���&B�F�B �B(�D0�A8�J�' 8D0A(B BBBA��D�V�F�<�Г��?B�B�B �D(�D0�)(A BBBД��l,ؔ���B�B�B �D(�D0�G@t 0D(A BBBL� 0D(A BBBID 0D(A BBBQ�H����P���)A�ct�`����B�E�D �D(�GP� (A ABBAQ (F ABBD] (F ABBHh (F ABBML����*De4d�l���B�D�A �� ABEDAB������T�@����B�B�B �A(�A0�D@o 0A(A BBBMc 0D(A BBBB�����l$p���B�B�B �D(�D0�GP� 0A(A BBBN� 0A(A BBBLW 0A(A BBBAD�l��B�B�B �B(�D0�D8�G`�8A0A(B BBB$��l��KA�D�D AA\ ����UB�B�B �B(�A0�A8�J�T 8D0A(B BBBK��D�V�F�\d ����CB�B�B �E(�A0�A8�G���H�g�F�f 8D0A(B BBBM�� ���B�B�B �B(�A0�A8�J�t 8D0A(B BBBK��G�g�F���H�g�A�y�H�g�G��L 8��B�B�B �B(�A0�A8�G�z�G�g�F�_ 8D0A(B BBBEr�H�g�A�p�H�g�G�� ���� � x��� p���x���4p��� Lh���,dp���zB�A�D �� CBK�����IdN,������A�A�D T DAA,�X����A�A�D T DAA����;a�Y�,���(Q�V�L���D~ NlX����DA K�����DA K�8����Do Mo A�����pDo Mo�����pDo Mo H���/Q�]�, X���L^�Y�,L �h���B�D�D �D0s DABD| X����B�A�C �D0W AABKl AABP<� ����3B�B�D �G(�J@^ (A ABBC���%\����B�B�A �D(�GPo (A ABBK2 (A ABBHL (A ABBN|����D W Ey GD�X���g�C�A �L�A�B�l ���K �D�B�EL��g��OB�E�E �E(�D0�C8�D�%8D0A(B BBB<4�l���E�E�B �A(�A0��(D BBBDt`��B�E�E �E(�D0�D8�DP�8D0A(B BBBd�8��B�B�E �E(�D0�A8�DP� 8A0A(B BBBO� 8F0A(B BBBA4$`l���B�A�D �D0� AAB,\����q�A�J0 A�A�A� ���d�� ��B�B�E �B(�D0�D8�DPm 8A0A(B BBBBL 8F0A(B BBBEl���B�F�B �B(�A0�A8�G�� 8D0A(B BBBC{�G�g�F�\�H�g�A�l|8 ���B�B�B �A(�A0�D@� 0D(A BBBPu 0D(A BBBAk 0D(A BBBI|��"���B�F�B �B(�D0�A8�D�x�D�k�F�_ 8D0A(B BBBB[�H�g�A�Y�H�g�G�|lH1��FB�F�B �B(�A0�A8�G` 8A0A(B BBBB[hGpghF`\ 8C0A(B BBBI�hHpghA`4�B���A�I�D m AACg CAE\$�B���B�B�B �B(�A0�D8�Gp� 8D0A(B BBBL�xL�gxAp4�R���A�I�D j AAFg CAEd��R��N B�F�B �B(�A0�A8�G`0 8D0A(B BBBN{hGpghF`�hHpghA`d$�_���B�F�B �B(�A0�A8�G`�hGpghF`_ 8D0A(B BBBE�hHpghA`\�k��>B�B�B �B(�A0�D8�Gpa 8D0A(B BBBN�xL�gxAp���v��'B�B�B �B(�A0�A8�G� 8D0A(B BBBPk�G�g�F�E�H�g�A���H�g�G�Lt����aB�B�B �B(�A0�A8�D`� 8D0A(B BBBF�����5<�ؓ��� I�A�A �� ABBA ABL|x���� B�F�B �B(�A0�D8�Dp� 8A0A(B BBBK{xG�gxFp\ 8C0A(B BBBI�xH�gxApD������B�H�A �_ ABC� ABJgCB\�����B�B�A �A(�D0F (D ABBGQ (F ABBDD (A ABBEdDP����B�E�E �B(�A0�D8�H�� 8A0A(B BBBAa 8F0A(B BBBA<�x����B�I�D �m ABAc DBG\�����B�B�A �D(�G0Y (D ABBNT (G ABBPQ (F ABBALL����@B�E�D �I(�D0� (D ABBOt (D DBBP<������B�H�D �D(�D0s (D ABBN4����rA�D�G e DAJlDA`���%,x���#D����3dN<\�d��xB�E�E �G(�A0�Z(C BBB<�'e��`B�B�D �A(�D0I(D ABB$�8���gA�K DK E����k\�z�$�e��d<�����B�F�D �A(�D@f (A ABBC{HGPgHF@\ (C ABBM�HHPgHA@���Tu�0q �RȬ#Ь#���o��0 _ �# g��J ���oh���o�o0���o��#fqvq�q�q�q�q�q�q�q�qrr&r6rFrVrfrvr�r�r�r�r�r�r�r�rss&s6sFsVsfsvs�s�s�s�s�s�s�s�stt&t6tFtVtftvt�t�t�t�t�t�t�t�tuu&u6uFuVufuvu�u�u�u�u�u�u�u�uvv&v6vFvVvfvvv�v�v�v�v�v�v�v�vww&w6wFwVwfwvw�w�w�w�w�w�w�w�wxx��# Breaks any long words found in the body of the text (not attributes).
Doesn't effect any of the tags in avoid_elements, by default ``<textarea>`` and ``<pre>``
Breaks words by inserting ​, which is a unicode character for Zero Width Space character. This generally takes up no space in rendering, but does copy as a space, and in monospace contexts usually takes up space.
See http://www.cs.tut.fi/~jkorpela/html/nobr.html for a discussion
Turn any URLs into links.
It will search for links identified by the given regular expressions (by default mailto and http(s) links).
It won't link text in an element in avoid_elements, or an element with a class in avoid_classes. It won't link to anything with a host that matches one of the regular expressions in avoid_hosts (default localhost and 127.0.0.1).
If you pass in an element, the element's tail will not be substituted, only the contents of the element.
Depending on the browser, stuff like ``e x p r e s s i o n(...)`` can get interpreted, or ``expre/* stuff */ssion(...)``. This checks for attempt to do stuff like this.
Typically the response will be to kill the entire style; if you have just a bit of Javascript in the style another rule will catch that and remove only the Javascript from the style; this catches more sneaky attempts.
IE conditional comments basically embed HTML that the parser doesn't normally see. We can't allow anything like that, so we'll kill any comments that could be conditional.
Decide whether a URL that was found in an element's attributes or text if configured to be accepted or rejected.
:param el: an element. :param url: a URL found on the element. :return: true to accept the URL and false to reject it.
Decide whether an element is configured to be accepted or rejected.
:param el: an element. :return: true to accept the element or false to reject/discard it.
Override to suppress rel="nofollow" on some anchors.