HTML cleaning
The contents of HTML fields can be cleaned both onclient-side and theserver-side.
Client-side
Client-side HTML cleaning is done by CKEditor itself. This feature is calledAdvanced Content Filter (ACF). Each plugin and command added to or removed from CKEditor influences the allowed HTML. For example, when there is no plugin to add an image,<img> tags will be removed automatically. This filtering also applies to attributes, which can, for instance, be allowed or required.
ACF can also be controlled per editor instance via the configuration propertyextraAllowedContent. Note that since Bloomreach Experience Manager 12,extraAllowedContent must be specified inJSON object format. For example:
{ extraAllowedContent: {q: {}, cite: {classes: 'myclass'}}}More information on ACF and how to configure it can be found at theCKEditor documentation website.
Disable client-side HTML cleaning
ACF is enabled by default. To disable ACF, set the CKEditor propertyallowedContent totrue:
ckeditor.config.overlayed.json:
{ allowedContent: true}Server-side
Server-side HTML cleaning is done by anHTML-processor. The HTML-processor checks, cleans, and corrects the output of rich-text fields, as well as management of internal links and images. The configuration of the HTML-processor works on the basis of an allowlist that defines which elements are allowed and the attributes they may contain. If an attribute is not configured as allowed, it is stripped from the output (text nodes from elements are preserved).
By default, server-side HTML cleaning also removes any usage of the#"/library/concepts/document-types/html-fields/html-fields-configuration-properties.html">htmlprocessor.id. This property can either be specified in thecluster.options node of a field of a specific document-type, or globally (i.e. for all formatted and/or richtext fields). The value of this property should correspond to the name of the HTML-processor configuration node as defined in the HTML-processor module, which is located at:
/hippo:configuration/hippo:modules/htmlprocessor/hippo:moduleconfig
By default, the CMS is bootstrapped with the following HTML-processor configurations:
- formatted: contains an allowlist of elements used in Formatted fields.
- richtext: contains an allowlist of elements used in Rich Text fields and manages internal links and images.
- no-filter: contains an empty allowlist but does manage internal links and images when applied to Rich Text fields.
The configuration node of an HTML-processor is of nodetypehipposys:moduleconfig and has the following properties available:
- charset: the character set of the output. Defaults toUTF-8.
- serializer: the type of serializer to use. Valid values arepretty,compact, andsimple. Defaults tosimple.
- convertLineEndings: whether to convert CRLF to LF when storing html, and vice-versa when reading HTML. Defaults totrue.
- omitComments: whether to strip comments from the html. Defaults tofalse.
- omitJavascriptProtocol: whether javascript statements are removed from the html. Defaults totrue.
- omitDataProtocol: whether the data protocol should be removed from the html. Defaults totrue. Available since 14.3.0
- filter: whether to apply allowlist filtering. Defaults to true.
- secureTargetBlankLinks: whether external links that open in a new tab or window should be secured using attributerel="noopener noreferrer". For more information, seehttps://web.dev/external-anchors-use-rel-noopener/.
Defaults to true.
Available since14.7.0. - allowStyleElements: whether to allow<style> elements, defaulting to false because of HTML5 specification. For configurations that havefilter=true, like theformatted andrichtext ones, a subnode "style" needs to be added as well to add it in the filter's allowlist.
Available since15.5.0, to bring back behavior from before 15.3.0, caused by an upgrade of the third party library HtmlCLeaner, seeits release notes.
Allowed HTML elements are defined as childnodes and are of nodetypehipposys:moduleconfig. The name of such a node corresponds with the allowed element name. These element nodes may contain a multi-valued property calledattributes to list the HTML attributes allowed on the element.
Since Bloomreach Experience Manager 14.3.0 the following two configuration options can be specified per element.
- omitJavascriptProtocol: whether javascript statements are removed from the html. Defaults to the value of the global setting.
- omitDataProtocol: whether the data protocol should be removed from the html. Defaults to the value of the global setting.
Thepretty andcompact serializers add some whitespace characters to the HTML source in order to make it human readable. This may result in some unwanted spacing when using super or sub scripts. For this reason, the default serializer is simple.
Disable server-side HTML cleaning
Change the configuration propertyhtmlprocessor.id tono-filter.
Configuration in delivery tier
HTML cleaning is used as well in the delivery tier, notably in thePage Model API and in thedefault REST API.
Since15.5.0, there's a configuration option "htmlcleaner.allowStyleElements" in theHST properties file, which defaults tofalse. Setting it totrue will render <style> elements that are part of an HTML field in the content to the API output. See also the allowStyleElements option above in the htmlprocessor module configuration.
This will bring back behavior from before 15.3.0, caused by an upgrade of the third party library HtmlCLeaner, seeits release notes.