HTMLCleaner helper


1Presentation

The main function of this helper is to clean up an HTML stream from a WYSIWYG editor, to ensure that it contains no forbidden code. In this respect, it is an overlay to the HTMLPurifier library.

It also offers the possibility of converting plain text into an HTML stream.


2Installation

In order to work properly, the HTMLCleaner object needs the HTMLPurifier library. There are three solutions to install it.


2.1System installation

The recommended method is to use your operating system's packages. For example, on Ubuntu, HTMLPurifier coud be installed easily with this command:

$ sudo apt install php-htmlpurifier

2.2Composer installation

It is also possible to use the Composer dependency manager (see the dedicated documentation). To do so, execute the following command from the projet root:

$ composer require ezyang/htmlpurifier

2.3Manual installation

You also have to possibility to download the files manually, and to copy them in the lib/ directory of you project.

Warning, if you put the files in a sub-directory (named "htmlpurifier" for example), you will have to declare this sub-directory in the list of inclusion paths (see the configuration documentation). In this case, add these lines in your etc/temma.php file:

[
    'includePaths' => [
        '/path/to/project/lib/htmlpurifier'
    ]
]

3clean()

Static function that receives an HTML stream and returns the same stream after cleaning it up. Unauthorized HTML tags are removed. Unauthorized tag attributes are also removed. Multiple carriage returns are transformed into paragraphs.

Method signature:

\Temma\Utils\HTMLCleaner::clean(string $html, ?bool $targetBlank=null, ?bool $nofollow=null, bool $removeNbsp=true) : string

Parameters:

  • $html: HTML stream to be cleaned up..
  • $targetBlank: Indicates whether to add a target="_blank" attribute to links.
    • true to add the attribute to all links.
    • false to never add the attribute.
    • null to add the attribute to external links (starting with http:// or https://).
  • $nofollow: Indicates whether to add a rel="nofollow" attribute to links.
    • true to add the attribute to all links.
    • false to never add the attribute.
    • null to add the attribute to external links (starting with http:// or https://).
  • $removeNbsp: Indicates whether to remove non-breaking spaces.

Return value: HTML stream cleaned up.

Example:

use \Temma\Utils\HTMLCleaner as TµHTMLCleaner;

$input = <<< EOT
<h1>Title
<p onclick="alert('XSS');">Paragraph<br>
<br>
<script>alert('XSS');</script>
EOT;
$output = TµHTMLCleaner::clean($input);
/*
<h1>Title</h1>
<p>Paragraph</p>
*/

4text2html()

Static function that takes plain text and returns an HTML stream. Returns and paragraphs (blocks of text separated by an empty line) are handled, as are links.

Method signature:

\Temma\Utils\HTMLCleaner::text2html(string $text, bool $urlProcess=true, bool $nofollow=true) : string

Parameters:

  • $text: Text to be processed.
  • $urlProcess: true to process URLs. Links to external sites (starting with http:// or https://) open in a new tab (target="_blank").
  • $nofollow: true to set URLs to nofollow.

Return value: Generated HTML streal.

Example:

use \Temma\Utils\HTMLCleaner as TµHTMLCleaner;

$text = "First paragraph,
on two lines.

Site: https://www.temma.net";
$html = TµHTMLCleaner::text2html($text);
/*
<p>First paragraph,<br />
on two lines.</p>
<p>Site: <a target="_blank" rel="nofollow"
href="https://www.temma.net">https://www.temma.net</a></p>
*/