Documentation

HTMLCleaner helper

Table of Contents 

Presentation

The main function of this helper is to clean up an HTML stream from a WYSIWYG editor, to ensure that it contains no forbidden code. In this respect, it is an overlay to the HTMLPurifier library.

It also offers the possibility of converting plain text into an HTML stream.

Installation

In order to work properly, the HTMLCleaner object needs the HTMLPurifier library. There are three solutions to install it.

System installation

The recommended method is to use your operating system's packages. For example, on Ubuntu, HTMLPurifier coud be installed easily with this command:

sudo apt install php-htmlpurifier
Composer installation

It is also possible to use the Composer dependency manager (see the dedicated documentation). To do so, create a composer.json file with this content:

{
    "require": {
        "htmlpurifier/htmlpurifier": "4.*"
    }
}

Then execute the command composer update

Next, you will have to add the path to the installed files in the list of inclusion paths (see the configuration documentation).

In the temma.json file, add the following lines:

{
    "includePaths": [
        "/path/to/project/vendors/htmlpurifiers/src"
    ]
}
Manual installation

You also have to possibility to download the files manually, and to copy them in the lib/ directory of you project.

Warning, if you put the files in a sub-directory (named "htmlpurifier" for example), you will have to declare this sub-directory in the list of inclusion paths (see the configuration documentation). In this case, add these lines in your temma.json file:

{
    "includePaths": [
        "/path/to/project/lib/htmlpurifier"
    ]
}

clean()

Static function that receives an HTML stream and returns the same stream after cleaning it up. Unauthorized HTML tags are removed. Unauthorized tag attributes are also removed. Multiple carriage returns are transformed into paragraphs.

Method signature:

\Temma\Utils\HTMLCleaner::clean(string $html, ?bool $targetBlank=null,
                                ?bool $nofollow=null, bool $removeNbsp=true) : string

Parameters:

  • $html: HTML stream to be cleaned up..
  • $targetBlank: Indicates whether to add a target="_blank" attribute to links.
    • true to add the attribute to all links.
    • false to never add the attribute.
    • null to add the attribute to external links (starting with http:// or https://).
  • $nofollow: Indicates whether to add a rel="nofollow" attribute to links.
    • true to add the attribute to all links.
    • false to never add the attribute.
    • null to add the attribute to external links (starting with http:// or https://).
  • $removeNbsp: Indicates whether to remove non-breaking spaces.

Return value: HTML stream cleaned up.

Example:

use \Temma\Utils\HTMLCleaner as TµHTMLCleaner;

$input = <<< EOT
<h1>Title
<p onclick="alert('XSS');">Paragraph<br>
<br>
<script>alert('XSS');</script>
EOT;
$output = TµHTMLCleaner::clean($input);
/*
<h1>Title</h1>
<p>Paragraph</p>
*/

text2html()

Static function that takes plain text and returns an HTML stream. Returns and paragraphs (blocks of text separated by an empty line) are handled, as are links.

Method signature:

\Temma\Utils\HTMLCleaner::text2html(string $text, bool $urlProcess=true,
                                    bool $nofollow=true) : string

Parameters:

  • $text: Text to be processed.
  • $urlProcess: true to process URLs. Links to external sites (starting with http:// or https://) open in a new tab (target="_blank").
  • $nofollow: true to set URLs to nofollow.

Return value: Generated HTML streal.

Example:

use \Temma\Utils\HTMLCleaner as TµHTMLCleaner;

$text = "First paragraph,
on two lines.

Site: https://www.temma.net";
$html = TµHTMLCleaner::text2html($text);
/*
<p>First paragraph,<br />
on two lines.</p>
<p>Site: <a target="_blank" rel="nofollow"
href="https://www.temma.net">https://www.temma.net</a></p>
*/
Previous: Email helper
Next: IniExport helper

Table of Contents