DataFilter helper


1Presentation

Helper used to filter or validate data in order to check that it complies with a contract. This can be useful to ensure that incoming data in an API matches what is expected.


2Usage

The \Temma\Utils\DataFilter object provides a static method process(). This method takes as parameters the data to filter and the contract to apply, and returns the filtered data. If the contract syntax is incorrect, the method throws a \Temma\Exceptions\IO exception. If the data does not satisfy the contract, the method throws a \Temma\Exceptions\Application exception.

An optional third parameter makes it possible to specify whether validation should be performed in strict mode or not. By default, validation is non-strict.

Usage example:

use \Temma\Utils\DataFilter as TµDataFilter;
use \Temma\Exceptions\IO as TµIOException;
use \Temma\Exceptions\Application as TµApplicationException;
use \Temma\Base\Log as TµLog;

$contract = 'enum; values: admin, member, guest';
/* equivalent to the previous line:
$contract = [
    'type'   => 'enum',
    'values' => ['admin', 'member', 'guest'],
];
*/
try {
    $data = TµDataFilter::process($data, $contract);
} catch (TµIOException $ie) {
    TµLog::log('myapp', 'WARN', "Invalid contract.");
    throw $ie;
} catch (TµApplicationException $ae) {
    TµLog::log('myapp', 'WARN', "Invalid data.");
    throw $ae;
}

// strict-mode execution
try {
    $data = TµDataFilter::process($data, $contract, true);
} catch (\Exception $e) {
    TµLog::log('myapp', 'WARN', "Validation error.");
    throw $e;
}

3Contract definition

3.1Type and parameter

Contracts are used to define the type of data to filter or validate. They can be written as a string or as an associative array.

A contract contains at least the type that the data must respect.
Examples: 'int', 'string'

A contract may also take additional parameters.
Parameters can be written in two different ways:

  • They can be added to the contract configuration string, following the type definition. Parameters are separated by semicolons (;), and parameter names are separated from their values by a colon (:).

    Example: 'int; default: 3'

  • Contracts can also be written as an associative array. The type and parameters are represented as key/value pairs in the array. This format is required when defining sub-contracts.

    Example: ['type' => 'int', 'default' => 3]

Finally, when a contract expects an associative array (type 'assoc') but the 'type' key is missing, the type 'assoc' is automatically inferred, and the array containing the contract is considered as the value of the 'keys' parameter.

Example: ['name' => 'string']
is equivalent to: ['type' => 'assoc', 'keys' => ['name' => 'string']]


3.2Nullable types

All types are nullable by prefixing a question mark: '?bool', '?false', '?true'


3.3Multiple types

It is possible to use multiple types. For example: 'null|int|string', 'int|float'


3.4Automatic validation (pass-through)

If the provided contract is the value null (and not the string "null"), the input data is returned as-is.


3.5Strict mode

By default, the DataFilter object operates in non-strict mode: if necessary, it performs type casting (for example, converting a string containing digits into an integer); for some parameters, it behaves permissively (for example, if a number exceeds the defined maximum, it returns the maximum instead of throwing an error; if a string exceeds the defined maximum length, it is truncated); undefined keys in an associative array are removed.

To execute filtering in strict mode, you must set the strict parameter to true (see how to use the DataFilter object above).

A contract may also force strict or non-strict evaluation:

  • To enforce strict evaluation, prefix the type with an equal sign (=).
    Examples: '=int', '=string'
  • To enforce non-strict evaluation, prefix the type with a tilde (~).
    Examples: '~int', '~string'

3.6Default value

Using the default parameter, it is possible to provide a fallback value, which will be used if the input data is invalid. The default value must be of the same type as the type being validated.

Examples:

'bool; default: false'

[
    'type'    => 'list',
    'default' => [1, 2, 3],
]

3.7Other parameters

  • min: Minimum value (for a number, date, time, or geographic coordinate).
  • max: Maximum value (for a number, date, time, or geographic coordinate).
  • minLen: Minimum length (number of characters for a string or URL, number of elements for a list).
  • maxLen: Maximum length (number of characters for a string or URL, number of elements for a list).
  • mask: Regular expression (for a string, email address, or URL).
  • format: Input and output format for a date/time.
  • inFormat: Input format for a date/time.
  • outFormat: Output format for a date/time.
  • values: Accepted values for an enumeration.
  • contract: Validation contract for the content of a list.
  • keys: List of expected keys in an associative array.
  • mime: List of authorized MIME types (in a binary or base64-encoded content).
  • charset: Character set. Can be a simple string ('utf-8') or a list ('utf-8, iso-8859-1'). In the case of a list, the first character set is the conversion target (in non-strict mode), the following ones are accepted alternative character sets.
Parameters that specify a size (minLen and maxLen) can use measurement units (case insensitive): K (kilo), M (mega), G (giga), T (tera), P (peta), E (exa).
Example: 'maxLen: 10M' for 10 megabytes (i.e. 10,485,760).

4Scalar types

4.1null

  • The input data must be null.
  • No parameters accepted.

Examples:

'null'
['type' => 'null']

4.2false, true, bool

false
  • Strict mode: the data must be equal to false.
  • Non-strict mode: the data must be equal to false or any value that PHP automatically converts to false (null, 0, empty string, empty array).
  • Accepted parameter: default

Examples:

// validates a false value
'false'
['type' => 'false']

// returns false even if the data had another value
'=false; default: false'
true
  • Strict mode: the data must be equal to true.
  • Non-strict mode: the data must be equal to true or any value that PHP automatically converts to true (non-zero number, non-empty string, non-empty array, object).
  • Accepted parameter: default

Examples:

// validates a true value
'true'
['type' => 'true']

// validates a true value, always in non-strict mode
'~true'
bool
  • Strict mode: the data must be equal to true or false.
  • Non-strict mode: the data is typecast to a boolean.
  • Accepted parameter: default

Examples:

// validates a boolean
'bool'
['type' => 'bool']

// boolean with a default false value
'bool; default: false'
[
    'type'    => 'bool',
    'default' => true,
]

4.3int, float

int
  • Strict mode: the data must be an integer.
  • Non-strict mode: the data may be a boolean (converted to 0 or 1), an integer, a float (converted to integer), or a string containing digits (converted to integer).
  • Accepted parameters: default, min, max

Examples:

// validates an integer
'int'

// validates an integer in non-strict mode, with a default value
'~int; default: 3'

// validates an integer in strict mode, greater than or equal to 5
'=int; min: 5'

// integer between 5 and 8, with a default value
'int; min: 5; max: 8; default: 6'

// validates an integer
['type' => 'int']

// with a minimum value
[
    'type' => 'int',
    'min'  => 5,
]
float
  • Strict mode: the data must be a float.
  • Non-strict mode: the data may be a boolean (converted to 0.0 or 1.0), a float, an integer (converted to float), or a string containing digits (converted to float).
  • Accepted parameters: default, min, max

Examples:

// validates a float
'float'

// validates a float in non-strict mode, with a minimum value
'~int; min: 2.7'

// validates a float in strict mode, with a maximum value
[
    '=type' => 'float',
    'max'   => 18.5,
]

4.4string

  • Strict mode: the data must be a string.
  • Non-strict mode: the data may be a string, a boolean (converted to "true" or "false"), or any scalar value (converted to string).
  • Accepted parameters: default, minLen, maxLen, mask, charset

Examples:

// validates a string, with a default value
'string; default: abc'

// validates a string between 3 and 12 characters long
'string; minLen: 3; maxLen: 12'

// validates a string matching a regular expression
'string; mask: ^[Bb][Oo0]..[Oo0].r$'
// equivalent
[
    'type' => 'string',
    'mask' => '^[Bb][Oo0]..[Oo0].r$',
]

// validates a UTF-8 encoded string
'string; charset: utf-8'

// validates a string, and converts it to UTF-8 if necessary (in non-strict mode)
'string; charset: utf-8, iso-8859-1'

5Advanced types

5.1email, url, uuid

email
  • The data must be a valid email address.
  • Accepted parameters: default, mask

Examples:

// validates an email address, with a default value
'email; default: contact@domain.com'

// with a regular expression
'email; mask: @domain.com$'

// with a regular expression and a default value
[
    'type'    => 'email',
    'default' => 'contact@domain.com',
    'mask'    => '@domain.com$',
]
url
  • The data must be a valid URL.
  • Accepted parameters: default, minLen, maxLen, mask

Examples:

// validates a URL
'url'

// URL shorter than 200 characters
'url; maxLen: 200'

// with a regular expression
[
    'type' => 'url',
    'mask' => 'https?:..www.domain.com/.$',
]
uuid
  • The data must be a valid UUID.
  • Accepted parameter: default

Example:

// validates a UUID with a default value
'uuid; default: 123e4567-e89b-12d3-a456-426614174003'

5.2binary, base64

binary
  • Checks if the provided data is a valid binary string.
  • The mime parameter is optional, and accepts one or more MIME types (separated by commas). Types can be specific (e.g., image/jpeg) or generic (image). If the data is not one of the listed types, an exception is thrown.
  • Accepted parameters: default, minLen, maxLen, mime, charset
This type doesn't return just a string, but an associative array containing three keys:
  • binary: the binary content.
  • mime: the content's MIME type.
  • charset: the content's character set (if detected, otherwise null).

Examples:

// validate a GIF or PNG image
'binary; mime: image/gif, image/png'

// validate image or PDF
'binary; mime: image, application/pdf'
base64
  • The data must be Base64-encoded.
  • Mode:
    • non-strict: data is decoded.
    • strict: data is decoded, then encoded and the result is compared to the original data.
  • With the mime parameter, it is possible to specify one or many MIME types (comma separated). If the base64-encoded content doesn't comply with any listed types, an exception is thrown.
  • Accepted parameters: default, minLen, maxLen, mime, charset
This type doesn't return directly the result of decoding the base-64 encoded stream provided as input. Like for the binary type, it returns an associative array containing three keys:
  • binary: the decoded binary content.
  • mime: the content's MIME type.
  • charset: the content's character set (if detected, or null).

Examples:

// validate any base64-encoded content
'base64'

// validate a GIF or PNG image
'base64; mime: image/gif, image/png'

// validate image or PDF
'base64; mime: image, application/pdf'

5.3date, time, datetime

date
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (Y-m-d by default).
  • Mode:
    • strict: the date must be valid (the date 2026/12/33 is rejected).
    • non-strict: the date is converted (2026/12/33 becomes 2027/01/02).
  • The data is returned using the specified output format (Y-m-d by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a date with a specified output format
'date; outFormat: d/m/Y'

// specifying the input format, with a minimum date after January 1st, 2000
'date; inFormat: d/m/Y; min: 01/01/2000'
time
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (H:i:s by default).
  • Mode:
    • strict: the time must be valid (the time 13:65:34 is rejected).
    • non-strict: the time is converted (13:65:34 becomes 14:05:34).
  • The data is returned using the specified output format (H:i:s by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a time with identical input and output format
'time; format: H:i;'

// validates a time between 15:00 and 17:00
'time; min: 15:00:00; max: 17:00:00'
datetime
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (Y-m-d H:i:s by default).
  • Mode:
    • strict: the date/time must be valid (the datetime 2026/12/33 13:65:34 is rejected).
    • non-strict: the datetime is converted (2026/12/33 13:65:34 becomes 2027/01/02 14:05:34).
  • The data is returned using the specified output format (Y-m-d H:i:s by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a datetime using a specific input format
'datetime; inFormat: d/m/Y H:i'

// validates a datetime returned as a timestamp, specifying input format and allowed range
[
    'type'      => 'datetime',
    'inFormat'  => 'd/m/Y H:i:s',
    'outFormat' => 'U',
    'min'       => '2000-01-01 00:00',
    'max'       => '2050-12-31 23:59',
]

5.4isbn, ean

isbn
  • The data must be a valid ISBN.
  • Accepted parameter: default

Examples:

// validates an ISBN with a default ISBN-10
'isbn; default: 0-306-40615-2'
// same without hyphens
'isbn; default: 0306406152'

// validates an ISBN with a default ISBN-13
'isbn; default: 978-3-16-148410-0'
// same without hyphens
'isbn; default: 9783161484100'
ean
  • The data must be a valid EAN code.
  • Accepted parameter: default

Example:

// validates an EAN with a default value
'ean; default: 4006381333931'

5.5ip, ipv4, ipv6

ip

Examples:

// validates an IP address with a default IPv4
'ip; default: 127.0.0.1'

// validates an IP address with a default IPv6
[
    'type'    => 'ip',
    'default' => '::1',
]
ipv4
  • The data must be a valid IPv4 address.
  • Accepted parameter: default

Example:

// validates an IPv4 with a default value
'ipv4; default: 127.0.0.1'
ipv6
  • The data must be a valid IPv6 address.
  • Accepted parameter: default

Example:

// validates an IPv6 with a default value
'ipv6; default: ::1'

5.6mac, port

mac
  • The data must be a valid MAC address.
  • Accepted parameter: default

Example:

// validates a MAC address with a default value
'mac; default: 00:1A:2B:3C:4D:5E'
port
  • Strict mode: the data must be an integer between 1 and 65,535.
  • Non-strict mode: the data must be convertible into an integer between 1 and 65,535.
  • Accepted parameters: default, min, max

Example:

// validates a root port (below 1024)
'port; max: 1024'

5.7slug, color

slug
  • Strict mode: the data must be a string containing only lowercase unaccented characters (a to z) and hyphens (-).
  • Non-strict mode: the data is converted using \Temma\Utils\Text::urlize().
  • Accepted parameter: default
color
  • The data must be a string containing a valid hexadecimal color, optionally starting with a hash (#).
  • The returned value always starts with a hash and is lowercase.
  • Accepted parameter: default

5.8geo, phone

geo
  • The data must be a geographic coordinate.
  • Accepted parameter: default

Example:

// default coordinates of Paris
'geo; default: 48.8566, 2.3522'
phone

Validates phone numbers that meet one of the following conditions:

  • Start with 00 followed by 1 to 15 digits.
  • Start with + followed by 1 to 15 digits.
  • Contain 1 to 15 digits.

Notes:

  • Strict mode: the returned number is stripped of spaces, dashes, dots, and parentheses.
  • Non-strict mode: spaces, dashes, dots, and parentheses are preserved.
  • Accepted parameter: default

6Complex types

6.1enum

  • Enumeration whose value must be one of the listed options.
  • Accepted parameters: default, values

Examples:

// enumeration with three possible values and a default value
'enum; values: red, green, blue; default: red'

// equivalent
[
    'type'    => 'enum',
    'values'  => ['red', 'green', 'blue'],
    'default' => 'red',
]

6.2list, assoc

list
  • The data must be an array.
  • If a sub-contract is given, all elements of the list must validate it.
  • Accepted parameters: default, contract, minLen, maxLen
// list where all elements are integers
'list; contract: int'

// equivalent
[
    'type'     => 'list',
    'contract' => 'int',
]

// list of 3 to 5 integers
'list; contract: int; minLen: 3; maxLen: 5'
assoc
  • The data must be an associative array, whose keys may be defined.
  • If the keys are defined:
    • Strict mode: an exception is thrown if a non-defined key is found.
    • Non-strict mode: non-defined keys are silently discarded from the returned data.
    • In any case, if the list of defined keys contains a "..." entry, nondefined keys are accepted as-is.
    • If the list of defined keys contains a "..." entry associated with a contract, nondefined keys are validated with this contract.
  • Accepted parameters: default, keys

Example: associative array with 'id' and 'name' keys

'assoc; keys: id, name'

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name',
    ]
]

Example: associative array with mandatory 'id' and 'name' keys, and other keys are accepter

'assoc; keys: id, name, ...'

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name',
        '...'
    ]
]

Example: associative array with mandatory 'id' and 'name' keys, and other keys are accepted only if they are integers

[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name',
        '...' => 'int',
    ]
]

Example: list whose elements are associative arrays with defined keys:

[
    'type'     => 'list',
    'contract' => 'assoc; keys: id, name',
]

// equivalent
[
    'type'     => 'list',
    'contract' => [
        'type' => 'assoc',
        'keys' => ['id', 'name'],
    ]
]

Example: associative array with key 'id' (mandatory) and key 'name' (optional)

'assoc; keys: id, name?'

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name?',
    ]
]

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name' => [
            'mandatory' => false,
        ],
    ]
]

Example: defining the expected type of some keys

[
    'type' => 'assoc',
    'keys' => [
        'id'   => 'int',
        'name' => 'string',
        'date',
    ]
]

Example: associative array with typed keys, one of them optional

[
    'type' => 'assoc',
    'keys' => [
        'id'   => 'int',
        'name' => [
            'type'      => 'string',
            'mandatory' => false,
        ]
    ]
]

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id'    => 'int',
        'name?' => 'string',
    ]
]

Example : associative array with typed keys, accepting other non-defined keys

[
    'type' => 'assoc',
    'keys' => [
        'id'    => 'int',
        'name?' => 'string',
        '...',
    ]
]

Complex example:

[
    'type' => 'assoc',
    'keys' => [
        'id'          => 'int',
        'isCreated'   => 'bool',
        'name'        => 'string; default: abc',
        'color'       => [
            'type'      => 'enum',
            'values'    => ['red', 'green', 'blue'],
            'default'   => 'red',
            'mandatory' => false,
        ],
        'creator'     => [
            'type' => 'assoc',
            'keys' => [
                'id'           => 'int',
                'name',
                'dateCreation',
            ],
        ],
        'children'    => [
            'type'      => 'list',
            'mandatory' => false,
            'contract'  => [
                'type' => 'assoc',
                'keys' => [
                    'id'   => 'int',
                    'name',
                ]
            ],
        ],
        'identifiers' => [
            'type'     => 'list',
            'contract' => 'int',
        ],
    ],
]

6.3json

  • The data must be a string containing valid JSON.
  • If a sub-contract is given, the JSON content must validate it.
  • Accepted parameter: default, contract
The returned data is the result of JSON deserialization.
It is not a JSON stream.

Examples:

// validates a JSON stream (regardless of its content)
'json'


// validates a JSON which contains a list of integers
[
    'type'     => 'json',
    'contract' => 'list; contrat: int',
]

// validates a JSON which contains an associative array with defined keys
[
    'type'     => 'json',
    'contract' => 'assoc; keys: id, name, role',
]

// equivalent to the previous one
[
    'type'     => 'json',
    'contract' => [
        'type'    => 'assoc',
        'keys'    => [
            'id',
            'name',
            'role',
        ]
    ]
]