DataFilter helper


1Presentation

Helper used to filter or validate data in order to check that it complies with a contract. This can be useful to ensure that incoming data in an API matches what is expected.


2Contract definition

2.1Type and parameter

Contracts are used to define the type of data to filter or validate. They can be written as a string or as an associative array.

A contract contains at least the type that the data must respect.
Examples: 'int', 'string'

A contract may also take additional parameters.
Parameters can be written in two different ways:

  • They can be added to the contract configuration string, following the type definition. Parameters are separated by semicolons (;), and parameter names are separated from their values by a colon (:).

    Example: 'int; default: 3'

  • Contracts can also be written as an associative array. The type and parameters are represented as key/value pairs in the array. This format is required when defining sub-contracts.

    Example: ['type' => 'int', 'default' => 3]


2.2Nullable types

All types are nullable by prefixing a question mark: '?bool', '?false', '?true'


2.3Multiple types

It is possible to use multiple types. For example: 'null|int|string', 'int|float'


2.4Automatic validation (pass-through)

If the provided contract is the value null (and not the string "null"), the input data is returned as-is.


2.5Strict mode

By default, the DataFilter object operates in non-strict mode: if necessary, it performs type casting (for example, converting a string containing digits into an integer); for some parameters, it will behave permissively (for example, if a number exceeds the defined maximum, it will return the maximum instead of throwing an error).

To execute filtering in strict mode, you must set the strict parameter to true (see how to use the DataFilter object below).

A contract may also force strict or non-strict evaluation:

  • To enforce strict evaluation, prefix the type with an equal sign (=).
    Examples: '=int', '=string'
  • To enforce non-strict evaluation, prefix the type with a tilde (~).
    Examples: '~int', '~string'

2.6Default value

Using the default parameter, it is possible to provide a fallback value, which will be used if the input data is invalid. The default value must be of the same type as the type being validated.

Examples:

'bool; default: false'

[
    'type'    => 'list',
    'default' => [1, 2, 3],
]

2.7Other parameters

  • min: Minimum value (for a number, date, time, or geographic coordinate).
  • max: Maximum value (for a number, date, time, or geographic coordinate).
  • minLen: Minimum length (for a string or URL).
  • maxLen: Maximum length (for a string or URL).
  • mask: Regular expression (for a string, email address, or URL).
  • format: Input and output format for a date/time.
  • inFormat: Input format for a date/time.
  • outFormat: Output format for a date/time.
  • values: Accepted values for an enumeration.
  • contract: Validation contract for the content of a list.
  • keys: List of expected keys in an associative array.

3Scalar types

3.1null

  • The input data must be null.
  • No parameters accepted.

Examples:

'null'
['type' => 'null']

3.2false, true, bool

false
  • Strict mode: the data must be equal to false.
  • Non-strict mode: the data must be equal to false or any value that PHP automatically converts to false (null, 0, empty string, empty array).
  • Accepted parameter: default

Examples:

// validates a false value
'false'
['type' => 'false']

// returns false even if the data had another value
'=false; default: false'
true
  • Strict mode: the data must be equal to true.
  • Non-strict mode: the data must be equal to true or any value that PHP automatically converts to true (non-zero number, non-empty string, non-empty array, object).
  • Accepted parameter: default

Examples:

// validates a true value
'true'
['type' => 'true']

// validates a true value, always in non-strict mode
'~true'
bool
  • Strict mode: the data must be equal to true or false.
  • Non-strict mode: the data is typecast to a boolean.
  • Accepted parameter: default

Examples:

// validates a boolean
'bool'
['type' => 'bool']

// boolean with a default false value
'bool; default: false'
[
    'type'    => 'bool',
    'default' => true,
]

3.3int, float

int
  • Strict mode: the data must be an integer.
  • Non-strict mode: the data may be a boolean (converted to 0 or 1), an integer, a float (converted to integer), or a string containing digits (converted to integer).
  • Accepted parameters: default, min, max

Examples:

// validates an integer
'int'

// validates an integer in non-strict mode, with a default value
'~int; default: 3'

// validates an integer in strict mode, greater than or equal to 5
'=int; min: 5'

// integer between 5 and 8, with a default value
'int; min: 5; max: 8; default: 6'

// validates an integer
['type' => 'int']

// with a minimum value
[
    'type' => 'int',
    'min'  => 5,
]
float
  • Strict mode: the data must be a float.
  • Non-strict mode: the data may be a boolean (converted to 0.0 or 1.0), a float, an integer (converted to float), or a string containing digits (converted to float).
  • Accepted parameters: default, min, max

Examples:

// validates a float
'float'

// validates a float in non-strict mode, with a minimum value
'~int; min: 2.7'

// validates a float in strict mode, with a maximum value
[
    '=type' => 'float',
    'max'   => 18.5,
]

3.4string

  • Strict mode: the data must be a string.
  • Non-strict mode: the data may be a string, a boolean (converted to "true" or "false"), or any scalar value (converted to string).
  • Accepted parameters: default, minLen, maxLen, mask

Examples:

// validates a string, with a default value
'string; default: abc'

// validates a string between 3 and 12 characters long
'string; minLen: 3; maxLen: 12'

// validates a string matching a regular expression
'string; mask: ^[Bb][Oo0]..[Oo0].r$'
// equivalent
[
    'type' => 'string',
    'mask' => '^[Bb][Oo0]..[Oo0].r$',
]

4Advanced types

4.1email, url, uuid

email
  • The data must be a valid email address.
  • Accepted parameters: default, mask

Examples:

// validates an email address, with a default value
'email; default: contact@domain.com'

// with a regular expression
'email; mask: @domain.com$'

// with a regular expression and a default value
[
    'type'    => 'email',
    'default' => 'contact@domain.com',
    'mask'    => '@domain.com$',
]
url
  • The data must be a valid URL.
  • Accepted parameters: default, minLen, maxLen, mask

Examples:

// validates a URL
'url'

// URL shorter than 200 characters
'url; maxLen: 200'

// with a regular expression
[
    'type' => 'url',
    'mask' => 'https?:..www.domain.com/.$',
]
uuid
  • The data must be a valid UUID.
  • Accepted parameter: default

Example:

// validates a UUID with a default value
'uuid; default: 123e4567-e89b-12d3-a456-426614174003'

4.2date, time, datetime

date
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (Y-m-d by default).
  • Mode:
    • strict: the date must be valid (the date 2026/12/33 is rejected).
    • non-strict: the date is converted (2026/12/33 becomes 2027/01/02).
  • The data is returned using the specified output format (Y-m-d by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a date with a specified output format
'date; outFormat: d/m/Y'

// specifying the input format, with a minimum date after January 1st, 2000
'date; inFormat: d/m/Y; min: 01/01/2000'
time
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (H:i:s by default).
  • Mode:
    • strict: the time must be valid (the time 13:65:34 is rejected).
    • non-strict: the time is converted (13:65:34 becomes 14:05:34).
  • The data is returned using the specified output format (H:i:s by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a time with identical input and output format
'time; format: H:i;'

// validates a time between 15:00 and 17:00
'time; min: 15:00:00; max: 17:00:00'
datetime
  • If the data is an integer, float, or string containing digits, it is interpreted as a Unix timestamp.
  • If the data is a string, it must match the input format (Y-m-d H:i:s by default).
  • Mode:
    • strict: the date/time must be valid (the datetime 2026/12/33 13:65:34 is rejected).
    • non-strict: the datetime is converted (2026/12/33 13:65:34 becomes 2027/01/02 14:05:34).
  • The data is returned using the specified output format (Y-m-d H:i:s by default).
  • Accepted parameters: default, format, inFormat, outFormat, min, max

Examples:

// validates a datetime using a specific input format
'datetime; inFormat: d/m/Y H:i'

// validates a datetime returned as a timestamp, specifying input format and allowed range
[
    'type'      => 'datetime',
    'inFormat'  => 'd/m/Y H:i:s',
    'outFormat' => 'U',
    'min'       => '2000-01-01 00:00',
    'max'       => '2050-12-31 23:59',
]

4.3isbn, ean

isbn
  • The data must be a valid ISBN.
  • Accepted parameter: default

Examples:

// validates an ISBN with a default ISBN-10
'isbn; default: 0-306-40615-2'
// same without hyphens
'isbn; default: 0306406152'

// validates an ISBN with a default ISBN-13
'isbn; default: 978-3-16-148410-0'
// same without hyphens
'isbn; default: 9783161484100'
ean
  • The data must be a valid EAN code.
  • Accepted parameter: default

Example:

// validates an EAN with a default value
'ean; default: 4006381333931'

4.4ip, ipv4, ipv6

ip

Examples:

// validates an IP address with a default IPv4
'ip; default: 127.0.0.1'

// validates an IP address with a default IPv6
[
    'type'    => 'ip',
    'default' => '::1',
]
ipv4
  • The data must be a valid IPv4 address.
  • Accepted parameter: default

Example:

// validates an IPv4 with a default value
'ipv4; default: 127.0.0.1'
ipv6
  • The data must be a valid IPv6 address.
  • Accepted parameter: default

Example:

// validates an IPv6 with a default value
'ipv6; default: ::1'

4.5mac, port

mac
  • The data must be a valid MAC address.
  • Accepted parameter: default

Example:

// validates a MAC address with a default value
'mac; default: 00:1A:2B:3C:4D:5E'
port
  • Strict mode: the data must be an integer between 1 and 65,535.
  • Non-strict mode: the data must be convertible into an integer between 1 and 65,535.
  • Accepted parameters: default, min, max

Example:

// validates a root port (below 1024)
'port; max: 1024'

4.6slug, json, color

slug
  • Strict mode: the data must be a string containing only lowercase unaccented characters (a to z) and hyphens (-).
  • Non-strict mode: the data is converted using \Temma\Utils\Text::urlize().
  • Accepted parameter: default
json
  • The data must be a string containing valid JSON.
  • Accepted parameter: default
color
  • The data must be a string containing a valid hexadecimal color, optionally starting with a hash (#).
  • The returned value always starts with a hash and is lowercase.
  • Accepted parameter: default

4.7geo, phone

geo
  • The data must be a geographic coordinate.
  • Accepted parameter: default

Example:

// default coordinates of Paris
'geo; default: 48.8566, 2.3522'
phone

Validates phone numbers that meet one of the following conditions:

  • Start with 00 followed by 1 to 15 digits.
  • Start with + followed by 1 to 15 digits.
  • Contain 1 to 15 digits.

Notes:

  • Strict mode: the returned number is stripped of spaces, dashes, dots, and parentheses.
  • Non-strict mode: spaces, dashes, dots, and parentheses are preserved.
  • Accepted parameter: default

5Complex types

5.1enum

  • Enumeration whose value must be one of the listed options.
  • Accepted parameters: default, values

Examples:

// enumeration with three possible values and a default value
'enum; values: red, green, blue; default: red'

// equivalent
[
    'type'    => 'enum',
    'values'  => ['red', 'green', 'blue'],
    'default' => 'red',
]

5.2array, list, assoc

array
  • Strict mode: the data must be an array, whatever its contents.
  • Non-strict mode: if the data is not an array, it is wrapped into one and returned.
  • Accepted parameter: default
list
  • The data must be an array whose elements are all of the same type (defined in a sub-contract).
  • Accepted parameters: default, contract
// list where all elements are integers
'list; contract: int'

// equivalent
[
    'type'     => 'list',
    'contract' => 'int',
]
assoc
  • The data must be an associative array, whose keys may be defined.
  • Accepted parameters: default, keys

Example: associative array with 'id' and 'name' keys

'assoc; keys: id, name'

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name',
    ]
]

Example: list whose elements are associative arrays with defined keys:

[
    'type'     => 'list',
    'contract' => [
        'type' => 'assoc',
        'keys' => ['id', 'name'],
    ]
]

Example: associative array with key 'id' (mandatory) and key 'name' (optional)

'assoc; keys: id, name?'

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name?',
    ]
]

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id',
        'name' => [
            'mandatory' => false,
        ],
    ]
]

Example: defining the expected type of some keys

[
    'type' => 'assoc',
    'keys' => [
        'id'   => 'int',
        'name' => 'string',
        'date',
    ]
]

Example: associative array with typed keys, one of them optional

[
    'type' => 'assoc',
    'keys' => [
        'id'   => 'int',
        'name' => [
            'type'      => 'string',
            'mandatory' => false,
        ]
    ]
]

// equivalent
[
    'type' => 'assoc',
    'keys' => [
        'id'    => 'int',
        'name?' => 'string',
    ]
]

Complex example:

[
    'type' => 'assoc',
    'keys' => [
        'id'          => 'int',
        'isCreated'   => 'bool',
        'name'        => 'string; default: abc',
        'color'       => [
            'type'      => 'enum',
            'values'    => ['red', 'green', 'blue'],
            'default'   => 'red',
            'mandatory' => false,
        ],
        'creator'     => [
            'type' => 'assoc',
            'keys' => [
                'id'           => 'int',
                'name',
                'dateCreation',
            ],
        ],
        'children'    => [
            'type'      => 'list',
            'mandatory' => false,
            'contract'  => [
                'type' => 'assoc',
                'keys' => [
                    'id'   => 'int',
                    'name',
                ]
            ],
        ],
        'identifiers' => [
            'type'     => 'list',
            'contract' => 'int',
        ],
    ],
]

6Usage

The \Temma\Utils\DataFilter object provides a static method process(). This method takes as parameters the data to filter and the contract to apply, and returns the filtered data. If the contract syntax is incorrect, the method throws a \Temma\Exceptions\IO exception. If the data does not satisfy the contract, the method throws a \Temma\Exceptions\Application exception.

An optional third parameter makes it possible to specify whether validation should be performed in strict mode or not. By default, validation is non-strict.

Usage example:

use \Temma\Utils\DataFilter as TµDataFilter;
use \Temma\Exceptions\IO as TµIOException;
use \Temma\Exceptions\Application as TµApplicationException;
use \Temma\Base\Log as TµLog;

$contract = 'enum; values: admin, member, guest';
/* equivalent to the previous line:
$contract = [
    'type'   => 'enum',
    'values' => ['admin', 'member', 'guest'],
];
*/
try {
    $data = TµDataFilter::process($data, $contract);
} catch (TµIOException $ie) {
    TµLog::log('myapp', 'WARN', "Invalid contract.");
    throw $ie;
} catch (TµApplicationException $ae) {
    TµLog::log('myapp', 'WARN', "Invalid data.");
    throw $ae;
}

// strict-mode execution
try {
    $data = TµDataFilter::process($data, $contract, true);
} catch (\Exception $e) {
    TµLog::log('myapp', 'WARN', "Validation error.");
    throw $e;
}