All pages
Powered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Data Quality

Have confidence in your data

Most companies detect issues after their team has used bad data to make decisions or trigger campaigns. Quickly take action on every invalid event with in-app reporting and daily email digests.

Align all the teams in your company around a single data dictionary

In the interface, you will be able to define the schema of your data and define the validation rules that will feed your data quality workflow.

Automate the QA process

Writing allows you to automate the QA process, to feed the in the dashboard, but also to define realtime alerts to react quickly when errors occur on your data.

Fix your data in realtime

To react quickly to data errors, while your IT team corrects the problem at source, you can rely on the , aka .

Control the data delivery

Having a good data quality on each source is essential, but being able to check also the is at least as important. For each destination, you can view the , quickly identify errors and define with a personnalized thresold.

Inspect specific events

In case of doubt or to further investigate a data problem, you can access the logs of the events sent. You benefit both from and to search for a specific event, analyse it's data and better understand the issue.

Event Specification
event specification
Source data quality
live data transformation feature
Data cleansing
quality of the data transmission
event delivery history
realtime alerts
Event source inspector
Destination event inspector

Data cleansing

Transform/fix your events before to send them to destinations

Cleaning, fixing and preparing your data is crucial for success when using the Platform activation capabilities. The no-code approach allows you to create transformations using simple formula based on basic functions and operators.

You can choose between different kind of transformations:

  • Rename event: simply change the name of an incoming event

  • Derive event: create a new event from an existing one (coming soon)

  • Modify properties: apply transformations and functions on properties

  • Filter event: define rules to filter incoming events (coming soon)

  • Custom code: create your own transformations (coming soon)

Rename event

First, give a name to the transformation and add a description if needed.

Select on which sources the rename transformation will be applied.

Select which event should be renamed. It is also possible to add more conditions, like rename the page view event to product view, if 'page_type' contains 'product'.

Last step, enter the new event name.

Modify properties

First, give a name to the transformation and add a description if needed.

Select on which sources the transformation will be applied.

Define conditions on events and/or properties, the transformation will be applied only regarding these conditions.

You can add multiple values (like event name is purchase ; add to cart). In that case, it is an OR condition (event name is purchase OR add to cart)

Select properties to transform. You can:

  • rename the property ('rename to' option)

    • Example: rename 'date_of_birth' to 'birthdate'

  • map/link to an existing property ('copy value from' option)

With the 'set value to' option, you can simply change the value (ex: 'currency' = EUR by default) but you also have access to formulas based on basic and operators. For example, you can hash incoming data with SHA-256 or MD5 functions. You can also REPLACE or SUBSTITUTE a text to replace by another one, or EXTRACT a specific text in a chain.

You can also use operators like AND / OR / NOT and create operations with * / + -, full list .

For example, to calculate the revenue with taxes, you can do: revenue = value*1,2.

Example: map property 'price' to 'revenue'
  • delete a property ('delete' option)

    • Example: always remove property 'tax_amount' from event 'purchase'

  • change the value ('set value to' option)

  • functions
    here

    Event Specification

    In the Event Specification interface, you will be able to define the schema of your data and determine the actions to be taken if an event does not conform your event specifications (aka validation rules)

    Adding event specifications allows you to define the payload and what is expected, so that you can see if the data is meeting those specifications, in case there are any errors.

    Then, in the Source data quality view, you are able to define an automatic alert or see the summary of all specification violations, so that you fix it at your source or with the live Data Cleansing feature

    You can choose to add a standard event (from the event catalog) or a custom specification.

    The schema of standard events is already defined, but you can modify it by adding more properties (standard or custom properties).

    Create custom property

    You can create custom properties and add it to your event specification.

    Precise the data type

    • String: most used type, it corresponds to a text format (ex: name = 'ABC')

    • Number: corresponds to a number (float, entire) (ex: value = '12')

    • Boolean: the property can take only 2 values, true or false (ex: paid = 'true')

    • Object: corresponds to an array of values (ex: items = shirt, pant, shoes)

    Precise the structure type

    • Default (simple value): most used structure, corresponds to a single value. (ex: name = 'ABC')

    • List (array of values): list of all values on the property (ex: items = shirt, pant, pant)

    • Set (array of unique values): list of unique values (ex: items = shirt, pant)

    PII option

    You can define your property as 'Personal Identifiable Information (PII)', meaning this property contains personal information that could identify a user (email address, postal address, Customer ID...).

    If this option is set to true, the property will be considered as PII.

    If this property is stored and set as PII, it will be automatically encrypted in AES-256 before being written to the database.

    Validation Rule Actions

    You can define bespoke responses to data validation scenarios:

    • Missing Required Property: Choose to either accept or reject events entirely when a required property is missing.

    • Unexpected Value Format: Configure the system either to omit properties that don’t match the expected format or to accept/reject event entirely.

    • Unspecified Events: You have the option to accept events that are not predefined in your schema. You can also decide if this action should trigger a warning in the Source Data Quality report.

    Unspecified Properties: Decide whether to omit undefined properties or to accept them, with or without warnings.

    Sources data quality

    On the Health menu, you will find Sources data quality.

    You can visualize the evolution of your event data quality over time, filter by sources or environment (default to Production)

    You can setup an alert to know in realtime if some of your event violate your validation rules.

    You can analyze in detail each event and have more details about errors, and a shortcut to fix the error in the Data Cleansing feature.

    FAILED the event is in error status, can be caused by an error 404, a timeout, etc...

    FILTERED the event has been filtered (based on the conditions created in Normalized Datalayer.)

    INVALID ACCEPTED means the property has been removed, but it is not blocking for the event, this one has been received without the concerned property. You should fix the property to receive it.

    INVALID REJECTED means one or many properties are missing or don't have good values. As a result, the full event was rejected, you should fix it to receive the event.

    VALID the event has been collected correctly, this means your sources are sending the data expected in your specifications.

    You can also visualize the data quality for a specific source, when you click on a source, you have a dedicated tab 'Data Quality'tap available with same info but filtered for the source you selected:

    The quality score of events is represented by a weather icon:

    • sunny if the percent of correct events is equal to 100%

    • cloudy between 95% and 99.99%

    • rainy between 90 and 95%

    • stormy below 90%

    Format a date

    Details about format used for the date functions

    The following table defines the token substrings you can use in a date-time format pattern for the functions DATEPARSE(str, format) and DATEFORMAT(date, format).

    You can define the format field by combining the tokens shown below.

    Usual format token

    Examples below given for 2014-08-06T13:07:04.054 considered as a local time in America/New_York

    To display
    Use this token
    Example

    Advanced format token

    Examples below given for 2014-08-06T13:07:04.054 considered as a local time in America/New_York

    To display
    Use this token
    Example

    month as a padded number

    MM

    08

    month as an abbreviated localized string

    MMM

    Aug

    month as an unabbreviated localized string

    MMMM

    August

    month as a single localized letter

    MMMMM

    A

    day of the month, no padding

    d

    6

    day of the month, padded to 2

    dd

    06

    hour in 12-hour time, no padding

    h

    1

    hour in 12-hour time, padded to 2

    hh

    01

    hour in 24-hour time, no padding

    H

    13

    hour in 24-hour time, padded to 2

    HH

    13

    minute, no padding

    m

    7

    minute, padded to 2

    mm

    07

    second, no padding

    s

    4

    second, padded to 2 padding

    ss

    04

    fractional seconds, functionally identical to SSS

    u

    054

    fractional seconds, between 0 and 99, padded to 2

    uu

    05

    fractional seconds, between 0 and 9

    uuu

    0

    millisecond, no padding

    S

    54

    millisecond, padded to 3

    SSS

    054

    ISO week year, unpadded

    kk

    14

    ISO week year, padded to 4

    kkkk

    2014

    ISO week number, unpadded

    W

    32

    ISO week number, padded to 2

    WW

    32

    Local week year, unpadded

    ii

    14

    Local week year, padded to 4

    iiii

    2014

    Local week number, unpadded

    n

    32

    Local week number, padded to 2

    nn

    32

    day of the week, as number from 1-7 (Monday is 1, Sunday is 7)

    E

    3

    day of the week, as an abbreviate localized string

    EEE

    Wed

    day of the week, as an unabbreviated localized string

    EEEE

    Wednesday

    day of the week, as a single localized letter

    EEEEE

    W

    localized numeric date

    D

    9/4/2017

    localized date with abbreviated month

    DD

    Aug 6, 2014

    localized date with full month

    DDD

    August 6, 2014

    localized date with full month and weekday

    DDDD

    Wednesday, August 6, 2014

    localized time

    t

    9:07 AM

    localized time with seconds

    tt

    1:07:04 PM

    localized time with seconds and abbreviated offset

    ttt

    1:07:04 PM EDT

    localized time with seconds and full offset

    tttt

    1:07:04 PM Eastern Daylight Time

    localized 24-hour time

    T

    13:07

    localized 24-hour time with seconds

    TT

    13:07:04

    localized 24-hour time with seconds and abbreviated offset

    TTT

    13:07:04 EDT

    localized 24-hour time with seconds and full offset

    TTTT

    13:07:04 Eastern Daylight Time

    short localized date and time

    f

    8/6/2014, 1:07 PM

    less short localized date and time

    ff

    Aug 6, 2014, 1:07 PM

    verbose localized date and time

    fff

    August 6, 2014, 1:07 PM EDT

    extra verbose localized date and time

    ffff

    Wednesday, August 6, 2014, 1:07 PM Eastern Daylight Time

    short localized date and time with seconds

    F

    8/6/2014, 1:07:04 PM

    less short localized date and time with seconds

    FF

    Aug 6, 2014, 1:07:04 PM

    verbose localized date and time with seconds

    FFF

    August 6, 2014, 1:07:04 PM EDT

    extra verbose localized date and time with seconds

    FFFF

    Wednesday, August 6, 2014, 1:07:04 PM Eastern Daylight Time

    narrow offset

    Z

    +5

    short offset

    ZZ

    +05:00

    techie offset

    ZZZ

    +0500

    abbreviated named offset

    ZZZZ

    EST

    unabbreviated named offset

    ZZZZZ

    Eastern Standard Time

    IANA zone

    z

    America/New_York

    meridiem

    a

    AM

    unix timestamp in seconds

    X

    1407287224

    unix timestamp in milliseconds

    x

    1407287224054

    year, unpadded

    y

    2014

    two-digit year

    yy

    14

    four- to six- digit year, pads to 4

    yyyy

    2014

    month as an unpadded number

    M

    ordinal (day of year), unpadded

    o

    218

    ordinal (day of year), padded to 3

    ooo

    218

    quarter, no padding

    q

    3

    quarter, padded to 2

    qq

    8

    03

    Supported transformation functions - Data cleansing

    No-code Transformation formulas for Data cleansing.

    In the formula input, you can can write basic or complex expression to transform the data. Expressions are composed of either a value, a function, an operation, or another expression in parenthesis. See below for the description of each one of them.

    Functions can be nested, ex: SHA256(LOWER(user.email))

    Available functions

    FUNCTION NAME
    WHAT IT DOES
    Example

    Operators

    Aggregations

    FUNCTION NAME
    WHAT IT DOES
    Example

    Examples

    1. Scenario: Create a Flag that shows if Consumers’ Primary Address is in California IF(EXTRACT(city_state, '-', 1) == " CA", "TRUE", "FALSE")

    2. Scenario: Set a value with multiple conditions. The value equals "12345" for country FR and environment prod, else equals 98888 for country DE and environment dev: \

    MD5(stringToHash)

    Returns an MD5 hash of the given string

    MD5("test") returns "098f6bcd4621d373cade4e832627b4f6"

    COALESCE(value1, value2, ...)

    Returns the first value from the list that isn’t empty

    COALESCE("123", "hello") returns "123"

    CONCAT(string1, string2, ...)

    Concatenate multiple values into one property. Separators could be specified: CONCAT(prop1, "

    ", prop2, "

    TRIM(text)

    Removes all spaces, new lines, tabulations at the beginning and at the end of a text.

    TRIM(" \Hello World \n") returns "Hello World"

    SELECT(sourceStr, pattern, <position>)

    Returns the first match of the pattern (a regex expression) in sourceStr. If the third parameter is set, it will return the corresponding group inside the match.

    SELECT("From A to Z alphabet", 'A(.*?)Z") returns "A to Z"

    LOWER(text)

    Converts all uppercase letters in a text string to lowercase

    LOWER("ABC") returns "abc"

    UPPER(text)

    Converts all lowercase letters in a text string to uppercase

    UPPER("abc") returns "ABC"

    NUMBER(value1)

    Converts string to number

    NUMBER("123.456") returns 123.456

    NUMBER("not a number") returns null

    STRING(value)

    Converts a number or any value to string

    STRING(123.456) returns "123.456"

    STRING(true) returns "true"

    STRING(null) returns ""

    ENCODE_BASE64(value1)

    Encode a string to base64

    ENCODE_BASE64("hello") returns "aGVsbG8="

    DECODE_BASE64(value1)

    Decode a string from base64

    DECODE_BASE64("aGVsbG8=") returns "hello"

    TIMESTAMP()

    Returns the current date timestamp

    TIMESTAMP () returns a value like 1674832519845 (changes with the current time)

    LEFT(text, length)

    Returns the requested number of characters from the beginning of a string or number.

    LEFT("Hello world", 3) returns "Hel"

    RIGHT(text, length)

    Returns the requested number of characters from the end of a string or number.

    RIGHT("Hello world", 2) returns "ld"

    CHAR(text, position)*

    Returns the character at the specified position (start at 0).

    CHAR("Hello world", 5) returns "o"

    IF(condition,resultIfTrue,resultIfFalse)

    Returns the second argument if the first argument is true, or the third argument otherwise

    IF(city="Paris", ListA, ListB)

    ISEMPTY(value)

    Returns whether the value is empty or not

    ISEMPTY("abc") returns false

    SIZE(array or object)

    Returns the number of elements in the list

    SIZE(items) returns the number of items

    LENGTH(string)

    Returns the number of characters in the string

    LENGTH("abc") returns 3

    SUBSTRING(text, pos1, pos2)

    Returns a new string composed of the characters between the two position defined by pos1 and pos2. The positions start at 0. If a negative number is given, the positions start from the end of the string. If pos2 is omitted, the end of the string is used.

    SUBSTRING("hello beautiful world", 6, 14) returns "beautiful"

    JSON_PARSE(string)

    Parse a JSON string (works for both objects and arrays). Useful to create an object or array property that will contain all the properties or elements present inside a stringified JSON you may have inside another event property

    JSON_PARSE('{"optin":1,"message":"ok"}') returns an object with an optin key equals to 1 and message key equals to "ok"

    JSON_PARSE('["morgan","justine"]') returns an array containing "morgan" and "justine"

    JSON_STRINGIFY(value, defaultValue)

    Stringify a value into JSON. (Optional) You can specify a defaultValue in case value is undefined

    JSON_STRINGIFY({"message":"ok"}) returns a string that equals '{"message":"ok"}'

    VALUE_FROM_JSON(string, path)

    Extract a value from a data string formatted in JSON

    VALUE_FROM_JSON('{"info":{"code":42},"message":"ok"}', 'info.code')

    returns

    42

    GET(property name)

    GET(name) return the value of the given event property. Useful for property names that contains a special character like "-"

    GET("droid-type")

    returns

    "r2d2" GET("invalidProperty") returns null

    GETHEADER(name)

    GETHEADER(name) return the value of the given http header

    GETHEADER("Accept-Language")

    returns

    "fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3"

    GETCOOKIE(name)

    GETCOOKIE(name) return the value of the given http cookie

    GETCOOKIE("cart")

    returns

    "empty"

    DATEPARSE(str, format)

    Takes a date in a specific format and converts it to a proper date

    DATEPARSE('2023-05-23', 'yyyy-MM-dd') returns 2023-05-23T00:00:00.000+01:00

    DATEFORMAT(date, format)

    Takes a date in a ISO-8601 or timestamp format and converts it to the specified format

    DATEFORMAT('2023-01-31T23:59:00.000+01:00', 'yyyy-MM-dd HH:mm') returns 2023-01-31 23:59

    DATEADD(date, interval, unit)

    Adds the interval as a unit (years, months, weeks, days, hours, minutes, seconds)

    DATEADD('2023-01-23T00:05:00.000+01:00', 3, 'days') returns 2023-01-26T00:05:00.000+01:00

    DATESUB(date, interval, unit)

    Removes the interval as a unit (years, months, weeks, days, hours, minutes, seconds)

    DATESUB('2023-01-23T00:05:00.000+01:00', 3, 'days') returns 2023-01-20T00:05:00.000+01:00

    DATEDIFF(date1, date2, unit)

    Returns the interval between two dates in unit

    DATEDIFF('2023-01-20T00:05:00.000+01:00', '2023-01-23T00:05:00.000+01:00', 'days') returns 0

    AGE(date, unit)

    Returns the elapsed time since the given date in unit (if unit is not specified, years by default)

    AGE('2022-01-20T00:05:00.000+01:00') returns 1 in 2023

    YEAR(date), MONTH(date), DAY(date), HOUR(date), MINUTES(date), SECONDS(date)

    Returns the dates specified unit

    YEAR('2023-01-20T00:05:00.000+01:00') returns 2023, MONTH for the same date returns 1, etc

    TIMEZONE(date)

    Returns the timezone in UTC

    TIMEZONE('2023-01-20T00:05:00.000+01:00') returns UTC+1

    TODAY()

    Returns the current date in ISO-8601

    TODAY() might return 2023-01-20T00:05:00.000+01:00, assuming the current date is the 20th of January 2023

    Do a boolean NOT on the expression

    NOT(age>18)

    IN()

    Returns true if the left is equals to at least one value on the right.

    currency IN("€", "$", "£")

    !IN

    Returns true if none of the value on the right is equal to the value on the left.

    currency !IN("€", "$", "£")

    <, >, <=, or >=

    Compare two values If one of the values is an array, check that at least one of the value match.

    age > 18

    BETWEEN()

    Check if the left value is between the two values passed as arguments.If the left value is an array, check that at least one of the value match.

    age BETWEEN(7, 80)

    EXISTS()

    Check if the property exists

    EXISTS(my_property)

    ~ or !~

    Check if the left value match the regex in the right value (or doesn’t match if !~). The regex language is the javascript one. If the left value is an array, check that at least one of the value match.

    email ~ "@example.com"

    STARTSWITH(), ENDSWITH(), or CONTAINS()

    Check if the left value starts with, ends with, or contains the right value. If the left value is an array, check that at least one of the value match.

    lastname STARTSWITH("DE")

    * or /

    Multiplication or division

    3*5

    + or -

    Addition / concatenation or substraction

    1+2

    COUNT(array, expression)

    Counts the number of non empty values in an array or from an expression. The expression parameter is optional.

    COUNT(users, IF(ISEMPTY(email), null, email)) Count customers with non empty email condition

    COUNT(items, IF(status = "completed" AND amount > 100, 1, null)) Count completed items over 100

    EXTRACT(text, delimiter, position)

    Extracts from the given text the substring at the given position, after splitting by the given delimiter.

    EXTRACT("a-b-c", "-", 1) returns "b"

    EXTRACT("a-b-c", "-", 0) returns "a"

    SUBSTITUTE(text, search_text,substitute_text)

    Substitutes new_text for old_text in a text string. Use SUBSTITUTE when you want to replace specific text in a text string.

    SUBSTITUTE("The-Clone-Wars", "-", ".")returns "The.Clone.Wars"

    REPLACE(sourceStr, pattern, replaceStr)

    Replaces every match of pattern (a regex expression) in sourceStr.

    REPLACE("The Clone Wars", "\s", ".") returns "The.Clone.Wars"

    REPLACE('0123456789', '([0-9]{2})', '$1-') returns

    01-23-45-67-89

    SHA256(stringToHash)

    Returns an SHA256 hash of the given string

    Operator

    Description

    Example

    =, ==

    Return true if the left part is equal to the right part. If one of the part is an array, check if some values are in equal between the two parts.

    gender = "F"

    !=, <>

    Inverse of the above

    gender != "F"

    AND or &&

    Do a boolean AND between the two parts

    age >18 AND gender = "M"

    OR or ||

    Do a boolean OR between the two parts

    age >18 OR gender = "M"

    AVERAGE(array, expression)

    Calculates the average of values in an array or from an expression. The expression parameter is optional.

    AVERAGE([0, null, 10]) Returns 5 (null values are ignored)

    AVERAGE(items, price) Returns average of price of each items (null values are ignored)

    AVERAGE(items, EXTRACT(price, "€", 0)) Extract numeric value before € and calculate average

    AVERAGE(items, IF(returned_quantity = 0, NUMBER(EXTRACT(product.price, "€",0)), 0)) Average price only for non-returned items

    MIN(array, expression)

    Returns the minimum value from an array or from an expression. The expression parameter is optional.

    MIN([null, 15, 8]) Returns 8 (ignores null)

    MIN(orders, DATEPARSE(order_date, "yyyy-MM-dd")) Find the earliest order date

    MIN(items, IF(status = "active", price, 999999)) Minimum price among active items only

    MAX(array, expression)

    Returns the maximum value from an array or from an expression. The expression parameter is optional.

    MAX([null, 15, 8]) Returns 15 (ignores null)

    MAX(users, AGE(birth_date)) Find the age of the oldest user

    MAX(transactions, IF(type = "purchase", amount, 0)) Maximum purchase amount

    SUM(array, expression)

    Calculates the sum of values in an array or from an expression. The expression parameter is optional.

    SHA256("test") returns "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"

    NOT(expression)

    SUM(items, quantity * NUMBER(EXTRACT(unit_price, "$", 0))) Calculate total value of all items

    SUM([5, null, 15]) Returns 20 (ignores null)

    SUM(items, IF(returned_quantity = 0, NUMBER(EXTRACT(product.price, "€",0)), 0)) Sum prices only for non returned quantity items condition

    ```xquery
    IF(country = "FR" AND environment_work = "prod", "12345", IF(country = "DE" AND environment_work = "dev", "98888")
    ```
    see more about format
    see more about format