No-code Transformation formulas for Data cleansing.
In the formula input, you can can write basic or complex expression to transform the data. Expressions are composed of either a value, a function, an operation, or another expression in parenthesis. See below for the description of each one of them.
Functions can be nested, ex: SHA256(LOWER(user.email))
EXTRACT(text, delimiter, position)
Extracts from the given text the substring at the given position, after splitting by the given delimiter.
EXTRACT("a-b-c", "-", 1)
returns "b"
EXTRACT("a-b-c", "-", 0)
returns "a"
SUBSTITUTE(text, search_text,substitute_text)
Substitutes new_text for old_text in a text string. Use SUBSTITUTE when you want to replace specific text in a text string.
SUBSTITUTE("The-Clone-Wars", "-", ".")
returns "The.Clone.Wars"
REPLACE(sourceStr, pattern, replaceStr)
Replaces every match of pattern (a regex expression) in sourceStr.
REPLACE("The Clone Wars", "\s", ".")
returns "The.Clone.Wars"
REPLACE('0123456789', '([0-9]{2})', '$1-')
returns
01-23-45-67-89
SHA256(stringToHash)
Returns an SHA256 hash of the given string
SHA256("test")
returns "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
MD5(stringToHash)
Returns an MD5 hash of the given string
MD5("test")
returns
"098f6bcd4621d373cade4e832627b4f6"
COALESCE(value1, value2, ...)
Returns the first value from the list that isn’t empty
COALESCE("123", "hello")
returns
"123"
CONCAT(string1, string2, ...)
Concatenate multiple values into one property. Separators could be specified: CONCAT(prop1, "
", prop2, "
TRIM(text)
Removes all spaces, new lines, tabulations at the beginning and at the end of a text.
TRIM(" \Hello World \n")
returns
"Hello World"
SELECT(sourceStr, pattern, <position>)
Returns the first match of the pattern (a regex expression) in sourceStr. If the third parameter is set, it will return the corresponding group inside the match.
SELECT("From A to Z alphabet", 'A(.*?)Z")
returns
"A to Z"
LOWER(text)
Converts all uppercase letters in a text string to lowercase
LOWER("ABC")
returns
"abc"
UPPER(text)
Converts all lowercase letters in a text string to uppercase
UPPER("abc")
returns
"ABC"
NUMBER(value1)
Converts string to number
NUMBER("123.456")
returns
123.456
NUMBER("not a number")
returns
null
ENCODE_BASE64(value1)
Encode a string to base64
ENCODE_BASE64("hello")
returns
"aGVsbG8="
DECODE_BASE64(value1)
Decode a string from base64
DECODE_BASE64("aGVsbG8=")
returns
"hello"
TIMESTAMP()
Returns the current date timestamp
TIMESTAMP ()
returns a value like 1674832519845
(changes with the current time)
LEFT(text, length)
Returns the requested number of characters from the beginning of a string or number.
LEFT("Hello world", 3)
returns
"Hel"
RIGHT(text, length)
Returns the requested number of characters from the end of a string or number.
RIGHT("Hello world", 2)
returns
"ld"
CHAR(text, position)*
Returns the character at the specified position (start at 0).
CHAR("Hello world", 5)
returns
"o"
IF(condition,resultIfTrue,resultIfFalse)
Returns the second argument if the first argument is true, or the third argument otherwise
IF(city="Paris", ListA, ListB)
ISEMPTY(value)
Returns whether the value is empty or not
ISEMPTY("abc") returns false
SIZE(array or object)
Returns the number of elements in the list
SIZE(items)
returns the number of items
LENGTH(string)
Returns the number of characters in the string
LENGTH("abc")
returns
3
SUBSTRING(text, pos1, pos2)
Returns a new string composed of the characters between the two position defined by pos1 and pos2. The positions start at 0. If a negative number is given, the positions start from the end of the string. If pos2 is omitted, the end of the string is used.
SUBSTRING("hello beautiful world", 6, 14)
returns
"beautiful"
JSON_PARSE(string)
Parse a JSON string. Usefull to create an object property that will contain all the properties present inside a stringyfied JSON you may have inside another event property
JSON_PARSE('{"optin":1,"message":"ok"}')
returns an object with an optin
key equals to 1
and message
key equals to "ok"
VALUE_FROM_JSON(string, path)
Extract a value from a data string formatted in JSON
VALUE_FROM_JSON('{"info":{"code":42},"message":"ok"}', 'info.code')
returns
42
GET(property name)
GET(name) return the value of the given event property. Useful for property names that contains a special character like "-"
GET("droid-type")
returns
"r2d2"
GET("invalidProperty")
returns
null
GETHEADER(name)
GETHEADER(name) return the value of the given http header
GETHEADER("Accept-Language")
returns
"fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3"
GETCOOKIE(name)
GETCOOKIE(name) return the value of the given http cookie
GETCOOKIE("cart")
returns
"empty"
DATEPARSE(str, format)
Takes a date in a specific format and converts it to a proper date
DATEFORMAT(date, format)
Takes a date in a ISO-8601 format and converts it to the specified format
DATEADD(date, interval, unit)
Adds the interval as a unit (years, months, weeks, days, hours, minutes, seconds)
DATEADD('2023-01-23T00:05:00.000+01:00', 3, 'days')
returns
2023-01-26T00:05:00.000+01:00
DATESUB(date, interval, unit)
Removes the interval as a unit (years, months, weeks, days, hours, minutes, seconds)
DATESUB('2023-01-23T00:05:00.000+01:00', 3, 'days')
returns
2023-01-20T00:05:00.000+01:00
DATEDIFF(date1, date2, unit)
Returns the interval between two dates in unit
DATEDIFF('2023-01-20T00:05:00.000+01:00', '2023-01-23T00:05:00.000+01:00', 'days')
returns
0
AGE(date, unit)
Returns the elapsed time since the given date in unit (if unit is not specified, years by default)
AGE('2022-01-20T00:05:00.000+01:00')
returns
1
in 2023
YEAR(date), MONTH(date), DAY(date), HOUR(date), MINUTES(date), SECONDS(date)
Returns the dates specified unit
YEAR('2023-01-20T00:05:00.000+01:00')
returns
2023
, MONTH
for the same date returns 1
, etc
TIMEZONE(date)
Returns the timezone in UTC
TIMEZONE('2023-01-20T00:05:00.000+01:00')
returns
UTC+1
TODAY()
Returns the current date in ISO-8601
TODAY()
might return 2023-01-20T00:05:00.000+01:00
, assuming the current date is the 20th of January 2023
Operator
Description
Example
=, ==
Return true if the left part is equal to the right part. If one of the part is an array, check if some values are in equal between the two parts.
gender = "F"
!=, <>
Inverse of the above
gender != "F"
AND or &&
Do a boolean AND between the two parts
age >18 AND gender = "M"
OR or ||
Do a boolean OR between the two parts
age >18 OR gender = "M"
NOT(expression)
Do a boolean NOT on the expression
NOT(age>18)
IN()
Returns true if the left is equals to at least one value on the right.
currency IN("€", "$", "£")
!IN
Returns true if none of the value on the right is equal to the value on the left.
currency !IN("€", "$", "£")
<, >, <=, or >=
Compare two values If one of the values is an array, check that at least one of the value match.
age > 18
BETWEEN()
Check if the left value is between the two values passed as arguments.If the left value is an array, check that at least one of the value match.
age BETWEEN(7, 80)
EXISTS()
Check if the property exists
EXISTS(my_property)
~ or !~
Check if the left value match the regex in the right value (or doesn’t match if !~). The regex language is the javascript one. If the left value is an array, check that at least one of the value match.
email ~ "@example.com"
STARTSWITH(), ENDSWITH(), or CONTAINS()
Check if the left value starts with, ends with, or contains the right value. If the left value is an array, check that at least one of the value match.
lastname STARTSWITH("DE")
* or /
Multiplication or division
3*5
+ or -
Addition / concatenation or substraction
1+2
Scenario: Create a Flag that shows if Consumers’ Primary Address is in California
IF(EXTRACT(city_state, '-', 1) == " CA", "TRUE", "FALSE")
Scenario: Set a value with multiple conditions. The value equals "12345" for country FR and environment prod, else equals 98888 for country DE and environment dev: \
On the Health menu, you will find Sources data quality.
You can visualize the evolution of your event data quality over time, filter by sources or environment (default to Production)
You can setup an alert to know in realtime if some of your event violate your validation rules.
You can analyze in detail each event and have more details about errors, and a shortcut to fix the error in the Data Cleansing feature.
FAILED the event is in error status, can be caused by an error 404, a timeout, etc...
FILTERED the event has been filtered (based on the conditions created in Normalized Datalayer.)
INVALID ACCEPTED means the property has been removed, but it is not blocking for the event, this one has been received without the concerned property. You should fix the property to receive it.
INVALID REJECTED means one or many properties are missing or don't have good values. As a result, the full event was rejected, you should fix it to receive the event.
VALID the event has been collected correctly, this means your sources are sending the data expected in your specifications.
You can also visualize the data quality for a specific source, when you click on a source, you have a dedicated tab 'Data Quality'tap available with same info but filtered for the source you selected:
The quality score of events is represented by a weather icon:
DATEPARSE('2023-05-23', 'yyyy-MM-dd')
returns
2023-05-23T00:00:00.000+01:00
DATEFORMAT('2023-01-31T23:59:00.000+01:00', 'yyyy-MM-dd HH:mm')
returns
2023-01-31 23:59
sunny if the percent of correct events is equal to 100%
cloudy between 95% and 99.99%
rainy between 90 and 95%
stormy below 90%
Details about format used for the date functions
The following table defines the token substrings you can use in a date-time format pattern for the functions DATEPARSE(str, format) and DATEFORMAT(date, format).
You can define the format
field by combining the tokens shown below.
Examples below given for 2014-08-06T13:07:04.054
considered as a local time in America/New_York
year, unpadded
y
2014
two-digit year
yy
14
four- to six- digit year, pads to 4
yyyy
2014
month as an unpadded number
M
8
month as a padded number
MM
08
month as an abbreviated localized string
MMM
Aug
month as an unabbreviated localized string
MMMM
August
month as a single localized letter
MMMMM
A
day of the month, no padding
d
6
day of the month, padded to 2
dd
06
hour in 12-hour time, no padding
h
1
hour in 12-hour time, padded to 2
hh
01
hour in 24-hour time, no padding
H
13
hour in 24-hour time, padded to 2
HH
13
minute, no padding
m
7
minute, padded to 2
mm
07
second, no padding
s
4
second, padded to 2 padding
ss
04
fractional seconds, functionally identical to SSS
u
054
fractional seconds, between 0 and 99, padded to 2
uu
05
fractional seconds, between 0 and 9
uuu
0
millisecond, no padding
S
54
millisecond, padded to 3
SSS
054
Examples below given for 2014-08-06T13:07:04.054
considered as a local time in America/New_York
ordinal (day of year), unpadded
o
218
ordinal (day of year), padded to 3
ooo
218
quarter, no padding
q
3
quarter, padded to 2
03
ISO week year, unpadded
kk
14
ISO week year, padded to 4
kkkk
2014
ISO week number, unpadded
W
32
ISO week number, padded to 2
WW
32
Local week year, unpadded
ii
14
Local week year, padded to 4
iiii
2014
Local week number, unpadded
n
32
Local week number, padded to 2
nn
32
day of the week, as number from 1-7 (Monday is 1, Sunday is 7)
E
3
day of the week, as an abbreviate localized string
EEE
Wed
day of the week, as an unabbreviated localized string
EEEE
Wednesday
day of the week, as a single localized letter
EEEEE
W
localized numeric date
D
9/4/2017
localized date with abbreviated month
DD
Aug 6, 2014
localized date with full month
DDD
August 6, 2014
localized date with full month and weekday
DDDD
Wednesday, August 6, 2014
localized time
t
9:07 AM
localized time with seconds
tt
1:07:04 PM
localized time with seconds and abbreviated offset
ttt
1:07:04 PM EDT
localized time with seconds and full offset
tttt
1:07:04 PM Eastern Daylight Time
localized 24-hour time
T
13:07
localized 24-hour time with seconds
TT
13:07:04
localized 24-hour time with seconds and abbreviated offset
TTT
13:07:04 EDT
localized 24-hour time with seconds and full offset
TTTT
13:07:04 Eastern Daylight Time
short localized date and time
f
8/6/2014, 1:07 PM
less short localized date and time
ff
Aug 6, 2014, 1:07 PM
verbose localized date and time
fff
August 6, 2014, 1:07 PM EDT
extra verbose localized date and time
ffff
Wednesday, August 6, 2014, 1:07 PM Eastern Daylight Time
short localized date and time with seconds
F
8/6/2014, 1:07:04 PM
less short localized date and time with seconds
FF
Aug 6, 2014, 1:07:04 PM
verbose localized date and time with seconds
FFF
August 6, 2014, 1:07:04 PM EDT
extra verbose localized date and time with seconds
FFFF
Wednesday, August 6, 2014, 1:07:04 PM Eastern Daylight Time
narrow offset
Z
+5
short offset
ZZ
+05:00
techie offset
ZZZ
+0500
abbreviated named offset
ZZZZ
EST
unabbreviated named offset
ZZZZZ
Eastern Standard Time
IANA zone
z
America/New_York
meridiem
a
AM
unix timestamp in seconds
X
1407287224
unix timestamp in milliseconds
x
1407287224054
Event specification
In the Normalized datalayer interface, you will be able to define the schema of your data and determine the actions to be taken if an event does not conform your event specifications (aka validation rules)
Adding event specifications allows you to define the payload and what is expected, so that you can see if the data is meeting those specifications, in case there are any errors.
Then, in the Source data quality view, you are able to define an automatic alert or see the summary of all specification violations, so that you fix it at your source or with the live Data Cleansing feature
You can choose to add a standard event (from the event catalog) or a custom specification.
The schema of standard events is already defined, but you can modify it by adding more properties (standard or custom properties).
You can create custom properties and add it to your event specification.
String: most used type, it corresponds to a text format (ex: name = 'ABC')
Number: corresponds to a number (float, entire) (ex: value = '12')
Boolean: the property can take only 2 values, true or false (ex: paid = 'true')
Object: corresponds to an array of values (ex: items = shirt, pant, shoes)
Default (simple value): most used structure, corresponds to a single value. (ex: name = 'ABC')
List (array of values): list of all values on the property (ex: items = shirt, pant, pant)
Set (array of unique values): list of unique values (ex: items = shirt, pant)
You can define your property as 'Personal Identifiable Information (PII)', meaning this property contains personal information that could identify a user (email address, postal address, Customer ID...).
If this option is set to true, the property will be considered as PII.
If this property is stored and set as PII, it will be automatically encrypted in AES-256 before being written to the database.
You can define bespoke responses to data validation scenarios:
Missing Required Property: Choose to either accept or reject events entirely when a required property is missing.
Unexpected Value Format: Configure the system either to omit properties that don’t match the expected format or to accept/reject event entirely.
Unspecified Events: You have the option to accept events that are not predefined in your schema. You can also decide if this action should trigger a warning in the Source Data Quality report.
Unspecified Properties: Decide whether to omit undefined properties or to accept them, with or without warnings.
Transform/fix your events before to send them to destinations
Cleaning, fixing and preparing your data is crucial for success when using the Platform activation capabilities. The no-code approach allows you to create transformations using simple formula based on basic functions and operators.
You can choose between different kind of transformations:
Rename event: simply change the name of an incoming event
Derive event: create a new event from an existing one (coming soon)
Modify properties: apply transformations and functions on properties
Filter event: define rules to filter incoming events (coming soon)
Custom code: create your own transformations (coming soon)
First, give a name to the transformation and add a description if needed.
Select on which sources the rename transformation will be applied.
Select which event should be renamed. It is also possible to add more conditions, like rename the page view event to product view, if 'page_type' contains 'product'.
Last step, enter the new event name.
First, give a name to the transformation and add a description if needed.
Select on which sources the transformation will be applied.
Define conditions on events and/or properties, the transformation will be applied only regarding these conditions.
You can add multiple values (like event name is purchase
; add to cart
). In that case, it is an OR condition (event name is purchase
OR add to cart
)
Select properties to transform. You can:
rename the property ('rename to' option)
Example: rename 'date_of_birth' to 'birthdate'
map/link to an existing property ('copy value from' option)
Example: map property 'price' to 'revenue'
delete a property ('delete' option)
Example: always remove property 'tax_amount' from event 'purchase'
change the value ('set value to' option)
With the 'set value to' option, you can simply change the value (ex: 'currency' = EUR by default) but you also have access to formulas based on basic functions and operators.
For example, you can hash incoming data with SHA-256
or MD5
functions. You can also REPLACE
or SUBSTITUTE
a text to replace by another one, or EXTRACT
a specific text in a chain.
You can also use operators like AND / OR / NOT
and create operations with * / + -
, full list here.
For example, to calculate the revenue with taxes, you can do: revenue = value*1,2
.
Most companies detect issues after their team has used bad data to make decisions or trigger campaigns. Quickly take action on every invalid event with in-app reporting and daily email digests.
In the Normalized datalayer interface, you will be able to define the schema of your data and define the validation rules that will feed your data quality workflow.
Writing event specification allows you to automate the QA process, to feed the in the Source data quality dashboard, but also to define realtime alerts to react quickly when errors occur on your data.
To react quickly to data errors, while your IT team corrects the problem at source, you can rely on the live data transformation feature, aka Data cleansing.
Having a good data quality on each source is essential, but being able to check also the quality of the data transmission is at least as important. For each destination, you can view the event delivery history, quickly identify errors and define realtime alerts with a personnalized thresold.
In case of doubt or to further investigate a data problem, you can access the logs of the events sent. You benefit both from Event source inspector and Destination event inspector to search for a specific event, analyse it's data and better understand the issue.