Google Analytics 4 - Proxy Mode
Anonymize data before to send it to Google
Last updated
Anonymize data before to send it to Google
Last updated
The protection of user privacy has become a necessity with the implementation of the GDPR. In accordance with this regulation, you should delete any personally identifiable information from user data before transferring it to a tool owned by a US entity, owing to the invalidation of Privacy Shield.
The CNIL recommended on June 7, 2022 that proxification should be implemented along with other specific measures to ensure the validity of the use of GA4.
The proxy mode option in the GA4 destination allows you to anonymize data before to send it to Google.
When enabled, the proxy mode gives you access to a number of options that allow you to choose granularly how each parameter should be anonymized.
You can find below the CNIL recommandation, and for each parameter the proxy mode give you a userfriendly way to manage anonymisation:
the absence of transfer of the IP address to the servers of the analytics tool. If a location is transmitted to the servers of the measurement tool, it must be carried out by the proxy server and the level of precision must ensure that this information does not allow the person to be re-identified (for example, by using a geographical mesh ensuring a minimum number of Internet users per cell); Solution: You can choose to obfusctate the IP (the last octet (the last portion) of the IP address is replaced by 0) or to delete it completly. Obfuscation is often preferred because it allows to remove the identifying character of the IP while keeping the geolocation features of the country
the replacement of the user identifier by the proxy server. To ensure effective pseudonymisation, the algorithm performing the replacement should ensure a sufficient level of collision (i.e. a sufficient probability that two different identifiers will give an identical result after a hash) and include a time-varying component (adding a value to the hashed data that evolves over time so that the hash result is not always the same for the same identifier) ; Solution: You can choose to pseudonymize the client id (cid) and user id (uid). This pseudonymization option consist to replace the id by a hash of id plus a salt. The id will be first concatened with a salt that changes every 3h approximately and then be hased using SHA256. This allows to create anonymous ids that are identical within a session but different from session to session. This will prevent GA4 from tracking a user over time.
the removal of external referrer information from the site; Solution: You can choose to delete it or keep only internal domains.
the removal of any parameters contained in the collected URLs (e.g. UTMs, but also URL parameters allowing internal routing of the site); Solution: You can choose to delete all url parameters, keep only specific parameters and/or keep UTMs in some case
reprocessing of information that can be used to generate a fingerprint, such as user-agents, to remove the rarest configurations that can lead to re-identification; Solution: Choosing to delete completly the user-agent seems to be the best option.
the absence of collection of cross-site or lasting identifiers (CRM ID, unique ID); Solution: Use the Properties Transformation feature or the Data Cleansing feature to treat on a case by case basis by deleting/hashing/transforming your properties (see Manage custom PII data below) It is often easier to completly delete the user id.
the deletion of any other data that could lead to re-identification. Solution: Use the Properties Transformation feature or the Data Cleansing feature to treat on a case by case basis by deleting/hashing/transforming your properties (see Manage custom PII data below)
In addition to the GA4 proxy mode, you can also use on each destination, the Properties Transformation feature or the Data Cleansing feature to transform/delete/hash any event property before to send it to the partner.
CNIL recommandation | Analysis | Suggestion/Impact |
---|---|---|
Absence of IP address transfer to measurement tool servers | This point is normal and standard. | Anonymize IPs by removing the last 3 characters. Impact: This may result in a loss of location precision, going from a measurement at the city level to that of the region. |
Replacement of user identifier by the proxy server | The CNIL doubts that Google does not use this data in conjunction with other third-party data. | Add pseudonymisation before sending the ID. No impact. |
Removal of external referrer information (or "referrer") from the site | Complete removal of the referrer is a surprising proposition, while just reducing it to the domain name is common in other tools (Safari, Adblockers, ...) | Reduce the referrer to the domain name, which is a simple statistical measure of audience. If this suggestion is followed, there will be no impact. (If the CNIL recommendation is followed, the tool will become useless or almost useless) You can also choose to only authorize internal domains, in this case the impact can be important, especially on source traffic reports. |
Removal of any parameters contained in collected URLs | It is legitimate to remove URL parameters containing personal information, but maybe not general information like utm_campaigns. | Remove URL parameters on a case-by-case basis if they contain personally identifiable data. Utm_campaigns can be kept if they are properly managed, but the question arises for advertising click IDs such as fbclid and gclid. If the CNIL recommendation is followed, the tool will become useless or almost useless, while if our recommendation is followed, there will be little impact. In case of gclid removal, utms will need to be used to tag Google Ads campaigns. |
Retreatment of information that could contribute to generating a fingerprint | This request is legitimate and common and will be implemented in browsers in the future. | Remove unnecessary information from the user agent to minimize loss of granular information such as the phone model. Choosing to delete completly the user-agent seems to be the easiest option. Impact: not that low. Application of this measure no longer distinguishes device type (device _category) |
Absence of any cross-site or deterministic (CRM, unique ID) identifier collection | This request is considered irrelevant as long as consent is obtained. These IDs cannot be used by Google for other data cross-referencing. | It is recommended to request consent for the use of these IDs and to treat them securely if consent is given. But you may want to hash all this ids before to send it to Google (in that case you can use Properties transformation) |
As with classical GA4 server-side setups, you need to setup a single initial client-side Gtag tag which will only be triggered once per visit and will send an empty initialization event. This is necessary due to current limitations of Google's protocol.
your first party subdomain set in domain management
In this case the transpor_url
has to be set to:
https://
YOUR_1ST_TRACKING_DOMAIN
.com
/
cdp
/
events?tc_s=
YOURSITEID
&token=
YOURSOURCEKEY
&event_name=ga_session_start&ga_url_param=
or our third party collection domain collect.commander1.com
In this case the transpor_url
has to be set to:
https://collect.commander1.com/events?tc_s=
YOURSITEID
&token=
YOURSOURCEKEY
&event_name=ga_session_start&ga_url_param=
Consequently, this first hit is no longer sent to Google, but to Commanders Act server, which transforms it into a CA event. This event will then be sent to your GA4 destination where it will be processed (pseudonymized, etc. depending on the chosen settings) before being sent back to Google.
Apart from this first client-side hit, all other events from the website should be sent from any source, for instance through our function cact('trigger', 'myEventName', ...). These events will also, of course, reach your GA4 destination where the data will be pseudonymized according to the settings of the destination.
- In the settings tab, check the "Enable proxy mode" option and choose wich pseudonymisation/treatment you want to apply. - If needed hash your custom PII data through the smart mapping, properties transformation or Data Cleansing
Go through Event Inspector and inspect outgoing events.
Then the particularity with the proxy mode is that you have to alter the GA4 hit URL, replacing google-analytics.com with the Commanders Act server-side collection URL. This is done via the native GA parameter: transport_url
(Example code provided below).
The transport_url
has to be set to your tracking url.
Your tracking domain is either: