Clicks to conversions with BigQuery & Data Studio

Ivan Franco
5 min readJun 11, 2018

I’ve been using Google Data Studio for a while to visualise the performance of my team’s products. I find it to be a great tool for bringing together lots of data into a graphical framework. However, despite all its benefits, there are often times when Data Studio lacks the resources to answer the question I’m working to solve. Luckily for us, Data Studio has a bigger brother (Big Query) that can help us to answer almost any question.

The Situation

After a successful product revamp, we created an internal campaign to drive more traffic to our product landing page (and conversions alongside it). My job was to ensure that the team could see how many times our adverts were clicked and whether this resulted in a purchase of the product. On the face of it, this seemed to be a task for vanilla Data Studio: plug in the clicks on our internal adverts, check the resulting conversions — viola!

Unfortunately data like this is hard for Data Studio and Google Analytics (GA) to handle. Whilst it is easy to see clicks on your adverts via event reporting, stitching those events to the purchase of a specific product at checkout isn’t as easily accessible. This is a great example of where BigQuery can be your best friend.

The query I used to get around this goes as follows.

The Query

SELECT
date,
SUM(totals.visits) clicked_the_ad,
# count sessions which bought product X COUNT((
SELECT
DISTINCT(CONCAT(fullvisitorID, CAST(visitID AS string)))
FROM
UNNEST(hits) AS hits,
UNNEST(hits.product) AS product
WHERE
product.v2productName LIKE 'X')) bought_product_x
FROM
`XXXX.ga_sessions_*` GA
# Filter all sessions for those originating from campaign XWHERE
EXISTS(
SELECT
1
FROM
UNNEST(GA.hits)
WHERE
eventinfo.eventAction = 'clicks_on_button')
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
1`

Part 1 — where exists filter

WHERE
EXISTS(
SELECT
1
FROM
UNNEST(GA.hits)
WHERE
eventinfo.eventAction = 'clicks_on_button')

The first part of this query selects all sessions where someone clicked one of our internal ads (the ‘where exists’ function says ‘select any sessions where such-and-such condition evalutes to true’ — in this instance that they clicked an ad). The useful thing about ‘where exists’ is that the selected sessions are kept intact, and we can drill down further into them to find out more information.

In our example, now that we’ve selected sessions where someone clicked an ad, we can start to find out more information about those sessions, principally (i) the date of the sessions, (ii) how many there were, and (iii) how many products were bought. The first two of these are easy enough (adding a date dimension to the opening select statement and summing visits) but the third step is a little trickier as it utilises a powerful aspect of Standard SQL for BigQuery — unnesting columns on the fly.

Part 2 — unnesting columns within the opening select statement

COUNT((
SELECT
DISTINCT(
CONCAT(fullvisitorID, CAST(visitID AS string)))
FROM
UNNEST(hits) AS hits,
UNNEST(hits.product) AS product
WHERE
product.v2productName LIKE 'X')) conversions
FROM
`XXXX.ga_sessions_*` GA

To explain how this works it’s necessary to give a quick overview of how BigQuery stores GA data. In BigQuery, each GA session is stored in a single row, with columns holding data on different attributes of the session. Some of columns have one value (for instance device category), and others have multiple ‘nested’ values for repeated actions people enacted on your site (e.g. pages viewed, buttons clicked, products bought).

In order to find out whether the sessions we’ve filtered include the purchase of Product X, we count the number of our filtered ‘ad click’ sessions that included a purchase of Product X. To do this we create a sub-select in the opening select statement where we ‘unnest’ the values in the product column and each session that purchased Product X.

There are two important things to note about this part of the query:

  1. The statement is wrapped in double parantheses to tell BigQuery that the expression is scalar (meaning it should only return one row).
  2. The session is concatenated into a single unique row using ‘concat’ and ‘distinct’ functions. We could have summed ‘totals.visits’ but sessions with more than one product would be duplicated.

Part 3 — including a dynamic date range

AND _TABLE_SUFFIX BETWEEN FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
1`

Finally, in order to make the data useful to our Data Studio dashboard, we add the dynamic date range above, ensuring that the data is always up to date and we can see changes over time, which takes us back to the final query below — voila!

SELECT
date,
SUM(totals.visits) clicked_the_ad,
# count sessions which bought product XCOUNT((
SELECT
DISTINCT(CONCAT(fullvisitorID, CAST(visitID AS string)))
FROM
UNNEST(hits) AS hits,
UNNEST(hits.product) AS product
WHERE
product.v2productName LIKE 'X')) bought_product_x
FROM
`XXXX.ga_sessions_*` GA
# Filter all sessions for those originating from campaign XWHERE
EXISTS(
SELECT
1
FROM
UNNEST(GA.hits)
WHERE
eventinfo.eventAction = 'clicks_on_button')
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE(“%Y%m%d”, DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
1`

Caveat

The eagle-eyed among you may have spotted one flaw in the approach: we have not applied a sequence to the interaction. If someone purchased our product and then clicked the ad (i.e. the reverse order to what we want), they would still be picked up here as having converted due to the ad. The first thing to say is that due to the nature of the product, it is unlikely that someone would have bought our product and then clicked an ad for that same product later in the session (I’ve checked in GA with funnel analysis and this checks out). However, you could get around this using a more advanced query that takes into account the hit_number and a concatenated regex of all pages seen in a session. I will provide details on this in a later post.

More Advanced

There are two ways to make this query even more powerful. The first would be to add a sequential component (to address the above), and the way I’ve managed to do this is by using STRING_AGG to group hits in a session into one continuous string. The other way would be by using Google Apps Script to automate pulling the data into a Google Sheets every day, with the Google Sheet acting as a data source for Data Studio. This method is more effective in terms of cost as the query only needs to run over yesterday, rather than the 7 day window in our query.

*This is one of my first forays into BigQuery’s Standard SQL, stay tuned for more posts once I’ve explored the feature further. Also, if you have any tips feel free to get in touch.

--

--