Using BigQuery is a great way to generate some custom in-depth analysis of your Google Analytics data, but to really unlock that data, it helps to know a few tricks. Today, let’s talk about one of the most important ones: Using the
One of the trickier parts of working with Firebase data in BigQuery — and this applies not just to Analytics data but to Crashlytics data, too — is that your data is not organized in nice little rows and columns like this: Instead, your data consists of rows that maybe look a little more like JSON objects, where columns could contain a single bit of data, but they could also contain arrays, structs, or even arrays of structs. Maybe something a little more like this: Working with this data can be weird at first, but once you learn a couple of tricks, it’s not so bad. Let’s start by looking at our public sample data for our game of Flood-it. Feel free to follow along; this data is available for anybody to play with. In particular, I’m interested in the SELECT * If you run this, you’ll see a number of level complete events, along with a couple of other event parameters that seem interesting. We’ve got a You might also notice that each of these parameters doesn’t have a single value column, but several: Basically, you can think of this as kind of a struct that we use to store your parameter values, and in practice most of them end up being nil. But it’s a nice way of making sure we can take a number of different parameters with a bunch of different formats and store them in a consistent way. So, let’s figure out the average user score just by looking at event
parameters where SELECT * Oh, dear. That’s… a really weird error message. The problem here is that This is where the To use a simpler example, let’s imagine we have a
table full of spaceships. And Right now, if I wanted to find spaceships that contain a crew member named Zoe, I couldn’t do that with a line like this. SELECT * FROM `spaceships` WHERE crew = "Zoe" But if I were to write something like this… SELECT * FROM `spaceships` CROSS JOIN UNNEST(crew) as crew_member What BigQuery will do is take every individual member in my crew array, and add it on as a new column in my row called Once I’ve done that, it’s easy to select spaceships with crew members named Zoe. SELECT * FROM `spaceships` You’ll find that in practice, though, most BigQuery developers will replace the SELECT * FROM `spaceships`, It still does the same thing, it just looks cooler. (BigQuery developers are all about looking cool.) So going back to our example, we essentially have an array (or repeated record) of SELECT * FROM `my_analtyics_table`, …what BigQuery will do is take each individual event parameter and expand them out into
a new column called You’ll notice that Since, in practice, it’s a little weird to have the full repeated So with all that in mind, let’s go back to the task of analyzing the average score for users who complete a quickplay level. I can now grab just those events along with the unnested SELECT event_name, param
And then, it’s really easy to grab the actual value of the value parameter by looking at SELECT event_name, param.value.int_value AS score From there, I can perform interesting tasks like calculating the mean, getting some quantiles, and figuring out the standard deviation… SELECT AVG(param.value.int_value) AS average, Hey, maybe while I’m at it, let’s make a quick histogram of all the values! SELECT param.value.int_value AS score, COUNT(1) AS count So… this is interesting. There’s a pretty big peak around 21 or 22 moves… but if you keep looking, it seems like there’s also a couple of smaller peaks around 29 moves and in the 34–36 range. The reason for that is the other parameter I was talking about — the In fact, we can kinda see that by looking at both the board and the value parameters for our event. SELECT param Sure enough, it seems like the “value” parameter is higher when it follows a But how can we actually prove this out in BigQuery? Right now, making this analysis is kinda difficult because we have different values in different rows. Now, there is a way we can bunch these rows together. If we add back the SELECT MAX(if(param.key = "value", param.value.int_value, NULL)) AS score, And then we can analyze score by board type… SELECT ANY_VALUE(board_type) as board, AVG(score) as average_score But this seems like a pretty awkward way of doing things. Wouldn’t it be better if we could find a way of more easily getting multiple values in the same row? Well it turns out you can, and we’ll find out how… in my next post! (Woo! Cliffhanger ending!!) How do you Unnest in BigQuery?To convert an ARRAY into a set of rows, also known as "flattening," use the UNNEST operator. UNNEST takes an ARRAY and returns a table with a single row for each element in the ARRAY . Because UNNEST destroys the order of the ARRAY elements, you may wish to restore order to the table.
What does Unnest function do?The UNNEST function returns a result table that includes a row for each element of the specified array. If there are multiple ordinary array arguments specified, the number of rows will match the array with the largest cardinality.
What is flatten in BigQuery?FLATTEN. When you query nested data, BigQuery automatically flattens the table data for you.
|