Home

Python Google Analytics Client: How to use it and how to help make it better


My Google Analytics Python client (http://github.com/clintecker/python-googleanalytics/tree/master) is at the point now where someone could reliably use it to get data out of their profiles. Let me just give everyone a quick tutorial on how you would do this (there are a lot of examples in tests.py too).

You initiate the process by importing the googleanalytics.Connection class. This object is used to authorize against GA and contains the machinery to make requests to GA, and maintains your authorization token. Speaking of your Google Credentials, you can specify these in two ways. The first, which I like, is to create a configuration file in your home directory named .pythongoogleanalytics. Populate it like this:

[Credentials] 
google_account_email = youraccount@gmail.com 
google_account_password = yourpassword

The second method is to supply the credentials directly to the Connection object in your code like so:

>>> from googleanalytics import Connection 
>>> connection = Connection('clintecker@gmail.com', 'fakefake')

If you are using the former (~/.pythongoogleanalytics) method, you can just make naked Connection() calls to set up your connection object.

Listing/getting accounts

You can retrieve a list of profiles associated with your account like so:

>>> from googleanalytics import Connection 
>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> accounts = connection.get_accounts()

This will return a list of account objects which you could use to retrieve data. The Connection.get_accounts method also accepts pagination parameters:

>>> accounts = connection.get_accounts(start_index=10, max_results=5)

Both are optional. start_index defaults to 1 (the listing is 1-indexed), and max_results defaults to None which means a naked call just returns all your accounts.

Alternatively: If you know the profile ID you want to use, you can use the Connection.get_account method. This currently does no validation, so if you provide an invalid profile ID you can expect things to break. It works like you might expect:

>>> account = connection.get_account('1234')

Retrieving Data

Once you have an Account object you can start pulling data from it. Reports are built by specifying a combinations of dimensions and metrics. Dimensions are things like Browsers, Platforms, PagePath. Metrics are generally numerical data like pageviews, visits, percentages, elapsed time, and so forth. Google has a long reference to these here.

Google specifies all these with ga: prepended to each dimension or metric. Right now I require you don’t specify the ga: part. What I mean is that when you want ga:pagePath you pass in pagePath. Leave off the ga: for now.

In addition to dimensions and metrics you can specify sorting and filtering parameters, both optional. Definitely required though are lower and upper bounds to the time frame you wish to gather data from. These can be datetime.datetime or datetime.date objects. Here’s a really basic call:

>>> from googleanalytics import Connection 
>>> import datetime 
>>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> account = connection.get\_account('1234') 
>>> start\_date = datetime.date(2009, 04, 10) 
>>> end\_date = datetime.date(2009, 04, 10) 
>>> account.get\_data(start\_date=start\_date, end\_date=end\_date) []

This will, of course, return no data (no dimensions or metrics specified), but is valid.

Here’s one that would give you some good data, a list of browsers that accessed your site in your timeframe and how many page views each of those browsers generated.

>>> from googleanalytics import Connection 
>>> import datetime 
>>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> account = connection.get\_account('1234') 
>>> start\_date = datetime.date(2009, 04, 10) 
>>> end\_date = datetime.date(2009, 04, 10) 
>>> account.get\_data(start\_date=start\_date, end\_date=end\_date, dimensions=['browser',], metrics=['pageviews',]) [\<DataPoint: ga:6367750 / ga:browser=Chrome\>, \<DataPoint: ga:6367750 / ga:browser=Firefox\>, \<DataPoint: ga:6367750 / ga:browser=Internet Explorer\>, \<DataPoint: ga:6367750 / ga:browser=Mozilla Compatible Agent\>, \<DataPoint: ga:6367750 / ga:browser=Safari\>]

You could get Google to sort that for you (note FireFox is first now):

>>> from googleanalytics import Connection 
>>> import datetime 
>>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> account = connection.get\_account('1234') 
>>> start\_date = datetime.date(2009, 04, 10) 
>>> end\_date = datetime.date(2009, 04, 10) 
>>> account.get\_data(start\_date=start\_date, end\_date=end\_date, dimensions=['browser',], metrics=['pageviews',], sort=['-pageviews',]) [\<DataPoint: ga:6367750 / ga:browser=Firefox\>, \<DataPoint: ga:6367750 / ga:browser=Internet Explorer\>, \<DataPoint: ga:6367750 / ga:browser=Safari\>, \<DataPoint: ga:6367750 / ga:browser=Chrome\>, \<DataPoint: ga:6367750 / ga:browser=Mozilla Compatible Agent\>]

And you could do some fun filtering, get a list of browsers, sorted descending by page views, and filtered to only contain browser strings which match the three regexs below (starting with Fire OR Internet OR Saf):

>>> from googleanalytics import Connection 
>>> import datetime 
>>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> account = connection.get\_account('1234') 
>>> start\_date = datetime.date(2009, 04, 10) 
>>> end\_date = datetime.date(2009, 04, 10) 
>>> filters = [... ['browser', '=~', '^Fire', 'OR'], ... ['browser', '=~', '^Internet', 'OR'], ... ['browser', '=~', '^Saf'], ... ] 
>>> account.get\_data(start\_date=start\_date, end\_date=end\_date, dimensions=['browser',], metrics=['pageviews',], sort=['-pageviews',], filters=filters) [\<DataPoint: ga:6367750 / ga:browser=Firefox\>, \<DataPoint: ga:6367750 / ga:browser=Internet Explorer\>, \<DataPoint: ga:6367750 / ga:browser=Safari\>]

Data

At this point you should be asking me how this data is returned to you. In the above examples, the data is returned as a googleanalytics.data.DataSet object which is essentially a Python list with three “properties” (list/tuple/dict) added to it. This list is populated with googleanalytics.data.DataPoint objects. Each of these has an associated dimension and metric (i.e. “Firefox” and “30293”) and a little more data.

So how do you get useful data? You could iterate over the DataSet and access each DataPoint’s metric and dimension properties directly, or you could output the whole dataset as a list of lists, tuple or tuples, or dictionary. Example:

>>> from googleanalytics import Connection 
>>> import datetime 
>>> connection = Connection('clintecker@gmail.com', 'fakefake') 
>>> account = connection.get\_accouunt('1234') 
>>> start\_date = datetime.date(2009, 04, 10) 
>>> end\_date = datetime.date(2009, 04, 10) 
>>> data = account.get\_data(start\_date=start\_date, end\_date=end\_date, dimensions=['browser',], metrics=['pageviews',], sort=['-pageviews',]) 
>>> data.list [['Firefox', 21], ['Internet Explorer', 17], ['Safari', 17], ['Chrome', 6], ['Mozilla Compatible Agent', 5]] 
>>> data.tuple (('Firefox', 21), ('Internet Explorer', 17), ('Safari', 17), ('Chrome', 6), ('Mozilla Compatible Agent', 5)) 
>>> data.dict {'Chrome': 6, 'Internet Explorer': 17, 'Firefox': 21, 'Safari': 17, 'Mozilla Compatible Agent': 5}

If you’re concerned with the sort-order, you shouldn’t really use the dict output as order isn’t guaranteed. list and tuple will retain the sorting order that Google Analytics output the data in.

Caveats

The Google Analytics API allows for specifying as many dimensions and metrics in a report as you like. These are really useful and I’ve got basic support for these in this client (you can specify as many as you like and the client will pass them to Google and we’ll get back the right data), but my data processor code only looks for the first dimension and associated metric in each returned row. So you can mess around with specifying multiple metrics/dimensions but please note you may or may not get back the data you wanted. This should be fixed shortly.

Help!

If this Python client doesn’t have a feature you’re looking for, by all means please fork the project @ Github and start coding. I only ask that if you add any functionality, please add associated tests so I can verify that I don’t break what you’ve done. If you were super awesome you would also add docs for any new features in the README (which is in dire need of updating!).