Google, Ghostery and the limits of Ad Blocking.

On tracking some of the trackers some of the time.

Ghostery video Ghostery claims to show you “the invisible web” and block activity that could track your activity, but omits to report tracking by Google.

The code is not open source  but it is possible to see how it basically operates, which is by detecting resources accessed when web pages you visit are rendered and comparing their URLs against a curated list of supposed “trackers”. The list consists of regex strings to be matched against the URLs i.e. not only the domain name but also the targeted resource, usually a particular script library.

If a match is found this is displayed to the user in a popup panel, along with some information about the company responsible for the script. The information is often inaccurate and misleading, and while Ghostery does let you set the script to be blocked whenever the site is visited, and many of the URIs quoted in its list are used for the delivery of behavioural advertising, some of the script libraries listed are clearly not “trackers” but just components of a website that happen to be hosted on a different domain. For example our non PII collecting, privacy enabling product CookieQ is loaded from an external domain because it has to be to support dashboard manageability, and multiple-site specific and web-wide consent options for people visiting our customer's sites (we have had to reengineer the product to avoid its privacy enabling functionality being disabled by Ghostery).

More importantly, the list they do have is surprisingly incomplete. For example, there is no reference in their list to many of the script libraries and other resources using Google's google.com host domain. Even if all the “trackers” in the Ghostery list are set to be blocked it will fail to stop the majority of references to google.com, so people will continue to be tracked on any site that does reference it. Script libraries supporting products like Google reCAPTCHA, Translate, Custom Search Engine, User Distributed Search, the all-encompassing Google Loader and many other resources are accessed via this domain or one of its subdomains. One or more of them is referenced on the majority of public facing web sites and Google receives cookies on all of them, and places cookies when accessed as a third-party on many of them.

The Google+ social networking script and Google Analytics are referenced in Ghostery's list but people often do not wish to block the former, and have been led to believe by Governments and others that Google does not use Analytics data (which is gathered by the separate google-analytics.com domain) for its own purposes. But even if the particular script resource for Google+ is blocked any cookies in the google.com domain will still be sent if it is referenced by other resources, so tracking is hardly diminished.

For example the UK National Health Service contains information about various health conditions and illnesses. One would hope that people visiting these pages would not have their web history tracked especially as this could be classed as sensitive PII, but this is not the case. Even so, many might assume they will not be tracked on these sites if they use the Ghostery extension, but unfortunately this is not the case either. Here is a screenshot of Ghostery (the 5.4.0 version - see below for our Feb 2015 update on version 5.4.2) on the NHS page about HIV-Aids. It fails to show the reference to Google Translate on the page which uses the google.com domain, actually the subdomain of it translate.google.com. This Firebug trace shows that, even if the Google Translate function is not used and the Ghostery extension is configured to block everything, every visit to this page is recorded by sending a unique user identifying cookie to google.com.

The Whitehouse website contains some trackers that are successfully blocked by Ghostery such as the AddThis tracker. But because there is a reference to a YouTube video any visit to the Home page will result in citizens being tracked by Google even with all Ghostery blocking turned on. Here is the Firebug trace of this. Google does not place a cookie when this site is visited but if there are already cookies there, which is usually the case, they will be sent.

Another example here is a screenshot of Ghostery failing to show tracking via google.com on a randomly selected site that happens to use the Google ReCaptcha library. They show Google+, Doubleclick and Google Analytics but not the fact that UID (such as the ubiquitous PREF and NID) cookies are still being sent to Google via google.com. As can be seen from this Firebug Net activity display the reCAPTCHA ajax request places a cookie. Once the cookie is placed, as it has been on the majority of people’s browsers, it and any other cookies already stored in the domain will be sent to Google when anyone visits any page referencing google.com or a subdomain of it.

The cookie stored in this case, with name component set to "NID", is set to expire after a variable duration ranging from 2 days to 6 months, but in fact it and other cookies here are regenerated whenever a similar resource is accessed, and because this happens so frequently they are in effect immortal, leading to the permanent retention of the online activity they link to. Moreover cookies in the google.com are sent via redirection (aka “cookie synching”) when advertisements from the Google ad networks are rendered, or YouTube videos are present on a page, although Ghostery does allow you to block the script from some of these primary vectors if you set it to block the ads.

Ghostery’s inability to show the existence of these actual trackers is easily demonstrated by visiting one of Google’s main sites, here for example shows a visit to google.co.uk, where it claims to find no trackers but where in fact cookies are exchanged with many of Google’s properties.

Omissions like these are inevitable with the curated list approach, because the continuously changing web and multiple techniques for data collection means the list can never be comprehensive. Moreover, organisational entropy, if not competitive challenge, will inevitably result in annoying false positives. Nevertheless, it is a mystery why a company so associated with the ecosystem of smaller AdTech companies blocks their ability to collect data more efficiently than it does their biggest competitor. Only a transparent, widely recognised indication of consent with regulatory backing, as hopefully the Do Not Track signal will become in Europe, can ensure responsible web activity collection will be widespread enough to return trust to the web, and create a level playing field where smaller companies have a chance to prosper.

Another site where a visit causes cookies to be sent to Google is ProPublica.org. Anyone simply visiting its About Us page will have this web activity automatically sent to Google, see this Firebug screen shot.

It is interesting that investigative journalists from ProPublica, on whose board sits a representative of the private equity firm that funds Ghostery Inc., Warberg Pincus,, regularly refer to Ghostery and interact with its product and team without appearing to be aware of these issues.

UPDATE 25th Feb. 2015

We have downloaded the latest version of Ghostery (5.4.2) into Firefox and the request to the google.com subdomain (translate.google.com) is still not being detected or blocked, and cookies are still being set in the main domain by the response. Note that this is an embedded third-party, not clicked or interacted with, the request is sent whenever someone visits the UK NHS Aids/HIV page.

As with the 5.4.0 release Google-Analytics on domain google-analytics.com is being blocked but not most of the resources in google.com or its subdomains.

The Url for the NHS page is http://www.nhs.uk/conditions/HIV/Pages/Introduction.aspx

We also tried Ghostery 5.4.2 on the latest version of Chrome with the same results

UPDATE 26th Feb. 2015

We found that translate.google.com has been added to the Ghostery list since we reported yesterday (the 25th). As a result the NHS HIV site now does not send tracking information to Google if Ghostery is set to block the Google-Translate widgit. The Ghostery description of the widgit shows Ghostery examined it at 5.33am this morning (almost 5 months after we initially reported it). It must have been done a few hours after our updated post was re-tweeted midday yesterday.

As far as we have been able discover they did not fix the many other tracking references to the google.com domain. For example this shows that Google's PREF & NID cookies are still being sent (with full Ghostery blocking switched on) from pages that use the reCaptcha widgit, or many of the references to apis.google.com, for example this one on Mail Online, where the Google cookies are being set by the third-party access.

Ghostery is also still not detecting other Google owned domains. Here is a brand site that loads an embedded YouTube video. YouTube is owned by Google (references to youtube.com often cause a redirect to google.com) and here tracking cookies are being sent even when Ghostery is set to full blocking.