explicitClick to confirm you are 18+

How developers can unplug from Google

RecoveringAStudentAug 25, 2019, 6:35:08 PM
thumb_up105thumb_downmore_vert

    This essay is a part of the Plan for Action essay series.


    Unless you've been offline for the last couple of years, you are likely aware of how Google has been actively involved in censorship and manipulation of search results for political ends. Many guides have gone out about how to start getting around Google as a consumer, and that's a good thing. However, I think the real punch to the gut is going to come when businesses and developers start rejecting "free" Google products, as these "free" products are the base of Google's power.

    Fortunately, there are free open source software alternatives to a few of the leading Google products used by businesses, and if you have some developer talent, you need to start recommending these in your company or organization. This blog is a guide on how to make that case.


Google Analytics + Tag Manager

    If fighting Google's free products were a video game, Google Analytics (GA) would be the final boss, with Google Tag Manager (GTM) being the underboss just before. I would argue that this product alone is the primary source of Google's power. But unlike a game, we're tackling the biggest product first.

    GA is a tracking program that is responsible for following you around on the internet, in which you visit a site, GA tags you with a session cookie, and after this, virtually everything you do is tracked, including scrolling, link clicks, information typed into forms, and so on. If two sites have GA (and virtually all big sites do), then GA will follow you from site to site. This following ability is what enabled "retargeting" technology, so looking at something on Amazon causes an ad about the product to appear on other websites, and so on.

    GTM is an enhancement to GA, in which you can add scripting to record specific information as custom GA events, or as custom dimensions in GA data.

    I consider these to be the "final boss" because Google does not need to run a traditional search engine in the form of crawling when people use GA. Google instead is having companies crawl their own sites and report all the search engine data Google needs.

Why people use GA + GTM

    The popularity of GA + GTM comes form their being highly useful in marketing. In one of my developer jobs, I wrote GTM scripts so that they could build reports in GA that were so sophisticated, I could tell my boss that out of 200,000 visitors, which percentage looked at a button after scrolling halfway down the page, clicked on it, and then chose not to fill out a sales form after looking at it for 30, 60, or 90 seconds. And even that is scratching the surface of how thoroughly people could be tracked. And it was all free with no IT setup required.

    However, the main problem we had with GA was that they had a habit of arbitrarily deleting large amounts of data (so you couldn't do multi-year comparisons), and the data was hard to extract from their API. You didn't "own" your data, and while many aren't thinking about it, how much value isn't captured because Google is selling all of your data?

Alternative: Matomo Analytics

    Something I started using on my site as an alternative to GA is the self-hosted Matomo Analytics. For a business that needs tracking data, Matomo is largely identical to GA, and recently added their own version of Tag Manager. As a web software developer, one extra feature which GA does not have, is the ability to integrate error messages from your software, and to track individual users with errors. So, you don't have to do the awkward "what were you doing before the error happened?" You can see it yourself.

Screencap of the "visitor log" feature in Matomo.

     But the most important thing for our discussion is, Matomo can be self hosted.In other words, its on your server, you can export and process the data however you want, and you aren't giving free data for Google to sell. Also, since I'm not using GA, I am by definition breaking the retargeting chain, because nothing visitors do on my site is going to Google. Finally, your data isn't deleted until you delete it, so you have full control. Also, in addition to tracking via a browser, Matomo is much easier than GA when you do server-side analytics.

    Finally, Matomo has a variety of nice dashboards, so if your boss is complaining about GA deleting data again, you can show him some nice dashboards, explain how he can control the data with your awesome Matomo server. I think we can flip some businesses on this, and the more we flip, the less ability Google has to use websites as free data harvesters.


GMail

    If you remember those heady days of the early 2000s when GMail came out, you remember and probably bought into the marketing campaign in which someone had to have a GMail account, and give you one of a limited number of invitations (I believe people could only give out ten at a time, originally, and later 50). However, only later did we all learn that Google was reading all of our emails, first for advertising, but later to monitor everything we were saying. From an organization perspective, Google makes money by selling GMail as a service for businesses, in which you get an organization GMail login, which integrates with all other Google products. Note the walled-garden here - after GA, GMail is a major source of power for Google services, given the single sign-on capability.

Email Alternatives

    Fortunately, there are so many options to do email that I would have trouble making a definitive list. Personally, I am a fan of Protonmail, and love using their encrypted email service. Virtually any website host offers email as a service, and when combined with Mozilla Thunderbird you can set up an email address with any domain you choose. From a business perspective, knowing that Google reads your info also means that your company has no secrets, which for a larger company means a risk of industrial espionage or data loss. If an entry-level Google employee making $15 per hour sees your $20 million secret in an email chain, can you be sure he won't steal it? This should give any medium-size company manager pause.

Login Alternatives

    Developers know about Facebook logins alongside Google's multi-site login ability, but these two are the larger face of a much wider market of distributed authentication. In a professional context, I use the open-source Central Authentication Service (CAS) so that all web sites within a company are tied to a login from one central site (usually the company's .com domain). Alongside CAS, there are also the SAML, LDAP, and OAuth distributed authentication methods. For most IT security you will need to do this anyway, but for small businesses, the key is to not be tempted by the convenience of a Google login, and instead, if you are developing for a small business you need to recommend real security.


Google Drive + Hangouts

    Google Drive and Hangouts represent an additional class of free services, in which Drive offers free file storage, and Hangouts offers the ability to host free chats. I haven't looked as much into these since I quit using them some time ago, but given the patterns of the other services, we can assume that everything is once again tracked.

Alternative: Nextcloud

    I have had a lot of success with a software called Nextcloud, and use it for both personal and professional hosting. It works on the same principle as Dropbox, in that you have files in a folder that is linked to a server, and the server automatically syncs all shared files on all connected machines. If you need to share outside of the organization, you can also generate share links.

Screencap of the web-based interface for Nextcloud.

    Like Matomo, the main advantage of Nextcloud is that you once again have the option to self-host. Some commercial services will retain the option to track your files (read the terms and conditions carefully), but Nextcloud is open source and again, you have total control over who sees what data. Finally, Nextcloud has a built-in app marketplace, which means you have the option to edit files through the website (much like how Google does) with one of the office apps, among many other features.

    If you are concerned about third-party risk from a security perspective, aka the underpaid Google employee that might be able to see your files, Nextcloud is one option for de-Googling yourself.


Video Chat Alternatives

    With Hangouts, there are many free and paid alternatives out there already, and two I use regularly are Zoom and GoToMeeting. Some vendors also like to use Join.me . Of the many alternatives, I would specifically recommend Zoom if you are going the commercial route, as they are Linux-friendly.

    Also, our friend Nextcloud comes through again, as they have a built-in app that lets you run a video chat server from your site. Its one of many interesting things a Nextcloud server can do.


Google's Search Crawler

    This last strategy is more for activists who truly want to de-link from Google, and may not be for everyone. You can write a very simple script to parse your server's accesslog files, and extract a list of user agents. From these, you can ferret out the user agents that Google uses in it's search crawler bots, and block them (in a server, you can check for user agents and selectively return an error code). If you can find a bot-blocking software to automate this, even better.

    I do not necessarily recommend this because you are censoring yourself form Google search results, and certainly I can't recommend this to a business that is just trying to make sales. However, if we get to the point that its time to fully de-link from Google, the option is there.


You can make it happen

    The Plan for Action starts with individual action, but the bottom line of this guide is that you can go beyond yourself, to have a very powerful impact. If you are a developer, or if you can make recommendations, recommend these open source alternatives to Google products. The inconvenience factor has been coming down over the years, and as we learn more about Big Tech's data harvesting, the more we are going to see Big Tech as a security risk, given their tendency to harvest and sell data. Be the leader of this decentralizing trend, and urge your employer or IT department to look into these alternatives.