We’re living in a world of “big data”. While you can ignore a lot of this data, it’s better to have more knowledge about the data that belongs to you.
Your history is where you’ve been online, but your chosen browser running on your device can store a lot more data than just that.
Additionally, it can be a huge insight into not only where you’ve been, but where you’re going.
It’s not just your browsing history.
Browser data varies greatly: preferences, passwords, emails, usernames, user experience preferences, cached data, security setting, plugins, extensions, etc….
Like all things, these are best kept secure.
Your browsing data, and what you do online is quite valuable to advertisers, and a wide variety of organizations.
Maybe, you’re concerned that someone can “steal” your data. (We’ll leave the NSA and those with more legitimate spying reasons out of it for now)
I’m talking about the hackers that want to steal and sell data.
It’s time to understand how your browser stores your data, and what you can do with it.
Chrome Browser Data and Settings
Navigating to browser history and selecting “Clear Browser Data” can be overwhelming to someone unfamiliar with terms such as
- cookies
- cache
- autofill
- hosted apps
- browsing history
- download history
- etc…
All of these terms are quite straightforward and easy to understand if you take them one at a time.
We’ll go through each one and make sure you have enough of an understanding to predict the outcomes of any “Clear Data” decision.
Browser Cookies Defined, Explained, and Understood
Many people are familiar with the term “cookies” but at the same time the term has a veil of mystery to it.
- Are cookies good or bad?
- Do you need them?
- Or can you live without them?
- Are they a threat to your privacy?
To get to the heart of the matter, it makes sense to break down what a cookie is, and the different uses related to them.
This article will start with the basics and then delve into the details.
What is a Cookie
First things first, a cookie is a piece of data that travels between your device and some service through network activity triggered by visiting a website. This is then saved via your browser and potentially persisted on your device.
When visiting a site that employs cookies, your browser then uses that saved cookie to let the website server know who you are. It identifies you so, the server can act accordingly.
Imagine having to login each time you wanted to view a different product on Amazon.
No one would use the internet.
This enables you to sign into websites as well as allow websites to identify you as an individual.
It isn’t a program or software. It can’t infect your computer with viruses or malware. There’s a lot of misinformation and confusion surrounding cookies.
Lets iron this out. The term “cookie” started as “magic cookie” but it also has many other names.
- Browser Cookie
- Tracking Cookie
- HTTP Cookie
- Web Cookie
- Session Token
- Internet Cookie
Additionally, there are other types of cookies which have various defining characteristics:
- Persistent Cookie
- Secure Cookie
- These are only transmitted over encrypted connections.
- HttpOnly Cookie
- Implemented in 2002 by Microsoft IE to mitigate risks of XSS (cross-site scripting). This is a bit of a more technical concept so here are the details of HttpOnly Cookies and how they work.
- A simple way to think about it is that, it protects the cookie from being interacted with by the browser’s JavaScript run-time.
- This is enforced through the browser implementation of network communications.
- Third-Party Cookie – This is the most common “other” type of cookies.
- First-party cookies are issued by the website you’re visiting.
- Therefore, that domain name is documented in the cookie header.
- A third-party cookie is one that is sent to a different domain/server than the one you are currently requesting data from.
- By disabling third party cookies (instructions later in this article), you prevent your computer from communicating with any server other than the one you’re currently accessing.
- This is desirable as sending information to other servers necessarily doesn’t do anything for you. Quite the contrary, it can significantly slow down your internet experience, and other potential detriments.
- However, this can also break features that you app relies on, for example if your web app uses the Microsoft Authentication Library, and you’re using an old version of IE, third party cookies are used to grant access to a web app, so change cookie settings with caution.
- Zombie Cookie
- This is a bit of Internet Archaeology but this type of cookie usage is a breach of browser security. It involves cookies which are very difficult to remove.
- Used for persistent analytics, it enables tracking across browsers on the same machine.
- This was created to counteract this issue of fragmented analytics
- when the user manually deletes cookies,
- the analytics system will track this user as two users
- this happens because the chain of visits connected by a user id stored in a cookie is broken
- UC Berkely did a study in 2009 revealing how this Flash based cookie security vulnerability works
- Regular Ole HTTP Cookies
- To put it simply Cookies are added to browser request headers and sent back to servers that sent them in the first place.
How do Servers Send Browsers Cookies in General
A server response may contain set-cookie headers.
Cookies can have an expiration date. If there’s no expiration, the cookie will expire at the end of the session.
Set Your Browser’s Handling of Cookies
Try exploring your favorite browser settings to learn more as you read this article.
Brave:
Chrome:
Edge:
Alternatively, navigate to Chrome settings > Select “Show Advanced Settings” > Privacy > “Content settings…”
Once this dialog box appears you will see the options for how you want your browser to handle cookies.
You can also type in Chrome://settings/cookies to see all the cookies saved in your browser. Cookies have five options:
- The “recommended” option for cookies is “Allow local data to be set”. This will allow first and third-party cookies. You can use this option combined with the fourth option to restrict third party cookies.
- “Keep local data only until you quit your browser”. This option will allow cookies but will automatically delete them when you quite Chrome. This also means that websites that rely on cookies to persist an authenticated state, will log you out each time you close your browser. A bit inconvenient.
- “Block sites from setting any data”. This option is the most restrictive because now you won’t be able to sign in anywhere. Not sure why you would use this option but there is probably a good reason to use it. If you set this and forgot, look for this icon in your URL window as a reminder of the restricted option.
Delete Your Browser Data
If you use Chrome, you can go to history in your options and select “clear browsing data”.
Then you can select how far back to erase the data, and you need to decide what to delete.
Some of the options are selected by default.
- Browsing History – A list of URLs visited. (Uniform Resource Locator)
- Download History – Every file you download will have info about the download stored in your browser history.
- Cookies and Other Site and Plug-In Data – If you delete your cookies, websites will once again see you as a new visitor.
- Any preferences associated with a website login will be lost.
- There are other mechanisms of persistence in the browser such as local/session storage, and other less used options, but that’s outside the scope of this topic.
- Cached Images and Files – This enables quicker page loads. A cache of images and files enables your browser to avoid having to request these resources from the originating server.
- An item to note is, this cache of images and files can grow so large that it will actually slow your browser/device down.
Security Settings
There are also some options that aren’t selected by default that you can check, but make sure you understand what they do first.
Passwords
This one is controversial.
Password security is quite important. There’s even an article on this site discussing various password storage applications.
Mainly, the fact that all your passwords can then be accessed by going to chrome://settings/passwords into the URL or just clicking the link.
Alternatively if you want to find it in the menu just follow: Chrome > Settings > Advanced > Passwords and forms > Manage Passwords.
- Many people are getting bent out of shape because anyone with physical access to your computer can access your private data. Yes, that seems dangerous. As well as, going on a two-week vacation to Costa Rica while leaving your front door open, lights on, and music blaring the hit single “Safety Dance” on loop. You just forgot the sign that says take what you want.
- Autofill Form Data – This can be potentially dangerous if you have sensitive information being auto-filled. When I think of this I am mainly thinking of credit card info and address. Well, maybe your address isn’t a big deal. What’s the worst case scenario? Maybe, someone mails you something that you don’t want, oh no. The big danger here is that a website can take advantage of autofill data by having an invisible form for your credit card data or maybe even your social. I will say that if you opt in for many newsletters with your email address you might consider using an auto fill. That way you don’t have to type your email constantly.
- Hosted App Data – Hosted, in this case, refers to the Chrome Web Store apps that you may have added to your browser, this is a deep topic and I’m planning another article that digs into this.
- Content Licenses – This is essentially a collection of licenses to access content that you either paid for or were given access to gratis. Either way, when a publisher of content wants to control who has access to the content they use a “content license”. There is a reason this is unchecked by default. If you delete your content licenses you won’t be able to watch any of the media that issued you a license. In other words, you will have to reacquire the various licenses by either paying again or finding a way to reach out to the content providers. The rule of thumb, leave this alone unless you are selling your computer or some other very specific reason.
- Download History – Pretty straightforward, it’s a list of the files you’ve downloaded, when, from where, etc…
Cached Images and Files
When I went to mine I saw that I had 151 MB of cached content. When you see content in a browser, most of the time, it comes from another computer, these days most often a Content Delivery Network Server (CDN).
This site runs on WordPress on SiteGround and they offer a free integration to CloudFlare’s free CDN services. It helps you get your content to people faster, and this is very important, because people are very impatient.
Additionally, Google seems to prefer providing links to sites that offer a good UX (User Experience) and speed is a big part of that.
It exists in a physical location closer to you than the website host server, and stores a copy of the data so you can get it faster.
Quality CDNs update stale copies when the original (on the original website host server) changes. This needs to happen as fast as possible so that the risk of serving stale data is reduced.
Your browser does essentially the same thing. If there’s a high quality banner image at the top of your website, and it’s a server side rendered application, I’d need to get that image every time the site loads a new page.
If browser caching didn’t exist that is.
Caching logic enables the browser to save the resource, and save the time to keep getting it from the server. Network communications take time after all, and we want to see the image now.
If caching is disabled (you can do so in Dev Tools), this increases page load times significantly enough to be noticeable by the human eye, so use this feature with knowledge.
Let’s use Cloudflare’s site as an example. The top left has a logo, and when I go to my Elements view in the Developer tools I can see the location of that image.
If I go to that location, and then reload I’ll see that the nature of the network communication changes. The first time we request the resource, we get a Response Status of 200 while the second time we see something different.
There’s a bit of data in this view but I’ll highlight the important stuff for you, in case you don’t feel like pinching and zooming on your phone.
The first thing to notice is the Status Code value of 304. This is a server response that tells the browser that the version of the file it has is up to date. How is this done?
Entity Versioning
When the request for this resource was made, an important request header was included. It’s called if-none-match. It’s a way of telling the server, here’s the version we have, if you have a newer one please send that, otherwise I’m all set.
If we scroll down in the expanded portion of the network request/response details view, we’ll see that header.
You’ll see that there’s also an if-modified header that states the timestamp of when we got our currently cached version of this resource.
It’s up to the server to generate and track entity versions to support the browsers ability to cache resources. If the server never replies with a 304 status code, the browser will expect to receive a copy on every request.
Let’s back up though, how did we get the entity id in the first place. You might have the answer already, but it’s logical to conclude that it must have come from the server.
Let’s start over, clear our cache, and take a look at getting the resource the first time.
We can see that the server responds with not just the image but some valuable information for the browser.
Entity Tags
- etag shows us the entity tag or version number
- last-modified tells us when the version we have was created
- less accurate than the eTag and thus is used for a fallback mechanism
Now the browser can attach that to future requests and the speed of communication is increased since the server can reply with, use the one you have, much faster than sending the image once again.
The more data ,aka 1s and 0s, you send over the network, the slower communications will be. Somewhat obvious but important to highlight.
These types of requests are called conditional requests. They’re called this since the server uses the received information and returns a response conditionally based on the request headers.
Granted I used an example with a very small size but the point is that this type of caching strategy is useful with all the files that a web app needs.
Browser Cache: Memory and Disk
If you look in the network tab of your developer tools for this request, you’ll see the size of the item in cache.
You’ll see that some files are persisted to memory cache like the above Cloudflare logo image example,
and some to disk cache like this JavaScript file used by the CloudFlare website.
Click the header to sort by the size of the request, and see which files are on disk cache, and which are on memory cache.
There’s probably some complex logic that goes into caching decisions and keeping something in memory until it needs to be put on disk, but that’s currently beyond my understanding.
At my current level of knowledge about internet browser tech, it’s still an opaque piece of technological wizardry that enables a better experience of the internet.
They say there’s two difficult things in computer science and software engineering.
1. Naming things.
2. Off by one errors
3. Caching
The decisions for the caching strategy were likely made by developers working on the browser, and I look forward to getting into the weeds on that, as soon as I get better at reading complex C++.
The Chromium open source project is a lot of code though. But, I digress.
The important thing is that this is a key aspect of internet architecture that enables performant sites/apps.
Potential Future Data Collection
In the future we may see a rise in the collection of other types of data. For example, see existing types of computer human interaction data types such as:
- Eye Tracking
- Key stroke and mouse movement analysis
- Bio-metrics
- Finger prints
- Heart Rate
- Galvanic Skin Response
What’s Being Collecting About Me
This isn’t always easy to answer. You’d to look at the third party code used, because you keystrokes and mouse movements are all subject to be recorded and saved for later viewing and analysis.
Remember privacy is relative. Additionally, is not a specific term because generally speaking a privacy violation can only occur if the average person would expect that kind of privacy.
For that reason it’s important to understand what data is collected and how it’s used. For example, if it’s anonymized and used to improve a product, I’m all for it, however, if it’s collected and used to penalize my credit score in some way, then I’m not thrilled.