Previously, Principal Analyst Alex Caithness shed some light on IndexedDB on Chrome- one of the methods that websites and web apps can use to store data on a user’s device. In this new post - with accompanying free open-source scripts - he tackles the data structures behind a further two mechanisms that websites can use to persist information: Session Storage and Local Storage.
In previous blog posts we explored: the LevelDB file format - blog here; how Chrome (and Chrome-esque applications) builds its IndexedDB implementation atop this format - blog here; and perhaps most importantly, why this is knowledge worth having.
IndexedDB is one of a number of web technologies which enable a developer to store data locally, on a user’s machine. IndexedDB is very flexible but that flexibility comes with an overhead which might make it less suitable (or appealing) for basic tasks such as storing simple text keys and values.
For a long time, if a web developer wanted to store textual keys and values on the user’s device, the de facto method was to make use of good ol’ fashioned cookies; however, cookies come with two significant downsides. First: they are size limited (well, actually the truth is that IETF RFC 6265 sets a lower limit, not an upper one, but practically many browsers set a size limit 0f 4 KB. This practical limit is probably linked in part to the second limitation: cookies need to be transmitted with every HTTP request, which, if you want to store a lot of data on a client’s device, is going to lead to large, and therefore slow HTTP requests – not ideal.
To work around this problem, a new system called “Web Storage” was introduced, which encompassed two related methods which can be used to store data: Session Storage and Local Storage.
Both mechanisms allow a developer to store and manage text-based data in a key-value store. Session Storage data is meant to persist for one browsing session, isolated within a single window or tab. Session Store data stored by a site in one window or tab will not be accessible in any other tab even by the same website. Data stored using Local Storage on the other hand is meant to persist “forever” and be accessible by that website across multiple browsing sessions, windows, and tabs (unless the user decides to clear their browser’s data explicitly of course).
Which is significantly less code (and setup) compared to IndexedDb, so for basic text storage, Web Storage offers developers an easier ride.
In Chrome, and related Chrome-esque applications, the Local Storage and Session Storage data was previously stored in a simple SQLite database per domain, which provided a straightforward, albeit fragmented view of this data from a forensics standpoint. Those heady days of SQLite are, as you might expect, behind us now. Today, all of our Web Storage data for all websites are combined together in two LevelDB data stores – one for Local Storage and one for Session Storage, which, on Windows, can be found in:
“%LocalAppData%\Local\Google\Chrome\User Data\Default\Local Storage\leveldb”; and
“%LocalAppData%\Local\Google\Chrome\User Data\Default\Session Storage”
Despite being closely related in functionality, for whatever reason, the structure of the Local Storage and Session Storage LevelDB data stores differ, so in this post I will tackle each in turn. Throughout, I will be referring to source code files found in Chromium’s DOM Storage code.
It should be noted that because we are talking about LevelDB, it will be possible in a great many cases to recover deleted and outdated version of records. This is especially pertinent when discussing Session Storage – the data which is deleted at the end of a browsing session may well be recoverable for some time. From a forensic standpoint, this is of course a boon, but it may also cause complications around determining the content of “current” entries, so proceed with caution.
As discussed in a previous post, LevelDB is a key-value store where the keys and values are arbitrary BLOBs. LevelDB can store whatever it is told to store; in the case of Session Storage, keys and values are always textual strings – although encoded using a variety of encoding schemes. The Session Storage data store keeps track of the keys and values that are stored, and which hosts (websites) they are associated with.
There are 4 types of records that will be found in the Session Storage LevelDB, and they can all be identified by the key prefix (the start of the key), or in some cases a specific, static key. Keys are always text, encoded using UTF-8.
The first (static) key is “version”, the value of which should be “1” (encoded using UTF-8); if you are reading this in the future and the version is not “1” then suffice to say, you should take the rest of the section with a pinch of salt as you have a newer version of the format!
The next key prefix of interest is “namespace-“. These keys take the form of “namespace-<uuid with underscores>-<hostname>” e.g. “namespace-9c1cba1d_461a_4ceb_b536_ed7b4d890563-https://www.google.com/”.
The value of a “namespace” key will be a number, encoded as UTF-8 text; this value is the “map-id” associated with this website and UUID combination. The map-id is used to identify records associated with the host mentioned in the key. The UUID identifies each tab or window; as a result this value could appear multiple times for different websites. A single website could be opened multiple times in different windows or tabs and therefore have a different UUID assigned each time. Because of this, it is quite usual for a single website to be associated with multiple map-ids.
The key prefix “map-“ identifies the records which contain the actual keys and values stored by the Website/web app. The keys will take the form: “map-<map-id>-<key>” where <map-id> is the value determined by a “namespace” record and <key> is the key stored by the site/app, e.g. “map-5-userid”. The value of these records are the values for the Session Storage keys, encoded as UTF-16-LE text.
The final key is another static key: “next-map-id”, the value of which is the next available map-id number.
A significant limitation in this data, from a forensics standpoint, is the lack of timestamps intrinsic to the records. It is worth noting though, that a great many items that are stored in the Session Storage data store do have timestamps embedded in the keys and values. As well as attributing times to records which contain the timestamps, it may also be possible to infer the time-ranges for other records based upon timestamped records found on either side of them.
Although Local Storage must store essentially the same structure of data as Session Store, the records in its LevelDB data store are arranged differently – although like Session Storage, it is still key prefixes which lead the way through the data. The keys and values are mostly comprised of text, but unlike Session Storage there are additional markers and delimiters in the data.
Many strings in the keys and values are what I have termed “encoding-prefixed” - they are prefixed with a byte which gives the text encoding scheme used by the string. The prefixes I have encountered are:
• 0x00 – UTF-16-LE
• 0x01 – 8-bit encoding, ISO-8859-1 (latin-1) – this may be dependent on the device’s current culture; it also appears that some sites abuse this encoding scheme to store blobs
Keys with the prefix “META:” (encoded as ISO-8859-1) provide information around a batch of values which have been committed to the Local Storage for the host specified. The keys will take the form: “META:<hostname>” e.g. “META:http://google.com” (encoded as ISO-8859-1). The value of a meta key is a protocol buffer (see: https://developers.google.com/protocol-buffers) containing two values, both encoded as varints (see: local_storage_database.proto in the source code):
• ID 1: timestamp of the batch being committed (microseconds elapsed since 1st of January 1601)
• ID 2: size of the data held (in bytes) in the Local Storage for this URL when this batch was committed
Keys with the prefix “_” (encoded as ISO-8859-1) contain the keys and values stored in the Local Storage. The keys take the form:
The value of these records will be the values for the given Local Storage keys, stored as an encoding-prefixed string.
Unlike Session Storage, the Local Storage data does provide some timing context for when the keys and values were added or updated.
As noted above, the value of each “META:” key is a protocol buffer which contains a timestamp. This timestamp represents the time at which a batch of records are committed to the data store.
In order to identify records that belong to a batch: first identify a record with a “META:” key, then all subsequent “_” records referencing the same URL and that form an unbroken chain of consecutive LevelDB sequence numbers. It should be noted that keys can be deleted as part of a batch, so deleted records with the correct form of key may form part of the chain of records.
The Local Storage implementation places two limits on how often data can be committed to the database for any host. These limits will affect the accuracy of these timestamps (both of these limits are referenced in the local_storage_impl.cc source file in the definition for the StorageAreaHolder class). Firstly a 5 second delay is placed on each record being committed, to allow for other records to be batched up with it. Secondly a limit of 60 commits per hour is placed upon each host using Local Storage. During testing this resulted in a latency of between 5 and 60 seconds between a script requesting that data should be stored, and the data being entered into the database as part of a batch (and therefore being assigned a timestamp). This latency appeared to be dependent on how heavily Local Storage was being used.
In the course of performing this research we have built some Python modules and scripts to assist in working with Local Storage and Session Storage data in Chrome and Chrome-esque applications, and we are very happy to announce that we are releasing them as open source.
There are two modules which allow programmatic access to the records in Session Storage and Local Storage LevelDB data stores (ccl_chromium_sessionstorage.py and ccl_chromium_localstorage.py) to use in your own scripts and tools, in addition to two scripts which dump the data stores to an SQLite database with pre-defined views for easy review and searching.
The scripts can be found in our ccl_chrome_indexeddb repo here: https://github.com/cclgroupltd/ccl_chrome_indexeddb
P.S RabbitHole can help you dig down into data and reveal insights like these for yourself more easily – click here for more info and to request a free trial