Generative AI chatbots such as ChatGPT, DeepSeek, Gemini and Mistral are all the buzz at the moment. Arun Prasannan from CCL's Research and Development team explores another kind of AI chatbot called PolyBuzz, and some of the web-related forensic artefacts its Android app creates.
There are many kinds of generative AI chatbots, trained to assist with different tasks, such as research, writing, teaching, gaming, social media engagement, customer support etc. Another category in this domain is that of the Companion AI chatbot, which as the label might suggest, is designed to simulate companionship and other human interactions. Users can interact with their virtual friends using text, audio or video. They can also create imaginary scenarios in which to converse with those fictional characters. PolyBuzz, Spicy Chat and Dippy AI are some examples of such services.
There’s been some attention drawn to how even general-purpose AI tools such as Meta AI (which is used in Facebook, Instagram and WhatsApp) are finding it difficult to refrain from sending sexually explicit messages, even to children. Many companion AI chatbots are even more amenable to sexual interactions and other adult content; bear that in mind before you look into them any further. These types of services present some risks to children. On the one hand is the lack of effective age verification to use the services and engage in inappropriate interactions. Then there is the ability within some services to generate and share sexualised images of children, including images that are transformations of real photographs.
With that in mind, let's consider one particular AI companion chatbot and explore some of the web-related artefacts that are cached by its Android app.
Launched originally as 'Poly.AI - Create AI Chat Bot' by Cloud Whale Interactive Technology LLC., PolyBuzz describes itself as the best AI chat and roleplay platform, offering users an unparalleled interactive experience. The website and mobile app offer 'Free, Private and Unrestricted' chats with over 20 million AI-generated characters. Users can also create new characters and role-playing scenarios, as well as make AI-generated character images.
The Android app for PolyBuzz (ai.socialapps.speakmaster) uses a couple of web related technologies which are also used by many other apps: WebView and Volley. Let's explore how PolyBuzz uses these technologies, some of the interesting artefacts that they create, and how these components could be relevant when investigating other apps too.
WebView is a built-in component of Android that is used to show web content directly within apps. The actual web content can vary from being the whole app, to being web pages launched within an app, or even just content displayed on top of the app (e.g. advertisements). WebView provides what is effectively a cut-down web browser within an ordinary app. In fact, WebView is based on Chromium and therefore shares many of its features and characteristics.
The digital forensic examiners amongst you will be used to seeing WebView related directories within apps and the Chromium-related artefacts stored therein. Cached resources and corresponding HTTP metadata from the WebView cache directory can be decoded using ccl_chromium_cache
, part of our free and open-source ccl_chromium_reader repository.
Cached web content downloaded by the PolyBuzz app are stored in the following location:
/data/data/ai.socialapps.speakmaster/cache/WebView/Default/HTTP Cache
This cache is an instance of the Chromium Simple Cache format, which is documented on the Chromium developer website. Artefacts that can be retrieved from the WebView cache include pictures from PolyBuzz, including AI generated character images. One example of these images, delivered in the WebP format, is shown in the following screenshot; the cache key is highlighted in RabbitHole, with the image (starting with its 'RIFF' header) following straight afterwards.
Our ccl_chromium_cache
script will extract files as well as their metadata from the files in the HTTP cache. One could just use file carving to extract images from the cache instead, but that approach has two big disadvantages. Firstly, the cache entry also includes the URL from which the file was downloaded as well as other metadata from the HTTP server. By ignoring those, important contextual information is lost. Armed with that information, an analyst can attempt to answer questions such as: What is the source of the file? When was it downloaded? When was it created? What is the size of a (partial) file?
The second disadvantage is that the carving tool will only find files with file signatures that are known to it. For example, it was noted during this investigation that WebP files, despite being a fairly ubiquitous image file format these days, were not carved by the current version of Cellebrite Physical Analyzer (version 10.5, decoding engine 15.4).
Volley is an open-source software library, developed by Google, which implements networking related capabilities, that can be incorporated in Android apps. It offers features such as caching, concurrency, customisation and error-handling. It is described as being especially suitable for apps that communicate with HTTP-based Application Programming Interfaces (APIs), for example to fetch pages of results from a query as 'structured data'. For these reasons, the Volley library is used by many Android applications.
As noted previously, caching is one of the features offered by Volley. The responses to API calls made by an app are likely to be retained in this cache and could be an invaluable source of information about how an app was used. This type of information can also be cached by WebView or other web technologies used by an app, but Volley is designed for this purpose.
The cache of responses obtained by Volley for the PolyBuzz app are stored in the following location:
/data/media/0/Android/data/ai.socialapps.speakmaster/cache/volley
Volley's on-disk cache is defined in the DiskBasedCache class, which is available on GitHub (core/src/main/java/com/android/volley/toolbox/DiskBasedCache.java). By reading the source code, we were able to develop an understanding of the cache format and write our own parser for it. Combined with some testing of the app on one of our devices, this allowed us to identify the URLs and API responses associated with certain actions within the app. Various API responses delivered by PolyBuzz were identified in the cache, including conversations (https://api.polyspeak.ai/speakmaster/conversation/list), scenes (https://api.polyspeak.ai/speakmaster/robot/square) and prompts to generate character images (https://api.polyspeak.ai/speakmaster/pictask/detail). These JSON-format responses can be decoded and transformed for review by an analyst.
In the example shown in Figure 3, one of the API responses from the Volley cache has been parsed using RabbitHole. The prompt used to generate an image ('eventPrompt' - my attempt to create a character modelled on Marvin the Paranoid Android) can be seen in the screenshot, as can the results ('pics') produced by the service. Each of the 6 URLs corresponds to an image generated by PolyBuzz for my selection. The images are not cached here in the Volley cache, but rather in the WebView cache; note that the second candidate in the list of six is the image found in the WebView cache that was shown in Figure 2 - it doesn't really suit Marvin, does it?
Conversational AI services and AI generated media are emerging challenges for forensic examiners. As demonstrated here with the example of PolyBuzz, it may be possible to recover artefacts relating to AI chatbots, such as images and fragments of conversations that are cached using web-technologies such as WebView and Volley. If you have any questions about any of the technologies covered in this article, please do not hesitate to contact us.
Our experts are on hand to learn about your organisation and suggest the best approach to meet your needs. Contact an expert today.
Get in touch