Abstract
Prompt design can be understood similarly to query design, as a prompt aiming to understand cultural dimensions in visual research, forcing the AI to make sense of ambiguity as a way to understand its training dataset and biases ( Niederer, S. and Colombo, G., ‘Visual Methods for Digital Research’). It moves away from prompting engineering and efforts to make “code-like” prompts that suppress ambiguity and prevent the AI from bringing biases to the surface. Our idea is to keep the ambiguity present in the image descriptions like in natural language and let it flow through different stages (degrees) of the broken telephone dynamics. This way we have less control over the result or selection of the ideal result and more questions about the dynamics implicit in the biases present in the results obtained.
Different from textual or mathematical results, in which prompt chains or asking the AI to explain how it got the result might be enough, images and visual methods assisted by AI demand new methods to deal with that. Exploring and developing a new approach for it is the main goal of this research project, particularly interested in possible biases and unexplored patterns in AI’s image affordances.
How could we detect small biases in describing images and creating based on descriptions when it comes to AI? What exactly do the words written by AI when describing an image stand for? When detecting a ‘human’ or ‘science’, for example, what elements or archetypes are invisible between prompting, and the image created or described?
Turning an AI’s image description into a new image could help us to have a glimpse behind the scenes. In the broken telephone game, small misperceptions between telling and hearing, coding and decoding, produce big divergences in the final result - and the cultural factors in between have been largely studied. To amplify and understand possible biases, we could check how this new image would be described by AI, starting a broken telephone cycle. This process could shed light not just into the gap between AI image description and its capacity to reconstruct images using this description as part of prompts, but also illuminate biases and patterns in AI image description and creation based on description.
It is in line with previous projects on image clustering and image prompt analysis (see reference links), and questions such as identification of AI image biases, cross AI models analysis, reverse engineering through prompts, image clustering, and analysis of large datasets of images from online image and video-based platforms.
The experiment becomes even more relevant in light of the results from recent studies (Shumailov et al., 2024) that show that AI models trained on AI generated data will eventually collapse.
To frame this analysis, the proposal from Munn, Magee and Arora (2023) titled Unmaking AI Imagemaking introduces three methodological approaches for investigating AI image models: Unmaking the ecosystem, Unmaking the data and Unmaking the outputs.
First, the idea of ecosystem was taken for these authors to describe socio-technical implications that surround the AI models: the place where they have been developed; the owners, partners, or supporters; and their interests, goals, and impositions. “Research has already identified how these image models internalize toxic stereotypes (Birnhane 2021) and reproduce forms of gendered and ethnic bias (Luccioni 2023), to name just two issues” (Munn et al., 2023, p. 2).
There are also differences between the different models that currently dominate the market. Although Stable Diffusion seems to be the most open due to its origin, when working with images with this model, biases appear even more quickly than in other models. “In this framing, Stable Diffusion becomes an internet-based tool, which can be used and abused by “the people,” rather than a corporate product, where responsibility is clear, quality must be ensured, and toxicity must be mitigated” (Munn et al., 2023, p. 5).
To unmaking the data, it is important to ask ourselves about the source and interests for the extraction of the data used. According to the description of their project “Creating an Ad Library Political Observatory”, “This project aims to explore diverse approaches to analyze and visualize the data from Meta’s ad library, which includes Instagram, Facebook, and other Meta products, using LLMs. The ultimate goal is to enhance the Ad Library Political Observatory, a tool we are developing to monitor Meta’s ad business.” That is to say, the images were taken from political advertising on the social network Facebook, as part of an observation process that seeks to make evident the investments in advertising around politics. These are prepared images in terms of what is seen in the background of the image, the position and posture of the characters, the visible objects. In general, we could say that we are dealing with staged images. This is important since the initial information that describes the AI is in itself a representation, a visual creation.
Different from textual or mathematical results, in which prompt chains or asking the AI to explain how it got the result might be enough, images and visual methods assisted by AI demand new methods to deal with that. Exploring and developing a new approach for it is the main goal of this research project, particularly interested in possible biases and unexplored patterns in AI’s image affordances.
How could we detect small biases in describing images and creating based on descriptions when it comes to AI? What exactly do the words written by AI when describing an image stand for? When detecting a ‘human’ or ‘science’, for example, what elements or archetypes are invisible between prompting, and the image created or described?
Turning an AI’s image description into a new image could help us to have a glimpse behind the scenes. In the broken telephone game, small misperceptions between telling and hearing, coding and decoding, produce big divergences in the final result - and the cultural factors in between have been largely studied. To amplify and understand possible biases, we could check how this new image would be described by AI, starting a broken telephone cycle. This process could shed light not just into the gap between AI image description and its capacity to reconstruct images using this description as part of prompts, but also illuminate biases and patterns in AI image description and creation based on description.
It is in line with previous projects on image clustering and image prompt analysis (see reference links), and questions such as identification of AI image biases, cross AI models analysis, reverse engineering through prompts, image clustering, and analysis of large datasets of images from online image and video-based platforms.
The experiment becomes even more relevant in light of the results from recent studies (Shumailov et al., 2024) that show that AI models trained on AI generated data will eventually collapse.
To frame this analysis, the proposal from Munn, Magee and Arora (2023) titled Unmaking AI Imagemaking introduces three methodological approaches for investigating AI image models: Unmaking the ecosystem, Unmaking the data and Unmaking the outputs.
First, the idea of ecosystem was taken for these authors to describe socio-technical implications that surround the AI models: the place where they have been developed; the owners, partners, or supporters; and their interests, goals, and impositions. “Research has already identified how these image models internalize toxic stereotypes (Birnhane 2021) and reproduce forms of gendered and ethnic bias (Luccioni 2023), to name just two issues” (Munn et al., 2023, p. 2).
There are also differences between the different models that currently dominate the market. Although Stable Diffusion seems to be the most open due to its origin, when working with images with this model, biases appear even more quickly than in other models. “In this framing, Stable Diffusion becomes an internet-based tool, which can be used and abused by “the people,” rather than a corporate product, where responsibility is clear, quality must be ensured, and toxicity must be mitigated” (Munn et al., 2023, p. 5).
To unmaking the data, it is important to ask ourselves about the source and interests for the extraction of the data used. According to the description of their project “Creating an Ad Library Political Observatory”, “This project aims to explore diverse approaches to analyze and visualize the data from Meta’s ad library, which includes Instagram, Facebook, and other Meta products, using LLMs. The ultimate goal is to enhance the Ad Library Political Observatory, a tool we are developing to monitor Meta’s ad business.” That is to say, the images were taken from political advertising on the social network Facebook, as part of an observation process that seeks to make evident the investments in advertising around politics. These are prepared images in terms of what is seen in the background of the image, the position and posture of the characters, the visible objects. In general, we could say that we are dealing with staged images. This is important since the initial information that describes the AI is in itself a representation, a visual creation.
Original language | English |
---|---|
Publication status | Published - 2024 |