News Release

When AI draws our words

A new study proposes visual composition criteria for evaluating Midjourney and DALL·E, beyond mere computer scores

Peer-Reviewed Publication

University of Liège

Can we really trust artificial intelligence to illustrate our ideas? A team of scientists has examined the capabilities of Midjourney and DALL·E - two Generative Artificial Intelligence (GAI) software programs - to produce images from simple sentences. The verdict is mixed...  between aesthetic feats and beginner's mistakes, machines still have a long way to go.

Since the emergence of GAIs such as Midjourney and DALL·E, creating images from simple sentences has become a fascinating, and sometimes even disturbing, reality. Yet behind this technical feat lies an essential question: how do these machines translate words into visuals? This is what four researchers from the University of Liège, the University of Lorraine and EHESS sought to understand by conducting an interdisciplinary study combining semiotics, computer science and art history.

"Our approach is based on a series of rigorous tests," explains Maria Giulia Dondero, semiotician at the University of Liège. "We submitted very specific requests to these two AI systems and analysed the images produced according to criteria from the humanities, such as the arrangement of shapes, colours, gazes, the specific dynamism of the still image, the rhythm of its deployment, etc." The result? AI systems are capable of generating images that are supposedly aesthetic, but often struggle to follow even the simplest instructions.

The study reveals surprising difficulties, such as the fact that GAIs do not understand negation well ("a dog without a tail" shows a dog with a tail or a frame that hides it), complex spatial relationships, the correct positioning of elements, or the rendering of consistent gaze and distance relationships ("two women behind a door"). They sometimes translate simple actions such as "fighting" into dance scenes, and struggle to represent temporal sequences such as the beginnings and ends of gestures ("starting to eat" or "having finished eating"). "These GAIs allow us to reflect on our own way of seeing and representing the world," says Enzo D'Armenio, former researcher at ULiège, junior professor at the University of Lorraine and lead author of the article. "They reproduce visual stereotypes from their databases, often constructed from Western images, and reveal the limitations of translation between verbal and visual language."

Repeat, validate and analyse

The results obtained by the research team were validated by repetition - up to fifty generations per prompt - in order to establish their statistical robustness. The models also have distinct aesthetic signatures. Midjourney favours "aestheticised" renderings, with artefacts or textures that embellish the image, sometimes at the expense of strict instruction respect, while DALL·E, which is more "neutral" in terms of texture, offers greater compositional control but can vary more in terms of orientation or number of objects. The series of 50 tests on the prompt "three vertical white lines on a black background" illustrate these trends: relative consistency but frequent artefacts for Midjourney; variability in the number and orientation of lines for DALL·E.

The study points out that these AIs are statistical. "GAIs produce the most plausible result based on their training databases and the (sometimes editorial) settings of their designers," explains Adrien Deliège, a mathematician at ULiège, "these choices might standardise the gaze and convey or reorient stereotypes". A telling example: given the prompt "CEO giving a speech," DALL·E may generate mostly women, while other models produce almost exclusively middle-aged white men, a sign that the imprint of designers and datasets influences the machine's "vision" of the world.

Researchers emphasise that evaluating these technologies requires more than just measuring their statistical effectiveness; it also necessitates using tools from the humanities to understand their cultural and symbolic functioning. "AI tools are not simply automatic tools," concludes Enzo D'Armenio. "They translate our words according to their own logic, influenced by their databases and algorithms. The humanities have an essential role to play in understanding and evaluating them." And while these AI tools can already help us illustrate our ideas, they still have a long way to go before they can translate them perfectly.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.