Can artistry be built into a machine?
One day recently, on a table in Jean Oh’s lab in the Squirrel Hill neighborhood of Pittsburgh, a robot arm was busy at a canvas. Slowly, as if the air were viscous, it dipped a brush into a pool of light-gray paint on a palette, swung around and stroked the canvas, leaving an inch-long mark amid a cluster of other brushstrokes. Then it pulled back and paused, as if to assess its work.
The strokes, mostly different shades of gray, suggested something abstract – an anthill, maybe. Oh, head of the roBot Intelligence Group at Carnegie Mellon University, dressed in a sweatshirt bearing the words “There Are Artists Among Us,” looked on with approval. Her doctoral student, Peter Schaldenbrand, stood alongside.
Oh’s work, which includes robot vision and topics in autonomous aviation, often touches on what is known as the sim-to-real gap: how machines trained in a simulated environment can act in the real world. In recent years, Schaldenbrand has led an effort to bridge the sim-to-real gap between sophisticated image-generation programs such as Stable Diffusion and physical works of art such as drawings and paintings. This has mainly been manifest in the project known as FRIDA, the latest iteration of which was rhythmically whirring away in a corner of the lab. (FRIDA is an acronym for Framework and Robotics Initiative for Developing Arts, although the researchers chose the acronym, inspired by Frida Kahlo, before deciding what it stood for.)
The process of moving from language prompts to pixelated images to brushstrokes can be complicated, as the robot must account for “the noise of the real world,” Oh said. But she, Schaldenbrand and Jim McCann, a roboticist at Carnegie Mellon who also helped develop FRIDA, believe that the research is worth pursuing for two reasons: It could improve the interface between humans and machines, and it could, through art, help connect people to one another.
“These models are trained based on everybody’s data,” McCann said, referring to the large language models that power tools like ChatGPT and DALL-E. “And so I still think we’re figuring out how projects like this, that use such models, can deliver value back to people.”
The sim-to-real gap provides a surprisingly tricky problem for roboticists and computer engineers. Some artificial intelligence systems can list the steps involved in walking (tighten your quadriceps and flex your tibialas posterior, tilt your weight back and tense your gluteus maximus) and can make a simulated body walk in a virtual world. So, it’s tempting to think that these systems could easily make a physical body walk in the real world.
Not so. In the 1980s, computer scientist Hans Moravec noted that AI was good at engaging in complicated reasoning and parsing vast amounts of data but that it was bad at simple physical activities, such as picking up a bottle of water. This is known as Moravec’s paradox. (The physical superiority of humans might be explained by our body’s long evolutionary history; the tasks that are simple for us are supported by millions of years of Darwinian experimentation.)
Painting, which often mixes high-concept ideas and basic physical actions, throws the paradox into relief: How do we manage to capture the absurdity of human consciousness with the motions of an arm?
AI image-generating tools such as Midjourney, DALL-E and Stable Diffusion are trained by feeding neural networks massive databases of images and corresponding text descriptions. The programmed goal is to model the relationships between the meanings of words and the features of images, and to then use these relationships in a “diffusion model” to create original images that retain the meaning of particular descriptions. (The prompt “A family picnicking in the park” will generate a new image every time it is used; each one will be understandable as a family picnicking in the park.)
But such images exist only in the sim-world of computers, composed of pixels of varying hue and intensity. Leave the simulation and the image stays behind.
To solve this problem, Oh and her colleagues took FRIDA’s physicality into account. Taped to a wall in their lab is a piece of paper with 130 different brushstrokes in black: curlicues and lines, some long and straight, some little more than dots. The marks represent the range of the robot’s motion, and they were programmed into its diffusion model.
“We take pictures of the brushstrokes, model that interaction, and then get a really accurate simulation of brushstrokes grounded in what the robot can actually do,” Schaldenbrand said. When prompted, the model would create an image of a frog ballerina in pixels, but only in configurations that were possible for the robot to paint using those 130 brushstrokes.
The researchers developed a way for the robot to occasionally step back from its painting, to gauge how close it was to the goal it had generated in pixels, and to then revise that pixelated goal. A wayward mark could become the motion of the frog ballerina leaping, or the raised eyebrow of someone in her audience. So, every few dozen brushstrokes, FRIDA pulled away from the canvas, took a photo of its work thus far, paused and then went back to work.
“It’s how maybe human artists do this,” Oh said. “Add some brushstrokes, and then go back and look at the full canvas and replan. We wanted to mimic that process.” A process of artistic self-discovery, in a way, albeit a mechanized, algorithmic and statistical one.
The results of these methods are on display in the lab. Portraits of professors, historical figures, landscapes, cityscapes, that frog ballerina, all in a distinctive, abstract style – even a self-portrait of the first FRIDA robot. The consistency of the paintings suggests a unified artistic vision, for which Schaldenbrand, McCann and Oh decline to claim credit. They attribute each of the works to FRIDA.
But could FRIDA have an oeuvre without a will, a heart or fingernails? Can a robot be an artist?
Amy LaViers, a computer scientist and dancer who runs the Robotics, Automation and Dance Lab, an independent nonprofit, said that such questions wouldn’t seem so crazy, or scary, if people were open to dissolving the hard distinction between the artist and the medium. Everything – whether it’s watercolor or AI image generators or a desire for expressivity – is wrapped up in the art. Even something as simple as paint can seem to have a mind of its own, and a painter has to react to the way it glides on the canvas. LaViers suggested viewing FRIDA as a “robotic paintbrush,” rather than a painting robot.
“There are things you can do with artificial bodies that humans can’t do,” she said. “It broadens the palette of human expression.”
Oh emphasized that humans were still essentially involved in FRIDA painting. They prompt the machine and mix the paints, set up the canvas and limit the number of total brushstrokes in each piece. The data sets that FRIDA and other image generators are trained on contain paintings and photographs created by other people. But, Oh added, the goal was never to make something to compete with human artists. “We want to promote human creativity,” she said. “We want people to express their thoughts in different ways.”
In the lab, Schaldenbrand watched as a painting slowly emerged from FRIDA’s deliberate gray brushstrokes: a foggy road, the shapes of cars, taillights. “This is hard to explain,” he said. “I don’t want to give some false notion that there’s a consciousness going on here. But it’s kind of fun sometimes to pretend.”
This article originally appeared in The New York Times.