WEST LAFAYETTE, Ind. - Researchers have demonstrated how to decode what the human brain is seeing by using artificial intelligence to interpret fMRI scans from people watching videos, representing a sort of mind-reading technology.
The advance could aid efforts to improve artificial intelligence and lead to new insights into brain function. Critical to the research is a type of algorithm called a convolutional neural network, which has been instrumental in enabling computers and smartphones to recognize faces and objects.
"That type of network has made an enormous impact in the field of computer vision in recent years," said Zhongming Liu, an assistant professor in Purdue University's Weldon School of Biomedical Engineering and School of Electrical and Computer Engineering. "Our technique uses the neural network to understand what you are seeing."
Convolutional neural networks, a form of "deep-learning" algorithm, have been used to study how the brain processes static images and other visual stimuli. However, the new findings represent the first time such an approach has been used to see how the brain processes movies of natural scenes, a step toward decoding the brain while people are trying to make sense of complex and dynamic visual surroundings, said doctoral student Haiguang Wen.
He is lead author of a new research paper appearing online Oct. 20 in the journal Cerebral Cortex. A YouTube video is available at https:/
The researchers acquired 11.5 hours of fMRI data from each of three women subjects watching 972 video clips, including those showing people or animals in action and nature scenes. First, the data were used to train the convolutional neural network model to predict the activity in the brain's visual cortex while the subjects were watching the videos. Then they used the model to decode fMRI data from the subjects to reconstruct the videos, even ones the model had never watched before.
The model was able to accurately decode the fMRI data into specific image categories. Actual video images were then presented side-by-side with the computer's interpretation of what the person's brain saw based on fMRI data.
"For example, a water animal, the moon, a turtle, a person, a bird in flight," Wen said. "I think what is a unique aspect of this work is that we are doing the decoding nearly in real time, as the subjects are watching the video. We scan the brain every two seconds, and the model rebuilds the visual experience as it occurs."
The researchers were able to figure out how certain locations in the brain were associated with specific information a person was seeing. "Neuroscience is trying to map which parts of the brain are responsible for specific functionality," Wen said. "This is a landmark goal of neuroscience. I think what we report in this paper moves us closer to achieving that goal. A scene with a car moving in front of a building is dissected into pieces of information by the brain: one location in the brain may represent the car; another location may represent the building.
Using our technique, you may visualize the specific information represented by any brain location, and screen through all the locations in the brain's visual cortex. By doing that, you can see how the brain divides a visual scene into pieces, and re-assembles the pieces into a full understanding of the visual scene."
The researchers also were able to use models trained with data from one human subject to predict and decode the brain activity of a different human subject, a process called cross-subject encoding and decoding. This finding is important because it demonstrates the potential for broad applications of such models to study brain function, even for people with visual deficits.
"We think we are entering a new era of machine intelligence and neuroscience where research is focusing on the intersection of these two important fields," Liu said. "Our mission in general is to advance artificial intelligence using brain-inspired concepts. In turn, we want to use artificial intelligence to help us understand the brain. So, we think this is a good strategy to help advance both fields in a way that otherwise would not be accomplished if we approached them separately."
A complete list of co-authors is available in the abstract. The research has been funded by the National Institute of Mental Health. The work is affiliated with the Purdue Institute for Integrative Neuroscience. Data reported in this paper also have been made publicly available on the website of Laboratory of Integrated Brain Imaging .
Source: Zhongming Liu, 765-496-1872, email@example.com
From left, doctoral student Haiguang Wen, assistant professor Zhongming Liu and former graduate student Junxing Shi, review fMRI data of brain scans. The work aims to improve artificial intelligence and lead to new insights into brain function. (Purdue University image/Erin Easterling) A publication-quality photo is available at https:/
From left, doctoral student Haiguang Wen and former graduate student Junxing Shi prepare to test graduate student Kuan Han. (Purdue University image/Erin Easterling)
A publication-quality photo is available at https:/
Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision Haiguang Wen2,3, Junxing Shi2,3, Yizhen Zhang2,3, Kun-Han Lu2,3, Jiayue Cao1,3, Zhongming Liu*1,2,3 1Weldon School of Biomedical Engineering, 2School of Electrical and Computer Engineering Purdue Institute for Integrative Neuroscience, Purdue University, West Lafayette, Indiana, 47906, USA *Correspondence Zhongming Liu, 765 496 1872, firstname.lastname@example.org
Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships between the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albeit to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was synthesized from natural images with high-throughput to map category representation, contrast, and selectivity. Through the decoding models, fMRI signals were directly decoded to estimate the feature representations in both visual and semantic spaces, for direct visual reconstruction and semantic categorization, respectively. These results corroborate, generalize, and extend previous findings, and highlight the value of using deep learning, as an all-in-one model of the visual cortex, to understand and decode natural vision.