We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Advertisement

Attacking Neural Networks Could Lead to a Better Understanding of AI

Attacking Neural Networks Could Lead to a Better Understanding of AI content piece image
Researchers at Los Alamos are looking at new ways to compare neural networks. This image was created with an artificial intelligence software called Stable Diffusion, using the prompt “Peeking into the black box of neural networks.” Credit: Los Alamos National Laboratory
Listen with
Speechify
0:00
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 1 minute

A team at Los Alamos National Laboratory has developed a novel approach for comparing neural networks that looks within the “black box” of artificial intelligence to help researchers understand neural network behavior. Neural networks recognize patterns in datasets; they are used everywhere in society, in applications such as virtual assistants, facial recognition systems and self-driving cars.


“The artificial intelligence research community doesn’t necessarily have a complete understanding of what neural networks are doing; they give us good results, but we don’t know how or why,” said Haydn Jones, a researcher in the Advanced Research in Cyber Systems group at Los Alamos. “Our new method does a better job of comparing neural networks, which is a crucial step toward better understanding the mathematics behind AI.”


Jones is the lead author of the paper “If You’ve Trained One You’ve Trained Them All: Inter-Architecture Similarity Increases With Robustness,” which was presented recently at the Conference on Uncertainty in Artificial Intelligence. In addition to studying network similarity, the paper is a crucial step toward characterizing the behavior of robust neural networks.


Neural networks are high performance, but fragile. For example, self-driving cars use neural networks to detect signs. When conditions are ideal, they do this quite well. However, the smallest aberration — such as a sticker on a stop sign — can cause the neural network to misidentify the sign and never stop.


To improve neural networks, researchers are looking at ways to improve network robustness. One state-of-the-art approach involves “attacking” networks during their training process. Researchers intentionally introduce aberrations and train the AI to ignore them. This process is called adversarial training and essentially makes it harder to fool the networks.


Jones, Los Alamos collaborators Jacob Springer and Garrett Kenyon, and Jones’ mentor Juston Moore, applied their new metric of network similarity to adversarially trained neural networks, and found, surprisingly, that adversarial training causes neural networks in the computer vision domain to converge to very similar data representations, regardless of network architecture, as the magnitude of the attack increases.


“We found that when we train neural networks to be robust against adversarial attacks, they begin to do the same things,” Jones said.


There has been extensive effort in industry and in the academic community searching for the “right architecture” for neural networks, but the Los Alamos team’s findings indicate that the introduction of adversarial training narrows this search space substantially. As a result, the AI research community may not need to spend as much time exploring new architectures, knowing that adversarial training causes diverse architectures to converge to similar solutions.


“By finding that robust neural networks are similar to each other, we’re making it easier to understand how robust AI might really work. We might even be uncovering hints as to how perception occurs in humans and other animals,” Jones said.


Reference: Jones H, Springer JM, Kenyon GT, Moore J. If you've trained one you’ve trained them all: Inter-architecture similarity increases With Robustness. Presented at The 38th Conference on Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands, August 2022.
  

This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source.