A surprising amount of everyday life is governed by advanced mathematics. Aside from the motion of atoms or the rotation of the stars, complex mathematical methods guide social media feeds, medical imaging, and self-driving vehicles.
Rachel Ward, associate professor of mathematics and a data scientist at The Oden Institute, looks for those mathematical solutions that aid everyday life. She applies math in new ways for varied purposes: to rigorously justify existing methods for solving practical problems; to find faster or more accurate solutions to solve them; or to find solutions where none currently exist.
“I like doing mathematics driven by practical, real problems — letting the problem tell me what mathematics is needed to help solve the problem,” Ward explained. “The most exciting scenario is that the problem requires new mathematics, which then can also be applied to solve other problems which beforehand seemed unrelated.”
One practical problem that concerned Ward was the length of time patients spent in an MRI machine to undergo a scan. MRI scans typically last 15 to 90 minutes depending on what is being scanned. This is a long time for anyone, but particularly so for children, and the duration makes capturing fast-moving organs like the heart difficult. Each measurement takes a unit of time, so reducing time in the MRI scan is equivalent to reducing the number of measurements that must be taken.
About 15 years ago, empirical observations showed that MRI images can be reconstructed from far fewer measurements than should be necessary according to the Shannon-Nyquist sampling limit. The empirical theory required applying certain non-linear reconstruction methods instead of traditional linear ones, and it assumed use of the mathematical theory of compressive sensing. Compressive sensing says that provided the underlying signal to be recovered is sparse in a known dictionary, one need only collect a small number of measurements of the signal on the level of its sparsity, rather than a “full set” of measurements, to recover the signal, and use the sparsity knowledge to “fill in” the missing information. However, the theory of compressive sensing still did not fully explain the remarkable empirical results for MRI reconstructions.
“The theory for compressive sensing in its initial form seemed to suggest that completely random measurements in the frequency domain should be taken for recovering natural images such as those arising in the MRI application, which are sparse with respect to their gradient. That is, a natural image is cartoon-like, with mostly constant patches, separated by jumps only along edges,” Ward said. “However, at the same time, empirically it was observed random sampling did not work at all — instead, certain variable-density sampling schemes were what were producing the magical empirical results,” Ward said.
Ward’s research provided the first provable guarantee of robust image recovery by developing a more general theory of compressive sensing with importance sampling. This explains how the optimal, weighted, random sampling method relates to the sparsity structure. It also proved theory for the total variation minimization algorithm which was originally developed for noise removal, but which also gave rise to the best empirical results for recovering MRI images from undersampled measurements.
The research showed definitively that, given undersampled measurements of some signal, one can search for the signal with the same measurements that has the lowest variation, and that this reconstruction yields near-optimal error.
“It's been really interesting that one particular set of mathematical tools just isn't enough to really explain things or guide new optimization strategies,” Ward said. “We need to draw from tools from different but traditionally different mathematical areas.”
Applying Compressed Sensing to Autonomous Systems
Starting in 2018, Ward has been applying her research to the goal of creating truly autonomous systems as part of a team awarded a $7.5 million Multidisciplinary University Research Initiatives (MURI) grant from the Department of Defense.
Currently, autonomous vehicles can perform well in highly controlled areas, but sputter when they encounter new conditions on which they have not been trained. This is in contrast to fast-adapting humans.
“There have been examples in history where you have pilots who are flying and something happens to the aircraft that has never happened before, so they can't possibly have trained exactly for that scenario, but they have some combination of intuition and quick trial and error movements that allows them to somehow figure out how to land on the fly,” Ward explained. “We’re trying to give the machine this capability that humans seem to have.”
But how do you translate the concept of intuition into artificial intelligence?
“It connects back to sparsity priors, I think,” Ward explained. “Humans are born with certain innate abilities passed on from previous generations, like making eye contact. We don't have to learn these things. So, how do we develop algorithms that can both take into account ‘innate knowledge’ while also having the ability to learn new structures?”
The MURI grant advances research that allows intelligent machines to incorporate aspects of the physical context mathematically to heavily reduce the size of the search space when required to learn fast and with scarce data.
“Rachel Ward’s research in machine learning, sparse representations, and compressed sensing is a critical component of this MURI success as it explores novel research paths to learn a system’s equations of evolution quickly under scarce data,” said Dr. Frederick Leve, program officer for Dynamics and Controls at the Air Force Office of Scientific Research.
The MURI project includes researchers from Northeastern University and Princeton University, with expertise in aerospace engineering and control theory. The project held a kick-off meeting in February 2019 and will run for five years.
“This isn't something that we will solve in the next year,” Ward said, “but maybe in 10 years.”
Research that is poised to have an immediate impact, on the other hand, is her work on optimizing machine learning that Ward developed as a Visiting Research Scientist at Facebook AI Research in 2017-2018.
One of the challenging aspects of setting up a working neural network involves tuning hyperparameters — parameters whose value is set before the learning process begins. This is typically done by hand and refined through trial and error, and impacts the performance and quality of the neural network.
However, Ward – and others – believed that applying mathematical insights could allow one to automate the process. In the case of Facebook, this would mean fewer engineering hours required to develop a new image classification system or chatbot, and less compute time training the model.
Neural networks work iteratively, refining their models across many steps, before arriving at a solution that can accurately make predictions.
“There's sort of a Goldilocks-type perfect step size that will get you down to a good value that functions as quickly as possible,” Ward said. “But that step size depends on properties of the function that you can't possibly know beforehand because it depends on the training data that you input as a black box.”
Ward tackled the question of how to optimize the size of the steps a neural network takes, which determines how fast or slowly it converges on an answer.
Her recent paper on AdaGrad, a method developed by researchers from Google, UC Berkeley and Technion – Israel Institute of Technology, showed the algorithm could achieve comparable results to a hand-tuned system in far less time. It also proposed a mathematical explanation for the effectiveness of the system.
“One of the practical problems involved in training the large-scale multilayer neural networks that became the workhorses of artificial intelligence is the non-convex optimization,” said Leon Bottou, research lead for Facebook AI. “We can basically say that we do not understand mathematically why such an optimization is even possible.”
In practice, lots of cross-validation experiments are necessary to find the best step size schedules, network geometries, and regularization parameters for neural networks. Such experiments cost hundreds of millions of dollars industry-wide.
Ward first set her sights on precisely understanding why the ubiquitous batch normalization technique often improves both the neural network training times and their misclassification rates on a testing set.
She initially achieved some success on a simplified setup using ‘unusual’ mathematical techniques, but the resulting algorithm did not appear very promising, according to Bottou. About one month later, though, Ward realized that the same mathematical techniques could be applied to prove that a particular step size schedule — AdaGrad step sizes — would always perform nearly as well as the optimal schedule, independent of how it is initialized.
“This result essentially alleviates the need to experimentally determine a good step size schedule,” Bottou said. “Instead, the practitioner can now use this AdaGrad schedule with confidence that it is essentially as good as the one he would have found through costly experiments.”
“Some of this work is proposing a better solution,” Ward said. “But it's also about understanding why it works and hoping that the theory can give people intuition and guidance on how to do things that are better in the future.”
Beyond MRI scans, autonomous vehicles, and neural networks, there are no shortage of problems where fast analysis of large quantities of data is required.
“People often think that mathematics is helpful only for idealized problems that don't have anything to do with the real world,” Ward said. “But this is so untrue. I want to contribute simple, efficient new mathematics tricks for the analysis of large data that can improve on existing methods.”
--Written by Aaron Dubrow, Texas Advanced Computing Center
In March 2019, The University of Texas System approved the W. A. “Tex” Moncrief Distinguished Professorship in Computational Engineering and Sciences—Data Science. Rachel Ward will be the first honored with the professorship whose funds will be used to support her work.
"We are thrilled to have Dr. Ward as our newest Distinguished Professorship holder in the Oden Institute," says Oden Institute Director Karen Willcox. "Dr. Ward's research is at the forefront of the interface between data science and computational science, an area that represents so many important challenges and opportunities in research and education."