The Biros Code for Efficient Programming

Today’s high performance computing (HPC) systems support applications in analysis and prediction that require computational resources that cannot be found on small scale clusters. These supercomputing systems use hundreds of thousands of specialized processors to enable research and industry alike to analyze data in the petascale, and soon exascale, range.

Poorly written code, however, can undermine it all, making HPC system efficiencies stumble below the desired range of 40 to 50 percent, to values as low a .001 percent.

ICES Professor George Biros and his collaborators, are working to boost the performance of HPC code and improve the speed and scope researchers can investigate important scientific problems.

A two-time Gordon Bell Prize recipient, earning the prestigious processing prize in 2003 and 2010, Biros is developing and optimizing algorithms with the potential to be widely applied across different areas of science and engineering. And with a background and appointment as a mechanical engineering professor, Biros is accustomed to addressing a wide range of issues.

“We identify what the important problems are and then we try and design the appropriate methods to solve these problems, “ Biros, who leads the ICES Parallel Algorithms for Data Analysis and Simulation Group, said. “And many times we don’t have the algorithms. We have to redesign, or even discover new algorithms that enable the efficient use of HPC systems.”

Grand Challenge Problems

For codes that run on desktops, or small-scale commodity clusters, inefficiency is usually more of an inconvenience than a disruptive problem; one can simply let the computer run as long as the program requires. But when a code becomes so large it requires HPC (petabytes or larger) inefficiency turns into an expensive, time exhaustive problem that can prevent the research of others from progressing.

“If you have something that’s not optimal on a single laptop, that’s no big deal, but if you’re on a large national resource [an HPC system], first you’re preventing other researchers from using the machines, and second, you’re wasting taxpayer money,” Biros said. “Efficient computing is about maximizing science per dollar.”

With funding from National Science Foundation (NSF), Department of Energy (DOE), and the Air Force Office of Scientific Research (AFOSR), Biros, along with other members of his group and collaborators from UT and other universities, is looking for efficient programming solutions to what he calls “grand challenge problems,” problems that require HPC resources and affect multiple scientific disciplines.

“We identify important problems and then we try and design the appropriate methods to solve these problems, always having in mind that these methods should be able to use thousands of processors,” Biros said.

For example, a recent NSF-funded grant for improving codes for N-body problems, a class of methods used in science, engineering, and data analysis, increased the code efficiency from 1 percent to 60 percent. Other current points of research for Biros include developing algorithms for elliptic and parabolic partial differential equations, key functions in modeling phenomena like subsurface flow phenomena, geophysics, soft tissue mechanics and transport, and inverse problems.

Although the algorithms are diverse in scope, making them efficient on HPC systems comes down to a core set of objectives, says Biros: Using each computing core efficiently; Ensuring equal distribution of work across all available cores; and minimizing data transfer between cores.

Divide and Conquer (and Compute)

Simply splitting equal portions of the data among cores like parts of a shopping list is too simplistic, says Biros. Instead, computing parallel algorithms for high-data, high-complexity codes is more akin to a logistics problem, such as those faced by chefs planning and preparing an intricate banquet.

“The first step is you have to change code so you can use multiple ‘cooks,’ but then the question is how to optimize the flow ingredients and information between the different stations,” Biros said.

Optimization takes other computing factors besides available cores into account. It’s accounting for the memory transfer speed, which functions at much slower rate than processors; the effects of strategically utilizing specialized processing units within the HPC systems, such as graphics processing units (GPUs); and how quickly results are needed. An effective equal distribution of work, like a well-run kitchen, says Biros, results in no unnecessary idling of parts, no replicating steps, and no redundant communication: An efficient use of time and resources.

The Language Barrier

Besides designing new algorithms, a major roadblock to achieving good per-core efficiency is the language gap between humans and computers. A poorly written code is often less about bad directions and more about the precision differences between high-level and low-level programming languages.

In short, codes written in high-level languages are more straightforward for humans to write, but leave much of the processing commands automated, offering less precision, says Biros, while low-level languages require all processing aspects to be explicitly coded, making them much more precise, but much more convoluted.

“High-level languages are the antithesis of performance today,” said Dan Stanzione, executive director of the Texas Advanced Computing Center, at the annual meeting of The Academy of Medicine, Engineering and Science of Texas in February. His statement was illustrated with striking examples: One code increased its efficiency by a factor of 2600 after being translated from Python (a high-level language) into Fortran (a low-level language). Another, after receiving a similar treatment, increased efficiency by a factor of 3 million.

However, simply writing HPC-level codes in low-level language isn’t feasible, says Biros.

“Translating computer languages are, in general, a very, very hard problem,” Biros said. “Writing complicated programs using these primitives, it’s really hard for humans to master.”

Instead, when writing his codes, Biros uses a blended approach, writing the code in a high-level language and then refining bottlenecks—portions of the code that are a drag on efficiency--in low-level.

“For these complex problems where we have physics, and data and visualization, [all low-level codes are] impossible. You just have to identify which regions of your code are critical ones.” Biros said.

Putting it all together

By applying optimization techniques and careful programming to grand challenge problems, Biros is developing algorithms that can increase the efficiency of HPC-level problems across all scientific fields, thus enabling high-data research, and research that used to be too big, or too expensive, to conduct previously.

“I think it’s cool to help science move forward by using new instruments,” Biros said. “I was very lucky what when back in the '90s I was a graduate student at Carnegie Mellon…to be exposed to the state of that art. And then I have maintained this interest throughout my career.”

According to Stanzione, Biros’ approach is working:

"He regularly gets 60 percent performance on [TACC] systems,” Stanzione said.

George Biros is holder of the W. A. “Tex” Moncrief, Jr. Simulation-Based Engineering Science Chair II.

Posted: Aug. 6, 2014