AI and Real Solutions on Frontera

Broad science spectrum cast by AI, machine learning, and more

In July of 2022, it seemed the floodgates opened for artificial intelligence (AI) in the sciences. AI uses computers and other machines to mimic the way the brain solves problems and makes decisions.

The AI lab DeepMind announced they had made predictions for the shape of nearly all of the 200 million proteins known to exist, an astounding achievement. What’s more, DeepMind’s prior work won Science Magazine’s 2021 Breakthrough of the Year.

The Frontera supercomputer at TACC supports a broad range of projects that use AI to generate new ways to make scientific progress on daunting problems.

Graph AI

In computer science, an important data structure called a "graph" is widely used to model a set of objects (nodes) and their relationships (edges). Physics systems, predicting protein interfaces, and classifying diseases demand a model to learn from graph inputs.

Computer scientist Arvind of the Massachusetts Institute of Technology (MIT) is using Frontera for graph AI, which takes data samples that are inherently connected or correlated. Graph AI is widely adopted in social network analysis, product recommendation, fraud detection, drug discovery, and more.

“To scale graph AI computation, we need high performance computers, like Frontera, with a large amount of compute nodes and storage,” said research scientist Xuhao Chen of MIT, a collaborator with Arvind.

PFAS Pollution

Per- and polyfluoroalkyl substances (PFAS) are a large group of manufactured chemicals used in everyday products such as stain-resistant fabrics, nonstick pans, and fire-fighting foam.. Unfortunately, PFAS exposure is linked to adverse health effects on people’s metabolism, pregnancy, and more.

Post-doc Siva Dasetty is part of a team of scientists at The University of Chicago, who along with the Argonne National Laboratory, are using machine learning to perform high-throughput screening of molecular probes.

“Frontera is instrumental in enabling us to perform broad virtual screening of chemical space to efficiently identify high-performing molecular probes for experimental synthesis and testing,” Dasetty said.

The researchers want to incorporate the molecular probes into portable devices for sensitive and real-time detection of PFAS, and into high-performance materials with high-selectivity sorption where one substance selectively sticks to another.

Cosmic Web

Astrophysicist Yueying Ni of Carnegie Mellon University uses Frontera to develop AI-assisted simulations of the cosmic web, the large-scale structure of the universe encompassing clusters of galaxies at scales of millions of light years.

She helped develop the ASTRID simulation that uses neural networks on Frontera to generate “super resolution” enhancement of dark matter simulations of galaxy evolution and black holes.

“We performed the ASTRID simulation on the Frontera supercomputer," Ni said. "Currently, this is the largest cosmological simulation that covered the epoch of Cosmic Noon, when star formation and supermassive black holes both reached their peak activity."

Her work shows that deep learning and cosmological simulations can form a powerful combination to model the universe over its full dynamic range.

Future Mars Rover

Data scientist Chris Mattmann at the NASA Jet Propulsion Laboratory (JPL) is using machine learning on Frontera to explore the planet Mars. Frontera is helping his team train and analyze an image captioning algorithm for the Mars Surface terrain.

JPL calls the effort Drive-By Science — the idea is for a future “Smart rover” with graphics processing units (GPU) onboard to run an image captioning model like Google’s Show and Tell Neural image caption generator.

Instead of returning approximately 200 images a day, it could return about 1,000,000 captions because text is less expensive to send than images.

Mattmann used Frontera to train Google Show & Tell on Mars public data from the MSL (Mars Science Laboratory) mission. Frontera also provided GPUs for his team to train up the Show & Tell model, and then apply it using inference to new data to generate human readable captions.

“The future of space and planetary assets will include GPU-like computing,” Mattmann said.

AlphaFold and Cells

Biophysicist Liao Chen of UT San Antonio is using Frontera to perform large-scale all-atom simulations of neutral soluble molecular transporters in cell-like environments. A good example of this is pyruvate, the end product of glycolysis that plays a major role in cell metabolism.

In October 2021, Chen published work in the Journal of Chemical Information and Modeling that developed a new model of mitochondrial pyruvate carrier proteins.

In addition to using all-atom molecular dynamics simulations, Chen used Frontera to compare the structural models his team developed to those generated with RoseTTA-fold and also with the alternative deep-learning algorithm, AlphaFold, developed by DeepMind.

“The state-of-the-art supercomputing is easily accessible at TACC,” Chen said.