While automation has become the new normal in our everyday modern lives, it is also assisting scientists in making strides in analyzing large data sets. With decades of archival data from radio telescopes from around the world available in storage, astronomers at West Virginia University (WVU) are using automation and machine learning techniques to dig through mounds of archival data with the hopes of uncovering new clues about mysterious cosmic phenomena like Fast Radio Bursts (FRBs).
FRBs are millisecond-long blasts of radio waves that are incredibly bright. Discovered in 2007 by WVU Department of Physics and Astronomy Professor Duncan Lorimer, FRS, his graduate student and WVU Physics and Astronomy Professor Maura McLaughlin, using archival data from the Parkes Telescope, FRBs are one of the hottest topics in the field of radio astronomy.
The automation in question is a newer, more efficient process created by astronomers to assist in reviewing large amounts of data in a reduced amount of time. This process saves researchers time and allows them to focus on the actual science, rather than spend unnecessary hours mulling over plots of data, looking for the “needle in a haystack”.
WVU Department of Physics and Astronomy Graduate student and Center for Gravitational Waves and Cosmology researcher, Graham Doskoch (below) is one of the team members leading the efforts of the cross-institutional effort known as The Petabyte Project (TPP). TPP uses new automation and machine learning techniques to search data from the last couple of decades, beginning with data from the 1990s to the present day. The data was collected by radio telescopes from all over the world, including the Green Bank Telescope and the Arecibo Observatory (United States), the Murriyang telescope at the Parkes Observatory (Australia), and sites in Europe.
While the data is not new, the innovative techniques have been greatly improved with modern programming for searching and expedited results, thanks to GPU-accelerated code used in the search. GPUs, or Graphics Processing Units, enable specialized software to sift through large amounts of data at a much faster speed of return than humans or CPUs (central processing units) could manage. The accelerated pace of data searching allows researchers to “crunch” vast amounts of data, with high precision, in a shorter span of time, improving the overall results of the search. Doskoch describes another key advantage of the pipeline, explaining “we’re running one uniform pipeline on all the data; different surveys had used different search techniques. This means we can better compare our results across different data sets”.
Machine learning models allow researchers to fine-tune their search for FRB candidates in the data. “We can use a machine learning tool to pare down these candidates, removing many that are likely noise or radio frequency interference,” explains Doskoch, “which is useful because our pipeline generates a lot of candidates. We’ve built a database at WVU to store information about the raw data, as well as the candidate sources. After being searched by the codes, promising candidates are then looked at by a human”.
“We don’t have many detections of FRBs at high frequencies, so the goal would be to detect a high frequency FRB, and even better, a repeating FRB, which would provide a fantastic opportunity"Graham Doskoch
By guiding machine learning tools to scale data, researchers like Doskoch, can spend less time looking over plots (that are predominantly noise) and more time focusing on the plots that may lead to detection. According to Prof. Sarah Burke-Spolar, WVU Dept. of Physics and Astronomy, and Center for Gravitational Waves and Cosmology researcher, “Our automation greatly reduces the number of plots that graduate students need to look at and greatly raises the probability of detection. It also allows the student to spend more time thinking about science.” The TPP uses this added efficiency not to take away any educational components, opportunities or a job, but rather enrich the process with the goal of making the things they already do a lot easier and the scientific outcomes a lot clearer.
One goal of TPP is to run the pipeline of archival data in order to better understand FRBs. One strong model of FRBs suggests the origin or precursor of FRBs are magnetars, a type of neutron star with very powerful magnetic fields. They are the most magnetic stars in the universe. While their origins are not fully understood, many scientific models have suggested the mysterious radio bursts may originate from magnetars, which are expected to be particularly bright at higher radio frequencies. One of TPP’s goals is to produce some of the most sensitive FRB rate limits at high frequencies to date.
Collaboration is key in a project of this magnitude.
Enter the Deep Space Network (DSN), and colleagues from NASA’s Jet Propulsion Laboratory, or JPL. The DSN is NASA’s international network of facilities used to communicate with faraway spacecraft exploring our solar system. The DSN is a powerful collection of antennas whose primary purpose is to facilitate communications with space-based missions, connection for commanding spacecraft and receiving their never-before-seen images and scientific information on Earth. Due to its unique set of instrumentation, the DSN is also key in producing high frequency FRB data, which is valuable for TPP.
With the plan in place, and the instruments listening, the team is working to refine the data processing techniques to work higher frequencies with FRBs in mind. According to Doskoch, “it makes sense to look for FRBs at high frequencies, but high frequencies give us new challenges; for example, interstellar effects that are characteristic of astrophysical sources are harder to see at high frequencies. This makes high-frequency bursts look a lot like interference.” The team wants to polish the parameters and clean up the machine learning-side of observing, with the hopes of uncovering more details about FRBs. In addition to better detection, the team explains that improved data can be very valuable in improving machine learning algorithms for future data processing and deep learning models.
Doskoch works closely with NASA/JPL Astrophysicist, Walid Majid, who studies pulsars and FRBs, to advance the TPP project. Doskoch is thankful for the collaboration and expertise offered by Majid. “I’m very grateful for Walid and the JPL folks. Having someone who knows the ins and outs of the dataset is super valuable. Looking at new DSN data, and having someone who understands the quirks is also valuable.”
“Even with non-detections, we can still place solid constraints on the data, further polishing the data for future use,” Doskoch states. In the hopes of ideal outcomes of the project, he says “If you can detect an FRB at high frequencies, and it looks like it has a magnetar-like spectrum, that would certainly support magnetar models.”
“We don’t have many detections of FRBs at high frequencies, so the goal would be to detect a high frequency FRB, and even better, a repeating FRB, which would provide a fantastic opportunity,” Doskoch continues.
The promise of new data, improved search technologies and clues into the location and origin of FRBs drive this project. “I’m excited to be able to assemble a large high frequency dataset and search at frequencies that aren’t commonly looked at,” Doskoch explains. “Maybe we will find something that has previously gone undetected.”
The Petabyte Project is supported by the U.S. National Science Foundation under Grant # 2108673.
hal/12/2025
Contact: Holly Legleiter, Public Relations Specialist
Center for Gravitational Waves and Cosmology
hlegleiter@mail.wvu.edu