HETDEX Opens Massive Cosmic Dataset to Scientists, Novices, and AI
Today, the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX) - which recently completed the largest survey ever taken of the early universe – has released all of its immense, information-rich database to the public. Built from more than half a petabyte of raw and processed data, it will allow astronomers to study how the first galaxies formed and evolved, measure how gas and stars were distributed within these galaxies, map the large-scale structure of the cosmos, and investigate rare and unexpected objects not easily found in traditional surveys.
HETDEX observations make use of a technique called spectroscopy. With it, light is broken apart into its various wavelengths: a spectrum. Astronomers examine spectra for peaks and valleys, which tell them about an object’s chemistry, movement through space, and distance from Earth. The HETDEX database contains a whopping 600 million spectra for a period of history known as Cosmic Noon, 10 billion to 12 billion years ago.
From 2017 to 2024, the Hobby-Eberly Telescope at McDonald Observatory surveyed a region of night sky equivalent to 2,000 full Moons, creating a map of the distant universe. HETDEX is using that map to solve the riddle of dark energy, the unknown substance causing our universe to expand more and more quickly over time. To do this, it is charting the location of over a million early galaxies. However, it has also gathered data on all of the space in between.
“HETDEX has given us a new view of the vast cosmic web in the universe, from the present day to the distant past,” says Eiichiro Komatsu, a founding member of the project, director at the Max Planck Institute for Astrophysics, and co-author on the paper. "We can now see the cosmic web in exquisite detail in the nearby universe. For the first time, we can also see it in the distant universe. The team is working hard to extract information about dark energy from the data. However, the HETDEX data are about more than just dark energy. I am excited that this data release gives everyone a chance to discover something new about the universe.”
In addition to raw data, the release also contains a catalog of every object HETDEX has found so far: over one million distant galaxies, half a million nearby star-forming galaxies, 18,000 supermassive blackholes, and over 150,000 stars. Scientists, students, and citizen researchers can download customized subsets of data based on sky location.
While the release is based on half a petabyte of data, the team was able to process it down to a more manageable 10 terabytes. It also developed extensive tutorials and tools to help users – both human and AI – to make the most of this massive, complex dataset.
Today’s release marks the first time the full HETDEX dataset and survey catalog have been made available together. While the core survey is now complete, observations are ongoing, calibrations continue to improve, and supplementary releases are expected for the future.












