POEMAS Solar Dataset and Infrastructure
High-resolution solar time-series data, scientific infrastructure, and consolidated FITS products associated with a peer-reviewed publication in Astronomy and Computing.
Published scientific paper
MongoDB scalability for astronomical time series: The POEMAS solar radio telescope evaluation without HPC
This page connects the MackSun portal to the peer-reviewed paper that evaluated MongoDB scalability for astronomical time series under constrained computational resources, using real POEMAS solar radio observations.
The publication demonstrates practical scalability boundaries for scientific data processing without HPC and strengthens the scientific foundation of MackSun as a solar data portal focused on accessibility, reuse, and long-duration astronomical time-series analysis.
About the POEMAS solar dataset
POEMAS (POlarization Emission of Millimeter Activity at the Sun) is a solar radio telescope that monitors the Sun at 45 GHz and 90 GHz, with left and right circular polarization, at a temporal resolution of 10 milliseconds.
This acquisition model generates a dense solar time series dataset with millions of records per day and approximately 1.1 billion observations per year. In the production deployment described in the paper, the full historical POEMAS dataset reached approximately 3.3 billion records and generated around 50 GB of consolidated FITS products.
The resulting data products and metadata support scientific reuse in solar physics, astrophysics, radio astronomy, and large-scale scientific data engineering, while improving access to high resolution solar observations through the MackSun portal.
Why this work matters
Modern solar instruments generate increasingly dense and heterogeneous observational data. This creates practical challenges for scientific teams that need to store, query, aggregate, compress, and preserve large astronomical time series datasets over long time spans.
At CRAAM, this problem was evaluated under a strict resource-constrained scenario: a single virtual machine with 32 GB of RAM, with 16 GB allocated to MongoDB. The central question was whether a virtualized sharded cluster on a single physical host could offer practical scalability advantages over a standalone deployment.
This makes the study especially relevant for observatories, laboratories, and research groups that need to manage scientific data at scale without access to dedicated HPC infrastructure.
Astronomical time series evaluation scope
The empirical evaluation used real POEMAS solar radio telescope data and compared performance at three dataset scales:
- 15 million documents
- 150 million documents
- 500 million documents
The workloads included selective queries, range filters, sequential reads, and global aggregations, representing realistic analytical demands found in scientific and astronomical time series access.
Main results
- Sharding introduced coordination overhead for selective queries.
- Sharding delivered substantial gains for global aggregations.
- Aggregation speedups exceeded 600% in the evaluated scenarios.
- Compression ratios remained near 85%.
- An operational threshold of roughly 150 million documents per collection was identified to sustain stable performance under the available resources.
These results provide practical evidence that billion-scale astronomical time series can be managed efficiently in constrained environments when architecture, partitioning strategy, and workload profile are carefully aligned.
MackSun as a solar data portal
MackSun was created to improve access to solar data, solar time series datasets, solar flare observations, and scientific products derived from long-term monitoring instruments. In this context, the POEMAS publication strengthens the portal by linking peer-reviewed evidence to real operational data products.
This page supports discovery and indexing for topics such as:
- solar data
- solar time series data
- solar flare data
- solar radio telescope data
- astronomical time series
- scientific data engineering
- MongoDB scalability for scientific workloads
- FITS products from solar instruments
MackSun provides open access to solar data, solar time series datasets, and solar flare observations, supporting research in astrophysics, radio astronomy, and large-scale scientific data processing.
Access and related pages
To explore more content related to this project, visit the MackSun sections below:
Recommended citation
If you use this page, the associated infrastructure description, or the scientific context related to the POEMAS dataset, please cite the published paper below:
MongoDB scalability for astronomical time series: The POEMAS solar radio telescope evaluation without HPC
Astronomy and Computing.
DOI: https://doi.org/10.1016/j.ascom.2025.101053
Scientific and educational value
Beyond database benchmarking, this work serves as a practical reference for research groups, observatories, data engineers, and students interested in handling large astronomical datasets without HPC, while preserving accessibility, analytical performance, and reproducibility.
It also reinforces the role of MackSun as a scientific portal for solar observation products, astronomical time series datasets, and operationally realistic data architectures in astronomy.