Souryadeepta Majumdar, Diploma Level
Data Science has been the pivotal element for several industries over the decade but its profound impact in the advancement of Astronomy and Astrophysics cannot be overstated. In fact, a specialized subset, known as the Astronomical Data Science, has emerged as a critical component in this realm. An overview of how different domains of Astronomy got facilitated by Data Science has been discussed further.
To enhance our understanding of the Cosmos and derive meaningful insights from the vast arrays of the astronomical datasets, the integration of Data Science has been pivotal. Specifically known as Data-Driven Astronomy (DDA), its application involves leveraging data points to draw conclusions, make classifications, analysis, predictions and much more.
DDA helps the Astronomers to conclude patterns, study and analyze the conclusions in order to detect anomalies in the celestial phenomena. It uses advanced statistical methods and specific tasks like Image Processing and Data Mining to conclude and provide desirable outputs for further specific tasks which in turn not only saves time for most astronomers but also helps them to focus on interpreting the results for a deeper understanding of the universe. A significant utilization of the domain can be spotted in the Galaxy Zoo project (2007), that was led by Prof. Kevin Schawinski, an Astrophysicist in Oxford University and had the task to classify 900,000 images of galaxies gathered by the Sloan Digital Sky Survey (SDSS) for a period of 7 years and classify them as “elliptical” or “spherical” and determine whether they were spinning or not. In order to ease the tedious task, he and his team used Data Science and Analytical techniques in order to derive at the output faster as it was estimated that a person had to work constantly for a period of 3-5 years in order to get an output. DDA techniques scaled down the trivial job to a drastic extent.
Data Mining has been the most approached processes among the other existing Data Science approaches as it plays a crucial role in Big Data Management. Classification algorithms namely ANN (Artificial Neural Network), SVM (Support Vector Machine), LQ (Learning Vector Quantization), DT (Decision Trees),Random Forest, KNN (K Nearest Neighbours), Naïve Bayes Networks, Gaussian Process and so on are used for tasks like Spectral Classification, Photometric Classification, tracking solar activity and Morphological Classification of galaxies (Note that Spectral and Photometric Classifications range for stars, galaxies, quasars and supernovas).
Regression techniques are used for Photometric redshifts of galaxies and quasars and SPPM (Stellar Physical Parameter Measurement). Renowned regression models used for these applications involve the Regression algorithms of the existing Classifier models (i.e., SVM Regressor, Random Forest Regressor, Gaussian Regression, etc.) and some other exclusive models such as LSR (Least Square Regression), Partial Least Squares, etc.
Clustering by K-means, Cobweb, SOM (Self organizing Map) and similar algorithms is another noteworthy approach for classification and rare object detection. However, a major significance in Data Mining lies in Anomaly Detection and Feature Extraction in the field of Astronomy. Hierarchical Clustering, PCA (Principal Component Analysis) and even K-Means are used for Anomaly Detection that helps in rare object identification. Feature selection and extraction involve a very important role in the realm as the datasets that DDA deals with is the trending Big Data and hence feature selection and extraction can help in dimension reductions and noise reductions. Some of the trending approaches involve Random Search, Best First, Genetic Search, Greedy Stepwise, ReliefF, PCA, ICA (Independent Component Analysis), MDS (Multidimensional Scaling), Factor Analysis, Kernel PCA, KPLS (Kernel Partial Least Squares). All the above-mentioned Data Mining applications and approaches are successfully implemented on datasets comprising of Big Data that range from a 3TB input data of DPOSS (The Palomar Digital Sky Survey) to almost as big as 4.6EB (≈4600TB) input data from SKA (The Square Kilometer Array).
Hence, Data Science has been a great friend of an Astronomer in his/her journey of exploring the universe by reducing the trivial tasks of filtering the data and enhancing the quality output that could be provided. With the help of specific domains like Data Mining, BDM and ML, astronomers can now successfully uncover the mysteries of the universe and gain insights on evolution, origin, future of processes and even structures and components.
References:
[1] S. Pati, “How Data Science is Used in Astronomy?” , June 2021, https://www.analyticsinsight.net/how-data-science-is-used-in-astronomy/
[2] Y. Zhang, Y. Zhao, “Astronomy in the Big Data Era”, CoData Data Science Journal
[3] Anonymous, “Galaxy Zoo”, Wikipedia, https://en.wikipedia.org/wiki/Galaxy_Zoo
[4] Sloan Digital Sky Surveys, SDSS, https://www.sdss4.org/surveys/