
Astroinformatics is an interdisciplinary field of study involving the combination ofastronomy,data science,machine learning,informatics, andinformation/communications technologies.[2][3] The field is closely related toastrostatistics.
Data-driven astronomy (DDA) refers to the use ofdata science inastronomy. Several outputs oftelescopic observations andsky surveys are taken into consideration and approaches related todata mining and big data management are used to analyze, filter, andnormalize thedata set that are further used for making Classifications, Predictions, and Anomaly detections byadvanced Statistical approaches,digital image processing andmachine learning. The output of these processes is used byastronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in thecosmos.
Astroinformatics is primarily focused on developing the tools, methods, and applications ofcomputational science,data science,machine learning, andstatistics for research and education in data-oriented astronomy.[2] Early efforts in this direction includeddata discovery,metadata standards development,data modeling, astronomicaldata dictionary development,data access,information retrieval,[4]data integration, anddata mining[5] in the astronomicalVirtual Observatory initiatives.[6][7][8] Further development of the field, along with astronomy community endorsement, was presented to theNational Research Council (United States) in 2009 in the astroinformatics "state of the profession" position paper for the 2010Astronomy and Astrophysics Decadal Survey.[9] That position paper provided the basis for the subsequent more detailed exposition of the field in the Informatics Journal paperAstroinformatics: Data-Oriented Astronomy Research and Education.[2]
Astroinformatics as a distinct field of research was inspired by work in the fields ofGeoinformatics,Cheminformatics,Bioinformatics, and through theeScience work[10] ofJim Gray (computer scientist) atMicrosoft Research, whose legacy was remembered and continued through the Jim Gray eScience Awards.[11]
Although the primary focus of astroinformatics is on the large worldwide distributed collection of digital astronomical databases, image archives, and research tools, the field recognizes the importance of legacy data sets as well—using modern technologies to preserve and analyze historical astronomical observations. Some Astroinformatics practitioners help todigitize historical and recent astronomical observations and images in a largedatabase for efficient retrieval throughweb-based interfaces.[3][12] Another aim is to help develop new methods and software for astronomers, as well as to help facilitate the process and analysis of the rapidly growing amount of data in the field of astronomy.[13]
Astroinformatics is described as the "fourth paradigm" of astronomical research.[14] There are many research areas involved with astroinformatics, such as data mining, machine learning, statistics, visualization, scientific data management, and semantic science.[7] Data mining and machine learning play significant roles in astroinformatics as ascientific research discipline due to their focus on "knowledge discovery from data" (KDD) and "learning from data".[15][16]
The amount of data collected from astronomical sky surveys has grown from gigabytes to terabytes throughout the past decade and is predicted to grow in the next decade into hundreds of petabytes with theLarge Synoptic Survey Telescope and into the exabytes with theSquare Kilometre Array.[17] This plethora of new data both enables and challenges effective astronomical research. Therefore, new approaches are required. In part due to this, data-driven science is becoming a recognized academic discipline. Consequently, astronomy (and other scientific disciplines) are developing information-intensive and data-intensive sub-disciplines to an extent that these sub-disciplines are now becoming (or have already become) standalone research disciplines and full-fledged academic programs. While many institutes of education do not boast an astroinformatics program, such programs most likely will be developed in the near future.
Informatics has been recently defined as "the use of digital data, information, and related services for research and knowledge generation". However the usual, or commonly used definition is "informatics is the discipline of organizing, accessing, integrating, and mining data from multiple sources for discovery and decision support." Therefore, the discipline of astroinformatics includes many naturally-related specialties including data modeling, data organization, etc. It may also include transformation and normalization methods for data integration and information visualization, as well as knowledge extraction, indexing techniques, information retrieval and data mining methods. Classification schemes (e.g.,taxonomies,ontologies,folksonomies, and/or collaborativetagging[18]) plusAstrostatistics will also be heavily involved.Citizen science projects (such asGalaxy Zoo) also contribute highly valued novelty discovery, feature meta-tagging, and object characterization within large astronomy data sets. All of these specialties enable scientific discovery across varied massive data collections, collaborative research, and data re-use, in both research and learning environments.
In 2007, theGalaxy Zoo project[19] was launched formorphological classification[20][21] of a large number ofgalaxies. In this project, 900,000 images were considered for classification that were taken from theSloan Digital Sky Survey (SDSS)[22] for the past 7 years. The task was to study each picture of a galaxy, classify it aselliptical orspiral, and determine whether it was spinning or not. The team of Astrophysicists led byKevin Schawinski inOxford University were in charge of this project and Kevin and his colleagueChris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work.[23] There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.[24]
In 2012, two position papers[25][26] were presented to the Council of theAmerican Astronomical Society that led to the establishment of formal working groups in astroinformatics and Astrostatistics for the profession ofastronomy within the US and elsewhere.[27]
Astroinformatics provides a natural context for the integration of education and research.[28] The experience of research can now be implemented within the classroom to establish and growdata literacy through the easy re-use of data.[29] It also has many other uses, such as repurposing archival data for new projects, literature-data links, intelligent retrieval of information, and many others.[30]
The data retrieved from the sky surveys are first brought fordata preprocessing. In this,redundancies are removed and filtrated. Further,feature extraction is performed on this filtered data set, which is further taken for processes.[31] Some of the renowned sky surveys are listed below:
The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB.[31] Further,data mining tasks that are involved in the management and manipulation of the data involve methods likeclassification,regression,clustering,anomaly detection, andtime-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.
Classification[40] is used for specific identifications and categorizations of astronomical data such asSpectral classification, Photometric classification, Morphological classification, and classification ofsolar activity. The approaches of classification techniques are listed below:
Regression[41] is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetchingPhotometric redshifts and measurements of physical parameters of stars.[42] The approaches are listed below:
Clustering[43] is classifying objects based on asimilarity measure metric. It is used in Astronomy for Classification as well asSpecial/rare object detection. The approaches are listed below:
Anomaly detection[45] is used for detecting irregularities in the dataset. However, this technique is used here to detectrare/special objects. The following approaches are used:
Time-Series analysis[46] helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:
| Year | Place | Link |
|---|---|---|
| 2021 | Caltech | [1] |
| 2020 | Harvard | [2] |
| 2019 | Caltech | [3] |
| 2018 | Heidelberg,Germany | [4] |
| 2017 | Cape Town,South Africa | [5] |
| 2016 | Sorrento,Italy | [6] |
| 2015 | Dubrovnik,Dalmatia | [7] |
| 2014 | University of Chile | [8] |
| 2013 | Australia Telescope National Facility,CSIRO | [9] |
| 2012 | Microsoft Research | [10]Archived 2018-10-22 at theWayback Machine |
| 2011 | Sorrento,Italy | [11] |
| 2010 | Caltech | [12]Archived 2018-10-22 at theWayback Machine |
Additional conferences and conference lists:
| Item | Link |
|---|---|
| Machine Learning in Astronomy: Possibilities and Pitfalls (2022) | [13] |
| The Astrostatistics and Astroinformatics Portal (ASAIP) big list of conferences | [14] |
| Astronomical Data Analysis Software and Systems (ADASS) annual conferences | [15] |
{{cite book}}:|journal= ignored (help)