Anomaly Detection from ASRS Databases of Textual Reports

Published by Dashlink | National Aeronautics and Space Administration | Metadata Last Checked: September 07, 2025 | Last Modified: 2025-03-31

Our primary goal is to automatically analyze textual reports from the Aviation Safety Reporting System (ASRS) database to detect/discover the anomaly categories reported by the pilots, and to assign each report to the appropriate category/categories. We have used two state-of-the-art models for text analysis: (i) mixture of von Mises Fisher (movMF) distributions, and (ii) latent Dirichlet allocation (LDA) on a subset of all ASRS reports. The models achieve a reasonably high performance in discovering anomaly categories and clustering reports. Each category is represented by the most representative words with the highest probability in this category. In addition, since the inference algorithm for LDA was somewhat slow, we have developed a new fast LDA algorithm which is 5-10 times more efficient than the original one, therefore more applicable for the practical use. Further, we have developed a simple visualization tool based on non-linear manifold embedding (ISOMAP) to generate a 2-d visual representation of each report based on its content/topics, which gives a direct view of the structure of the whole dataset as well as the outliers.

Find Related Datasets

Click any tag below to search for similar datasets

Complete Metadata

bureauCode	[ "026:00" ]
identifier	DASHLINK_24
issued	2010-09-10
landingPage	https://c3.nasa.gov/dashlink/resources/24/
programCode	[ "026:029" ]

1 resource available