Computer Science > Software Engineering
arXiv:2205.13457 (cs)
[Submitted on 26 May 2022]
Title:AutoTSG: Learning and Synthesis for Incident Troubleshooting
View a PDF of the paper titled AutoTSG: Learning and Synthesis for Incident Troubleshooting, by Manish Shetty and 4 other authors
View PDFAbstract:Incident management is a key aspect of operating large-scale cloud services. To aid with faster and efficient resolution of incidents, engineering teams document frequent troubleshooting steps in the form of Troubleshooting Guides (TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed, unstructured, and often incomplete, requiring developers to manually understand and execute necessary steps. This results in a plethora of issues such as on-call fatigue, reduced productivity, and human errors. In this work, we conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents and find that TSGs are widely used and help significantly reduce mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and propose a taxonomy of issues that highlights significant gaps in TSG quality. To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG -- a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis. Our evaluation of AutoTSG on 50 TSGs shows the effectiveness in both identifying TSG statements (accuracy 0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly, we survey ten Microsoft engineers and show the importance of TSG automation and the usefulness of AutoTSG.
Subjects: | Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Programming Languages (cs.PL) |
Cite as: | arXiv:2205.13457 [cs.SE] |
(orarXiv:2205.13457v1 [cs.SE] for this version) | |
https://doi.org/10.48550/arXiv.2205.13457 arXiv-issued DOI via DataCite |
Submission history
From: Manish Shetty Molahalli [view email][v1] Thu, 26 May 2022 16:05:11 UTC (531 KB)
Full-text links:
Access Paper:
- View PDF
- TeX Source
- Other Formats
View a PDF of the paper titled AutoTSG: Learning and Synthesis for Incident Troubleshooting, by Manish Shetty and 4 other authors
Current browse context:
cs.SE
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer(What is the Explorer?)
Connected Papers(What is Connected Papers?)
Litmaps(What is Litmaps?)
scite Smart Citations(What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv(What is alphaXiv?)
CatalyzeX Code Finder for Papers(What is CatalyzeX?)
DagsHub(What is DagsHub?)
Gotit.pub(What is GotitPub?)
Hugging Face(What is Huggingface?)
Papers with Code(What is Papers with Code?)
ScienceCast(What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower(What are Influence Flowers?)
CORE Recommender(What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community?Learn more about arXivLabs.