@@ -26,6 +26,8 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
2626##📜 Table of Contents
2727
2828- [ 🏆 Leaderboard] ( #-leaderboard )
29+ - [ 👀 DevOps] ( #-devops )
30+ - [ 🔥 AIOps] ( #-aiops )
2931- [ ⏬ Data] ( #-data )
3032- [ 👀 Notes] ( #-notes )
3133- [ 🔥 AIOps Sample Example] ( #-aiops-sample-example )
@@ -36,19 +38,19 @@ DevOps-Eval is a comprehensive evaluation suite specifically designed for founda
3638
3739##🏆 Leaderboard
3840Below are zero-shot and five-shot accuracies from the models that we evaluate in the initial release. We note that five-shot performance is better than zero-shot for many instruction-tuned models.
39-
41+ ### DevOps
4042####Zero Shot
4143
4244| ** ModelName** | plan| code| build| test| release| deploy| operate| monitor| ** AVG** |
4345| :------------------------:| :-----:| :-----:| :-----:| :------:| :--------:| :------:| :-------:| :--------:| :-----------:|
44- | ** DevOps-Model -14B-Chat** | 60.61| 78.35| 84.86| 84.65| 87.26| 82.75| 81.34| 79.17| ** 80.34** |
45- | ** DevOps-Model -14B-Base** | 54.55| 77.82| 83.49| 85.96| 86.32| 81.96| 85.82| 82.41| ** 80.26** |
46+ | ** DevOpsPal -14B-Chat** | 60.61| 78.35| 84.86| 84.65| 87.26| 82.75| 81.34| 79.17| ** 80.34** |
47+ | ** DevOpsPal -14B-Base** | 54.55| 77.82| 83.49| 85.96| 86.32| 81.96| 85.82| 82.41| ** 80.26** |
4648| Qwen-14B-Chat| 60.61| 75.4| 85.32| 84.21| 89.62| 82.75| 83.58| 80.56| 79.28|
4749| Qwen-14B-Base| 57.58| 73.81| 84.4| 85.53| 86.32| 81.18| 82.09| 80.09| 77.92|
4850| Baichuan2-13B-Base| 60.61| 69.42| 79.82| 79.82| 82.55| 81.18| 85.07| 83.8| 75.10|
4951| Baichuan2-13B-Chat| 60.61| 68.43| 77.98| 80.7| 81.6| 83.53| 82.09| 84.72| 74.60|
50- | ** DevOps-Model -7B-Chat** | 54.55| 69.11| 83.94| 82.02| 76.89| 80| 79.85| 77.78| ** 74.00** |
51- | ** DevOps-Model -7B-Base** | 54.55| 68.96| 82.11| 78.95| 80.66| 76.47| 79.85| 78.7| ** 73.55** |
52+ | ** DevOpsPal -7B-Chat** | 54.55| 69.11| 83.94| 82.02| 76.89| 80| 79.85| 77.78| ** 74.00** |
53+ | ** DevOpsPal -7B-Base** | 54.55| 68.96| 82.11| 78.95| 80.66| 76.47| 79.85| 78.7| ** 73.55** |
5254| Qwen-7B-Base| 53.03| 68.13| 78.9| 75.44| 80.19| 80| 83.58| 80.09| 73.13|
5355| Qwen-7B-Chat| 57.58| 66.01| 80.28| 79.82| 76.89| 77.65| 80.6| 79.17| 71.96|
5456| Baichuan2-7B-Chat| 54.55| 63.66| 77.98| 76.32| 71.7| 73.33| 75.37| 79.63| 68.17|
@@ -61,21 +63,59 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
6163
6264| ** ModelName** | plan| code| build| test| release| deploy| operate| monitor| ** AVG** |
6365| :------------------------:| :-----:| :-----:| :-----:| :------:| :--------:| :------:| :-------:| :--------:| :---------:|
64- | ** DevOps-Model -14B-Chat** | 63.64| 79.49| 81.65| 85.96| 86.79| 86.67| 89.55| 81.48| ** 81.77** |
65- | ** DevOps-Model -14B-Base** | 62.12| 80.55| 82.57| 85.53| 85.85| 84.71| 85.07| 80.09| ** 81.70** |
66+ | ** DevOpsPal -14B-Chat** | 63.64| 79.49| 81.65| 85.96| 86.79| 86.67| 89.55| 81.48| ** 81.77** |
67+ | ** DevOpsPal -14B-Base** | 62.12| 80.55| 82.57| 85.53| 85.85| 84.71| 85.07| 80.09| ** 81.70** |
6668| Qwen-14B-Chat| 65.15| 76| 82.57| 85.53| 84.91| 84.31| 85.82| 81.48| 79.55|
6769| Qwen-14B-Base| 66.67| 76.15| 84.4| 85.53| 86.32| 80.39| 86.57| 80.56| 79.51|
6870| Baichuan2-13B-Base| 63.64| 71.39| 80.73| 82.46| 81.13| 84.31| 91.79| 85.19| 77.09|
6971| Qwen-7B-Base| 75.76| 72.52| 78.9| 81.14| 83.96| 81.18| 85.07| 81.94| 77.02|
7072| Baichuan2-13B-Chat| 62.12| 69.95| 76.61| 84.21| 83.49| 79.61| 88.06| 80.56| 75.32|
71- | ** DevOps-Model -7B-Chat** | 66.67| 69.95| 83.94| 81.14| 80.19| 82.75| 82.84| 76.85| ** 75.25** |
72- | ** DevOps-Model -7B-Base** | 69.7| 69.49| 82.11| 81.14| 82.55| 82.35| 80.6| 79.17| ** 75.17** |
73+ | ** DevOpsPal -7B-Chat** | 66.67| 69.95| 83.94| 81.14| 80.19| 82.75| 82.84| 76.85| ** 75.25** |
74+ | ** DevOpsPal -7B-Base** | 69.7| 69.49| 82.11| 81.14| 82.55| 82.35| 80.6| 79.17| ** 75.17** |
7375| Qwen-7B-Chat| 65.15| 66.54| 82.57| 81.58| 81.6| 81.18| 80.6| 81.02| 73.62|
7476| Baichuan2-7B-Base| 60.61| 67.22| 76.61| 75| 77.83| 78.43| 80.6| 79.63| 72.11|
7577| Internlm-7B-Chat| 60.61| 63.06| 79.82| 80.26| 67.92| 75.69| 73.88| 77.31| 71.09|
7678| Baichuan2-7B-Chat| 60.61| 64.95| 81.19| 75.88| 71.23| 75.69| 78.36| 79.17| 70.49|
7779| Internlm-7B-Base| 62.12| 65.25| 77.52| 80.7| 74.06| 78.82| 79.85| 75.46| 69.17|
7880
81+ ###AIOps
82+ ####Zero Shot
83+ | ** ModelName** | LogParsing| RootCauseAnalysis| TimeSeriesAnomalyDetection| TimeSeriesClassification| ** AVG** |
84+ | :-------------------:| :------------:| :------------------:| :---------------------------:| :-------------------------:| :-------:|
85+ | Qwen-14B-Base| 66.29| 58.8| 25.33| 43.5| 49.27|
86+ | DevOpsPal-14B—Base| 63.14| 53.6| 23.33| 43.5| 46.55|
87+ | DevOpsPal-14B—Chat| 60| 56| 24| 43| 46.18|
88+ | Qwen-14B-Chat| 64.57| 51.6| 22.67| 36| 45|
89+ | Qwen-7B-Base| 50| 39.2| 22.67| 54| 40.82|
90+ | Qwen-7B-Chat| 57.43| 38.8| 22.33| 39.5| 40.36|
91+ | DevOpsPal-7B—Chat| 56.57| 30.4| 25.33| 45| 40|
92+ | Baichuan2-13B-Chat| 64| 18| 21.33| 37.5| 37.09|
93+ | Baichuan2-7B-Chat| 60.86| 10| 28| 34.5| 35.55|
94+ | Baichuan2-7B-Base| 53.43| 12.8| 27.67| 36.5| 34.09|
95+ | Internlm-7B—Base| 48.57| 18.8| 23.33| 37.5| 32.91|
96+ | Baichuan2-13B-Base| 54| 12.4| 23| 34.5| 32.55|
97+ | DevOpsPal-7B—Base| 46.57| 20.8| 25| 34| 32.55|
98+ | Internlm-7B—Chat| 58.86| 8.8| 22.33| 28.5| 32|
99+
100+ ####One Shot
101+ | ** ModelName** | LogParsing| RootCauseAnalysis| TimeSeriesAnomalyDetection| TimeSeriesClassification| ** AVG** |
102+ | :-------------------:| :------------:| :------------------:| :---------------------------:| :-------------------------:| :-------:|
103+ | DevOpsPal-14B—Chat| 66.29| 80.8| 23.33| 44.5| 53.91|
104+ | Qwen-14B-Base| 64.29| 74.4| 28| 48.5| 53.82|
105+ | DevOpsPal-14B—Base| 60| 74| 25.33| 43.5| 50.73|
106+ | Qwen-14B-Chat| 49.71| 65.6| 28.67| 48| 47.27|
107+ | Qwen-7B-Base| 56| 60.8| 27.67| 44| 47.18|
108+ | DevOpsPal-7B—Base| 52.86| 44.4| 28| 44.5| 42.64|
109+ | Qwen-7B-Chat| 54.57| 52| 29.67| 26.5| 42.09|
110+ | Baichuan2-13B-Base| 56| 43.2| 24.33| 41| 41.73|
111+ | Baichuan2-13B-Chat| 57.43| 44.4| 25| 25.5| 39.82|
112+ | Baichuan2-7B-Base| 48.29| 40.4| 27| 42| 39.55|
113+ | Baichuan2-7B-Chat| 58.57| 31.6| 27| 31.5| 38.91|
114+ | DevOpsPal-7B—Chat| 56.57| 27.2| 25.33| 41.5| 38.64|
115+ | Internlm-7B—Base| 48| 33.2| 29| 35| 37.09|
116+ | Internlm-7B—Chat| 62.57| 12.8| 22.33| 21| 32.73|
117+
118+
79119##⏬ Data
80120####Download
81121* Method 1: Download the zip file (you can also simply open the following link with the browser):