@@ -41,14 +41,14 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
4141
4242| ** ModelName** | plan| code| build| test| release| deploy| operate| monitor| ** AVG** |
4343| :------------------------:| :-----:| :-----:| :-----:| :------:| :--------:| :------:| :-------:| :--------:| :-----------:|
44- | ** DevOpa -Model-14B-Chat** | 60.61| 78.35| 84.86| 84.65| 87.26| 82.75| 81.34| 79.17| ** 80.34** |
45- | ** DevOpa -Model-14B-Base** | 54.55| 77.82| 83.49| 85.96| 86.32| 81.96| 85.82| 82.41| ** 80.26** |
44+ | ** DevOps -Model-14B-Chat** | 60.61| 78.35| 84.86| 84.65| 87.26| 82.75| 81.34| 79.17| ** 80.34** |
45+ | ** DevOps -Model-14B-Base** | 54.55| 77.82| 83.49| 85.96| 86.32| 81.96| 85.82| 82.41| ** 80.26** |
4646| Qwen-14B-Chat| 60.61| 75.4| 85.32| 84.21| 89.62| 82.75| 83.58| 80.56| 79.28|
4747| Qwen-14B-Base| 57.58| 73.81| 84.4| 85.53| 86.32| 81.18| 82.09| 80.09| 77.92|
4848| Baichuan2-13B-Base| 60.61| 69.42| 79.82| 79.82| 82.55| 81.18| 85.07| 83.8| 75.10|
4949| Baichuan2-13B-Chat| 60.61| 68.43| 77.98| 80.7| 81.6| 83.53| 82.09| 84.72| 74.60|
50- | ** DevOpa -Model-7B-Chat** | 54.55| 69.11| 83.94| 82.02| 76.89| 80| 79.85| 77.78| ** 74.00** |
51- | ** DevOpa -Model-7B-Base** | 54.55| 68.96| 82.11| 78.95| 80.66| 76.47| 79.85| 78.7| ** 73.55** |
50+ | ** DevOps -Model-7B-Chat** | 54.55| 69.11| 83.94| 82.02| 76.89| 80| 79.85| 77.78| ** 74.00** |
51+ | ** DevOps -Model-7B-Base** | 54.55| 68.96| 82.11| 78.95| 80.66| 76.47| 79.85| 78.7| ** 73.55** |
5252| Qwen-7B-Base| 53.03| 68.13| 78.9| 75.44| 80.19| 80| 83.58| 80.09| 73.13|
5353| Qwen-7B-Chat| 57.58| 66.01| 80.28| 79.82| 76.89| 77.65| 80.6| 79.17| 71.96|
5454| Baichuan2-7B-Chat| 54.55| 63.66| 77.98| 76.32| 71.7| 73.33| 75.37| 79.63| 68.17|
@@ -61,15 +61,15 @@ Below are zero-shot and five-shot accuracies from the models that we evaluate in
6161
6262| ** ModelName** | plan| code| build| test| release| deploy| operate| monitor| ** AVG** |
6363| :------------------------:| :-----:| :-----:| :-----:| :------:| :--------:| :------:| :-------:| :--------:| :---------:|
64- | ** DevOpa -Model-14B-Chat** | 63.64| 79.49| 81.65| 85.96| 86.79| 86.67| 89.55| 81.48| ** 81.77** |
65- | ** DevOpa -Model-14B-Base** | 62.12| 80.55| 82.57| 85.53| 85.85| 84.71| 85.07| 80.09| ** 81.70** |
64+ | ** DevOps -Model-14B-Chat** | 63.64| 79.49| 81.65| 85.96| 86.79| 86.67| 89.55| 81.48| ** 81.77** |
65+ | ** DevOps -Model-14B-Base** | 62.12| 80.55| 82.57| 85.53| 85.85| 84.71| 85.07| 80.09| ** 81.70** |
6666| Qwen-14B-Chat| 65.15| 76| 82.57| 85.53| 84.91| 84.31| 85.82| 81.48| 79.55|
6767| Qwen-14B-Base| 66.67| 76.15| 84.4| 85.53| 86.32| 80.39| 86.57| 80.56| 79.51|
6868| Baichuan2-13B-Base| 63.64| 71.39| 80.73| 82.46| 81.13| 84.31| 91.79| 85.19| 77.09|
6969| Qwen-7B-Base| 75.76| 72.52| 78.9| 81.14| 83.96| 81.18| 85.07| 81.94| 77.02|
7070| Baichuan2-13B-Chat| 62.12| 69.95| 76.61| 84.21| 83.49| 79.61| 88.06| 80.56| 75.32|
71- | ** DevOpa -Model-7B-Chat** | 66.67| 69.95| 83.94| 81.14| 80.19| 82.75| 82.84| 76.85| ** 75.25** |
72- | ** DevOpa -Model-7B-Base** | 69.7| 69.49| 82.11| 81.14| 82.55| 82.35| 80.6| 79.17| ** 75.17** |
71+ | ** DevOps -Model-7B-Chat** | 66.67| 69.95| 83.94| 81.14| 80.19| 82.75| 82.84| 76.85| ** 75.25** |
72+ | ** DevOps -Model-7B-Base** | 69.7| 69.49| 82.11| 81.14| 82.55| 82.35| 80.6| 79.17| ** 75.17** |
7373| Qwen-7B-Chat| 65.15| 66.54| 82.57| 81.58| 81.6| 81.18| 80.6| 81.02| 73.62|
7474| Baichuan2-7B-Base| 60.61| 67.22| 76.61| 75| 77.83| 78.43| 80.6| 79.63| 72.11|
7575| Internlm-7B-Chat| 60.61| 63.06| 79.82| 80.26| 67.92| 75.69| 73.88| 77.31| 71.09|