- Notifications
You must be signed in to change notification settings - Fork6
Language Model for Mainframe Modernization
License
FSoft-AI4Code/XMainframe
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
- Introduction
- Demonstration
- Procedure of Data Construction
- Model Download
- Evaluation Results
- Usage
- License
- Acknowledgements
- Contact Us
- Citation Information
We are introducingXMAiNframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of mainframe legacy systems and COBOL codebases. XMAiNframe is built on top of DeepSeek-Coder 7B and is available with 7B and 10.5B parameters.Additionally, we presentMainframeBench, a comprehensive benchmark for assessing mainframe knowledge, including multiple-choice questions, question answering, and COBOL code summarization. Our empirical evaluations demonstrate that XMAiNframe consistently outperforms existing state-of-the-art LLMs across these tasks. Specifically, XMAiNframe achieves 30% higher accuracy than DeepSeek-Coder on multiple-choice questions, doubles the BLEU score of Mixtral-Instruct 8x7B on question answering, and scores six times higher than GPT-3.5 on COBOL summarization. Our work highlights the potential of XMAiNframe to drive significant advancements in managing and modernizing legacy systems, thereby enhancing productivity and saving time for software developers.
In this section, we demonstrate the capabilities of XMAiNframe by comparing it with the leading language model, DeepSeek-Coder-7B. We evaluate the performance of each model by showcasing their responses to a series of realistic questions related to mainframe knowledge. The images below illustrate how each model handles identical prompts. As shown, the responses generated by XMAiNframe are not only accurate but also more detailed and comprehensive compared to those from the base model, DeepSeek-Coder-7B. This makes XMAiNframe particularly valuable for developers seeking a reliable and thorough AI assistant in the mainframe environment.
We utilized two different sources: using the GitHub API to collect COBOL projects hosted on GitHub and gathering online document data relevant to mainframes. In total, Mainframe-Training Dataset consists of 236 million tokens from documents about the mainframe technology and COBOL constructs. In the pre-training process, we combined our Mainframe-Training Dataset withSlimOrca-Dedup to enrich the model’s mainframe knowledge while retaining its general capabilities.
Mainframe-Instruct is a high-quality synthetic dataset created through 5 steps:
Step 1: 300 seed data instances about Mainframe and COBOL are gathered and annotated by our domain experts.
Step 2: Using popular LLMs to enrich Mainframe-Instruct from seed data.
Step 3: Utilizing GPT-4 as an evaluator to judge model responses, scoring the outputs and ranking responses in a pairwise manner.
Step 4: Filtering and manually checking.
Step 5: Dividing Mainframe-Instruct into three tasks: Multiple Choice Questions, Question Answering, and COBOL summarization.
Below are the statistics of Mainframe-Instruct Dataset:
| Training Samples | Validating Samples | Testing Samples | |
| Multiple Choice Questions | 13.894 | 1.544 | 1.931 |
| Question Answering | 18.692 | 2.078 | 2.598 |
| COBOL Summarization | 9.081 | 1.010 | 2.523 |
MainframeBench, our benchmark for mainframe knowledge, is the testing set in Mainframe-Instruct Dataset. This benchmark is used to evaluate our LLMs with others which is now available at Huggingface datasets.
fromdatasetsimportload_dataset# Load each sub-set in MainframeBenchQA_set=load_dataset("Fsoft-AIC/MainframeBench",'question_answering')MC_set=load_dataset("Fsoft-AIC/MainframeBench",'multiple_choice_question')Summarization_set=load_dataset("Fsoft-AIC/MainframeBench",'COBOL_code_summarization')
We release XMAiNframe with 7B and 10.5B parameters, including base and instruct models, to the public. XMAiNframe 10.5B is expanded from DeepSeek-Coder 7B by the depth up-scaling method without introducing additional modules or dynamic expert selection methods.
| Model | Download |
|---|---|
| XMAiNframe-base-7b | 🤗 HuggingFace |
| XMAiNframe-instruct-7b | 🤗 HuggingFace |
| XMAiNframe-base-10.5b | 🤗 HuggingFace |
| XMAiNframe-instruct-10.5b | 🤗 HuggingFace |
| Model | Accuracy (%) |
| GPT-4 | 73.90 |
| GPT-3.5 | 74.56 |
| Mixtral-Instruct 8x7B | 68.12 |
| Mistral-Instruct 7B | 69.29 |
| Neural-Chat | 66.35 |
| DeepSeek-Coder-Instruct 6.7B | 47.49 |
| DeepSeek-Coder-Instruct 33B | 53.29 |
| XMAiNframe-Instruct 7B | 68.57 |
| XMAiNframe-Instruct 10.5B | 77.89 |
| Models | MAP | F1-Score | BERTScore | RougeL | Meteor | BLEU-4 |
| GPT 4 | 0.12 | 0.19 | 0.88 | 0.18 | 0.34 | 5.71 |
| GPT 3.5 | 0.14 | 0.22 | 0.89 | 0.21 | 0.38 | 7.36 |
| Mixtral-Instruct 8x7B | 0.27 | 0.31 | 0.9 | 0.29 | 0.38 | 11.39 |
| Mistral-Instruct 7B | 0.12 | 0.19 | 0.87 | 0.18 | 0.34 | 5.74 |
| Neural-Chat | 0.13 | 0.21 | 0.88 | 0.2 | 0.36 | 6.45 |
| DeepSeek-Coder-Instruct 6.7B | 0.09 | 0.15 | 0.86 | 0.14 | 0.30 | 4.09 |
| DeepSeek-Coder-Instruct 33B | 0.09 | 0.15 | 0.86 | 0.15 | 0.31 | 4.41 |
| XMAiNframe-Instruct 7B | 0.45 | 0.42 | 0.92 | 0.4 | 0.42 | 20.43 |
| XMAiNframe-Instruct 10.5B | 0.43 | 0.42 | 0.92 | 0.4 | 0.42 | 20.93 |
| Models | MAP | F1-Score | BERTScore | RougeL | Meteor | BLEU-4 |
| GPT 4 | 0.12 | 0.19 | 0.88 | 0.18 | 0.34 | 5.71 |
| GPT 3.5 | 0.14 | 0.22 | 0.89 | 0.21 | 0.38 | 7.36 |
| Mixtral-Instruct 8x7B | 0.27 | 0.31 | 0.9 | 0.29 | 0.38 | 11.39 |
| Mistral-Instruct 7B | 0.12 | 0.19 | 0.87 | 0.18 | 0.34 | 5.74 |
| Neural-Chat | 0.13 | 0.21 | 0.88 | 0.2 | 0.36 | 6.45 |
| DeepSeek-Coder-Instruct 6.7B | 0.09 | 0.15 | 0.86 | 0.14 | 0.30 | 4.09 |
| DeepSeek-Coder-Instruct 33B | 0.09 | 0.15 | 0.86 | 0.15 | 0.31 | 4.41 |
| XMAiNframe-Instruct 7B | 0.45 | 0.42 | 0.92 | 0.4 | 0.42 | 20.43 |
| XMAiNframe-Instruct 10.5B | 0.43 | 0.42 | 0.92 | 0.4 | 0.42 | 20.93 |
For more evaluation details and settings, please check our paper.
To run the code in this project, first, create a Python virtual environment using e.g. Conda:
conda create -n xmainframe python=3.10&& conda activate xmainframeYou can then install the remaining package dependencies as follows:
git clone https://github.com/FSoft-AI4Code/XMainframe.gitcd XMainframepip install -r requirements.txtYou can now check out thescripts andrecipes directories for instructions on how to fine-tune our model 🪁!
Here is a code snippet withapply_chat_template to show you how to load the tokenizer and model and how to generate content.
fromtransformersimportAutoTokenizer,AutoModelForCausalLMtokenizer=AutoTokenizer.from_pretrained("Fsoft-AIC/XMAiNframe-instruct-7b")model=AutoModelForCausalLM.from_pretrained("Fsoft-AIC/XMAiNframe-instruct-7b")messages=[ {'from':'system','value':"You are a helpful assistant"}, {'from':'human','value':'What is the future of Mainframe?'}]inputs=tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt").to(model.device)outputs=model.generate(inputs,max_new_tokens=512,do_sample=False,top_k=50,top_p=0.95,num_return_sequences=1,eos_token_id=tokenizer.eos_token_id)print(tokenizer.decode(outputs[0][len(inputs[0]):],skip_special_tokens=True))
This code repository is licensed underthe MIT License
This codebase is adapted from:
If you have any questions, comments or suggestions, please do not hesitate to contact us.
- Website:fpt-aicenter
- Email:support.ailab@fpt.com
More details can be found in ourtechnical report.
If you're using XMAiNframe, please cite using this BibTeX:
@article{dau2024xmainframe,title={XMainframe: A Large Language Model for Mainframe Modernization},author={Dau, Anh TV and Dao, Hieu Trung and Nguyen, Anh Tuan and Tran, Hieu Trung and Nguyen, Phong X and Bui, Nghi DQ},journal={arXiv preprint arXiv:2408.04660},year={2024}}
About
Language Model for Mainframe Modernization
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Contributors2
Uh oh!
There was an error while loading.Please reload this page.



