- Notifications
You must be signed in to change notification settings - Fork11
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.
License
unit-mesh/unit-gen
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。
Thanks toOpenBayes for providing computing resources.
Finetune Model Examples:
name | model download (HuggingFace) | finetune Notebook | model download (OpenBayes) |
---|---|---|---|
DeepSeek 6.7B | unit-mesh/autodev-coder | finetune.ipynb | AutoDev Coder |
Language support byChapi
- supported:
- Java
- Kotlin
- doing:
- TypeScript/JavaScript
- Rust
- future:
- Go
- Python
- C/C++
- C#
- Scala
Features:
- Code contextstrategy:Related code completion,Similar Code Completion
- Instruction Builder type: inline, block, after block, documentation, test gen
- Code quality filter and pipeline. Code smell, test smell, estimation and more.
Layered Architecture
Workflow
- Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
- Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
- Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.
Keep the same prompt: AutoDev <-> UnitGen <-> UnitEval
AutoDev prompt template example:
Write unit test for following code.${context.coc}${context.framework}${context.related_model}```${context.language}${context.selection}```
Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:
Instruction( instruction="Complete${it.language} code, return rest code, no explaining", output= it.output, input=""" |```${it.language} |${it.relatedCode} |``` | |Code: |```${it.language} |${it.beforeCursor} |```""".trimMargin())
UnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:
Complete ${language} code, return rest code, no explaining```${language}${relatedCode}```Code:```${language}${beforeCursor}```
Optional quality type:
enumclassCodeQualityType {BadSmell,TestBadSmell,JavaController,JavaRepository,JavaService,}
Custom thresholds' config:
data classBsThresholds(valbsLongParasLength:Int =5,valbsIfSwitchLength:Int =8,valbsLargeLength:Int =20,valbsMethodLength:Int =30,valbsIfLinesLength:Int =3,)
Custom rules:
val apis= apiAnalyser.toContainerServices()val ruleset=RuleSet(RuleType.SQL_SMELL,"normal",UnknownColumnSizeRule(),LimitTableNameLengthRule()// more rules)val issues=WebApiRuleVisitor(apis).visitor(listOf(ruleset))// if issues are not empty, then the code has bad smell
for examples, see:examples folder
see inconfig-examples
download the latest version fromGitHub Release
- config project by
processor.yml
- run picker:
java -jar unit-gen.jar
see inconfig-example
1.add dependency
dependencies { implementation("cc.unitmesh:unit-picker:0.1.5") implementation("cc.unitmesh:code-quality:0.1.5")}
2.config theunit-gen.yml
file andconnection.yml
3.write code
publicclassApp {publicstaticvoidmain(String[]args) {List<InstructionType>builderTypes =newArrayList<>();builderTypes.add(InstructionType.RELATED_CODE_COMPLETION);List<CodeQualityType>codeQualityTypes =newArrayList<>();codeQualityTypes.add(CodeQualityType.BadSmell);codeQualityTypes.add(CodeQualityType.JavaService);PickerOptionpickerOption =newPickerOption("https://github.com/unit-mesh/unit-gen-testing","master","java",".",builderTypes,codeQualityTypes,newBuilderConfig() );SimpleCodePickersimpleCodePicker =newSimpleCodePicker(pickerOption);List<Instruction>output =simpleCodePicker.blockingExecute();// handle output in here }}
- abstract syntax tree:Chapi. Used features: multiple language to same datastructure.
- legacy system analysis:Coca. Inspired: Bad Smell, Test Bad Smell
- architecture governance tool:ArchGuard.Used features: Estimation, Rule Lint (API, SQL)
- code databaseCodeDB. Used features: Code analysis pipeline
This code is distributed under the MPL 2.0 license. SeeLICENSE
in this directory.
About
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.