- Notifications
You must be signed in to change notification settings - Fork11
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.
License
unit-mesh/unit-gen
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。
Thanks toOpenBayes for providing computing resources.
Finetune Model Examples:
name | model download (HuggingFace) | finetune Notebook | model download (OpenBayes) |
---|---|---|---|
DeepSeek 6.7B | unit-mesh/autodev-coder | finetune.ipynb | AutoDev Coder |
Language support byChapi
- supported:
- Java
- Kotlin
- doing:
- TypeScript/JavaScript
- Rust
- future:
- Go
- Python
- C/C++
- C#
- Scala
Features:
- Code contextstrategy:Related code completion,Similar Code Completion
- Instruction Builder type: inline, block, after block, documentation, test gen
- Code quality filter and pipeline. Code smell, test smell, estimation and more.
Layered Architecture
Workflow
- Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
- Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
- Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.
Keep the same prompt: AutoDev <-> UnitGen <-> UnitEval
AutoDev prompt template example:
Write unit test for following code.${context.coc}${context.framework}${context.related_model}```${context.language}${context.selection}```
Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:
Instruction( instruction="Complete${it.language} code, return rest code, no explaining", output= it.output, input=""" |```${it.language} |${it.relatedCode} |``` | |Code: |```${it.language} |${it.beforeCursor} |```""".trimMargin())
UnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:
Complete ${language} code, return rest code, no explaining```${language}${relatedCode}```Code:```${language}${beforeCursor}```
Optional quality type:
enumclassCodeQualityType {BadSmell,TestBadSmell,JavaController,JavaRepository,JavaService,}
Custom thresholds' config:
data classBsThresholds(valbsLongParasLength:Int =5,valbsIfSwitchLength:Int =8,valbsLargeLength:Int =20,valbsMethodLength:Int =30,valbsIfLinesLength:Int =3,)
Custom rules:
val apis= apiAnalyser.toContainerServices()val ruleset=RuleSet(RuleType.SQL_SMELL,"normal",UnknownColumnSizeRule(),LimitTableNameLengthRule()// more rules)val issues=WebApiRuleVisitor(apis).visitor(listOf(ruleset))// if issues are not empty, then the code has bad smell
for examples, see:examples folder
see inconfig-examples
download the latest version fromGitHub Release
- config project by
processor.yml
- run picker:
java -jar unit-gen.jar
see inconfig-example
1.add dependency
dependencies { implementation("cc.unitmesh:unit-picker:0.1.5") implementation("cc.unitmesh:code-quality:0.1.5")}
2.config theunit-gen.yml
file andconnection.yml
3.write code
publicclassApp {publicstaticvoidmain(String[]args) {List<InstructionType>builderTypes =newArrayList<>();builderTypes.add(InstructionType.RELATED_CODE_COMPLETION);List<CodeQualityType>codeQualityTypes =newArrayList<>();codeQualityTypes.add(CodeQualityType.BadSmell);codeQualityTypes.add(CodeQualityType.JavaService);PickerOptionpickerOption =newPickerOption("https://github.com/unit-mesh/unit-gen-testing","master","java",".",builderTypes,codeQualityTypes,newBuilderConfig() );SimpleCodePickersimpleCodePicker =newSimpleCodePicker(pickerOption);List<Instruction>output =simpleCodePicker.blockingExecute();// handle output in here }}
- abstract syntax tree:Chapi. Used features: multiple language to same datastructure.
- legacy system analysis:Coca. Inspired: Bad Smell, Test Bad Smell
- architecture governance tool:ArchGuard.Used features: Estimation, Rule Lint (API, SQL)
- code databaseCodeDB. Used features: Code analysis pipeline
This code is distributed under the MPL 2.0 license. SeeLICENSE
in this directory.
About
UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.