Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.

License

NotificationsYou must be signed in to change notification settings

unit-mesh/unit-gen

Repository files navigation

UnitGen Logo

UnitGen

CI/CDPowered ByMavenOpen In OpenBayesBuilt with OpenBayescodecov

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。

Docs:https://gen.unitmesh.cc/

Thanks toOpenBayes for providing computing resources.

Finetune Model Examples:

namemodel download (HuggingFace)finetune Notebookmodel download (OpenBayes)
DeepSeek 6.7Bunit-mesh/autodev-coderfinetune.ipynbAutoDev Coder

Language support byChapi

  • supported:
    • Java
    • Kotlin
  • doing:
    • TypeScript/JavaScript
    • Rust
  • future:
    • Go
    • Python
    • C/C++
    • C#
    • Scala

Features:

Architecture

Layered Architecture

Architecture

Workflow

UnitGen Workflow

Design Philosophy

  • Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
  • Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
  • Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.

Unique Prompt

Keep the same prompt: AutoDev <-> UnitGen <-> UnitEval

AutoDev prompt

AutoDev prompt template example:

Write unit test for following code.${context.coc}${context.framework}${context.related_model}```${context.language}${context.selection}```

Unit Picker prompt

Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:

Instruction(    instruction="Complete${it.language} code, return rest code, no explaining",    output= it.output,    input="""    |```${it.language}    |${it.relatedCode}    |```    |    |Code:    |```${it.language}    |${it.beforeCursor}    |```""".trimMargin())

UnitGen prompt

UnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:

Complete ${language} code, return rest code, no explaining```${language}${relatedCode}```Code:```${language}${beforeCursor}```

Code quality pipeline

Code Quality Workflow

Extendable customize quality thresholds

Optional quality type:

enumclassCodeQualityType {BadSmell,TestBadSmell,JavaController,JavaRepository,JavaService,}

Custom thresholds' config:

data classBsThresholds(valbsLongParasLength:Int =5,valbsIfSwitchLength:Int =8,valbsLargeLength:Int =20,valbsMethodLength:Int =30,valbsIfLinesLength:Int =3,)

Custom rules:

val apis= apiAnalyser.toContainerServices()val ruleset=RuleSet(RuleType.SQL_SMELL,"normal",UnknownColumnSizeRule(),LimitTableNameLengthRule()// more rules)val issues=WebApiRuleVisitor(apis).visitor(listOf(ruleset))// if issues are not empty, then the code has bad smell

Quick Start

for examples, see:examples folder

use CLI

see inconfig-examples

download the latest version fromGitHub Release

Generate Instructions

  1. config project byprocessor.yml
  2. run picker:java -jar unit-gen.jar

use Java API

see inconfig-example

1.add dependency

dependencies {    implementation("cc.unitmesh:unit-picker:0.1.5")    implementation("cc.unitmesh:code-quality:0.1.5")}

2.config theunit-gen.yml file andconnection.yml

3.write code

publicclassApp {publicstaticvoidmain(String[]args) {List<InstructionType>builderTypes =newArrayList<>();builderTypes.add(InstructionType.RELATED_CODE_COMPLETION);List<CodeQualityType>codeQualityTypes =newArrayList<>();codeQualityTypes.add(CodeQualityType.BadSmell);codeQualityTypes.add(CodeQualityType.JavaService);PickerOptionpickerOption =newPickerOption("https://github.com/unit-mesh/unit-gen-testing","master","java",".",builderTypes,codeQualityTypes,newBuilderConfig()        );SimpleCodePickersimpleCodePicker =newSimpleCodePicker(pickerOption);List<Instruction>output =simpleCodePicker.blockingExecute();// handle output in here    }}

Thanks to

  • abstract syntax tree:Chapi. Used features: multiple language to same datastructure.
  • legacy system analysis:Coca. Inspired: Bad Smell, Test Bad Smell
  • architecture governance tool:ArchGuard.Used features: Estimation, Rule Lint (API, SQL)
  • code databaseCodeDB. Used features: Code analysis pipeline

LICENSE

This code is distributed under the MPL 2.0 license. SeeLICENSE in this directory.

About

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp