Commit019ba1d

authored

convert : fix Baichuan2 models by using vocab size in config.json (ggml-org#3299)

Use local GGUF package when possible in Baichuan converter

1 parentbeabc8c commit019ba1dCopy full SHA for 019ba1d

File tree

-2

lines changed

-2

lines changed

Lines changed: 8 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -11,11 +11,14 @@`
`11`	`11`	`frompathlibimportPath`
`12`	`12`	`fromtypingimportTYPE_CHECKING,Any`
`13`	`13`	`importitertools`
`14`		`-importgguf`
`15`	`14`	`importnumpyasnp`
`16`	`15`	`importtorch`
`17`	`16`	`fromsentencepieceimportSentencePieceProcessor# type: ignore[import]`
`18`	`17`
	`18`	`+if'NO_LOCAL_GGUF'notinos.environ:`
	`19`	`+sys.path.insert(1,str(Path(__file__).parent/'gguf-py'/'gguf'))`
	`20`	`+importgguf`
	`21`	`+`
`19`	`22`
`20`	`23`	`ifTYPE_CHECKING:`
`21`	`24`	`fromtypingimportTypeAlias`
`@@ -174,8 +177,11 @@ def parse_args() -> argparse.Namespace:`
`174`	`177`	`print("gguf: get sentencepiece tokenizer vocab, scores and token types")`
`175`	`178`
`176`	`179`	`tokenizer=SentencePieceProcessor(str(tokenizer_model_file))`
	`180`	`+vocab_size=hparams.get('vocab_size')`
	`181`	`+ifvocab_sizeisNone:`
	`182`	`+vocab_size=tokenizer.vocab_size()`
`177`	`183`
`178`		`-foriinrange(tokenizer.vocab_size()):`
	`184`	`+foriinrange(vocab_size):`
`179`	`185`	`text:bytes`
`180`	`186`	`score:float`
`181`	`187`

Comments

(0)