Sarashina2-8x70B

This repository provides large language models trained bySB Intuitions.

Required Hardware

BF16 Inference:

16x H100
16x A100 80GB

Model Description

We constructed this Sarashina2-8x70B model, which consists of over 450 billion parameters, by applying thesparse upcycling technique to ourSarashina2-70B model to efficiently build the Mixture-of-Experts model.We trained the Sarashina2-8x70B model using a mix of Japanese and English corpora from web data.

Tokenization

We use asentencepiece tokenizer with a unigram language model and byte-fallback.We do not apply pre-tokenization with Japanese tokenizer.Thus, a user may directly feed raw sentences into the tokenizer.

Ethical Considerations and Limitations

Sarashina2 has not been tuned to follow an instruction yet.Therefore, sarashina2 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs.Before using sarashina2, we would like developers to tune models based on human preferences and safety considerations.

License

Sarashina Model NonCommercial License Agreement

Downloads last month: 3

Safetensors

Model size

465B params

Tensor type

BF16

Inference ProvidersNEW

This model isn't deployed by any Inference Provider.🙋Ask for provider support

Model tree forsbintuitions/sarashina2-8x70b

Finetunes

1 model

Collection includingsbintuitions/sarashina2-8x70b

Sarashina

Collection

Large Language Models developed by SB Intuitions•8 items•UpdatedFeb 20•9

Movatterモバイル変換

sbintuitions
/
sarashina2-8x70b