Sarashina2-8x70B
This repository provides large language models trained bySB Intuitions.
Required Hardware
BF16 Inference:
- 16x H100
- 16x A100 80GB
Model Description
We constructed this Sarashina2-8x70B model, which consists of over 450 billion parameters, by applying thesparse upcycling technique to ourSarashina2-70B model to efficiently build the Mixture-of-Experts model.We trained the Sarashina2-8x70B model using a mix of Japanese and English corpora from web data.
Tokenization
We use asentencepiece tokenizer with a unigram language model and byte-fallback.We do not apply pre-tokenization with Japanese tokenizer.Thus, a user may directly feed raw sentences into the tokenizer.
Ethical Considerations and Limitations
Sarashina2 has not been tuned to follow an instruction yet.Therefore, sarashina2 might generate some meaningless sequences, some inaccurate instances or biased/objectionable outputs.Before using sarashina2, we would like developers to tune models based on human preferences and safety considerations.
License
- Downloads last month
- 3