- Notifications
You must be signed in to change notification settings - Fork566
Open
Description
I am running the following:
pytorch-triton 3.4.0+git11ec6354torch 2.9.0.dev20250723+cu128torchaudio 2.8.0.dev20250723+cu128torchvision 0.24.0.dev20250723+cu128and am on an H100 GPU.
The following produces some weird output:
python generate.py --checkpoint_path checkpoints/meta-llama/Llama-3.1-8B/model.pth --prompt "The capital of France is:" --num_samples=1 --temperature 0.8 --max_new_tokens 100<|begin_of_text|>The capital of France is: LondonThe capital of France is: ParisThe capital of France is: RomeThe capital of France is: AthensThe capital of France is: BerneThe capital of France is: BernThe capital of France is: LisbonThe capital of France is: MadridThe capital of France is: OsloThe capital of France is: StockholmThe capital of France is: HelsinkiThe capital of France is: BerlinThe capital of France is: ViennaTheTime for inference 1: 12.70 sec total, 7.87 tokens/secBandwidth achieved: 118.16 GB/sFLOPS achieved: 0.13 TF/sOther temperatures and getting more samples leads to some saner results, but this one seemed concerning to me.
Metadata
Metadata
Assignees
Labels
No labels