Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

MojoAuth profile imageAvi Kapoor
Avi Kapoor forMojoAuth

Posted on • Originally published atmojoauth.com on

DeepSeek-V3: A New Milestone in Language Modeling

DeepSeek-V3 Benchmark Results

Image courtesy ofDeepSeek-V3 Technical Report

Model Architecture and Training Efficiency

DeepSeek-V3 operates on a MoE architecture where only 37 billion parameters are activated per token during inference. This architecture enhances efficiency by allowing the model to be tailored for specific tasks without activating all parameters. The new auxiliary-loss-free load balancing strategy helps maintain performance while mitigating the training challenges commonly associated with MoE models.

The model’s training was conducted on a compute cluster comprised of 2048 NVIDIA H800 GPUs, organized in nodes connected via NVLink and InfiniBand. The DeepSeek team developed a custom training framework, HAI-LLM, which included a dual-pipeline parallelism algorithm named DualPipe, optimizing memory usage and reducing pipeline latency.

Performance Metrics

DeepSeek-V3 has demonstrated outstanding results in various benchmarks:

  • MMLU Accuracy : 87.1%
  • BBH Exact Match : 87.5%
  • DROP F1 Score : 89.0%
  • HumanEval Pass@1 : 65.2%
  • MBPP Pass@1 : 75.4%
  • GSM8K Exact Match : 89.3%
  • MATH Exact Match : 61.6%

These results highlight the model’s capabilities in coding, mathematical reasoning, and general language processing tasks.

Deployment and Accessibility

The DeepSeek-V3 model is available for download onHugging Face, where developers can access both the base and chat-tuned versions. The total model size is approximately 685GB, which includes the weights for the main model and the Multi-Token Prediction (MTP) module.

For local deployment, DeepSeek-V3 offers several options, including:

  1. DeepSeek-Infer Demo : A simple demo for testing inference capabilities.
  2. SGLang : Supports FP8 and BF16 inference modes.
  3. LMDeploy : A framework for local and cloud deployments.
  4. TensorRT-LLM andvLLM : Both frameworks provide optimized inference for various hardware configurations.

Industry Comparisons

DeepSeek-V3 outperforms several notable models, including:

  • GPT-4o : OpenAI’s flagship model.
  • Llama 3.1 : Meta’s contemporary model.
  • Qwen 2.5 : Another major competitor.

Andrej Karpathy noted the cost efficiency of DeepSeek-V3, training at approximately $5.5 million compared to OpenAI’s GPT-4, which reportedly costs over $100 million.

For further details, you can access theDeepSeek-V3 Technical Report or visit theDeepSeek GitHub repository.


For any inquiries or feedback, you can contact the DeepSeek team atservice@deepseek.com.

Sources and Further Reading

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

MojoAuth: Your Authentication Swiss Army Knife

More fromMojoAuth

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp