moe
Here are 149 public repositories matching this topic...
Language:All
Sort:Most stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
- Updated
Apr 22, 2025 - Python
SGLang is a fast serving framework for large language models and vision language models.
- Updated
Apr 22, 2025 - Python
An unofficialhttps://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录,bgm.tv 第三方客户端。为移动端重新设计,内置大量加强的网页端难以实现的功能,且提供了相当的自定义选项。 目前已适配 iOS / Android / WSA、mobile / 简单 pad、light / dark theme、移动端网页。
- Updated
Apr 18, 2025 - TypeScript
Mixture-of-Experts for Large Vision-Language Models
- Updated
Dec 3, 2024 - Python
MoBA: Mixture of Block Attention for Long-Context LLMs
- Updated
Apr 3, 2025 - Python
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al.https://arxiv.org/abs/1701.06538
- Updated
Apr 19, 2024 - Python
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
- Updated
Dec 6, 2024 - Python
Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4
- Updated
Apr 22, 2025 - Python
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
- Updated
Jul 2, 2024 - Python
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (DeepSeek-V3/R1 满血版 671B 全参数微调的开源解决方案,包含从训练到推理的完整代码和脚本,以及实践中积累一些经验和结论。)
- Updated
Mar 13, 2025 - Python
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
- Updated
Apr 30, 2024 - Python
MindSpore online courses: Step into LLM
- Updated
Jan 6, 2025 - Jupyter Notebook
Official LISTEN.moe Android app
- Updated
Apr 20, 2025 - Kotlin
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
- Updated
Mar 15, 2024 - C++
MoH: Multi-Head Attention as Mixture-of-Head Attention
- Updated
Oct 29, 2024 - Python
A libGDX cross-platform API for InApp purchasing.
- Updated
Jan 2, 2025 - Java
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
- Updated
Apr 10, 2024 - Python
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
- Updated
Oct 16, 2024 - Python
Improve this page
Add a description, image, and links to themoe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themoe topic, visit your repo's landing page and select "manage topics."