Movatterモバイル変換

pytorch/FBGEMMPublic

NotificationsYou must be signed in to change notification settings
Fork696
Star1.5k

New pull requestNew

604 Open 4,428 Closed

Author

Label

Projects

Milestones

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Sort

Implement lazy TMEM allocation for Blackwell decode kernel cla signed fb-exported meta-exported

#5262 openedDec 18, 2025 byAya-ZIbra

Loading…

Repro Zero length lanes cla signed fb-exported meta-exported

#5261 openedDec 18, 2025 byAya-ZIbra

Loading…

Refactor TBE benchmark reporter to use structured data config cla signed fb-exported meta-exported

#5260 openedDec 18, 2025 bygchalump

Loading…

Fix blackwell CUTLASS attention meta registration + actually test compile cla signed fb-exported meta-exported

#5259 openedDec 18, 2025 byjbschlosser

Loading…

Merge VBE output (frontend) cla signed fb-exported meta-exported

#5258 openedDec 18, 2025 byspcyppt

Loading…

Optimize benchmark index generation with std::sample() cla signed fb-exported meta-exported

#5254 openedDec 17, 2025 byterdogan

Loading…

Remove unused dedup_map and associated includes from benchmarks cla signed fb-exported meta-exported

#5253 openedDec 17, 2025 byterdogan

Loading…

Move the prefetched info to preallocated buffers cla signed fb-exported meta-exported

#5251 openedDec 17, 2025 bychouxi

Loading…

Enable direct MX4→BF16 dequantization to reduce memory (python side) (2/2) cla signed fb-exported meta-exported

#5250 openedDec 17, 2025 byarmandsauzay

Loading…

Add aarch64 intrinsic-based dequantization to autovec routine cla signed fb-exported meta-exported

#5249 openedDec 17, 2025 byNicoshev

Loading…

Choose _autovec version of GenerateEmbeddingSpMDMRowWiseSparse on AArch64 cla signed fb-exported meta-exported

#5247 openedDec 17, 2025 byMatzeB

Loading…

Specialize more cases to improve EmbeddingSpMDMNBitBenchmark cla signed fb-exported meta-exported

#5245 openedDec 17, 2025 byMatzeB

Loading…

Add EmbeddingSpMDMNBitRowWiseSparse autovectorized variant cla signed fb-exported meta-exported

#5244 openedDec 17, 2025 byMatzeB

Loading…

Fix tidy warnings cla signed

#5243 openedDec 17, 2025 bycyyever

Loading…

Fix zero_collision_hash_cpu_out call cla signed

#5241 openedDec 17, 2025 bycyyever

Loading…

Use if constexpr cla signed

#5240 openedDec 17, 2025 bycyyever

Loading…

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions cla signed module: rocm

#5233 openedDec 16, 2025 byaryaman-gupta

Loading…

support object cache in ssd l2 cache and add more unit tests cla signed fb-exported meta-exported

#5228 openedDec 16, 2025 byzhaojuanmao

Loading…

Optimizing 4-bit dequant to FP32 on AArch64 using vectorized intrinsics in EmbeddingSpMDMAutovec cla signed

#5224 openedDec 15, 2025 bymarma01

Loading…

Upgrade GitHub Actions to latest versions cla signed

#5223 openedDec 13, 2025 bysalmanmkc

Loading…

Upgrade GitHub Actions for Node 24 compatibility cla signed module: rocm

#5222 openedDec 13, 2025 bysalmanmkc

Loading…

Tune max segment length per cta in triton table batched embeddings, and expose the param via cli cla signed fb-exported meta-exported

#5212 openedDec 10, 2025 byOmarPavel

Loading…

Update heuristic to support variant batch sizes cla signed fb-exported meta-exported

#5211 openedDec 10, 2025 byzjing14

Loading…

Use H100 runners for OSS CI cla signed fb-exported meta-exported

#5205 openedDec 9, 2025 byq10

Loading…

Modifying clear_all_staged_data to accomadate KV Tensor Deletion cla signed fb-exported meta-exported

#5202 openedDec 9, 2025 byRaahul46

Loading…

ProTip! Addingno:label will show everything without a label.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Pull requests: pytorch/FBGEMM

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pull requests list