- Notifications
You must be signed in to change notification settings - Fork696
Pull requests: pytorch/FBGEMM
Author
Uh oh!
There was an error while loading.Please reload this page.
Label
Uh oh!
There was an error while loading.Please reload this page.
Projects
Uh oh!
There was an error while loading.Please reload this page.
Milestones
Uh oh!
There was an error while loading.Please reload this page.
Reviews
Assignee
Assigned to nobodyLoading
Uh oh!
There was an error while loading.Please reload this page.
Sort
Pull requests list
Implement lazy TMEM allocation for Blackwell decode kernel cla signed fb-exported meta-exported
#5262 openedDec 18, 2025 byAya-ZIbraLoading…
Repro Zero length lanes cla signed fb-exported meta-exported
#5261 openedDec 18, 2025 byAya-ZIbraLoading…
Refactor TBE benchmark reporter to use structured data config cla signed fb-exported meta-exported
#5260 openedDec 18, 2025 bygchalumpLoading…
Fix blackwell CUTLASS attention meta registration + actually test compile cla signed fb-exported meta-exported
#5259 openedDec 18, 2025 byjbschlosserLoading…
Merge VBE output (frontend) cla signed fb-exported meta-exported
#5258 openedDec 18, 2025 byspcypptLoading…
Optimize benchmark index generation with std::sample() cla signed fb-exported meta-exported
#5254 openedDec 17, 2025 byterdoganLoading…
Remove unused dedup_map and associated includes from benchmarks cla signed fb-exported meta-exported
#5253 openedDec 17, 2025 byterdoganLoading…
Move the prefetched info to preallocated buffers cla signed fb-exported meta-exported
#5251 openedDec 17, 2025 bychouxiLoading…
Enable direct MX4→BF16 dequantization to reduce memory (python side) (2/2) cla signed fb-exported meta-exported
#5250 openedDec 17, 2025 byarmandsauzayLoading…
Add aarch64 intrinsic-based dequantization to autovec routine cla signed fb-exported meta-exported
#5249 openedDec 17, 2025 byNicoshevLoading…
Choose _autovec version of GenerateEmbeddingSpMDMRowWiseSparse on AArch64 cla signed fb-exported meta-exported
#5247 openedDec 17, 2025 byMatzeBLoading…
Specialize more cases to improve EmbeddingSpMDMNBitBenchmark cla signed fb-exported meta-exported
#5245 openedDec 17, 2025 byMatzeBLoading…
Add EmbeddingSpMDMNBitRowWiseSparse autovectorized variant cla signed fb-exported meta-exported
#5244 openedDec 17, 2025 byMatzeBLoading…
Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions cla signed module: rocm
#5233 openedDec 16, 2025 byaryaman-guptaLoading…
support object cache in ssd l2 cache and add more unit tests cla signed fb-exported meta-exported
#5228 openedDec 16, 2025 byzhaojuanmaoLoading…
Optimizing 4-bit dequant to FP32 on AArch64 using vectorized intrinsics in EmbeddingSpMDMAutovec cla signed
#5224 openedDec 15, 2025 bymarma01Loading…
Upgrade GitHub Actions for Node 24 compatibility cla signed module: rocm
#5222 openedDec 13, 2025 bysalmanmkcLoading…
Tune max segment length per cta in triton table batched embeddings, and expose the param via cli cla signed fb-exported meta-exported
#5212 openedDec 10, 2025 byOmarPavelLoading…
Update heuristic to support variant batch sizes cla signed fb-exported meta-exported
#5211 openedDec 10, 2025 byzjing14Loading…
Use H100 runners for OSS CI cla signed fb-exported meta-exported
#5205 openedDec 9, 2025 byq10Loading…
Modifying clear_all_staged_data to accomadate KV Tensor Deletion cla signed fb-exported meta-exported
#5202 openedDec 9, 2025 byRaahul46Loading…
ProTip! Addingno:label will show everything without a label.