Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/hpcPublic

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

License

NotificationsYou must be signed in to change notification settings

cjmcv/hpc

Repository files navigation

Application

pocket-ai -- A Portable Toolkit for building AI Infra.

https://github.com/cjmcv/pocket-ai

  • engine/cl: A small computing framework based on opencl. This framework is designed to help you quickly call Opencl API to do the calculations you need.

  • engine/vk: A small computing framework based on vulkan. This framework is designed to help you quickly call vulkan's computing API to do the calculations you need.

  • engine/graph: A small multitasking scheduler that can quickly build efficient pipelines for your multiple tasks.

  • engine/infer: A tiny inference engine for microprocessors, with a library size of only 10K+.

  • eval/llm: A small tool is used to quickly verify whether the end-to-end calculation results are correct when accelerating and optimizing the large language model (LLM) inference engine.

  • Other small tools.

Reading Notes

ai-infra-notes

sglang, lighteval, cutlass, vllm, mlc-llm

Practice

cux -- An experimental framework for performance analysis and optimization of CUDA kernel functions.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/cux

tag: cuda / simd / openmp.

mrpc -- Mini-RPC, based on asio.

https://github.com/cjmcv/hpc/tree/master/0-frameworks/mrpc

tag: distributed computing.


Learning

Heterogeneous computing

cuda
vulkan
opencl
  • basic_demo : Introduce the basic calling method and process of OpenCL API (without using pocket-ai).
  • gemm_f32 : Gemm fp32 for Discrete graphics card.
  • gemm_mobile_f32 : Gemm fp32 for integrated graphics card.

SIMD

neon
sse/avx

Distributed computing

mpi/mpi4py

Thread

std
openmp
tbb

Coroutines

libco
asyncio

About

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp