Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@JonSnow1807
JonSnow1807
Follow
View JonSnow1807's full-sized avatar

Chinmay Shrivastava JonSnow1807

MS in Computer Science and Software Developer crafting efficient, scalable solutions. Proficient in C++, Python, JavaScript, SQL, and cloud tech.
  • Boston, MA

Block or report JonSnow1807

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more aboutblocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more aboutreporting abuse.

Report abuse
JonSnow1807/README.md

Chinmay Shrivastava

Software Engineer × AI/ML Developer × Performance Architect

LinkedIn  Email  Hugging Face

     



About Me

I'm a software engineer who transforms complex challenges into elegant solutions that scale. From optimizing CUDA kernels for 1.46x speedups to building real-time platforms with sub-500ms latency, I thrive at the intersection oftechnical excellence andbusiness impact.

My approach is simple:measure twice, optimize once, ship constantly. Whether it's achieving 94% accuracy in production ML systems or rendering 1M+ points at 858 FPS, I believe in pushing the boundaries of what's possible while keeping the user experience at the center.

Currently seeking opportunities to tackle meaningful challenges at companies building the future.


🏆 Impact-Driven Projects

🤖 Intelligent Knowledge Assistant

94% accuracy120ms latencyProduction RAG

Built a production RAG system with fine-tuned Llama-3.1-8B that matches GPT-4 quality at a fraction of the cost. Implemented custom attention caching that reduced latency by 73%, enabling real-time responses.

Technical Deep Dive
  • Architecture: Hierarchical vector indexing with FAISS
  • Innovation: Custom KV-cache optimization for transformers
  • Stack: PyTorch, LangChain, FastAPI, PostgreSQL
  • Deployment: Kubernetes with horizontal autoscaling

🎬 Real-time Collaboration Platform

<500ms syncWebSocket protocol85% bandwidth optimized

Created a video watch party platform with perfect synchronization across distributed clients. Engineered a binary WebSocket protocol with delta compression, achieving sub-500ms latency for seamless real-time collaboration.

Technical Deep Dive
  • Protocol: Custom binary format over WebSocket
  • Scaling: Redis pub/sub for horizontal distribution
  • Stack: React, NestJS, Socket.IO, Redis
  • Security: JWT with room-based permissions

⚡ GPU Performance Engineering

1.46x speedup95.3% bandwidth utilizationKernel fusion

Developed fused CUDA kernels for transformer models, achieving near-theoretical memory bandwidth utilization. This optimization enables significantly faster inference for large language models through innovative kernel fusion techniques.

Technical Deep Dive
  • Technique: Kernel fusion for LayerNorm + Activation
  • Memory: Coalesced access patterns, shared memory
  • Stack: CUDA C++, PyTorch extensions, nvprof
  • Impact: 46% inference speedup for LLMs

🎮 High-Performance 3D Visualization

858 FPS1M+ points7.2x faster

Built a 3D point cloud viewer that outperforms industry standards by 7.2x. Implemented custom spatial indexing and SIMD optimizations to achieve real-time rendering of massive datasets.

Technical Deep Dive
  • Algorithm: Custom octree with frustum culling
  • Rendering: Instanced drawing with GPU batching
  • Stack: C++17, OpenGL 4.5, GLM, ImGui
  • Optimization: SIMD intrinsics for transforms


🛠 Technical Expertise

Python
Python
TypeScript
TypeScript
C++
C++
React
React
PyTorch
PyTorch
Docker
Docker
Kubernetes
Kubernetes
Systems
Systems
📚 View Complete Tech Stack
Core Languages:Expert:[Python, TypeScript, C++, JavaScript]Proficient:[CUDA, SQL, Bash]AI/ML Stack:Frameworks:[PyTorch, Transformers, LangChain, scikit-learn]Techniques:[Fine-tuning, RAG, Embeddings, Vector Search]Production:[ONNX, TensorRT, Model Quantization, Batching]Backend Engineering:Python:[FastAPI, Django, Flask, Celery]Node.js:[NestJS, Express, Socket.IO, Bull]APIs:[REST, GraphQL, gRPC, WebSockets]Frontend Development:Core:[React, Next.js, Redux, TypeScript]UI:[Tailwind CSS, Material-UI, Framer Motion]Advanced:[Three.js, D3.js, WebRTC, Canvas API]Data & Infrastructure:Databases:[PostgreSQL, MongoDB, Redis, Elasticsearch]Vector DBs:[Pinecone, FAISS, Chroma, Qdrant]Message Queues:[RabbitMQ, Kafka, Redis Pub/Sub]DevOps & Cloud:Containers:[Docker, Docker Compose, Buildkit]Orchestration:[Kubernetes, Helm, ArgoCD]CI/CD:[GitHub Actions, GitLab CI, Jenkins]Cloud:[AWS (EC2, S3, Lambda), GCP, Vercel]Performance & Systems:GPU:[CUDA, cuDNN, Thrust, OptiX]CPU:[SIMD, OpenMP, Threading, Profiling]Graphics:[OpenGL, Vulkan, Shaders]

💡 Engineering Philosophy



User First
Every optimization should improve the user experience


Data Driven
Measure twice, optimize once, validate always


Ship Fast
Perfect tomorrow loses to good today


Think Scale
Build for 10x growth from day one

📈 What I Bring to Your Team

CapabilityEvidence
🏗️ Full Product OwnershipShipped end-to-end solutions from concept to production
⚡ Performance Excellence1.46x-7.2x improvements across different domains
📊 Production ExperienceDeployed scalable systems with real-world usage
🎯 Technical Precision94% ML accuracy, 95.3% GPU efficiency achieved
🚀 Rapid ExecutionFrom idea to MVP in days, not months

🎯 Looking For My Next Adventure

I'm excited about joining teams that are:

  • Building products that matter - Real problems, real impact, real users
  • Pushing technical boundaries - Where "impossible" is just another challenge
  • Moving fast with purpose - Velocity with vision, not just for speed's sake
  • Creating the future - Not just following trends, but setting them

Open to Opportunities In:


📬 Let's Connect


I'm always excited to discuss challenging problems and explore how I can contribute to your team's success.

Whether you're building the next breakthrough in AI, scaling systems to billions, or creating products that change lives - let's talk.


  





Status: Actively seeking new opportunities |Availability: Immediate |Location: Flexible/Remote

PinnedLoading

  1. llm-knowledge-assistantllm-knowledge-assistantPublic

    Production-ready RAG system with fine-tuned Llama-3.1-8B for expert-level domain Q&A

    Python

  2. Fused-LayerNorm-CUDA-OperatorFused-LayerNorm-CUDA-OperatorPublic

    High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, w…

    Python

  3. Mustard-Watch-PartyMustard-Watch-PartyPublic

    Real-time video synchronization platform for YouTube watch parties. Built with React, NestJS, Socket.IO WebSockets, PostgreSQL & Prisma ORM. Features <500ms sync latency, multi-user rooms, JWT auth…

    TypeScript

  4. pytorch-autotunepytorch-autotunePublic

    🚀 2-4x faster PyTorch training with one line of code. Beats torch.compile by 79%. Zero config, automatic hardware optimization for T4/V100/A100/H100 GPUs.

    Python

  5. student-schedulerstudent-schedulerPublic

    Google OR-Tools constraint solver scheduling 500+ students. Flask/PostgreSQL/Redis backend, Docker/K8s deployment, CI/CD. Zero conflicts, 60-second optimization.

    Python


[8]ページ先頭

©2009-2025 Movatter.jp