JonSnow1807Follow

JonSnow1807

Chinmay Shrivastava JonSnow1807

MS in Computer Science and Software Developer crafting efficient, scalable solutions. Proficient in C++, Python, JavaScript, SQL, and cloud tech.

2 followers ·0 following

Boston, MA

JonSnow1807/README.md

Software Engineer × AI/ML Developer × Performance Architect

About Me

I'm a software engineer who transforms complex challenges into elegant solutions that scale. From optimizing CUDA kernels for 1.46x speedups to building real-time platforms with sub-500ms latency, I thrive at the intersection oftechnical excellence andbusiness impact.

My approach is simple:measure twice, optimize once, ship constantly. Whether it's achieving 94% accuracy in production ML systems or rendering 1M+ points at 858 FPS, I believe in pushing the boundaries of what's possible while keeping the user experience at the center.

Currently seeking opportunities to tackle meaningful challenges at companies building the future.

🏆 Impact-Driven Projects

🤖 Intelligent Knowledge Assistant

94% accuracy120ms latencyProduction RAG

Built a production RAG system with fine-tuned Llama-3.1-8B that matches GPT-4 quality at a fraction of the cost. Implemented custom attention caching that reduced latency by 73%, enabling real-time responses.

Technical Deep Dive

Architecture: Hierarchical vector indexing with FAISS
Innovation: Custom KV-cache optimization for transformers
Stack: PyTorch, LangChain, FastAPI, PostgreSQL
Deployment: Kubernetes with horizontal autoscaling

🎬 Real-time Collaboration Platform

<500ms syncWebSocket protocol85% bandwidth optimized

Created a video watch party platform with perfect synchronization across distributed clients. Engineered a binary WebSocket protocol with delta compression, achieving sub-500ms latency for seamless real-time collaboration.

Technical Deep Dive

Protocol: Custom binary format over WebSocket
Scaling: Redis pub/sub for horizontal distribution
Stack: React, NestJS, Socket.IO, Redis
Security: JWT with room-based permissions

⚡ GPU Performance Engineering

1.46x speedup95.3% bandwidth utilizationKernel fusion

Developed fused CUDA kernels for transformer models, achieving near-theoretical memory bandwidth utilization. This optimization enables significantly faster inference for large language models through innovative kernel fusion techniques.

Technical Deep Dive

Technique: Kernel fusion for LayerNorm + Activation
Memory: Coalesced access patterns, shared memory
Stack: CUDA C++, PyTorch extensions, nvprof
Impact: 46% inference speedup for LLMs

🎮 High-Performance 3D Visualization

858 FPS1M+ points7.2x faster

Built a 3D point cloud viewer that outperforms industry standards by 7.2x. Implemented custom spatial indexing and SIMD optimizations to achieve real-time rendering of massive datasets.

Technical Deep Dive

Algorithm: Custom octree with frustum culling
Rendering: Instanced drawing with GPU batching
Stack: C++17, OpenGL 4.5, GLM, ImGui
Optimization: SIMD intrinsics for transforms

🛠 Technical Expertise

Python

TypeScript

C++

React

PyTorch

Docker

Kubernetes

Systems

📚 View Complete Tech Stack

Core Languages:Expert:[Python, TypeScript, C++, JavaScript]Proficient:[CUDA, SQL, Bash]AI/ML Stack:Frameworks:[PyTorch, Transformers, LangChain, scikit-learn]Techniques:[Fine-tuning, RAG, Embeddings, Vector Search]Production:[ONNX, TensorRT, Model Quantization, Batching]Backend Engineering:Python:[FastAPI, Django, Flask, Celery]Node.js:[NestJS, Express, Socket.IO, Bull]APIs:[REST, GraphQL, gRPC, WebSockets]Frontend Development:Core:[React, Next.js, Redux, TypeScript]UI:[Tailwind CSS, Material-UI, Framer Motion]Advanced:[Three.js, D3.js, WebRTC, Canvas API]Data & Infrastructure:Databases:[PostgreSQL, MongoDB, Redis, Elasticsearch]Vector DBs:[Pinecone, FAISS, Chroma, Qdrant]Message Queues:[RabbitMQ, Kafka, Redis Pub/Sub]DevOps & Cloud:Containers:[Docker, Docker Compose, Buildkit]Orchestration:[Kubernetes, Helm, ArgoCD]CI/CD:[GitHub Actions, GitLab CI, Jenkins]Cloud:[AWS (EC2, S3, Lambda), GCP, Vercel]Performance & Systems:GPU:[CUDA, cuDNN, Thrust, OptiX]CPU:[SIMD, OpenMP, Threading, Profiling]Graphics:[OpenGL, Vulkan, Shaders]

💡 Engineering Philosophy

User First
_{Every optimization should improve the user experience}

Data Driven
_{Measure twice, optimize once, validate always}

Ship Fast
_{Perfect tomorrow loses to good today}

Think Scale
_{Build for 10x growth from day one}

📈 What I Bring to Your Team

Capability	Evidence
🏗️ Full Product Ownership	Shipped end-to-end solutions from concept to production
⚡ Performance Excellence	1.46x-7.2x improvements across different domains
📊 Production Experience	Deployed scalable systems with real-world usage
🎯 Technical Precision	94% ML accuracy, 95.3% GPU efficiency achieved
🚀 Rapid Execution	From idea to MVP in days, not months

🎯 Looking For My Next Adventure

I'm excited about joining teams that are:

Building products that matter - Real problems, real impact, real users
Pushing technical boundaries - Where "impossible" is just another challenge
Moving fast with purpose - Velocity with vision, not just for speed's sake
Creating the future - Not just following trends, but setting them

Open to Opportunities In:

📬 Let's Connect

I'm always excited to discuss challenging problems and explore how I can contribute to your team's success.

Whether you're building the next breakthrough in AI, scaling systems to billions, or creating products that change lives - let's talk.

_{Status: Actively seeking new opportunities |Availability: Immediate |Location: Flexible/Remote}

PinnedLoading

llm-knowledge-assistantllm-knowledge-assistantPublic
Production-ready RAG system with fine-tuned Llama-3.1-8B for expert-level domain Q&A
Python
Fused-LayerNorm-CUDA-OperatorFused-LayerNorm-CUDA-OperatorPublic
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, w…
Python
Mustard-Watch-PartyMustard-Watch-PartyPublic
Real-time video synchronization platform for YouTube watch parties. Built with React, NestJS, Socket.IO WebSockets, PostgreSQL & Prisma ORM. Features <500ms sync latency, multi-user rooms, JWT auth…
TypeScript
pytorch-autotunepytorch-autotunePublic
🚀 2-4x faster PyTorch training with one line of code. Beats torch.compile by 79%. Zero config, automatic hardware optimization for T4/V100/A100/H100 GPUs.
Python
student-schedulerstudent-schedulerPublic
Google OR-Tools constraint solver scheduling 500+ students. Flask/PostgreSQL/Redis backend, Docker/K8s deployment, CI/CD. Zero conflicts, 60-second optimization.
Python

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinmay Shrivastava JonSnow1807

Block or report JonSnow1807

Software Engineer × AI/ML Developer × Performance Architect

About Me

🏆 Impact-Driven Projects

🤖 Intelligent Knowledge Assistant

🎬 Real-time Collaboration Platform

⚡ GPU Performance Engineering

🎮 High-Performance 3D Visualization

🛠 Technical Expertise

💡 Engineering Philosophy

📈 What I Bring to Your Team

🎯 Looking For My Next Adventure

Open to Opportunities In:

📬 Let's Connect

PinnedLoading

Uh oh!