- Boston, MA
I'm a software engineer who transforms complex challenges into elegant solutions that scale. From optimizing CUDA kernels for 1.46x speedups to building real-time platforms with sub-500ms latency, I thrive at the intersection oftechnical excellence andbusiness impact.
My approach is simple:measure twice, optimize once, ship constantly. Whether it's achieving 94% accuracy in production ML systems or rendering 1M+ points at 858 FPS, I believe in pushing the boundaries of what's possible while keeping the user experience at the center.
Currently seeking opportunities to tackle meaningful challenges at companies building the future.
Built a production RAG system with fine-tuned Llama-3.1-8B that matches GPT-4 quality at a fraction of the cost. Implemented custom attention caching that reduced latency by 73%, enabling real-time responses. Technical Deep Dive
|
Created a video watch party platform with perfect synchronization across distributed clients. Engineered a binary WebSocket protocol with delta compression, achieving sub-500ms latency for seamless real-time collaboration. Technical Deep Dive
|
Developed fused CUDA kernels for transformer models, achieving near-theoretical memory bandwidth utilization. This optimization enables significantly faster inference for large language models through innovative kernel fusion techniques. Technical Deep Dive
|
Built a 3D point cloud viewer that outperforms industry standards by 7.2x. Implemented custom spatial indexing and SIMD optimizations to achieve real-time rendering of massive datasets. Technical Deep Dive
|
📚 View Complete Tech Stack
Core Languages:Expert:[Python, TypeScript, C++, JavaScript]Proficient:[CUDA, SQL, Bash]AI/ML Stack:Frameworks:[PyTorch, Transformers, LangChain, scikit-learn]Techniques:[Fine-tuning, RAG, Embeddings, Vector Search]Production:[ONNX, TensorRT, Model Quantization, Batching]Backend Engineering:Python:[FastAPI, Django, Flask, Celery]Node.js:[NestJS, Express, Socket.IO, Bull]APIs:[REST, GraphQL, gRPC, WebSockets]Frontend Development:Core:[React, Next.js, Redux, TypeScript]UI:[Tailwind CSS, Material-UI, Framer Motion]Advanced:[Three.js, D3.js, WebRTC, Canvas API]Data & Infrastructure:Databases:[PostgreSQL, MongoDB, Redis, Elasticsearch]Vector DBs:[Pinecone, FAISS, Chroma, Qdrant]Message Queues:[RabbitMQ, Kafka, Redis Pub/Sub]DevOps & Cloud:Containers:[Docker, Docker Compose, Buildkit]Orchestration:[Kubernetes, Helm, ArgoCD]CI/CD:[GitHub Actions, GitLab CI, Jenkins]Cloud:[AWS (EC2, S3, Lambda), GCP, Vercel]Performance & Systems:GPU:[CUDA, cuDNN, Thrust, OptiX]CPU:[SIMD, OpenMP, Threading, Profiling]Graphics:[OpenGL, Vulkan, Shaders]
| Capability | Evidence |
|---|---|
| 🏗️ Full Product Ownership | Shipped end-to-end solutions from concept to production |
| ⚡ Performance Excellence | 1.46x-7.2x improvements across different domains |
| 📊 Production Experience | Deployed scalable systems with real-world usage |
| 🎯 Technical Precision | 94% ML accuracy, 95.3% GPU efficiency achieved |
| 🚀 Rapid Execution | From idea to MVP in days, not months |
I'm excited about joining teams that are:
- Building products that matter - Real problems, real impact, real users
- Pushing technical boundaries - Where "impossible" is just another challenge
- Moving fast with purpose - Velocity with vision, not just for speed's sake
- Creating the future - Not just following trends, but setting them
I'm always excited to discuss challenging problems and explore how I can contribute to your team's success.
Whether you're building the next breakthrough in AI, scaling systems to billions, or creating products that change lives - let's talk.
Status: Actively seeking new opportunities |Availability: Immediate |Location: Flexible/Remote
PinnedLoading
- llm-knowledge-assistant
llm-knowledge-assistant PublicProduction-ready RAG system with fine-tuned Llama-3.1-8B for expert-level domain Q&A
Python
- Fused-LayerNorm-CUDA-Operator
Fused-LayerNorm-CUDA-Operator PublicHigh-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, w…
Python
- Mustard-Watch-Party
Mustard-Watch-Party PublicReal-time video synchronization platform for YouTube watch parties. Built with React, NestJS, Socket.IO WebSockets, PostgreSQL & Prisma ORM. Features <500ms sync latency, multi-user rooms, JWT auth…
TypeScript
- pytorch-autotune
pytorch-autotune Public🚀 2-4x faster PyTorch training with one line of code. Beats torch.compile by 79%. Zero config, automatic hardware optimization for T4/V100/A100/H100 GPUs.
Python
- student-scheduler
student-scheduler PublicGoogle OR-Tools constraint solver scheduling 500+ students. Flask/PostgreSQL/Redis backend, Docker/K8s deployment, CI/CD. Zero conflicts, 60-second optimization.
Python
If the problem persists, check theGitHub status page orcontact support.
Uh oh!
There was an error while loading.Please reload this page.