- Notifications
You must be signed in to change notification settings - Fork3.8k
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
License
NotificationsYou must be signed in to change notification settings
eugeneyan/applied-ml
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Curated papers, articles, and blogs ondata science & machine learning in production. ⚙️
Figuring out how to implement your ML project? Learn how other organizations did it:
- How the problem is framed 🔎(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked ✅ (and sometimes, what didn't ❌)
- Why it works, the science behind it with research, literature, and references 📂
- What real-world results were achieved (so you can better assess ROI ⏰💰📈)
P.S., Want a summary of ML advancements? 👉ml-surveys
P.P.S, Looking for guides and interviews on applying ML? 👉applyingML
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Privacy-Preserving Machine Learning
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb2016 - Monitoring Data Quality at Scale with Statistical Modeling
Uber2017 - Data Management Challenges in Production Machine Learning (Paper)
Google2017 - Automating Large-Scale Data Quality Verification (Paper)
Amazon2018 - Meet Hodor — Gojek’s Upstream Data Quality Tool
Gojek2019 - Data Validation for Machine Learning (Paper)
Google2019 - An Approach to Data Quality for Netflix Personalization Systems
Netflix2020 - Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper)
Facebook2020
- Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb2018 - Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
Airbnb2020 - Unbundling Data Science Workflows with Metaflow and AWS Step Functions
Netflix2020 - How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand
DoorDash2020 - Revolutionizing Money Movements at Scale with Strong Data Consistency
Uber2020 - Zipline - A Declarative Feature Engineering Framework
Airbnb2020 - Automating Data Protection at Scale, Part 1 (Part 2)
Airbnb2021 - Real-time Data Infrastructure at Uber
Uber2021 - Introducing Fabricator: A Declarative Feature Engineering Framework
DoorDash2022 - Functions & DAGs: introducing Hamilton, a microframework for dataframe generation
Stitch Fix2021 - Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings
Pinterest2022 - Lessons Learned From Running Apache Airflow at Scale
Shopify2022 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta2022 - Data Mesh — A Data Movement and Processing Platform @ Netflix
Netflix2022 - Building Scalable Real Time Event Processing with Kafka and Flink
DoorDash2022
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code)
Apache - Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code)
WeWork - Discovery and Consumption of Analytics Data at Twitter
Twitter2016 - Democratizing Data at Airbnb
Airbnb2017 - Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber2018 - Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code)
Netflix2018 - Amundsen — Lyft’s Data Discovery & Metadata Engine
Lyft2019 - Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft2019 - DataHub: A Generalized Metadata Search & Discovery Tool (Code)
LinkedIn2019 - Amundsen: One Year Later
Lyft2020 - Using Amundsen to Support User Privacy via Metadata Collection at Square
Square2020 - Turning Metadata Into Insights with Databook
Uber2020 - DataHub: Popular Metadata Architectures Explained
LinkedIn2020 - How We Improved Data Discovery for Data Scientists at Spotify
Spotify2020 - How We’re Solving Data Discovery Challenges at Shopify
Shopify2020 - Nemo: Data discovery at Facebook
Facebook2020 - Exploring Data @ Netflix (Code)
Netflix2021
- Distributed Time Travel for Feature Generation
Netflix2016 - Building the Activity Graph, Part 2 (Feature Storage Section)
LinkedIn2017 - Fact Store at Scale for Netflix Recommendations
Netflix2018 - Zipline: Airbnb’s Machine Learning Data Management Platform
Airbnb2018 - Feature Store: The missing data layer for Machine Learning pipelines?
Hopsworks2018 - Introducing Feast: An Open Source Feature Store for Machine Learning (Code)
Gojek2019 - Michelangelo Palette: A Feature Engineering Platform at Uber
Uber2019 - The Architecture That Powers Twitter's Feature Store
Twitter2019 - Accelerating Machine Learning with the Feature Store Service
Condé Nast2019 - Feast: Bridging ML Models and Data
Gojek2020 - Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression
DoorDash2020 - Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed
LinkedIn2020 - Building a Feature Store
Monzo Bank2020 - Butterfree: A Spark-based Framework for Feature Store Building (Code)
QuintoAndar2020 - Building Riviera: A Declarative Real-Time Feature Engineering Framework
DoorDash2021 - Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory
Uber2021 - ML Feature Serving Infrastructure at Lyft
Lyft2021 - Near real-time features for near real-time personalization
LinkedIn2022 - Building the Model Behind DoorDash’s Expansive Merchant Selection
DoorDash2022 - Open sourcing Feathr – LinkedIn’s feature store for productive machine learning
LinkedIn2022 - Evolution of ML Fact Store
Netflix2022 - Developing scalable feature engineering DAGs
Metaflow + HamiltonviaOuterbounds2022 - Feature Store Design at Constructor
Constructor.io2023
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google2010 - High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn2011 - Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
Walmart2014 - Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER2016 - Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google2017 - Discovering and Classifying In-app Message Intent at Airbnb
Airbnb2019 - Teaching Machines to Triage Firefox Bugs
Mozilla2019 - Categorizing Products at Scale
Shopify2020 - How We Built the Good First Issues Feature
GitHub2020 - Testing Firefox More Efficiently with Machine Learning
Mozilla2020 - Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft2020 - Scalable Data Classification for Security and Privacy (Paper)
Facebook2020 - Uncovering Online Delivery Menu Best Practices with Machine Learning
DoorDash2020 - Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging
DoorDash2020 - Deep Learning: Product Categorization and Shelving
Walmart2021 - Large-scale Item Categorization for e-Commerce (Paper)
DianPing,eBay2012 - Semantic Label Representation with an Application on Multimodal Product Categorization
Walmart2022 - Building Airbnb Categories with ML and Human-in-the-Loop
Airbnb2022
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb2017 - Using Machine Learning to Predict the Value of Ad Requests
Twitter2020 - Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
Netflix2020 - Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment
DoorDash2020
- Engineering Extreme Event Forecasting at Uber with RNN
Uber2017 - Forecasting at Uber: An Introduction
Uber2018 - Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber2018 - Under the Hood of Gojek’s Automated Forecasting Tool
Gojek2019 - BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper,Video)
Google2020 - Retraining Machine Learning Models in the Wake of COVID-19
DoorDash2020 - Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper,Code)
Atlassian2020 - Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper,Video,Code)
Uber2021 - Managing Supply and Demand Balance Through Machine Learning
DoorDash2021 - Greykite: A flexible, intuitive, and fast forecasting library
LinkedIn2021 - The history of Amazon’s forecasting algorithm
Amazon2021 - DeepETA: How Uber Predicts Arrival Times Using Deep Learning
Uber2022 - Forecasting Grubhub Order Volume At Scale
Grubhub2022 - Causal Forecasting at Lyft (Part 1)
Lyft2022
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon2003 - Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix2012 - How Music Recommendation Works — And Doesn’t Work
Spotify2012 - Learning to Rank Recommendations with the k -Order Statistic Loss (Paper)
Google2013 - Recommending Music on Spotify with Deep Learning
Spotify2014 - Learning a Personalized Homepage
Netflix2015 - The Netflix Recommender System: Algorithms, Business Value, and Innovation (Paper)
Netflix2015 - Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica2016 - Deep Neural Networks for YouTube Recommendations
YouTube2016 - E-commerce in Your Inbox: Product Recommendations at Scale (Paper)
Yahoo2016 - To Be Continued: Helping you find shows to continue watching on Netflix
Netflix2016 - Personalized Recommendations in LinkedIn Learning
LinkedIn2016 - Personalized Channel Recommendations in Slack
Slack2016 - Recommending Complementary Products in E-Commerce Push Notifications (Paper)
Alibaba2017 - Artwork Personalization at Netflix
Netflix2017 - A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper)
Twitter2017 - Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper)
Pinterest2017 - Powering Search & Recommendations at DoorDash
DoorDash2017 - How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox2018 - Calibrated Recommendations (Paper)
Netflix2018 - Food Discovery with Uber Eats: Recommending for the Marketplace
Uber2018 - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify2018 - Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned (Paper)
LinkedIn2018 - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba2019 - SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper)
Alibaba2019 - Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper)
Alibaba2019 - Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor2019 - Powered by AI: Instagram’s Explore recommender system
Facebook2019 - Marginal Posterior Sampling for Slate Bandits (Paper)
Netflix2019 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber2019 - Music recommendation at Spotify
Spotify2019 - Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox2019 - Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox2019 - Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn2019 - Temporal-Contextual Recommendation in Real-Time (Paper)
Amazon2020 - P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper)
Amazon2020 - Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper)
Alibaba2020 - TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper)
Alibaba2020 - PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper)
Alibaba2020 - Controllable Multi-Interest Framework for Recommendation (Paper)
Alibaba2020 - MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper)
Alibaba2020 - ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper)
Alibaba2020 - For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify2020 - Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify2020 - Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper)
Spotify2020 - The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn2020 - Building a Heterogeneous Social Network Recommendation System
LinkedIn2020 - How TikTok recommends videos #ForYou
ByteDance2020 - Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper)
Google2020 - Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper)
Google2020 - Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper)
Google2020 - Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper)
Tencent2020 - A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper)
Home Depot2020 - Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper)
Ikea2020 - How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads
Pinterest2020 - Multi-task Learning for Related Products Recommendations at Pinterest
Pinterest2020 - Improving the Quality of Recommended Pins with Lightweight Ranking
Pinterest2020 - Multi-task Learning and Calibration for Utility-based Home Feed Ranking
Pinterest2020 - Personalized Cuisine Filter Based on Customer Preference and Local Popularity
DoorDash2020 - How We Built a Matchmaking Algorithm to Cross-Sell Products
Gojek2020 - Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper)
Twitter2021 - Self-supervised Learning for Large-scale Item Recommendations (Paper)
Google2021 - Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper)
ByteDance2021 - Using AI to Help Health Experts Address the COVID-19 Pandemic
Facebook2021 - Advertiser Recommendation Systems at Pinterest
Pinterest2021 - On YouTube's Recommendation System
YouTube2021 - "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops
Coveo2021 - Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates
Walmart2021 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta2021 - The Amazon Music conversational recommender is hitting the right notes
Amazon2022 - Personalized complementary product recommendation (Paper)
Amazon2022 - Building a Deep Learning Based Retrieval System for Personalized Recommendations
eBay2022 - How We Built: An Early-Stage Machine Learning Model for Recommendations
Peloton2022 - Lessons Learned from Building out Context-Aware Recommender Systems
Peloton2022 - Beyond Matrix Factorization: Using hybrid features for user-business recommendations
Yelp2022 - Improving job matching with machine-learned activity features
LinkedIn2022 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Meta2022 - Blueprints for recommender system architectures: 10th anniversary edition
Xavier Amatriain2022 - How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume
Pinterest2022 - RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix2022 - Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations
Slack2022 - Evolving DoorDash’s Substitution Recommendations Algorithm
DoorDash2022 - Homepage Recommendation with Exploitation and Exploration
DoorDash2022 - GPU-accelerated ML Inference at Pinterest
Pinterest2022 - Addressing Confounding Feature Issue for Causal Recommendation (Paper)
Tencent2022
- Amazon Search: The Joy of Ranking Products (Paper,Video,Code)
Amazon2016 - How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada2016 - Ranking Relevance in Yahoo Search (Paper)
Yahoo2016 - Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn2016 - Using Deep Learning at Scale in Twitter’s Timelines
Twitter2017 - An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy2017 - Powering Search & Recommendations at DoorDash
DoorDash2017 - Applying Deep Learning To Airbnb Search (Paper)
Airbnb2018 - In-session Personalization for Talent Search (Paper)
LinkedIn2018 - Talent Search and Recommendation Systems at LinkedIn (Paper)
LinkedIn2018 - Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber2018 - Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper)
Alibaba2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba2018 - Semantic Product Search (Paper)
Amazon2019 - Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb2019 - Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn2019 - The AI Behind LinkedIn Recruiter Search and recommendation systems
LinkedIn2019 - Learning Hiring Preferences: The AI Behind LinkedIn Jobs
LinkedIn2019 - The Secret Sauce Behind Search Personalisation
Gojek2019 - Neural Code Search: ML-based Code Search Using Natural Language Queries
Facebook2019 - Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper)
Alibaba2019 - Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search
Alibaba2019 - Understanding Searches Better Than Ever Before (Paper)
Google2019 - How We Used Semantic Search to Make Our Search 10x Smarter
Tokopedia2019 - Query2vec: Search query expansion with query embeddings
GrubHub2019 - MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search
Baidu2019 - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon2020 - Managing Diversity in Airbnb Search (Paper)
Airbnb2020 - Improving Deep Learning for Airbnb Search (Paper)
Airbnb2020 - Quality Matches Via Personalized AI for Hirer and Seeker Preferences
LinkedIn2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn2020 - Ads Allocation in Feed via Constrained Optimization (Paper,Video)
LinkedIn2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking
LinkedIn2020 - AI at Scale in Bing
Microsoft2020 - Query Understanding Engine in Traveloka Universal Search
Traveloka2020 - Bayesian Product Ranking at Wayfair
Wayfair2020 - COLD: Towards the Next Generation of Pre-Ranking System (Paper)
Alibaba2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper,Video)
Pinterest2020 - Driving Shopping Upsells from Pinterest Search
Pinterest2020 - GDMix: A Deep Ranking Personalization Framework (Code)
LinkedIn2020 - Bringing Personalized Search to Etsy
Etsy2020 - Building a Better Search Engine for Semantic Scholar
Allen Institute for AI2020 - Query Understanding for Natural Language Enterprise Search (Paper)
Salesforce2020 - Things Not Strings: Understanding Search Intent with Better Recall
DoorDash2020 - Query Understanding for Surfacing Under-served Music Content (Paper)
Spotify2020 - Embedding-based Retrieval in Facebook Search (Paper)
Facebook2020 - Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper)
JD2020 - QUEEN: Neural query rewriting in e-commerce (Paper)
Amazon2021 - Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper)
Amazon2021 - Seasonal relevance in e-commerce search (Paper)
Amazon2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba2021 - How We Built A Context-Specific Bidding System for Etsy Ads
Etsy2021 - Pre-trained Language Model based Ranking in Baidu Search (Paper)
Baidu2021 - Stitching together spaces for query-based recommendations
Stitch Fix2021 - Deep Natural Language Processing for LinkedIn Search Systems (Paper)
LinkedIn2021 - Siamese BERT-based Model for Web Search Relevance Ranking (Paper,Code)
Seznam2021 - SearchSage: Learning Search Query Representations at Pinterest
Pinterest2021 - Query2Prod2Vec: Grounded Word Embeddings for eCommerce
Coveo2021 - 3 Changes to Expand DoorDash’s Product Search Beyond Delivery
DoorDash2022 - Learning To Rank Diversely
Airbnb2022 - How to Optimise Rankings with Cascade Bandits
Expedia2022 - A Guide to Google Search Ranking Systems
Google2022 - Deep Learning for Search Ranking at Etsy
Etsy2022 - Search at Calm
Calm2022
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears2017 - Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba2018 - Embeddings@Twitter
Twitter2018 - Listing Embeddings in Search Ranking (Paper)
Airbnb2018 - Understanding Latent Style
Stitch Fix2018 - Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn2018 - Personalized Store Feed with Vector Embeddings
DoorDash2018 - Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper)
Moshbit2019 - Machine Learning for a Better Developer Experience
Netflix2020 - Announcing ScaNN: Efficient Vector Similarity Search (Paper,Code)
Google2020 - BERT Goes Shopping: Comparing Distributional Models for Product Representations
Coveo2021 - The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference
Coveo2022 - Embedding-based Retrieval at Scribd
Scribd2021 - Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper)
Apple2022 - Embeddings at Spotify's Scale - How Hard Could It Be?
Spotify2023
- Abusive Language Detection in Online User Content (Paper)
Yahoo2016 - Smart Reply: Automated Response Suggestion for Email (Paper)
Google2016 - Building Smart Replies for Member Messages
LinkedIn2017 - How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn2019 - Gmail Smart Compose: Real-Time Assisted Writing (Paper)
Google2019 - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon2019 - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix2019 - DeText: A deep NLP Framework for Intelligent Text Understanding (Code)
LinkedIn2020 - SmartReply for YouTube Creators
Google2020 - Using Neural Networks to Find Answers in Tables (Paper)
Google2020 - A Scalable Approach to Reducing Gender Bias in Google Translate
Google2020 - Assistive AI Makes Replying Easier
Microsoft2020 - AI Advances to Better Detect Hate Speech
Facebook2020 - A State-of-the-Art Open Source Chatbot (Paper)
Facebook2020 - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook2020 - Deep Learning to Translate Between Programming Languages (Paper,Code)
Facebook2020 - Deploying Lifelong Open-Domain Dialogue Learning (Paper)
Facebook2020 - Introducing Dynabench: Rethinking the way we benchmark AI
Facebook2020 - How Gojek Uses NLP to Name Pickup Locations at Scale
Gojek2020 - The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu2020 - PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper,Code)
Google2020 - Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo)
Salesforce2020 - GeDi: A Powerful New Method for Controlling Language Models (Paper,Code)
Salesforce2020 - Applying Topic Modeling to Improve Call Center Operations
RICOH2020 - WIDeText: A Multimodal Deep Learning Framework
Airbnb2020 - Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code)
Facebook2021 - How we reduced our text similarity runtime by 99.96%
Microsoft2021 - Textless NLP: Generating expressive speech from raw audio(Part 1)(Part 2)(Part 3)(Code and Pretrained Models)
Facebook2021 - Grammar Correction as You Type, on Pixel 6
Google2021 - Auto-generated Summaries in Google Docs
Google2022 - ML-Enhanced Code Completion Improves Developer Productivity
Google2022 - Words All the Way Down — Conversational Sentiment Analysis
PayPal2022
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health2015 - Deep Learning for Understanding Consumer Histories (Paper)
Zalando2016 - Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health2016 - Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper)
Telefonica2017 - Deep Learning for Electronic Health Records (Paper)
Google2018 - Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba2019 - Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper)
Alibaba2020 - How Duolingo uses AI in every part of its app
Duolingo2020 - Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper,Video)
Facebook2020 - Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn2021
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox2017 - Categorizing Listing Photos at Airbnb
Airbnb2018 - Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb
Airbnb2019 - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic - Making machines recognize and transcribe conversations in meetings using audio and video
Microsoft2019 - Powered by AI: Advancing product understanding and building new shopping experiences
Facebook2020 - A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google2020 - Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google2020 - RepNet: Counting Repetitions in Videos (Paper)
Google2020 - Converting Text to Images for Product Discovery (Paper)
Amazon2020 - How Disney Uses PyTorch for Animated Character Recognition
Disney2020 - Image Captioning as an Assistive Technology (Video)
IBM2020 - AI for AG: Production machine learning for agriculture
Blue River2020 - AI for Full-Self Driving at Tesla
Tesla2020 - On-device Supermarket Product Recognition
Google2020 - Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper)
Google2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper,Video)
Pinterest2020 - Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper)
Google2020 - Vision-based Price Suggestion for Online Second-hand Items (Paper)
Alibaba2020 - New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper,Model)
Facebook2021 - An Efficient Training Approach for Very Large Scale Face Recognition (Paper)
Alibaba2021 - Identifying Document Types at Scribd
Scribd2021 - Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper)
Walmart2021 - Recognizing People in Photos Through Private On-Device Machine Learning
Apple2021 - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Google2022 - Contrastive language and vision learning of general fashion concepts (Paper)
Coveo2022 - Leveraging Computer Vision for Search Ranking
BazaarVoice2023
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba2018 - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba2018 - Reinforcement Learning for On-Demand Logistics
DoorDash2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper)
Alibaba2018 - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba2019 - Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga2020 - Deep Reinforcement Learning in Production Part1Part 2
Zynga2020 - Building AI Trading Systems
Denny Britz2020 - Shifting Consumption towards Diverse content via Reinforcement Learning (Paper)
Spotify2022 - Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta2022 - How to Optimise Rankings with Cascade Bandits
Expedia2022 - Selecting the Best Image for Each Merchant Using Exploration and Machine Learning
DoorDash2023
- Detecting Performance Anomalies in External Firmware Deployments
Netflix2019 - Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn2019 - Deep Anomaly Detection with Spark and Tensorflow(Hopsworks Video)
Swedbank,Hopsworks2019 - Preventing Abuse Using Unsupervised Learning
LinkedIn2020 - The Technology Behind Fighting Harassment on LinkedIn
LinkedIn2020 - Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial2020 - How Does Spam Protection Work on Stack Exchange?
Stack Exchange2020 - Auto Content Moderation in C2C e-Commerce
Mercari2020 - Blocking Slack Invite Spam With Machine Learning
Slack2020 - Cloudflare Bot Management: Machine Learning and More
Cloudflare2020 - Anomalies in Oil Temperature Variations in a Tunnel Boring Machine
SENER2020 - Using Anomaly Detection to Monitor Low-Risk Bank Customers
Rabobank2020 - Fighting fraud with Triplet Loss
OLX Group2020 - Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative)
Facebook2020 - How AI is getting better at detecting hate speechPart 1,Part 2,Part 3,Part 4
Facebook2020 - Using deep learning to detect abusive sequences of member activity (Video)
LinkedIn2021 - Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop
Uber2022 - Graph for Fraud Detection
Grab2022 - Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms
Meta2022 - Evolving our machine learning to stop mobile bots
Cloudflare2022 - Improving the accuracy of our machine learning WAF using data augmentation and sampling
Cloudflare2022 - Machine Learning for Fraud Detection in Streaming Services
Netflix2022 - Pricing at Lyft
Lyft2022
- Building The LinkedIn Knowledge Graph
LinkedIn2016 - Scaling Knowledge Access and Retrieval at Airbnb
Airbnb2018 - Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest2018 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber2019 - AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba2019 - Contextualizing Airbnb by Building Knowledge Graph
Airbnb2019 - Retail Graph — Walmart’s Product Knowledge Graph
Walmart2020 - Traffic Prediction with Advanced Graph Neural Networks
DeepMind2020 - SimClusters: Community-Based Representations for Recommendations (Paper,Video)
Twitter2020 - Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper)
Alibaba2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper)
Alibaba2021 - JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper)
JPMorgan Chase2021 - How AWS uses graph neural networks to meet customer needs
Amazon2022 - Graph for Fraud Detection
Grab2022
- Matchmaking in Lyft Line (Part 1)(Part 2)(Part 3)
Lyft2016 - The Data and Science behind GrabShare Carpooling(Part 1) (PAPER NEEDED)
Grab2017 - How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber2018 - Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash2020 - Optimization of Passengers Waiting Time in Elevators Using Machine Learning
Thyssen Krupp AG2020 - Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper)
Amazon2020 - Optimizing DoorDash’s Marketing Spend with Machine Learning
DoorDash2020 - Using learning-to-rank to precisely locate where to deliver packages (Paper)
Amazon2021
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten2013 - Using Machine Learning to Index Text from Billions of Images
Dropbox2018 - Extracting Structured Data from Templatic Documents (Paper)
Google2020 - AutoKnow: self-driving knowledge collection for products of thousands of types (Paper,Video)
Amazon2020 - One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper)
Alibaba2020 - Information Extraction from Receipts with Graph Convolutional Networks
Nanonets2021
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google2019 - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel2019 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple2019 - Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM2019
- Better Language Models and Their Implications (Paper)
OpenAI2019 - Image GPT (Paper,Code)
OpenAI2019 - Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI2020 - Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar2020 - Unit Test Case Generation with Transformers
Microsoft2021
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google2020 - The Machine Learning Behind Hum to Search
Google2020
- Federated Learning: Collaborative Machine Learning without Centralized Training Data (Paper)
Google2017 - Federated Learning with Formal Differential Privacy Guarantees (Paper)
Google2022 - MPC-based machine learning: Achieving end-to-end privacy-preserving machine learning (Paper)
Facebook2022
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper)
Google2010 - The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google2015 - Twitter Experimentation: Technical Overview
Twitter2015 - It’s All A/Bout Testing: The Netflix Experimentation Platform
Netflix2016 - Building Pinterest’s A/B Testing Platform
Pinterest2016 - Experimenting to Solve Cramming
Twitter2017 - Building an Intelligent Experimentation Platform with Uber Engineering
Uber2017 - Scaling Airbnb’s Experimentation Platform
Airbnb2017 - Meet Wasabi, an Open Source A/B Testing Platform (Code)
Intuit2017 - Analyzing Experiment Outcomes: Beyond Average Treatment Effects
Uber2018 - Under the Hood of Uber’s Experimentation Platform
Uber2018 - Constrained Bayesian Optimization with Noisy Experiments (Paper)
Facebook2018 - Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab
Grab2018 - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better2019 - Detecting Interference: An A/B Test of A/B Tests
LinkedIn2019 - Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber2020 - Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka2020 - Large Scale Experimentation at Stitch Fix (Paper)
Stitch Fix2020 - Multi-Armed Bandits and the Stitch Fix Experimentation Platform
Stitch Fix2020 - Experimentation with Resource Constraints
Stitch Fix2020 - Computational Causal Inference at Netflix (Paper)
Netflix2020 - Key Challenges with Quasi Experiments at Netflix
Netflix2020 - Making the LinkedIn experimentation engine 20x faster
LinkedIn2020 - Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn
LinkedIn2020 - How to Use Quasi-experiments and Counterfactuals to Build Great Products
Shopify2020 - Improving Experimental Power through Control Using Predictions as Covariate
DoorDash2020 - Supporting Rapid Product Iteration with an Experimentation Analysis Platform
DoorDash2020 - Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity
DoorDash2020 - Leveraging Causal Modeling to Get More Value from Flat Experiment Results
DoorDash2020 - Iterating Real-time Assignment Algorithms Through Experimentation
DoorDash2020 - Spotify’s New Experimentation Platform (Part 1)(Part 2)
Spotify2020 - Interpreting A/B Test Results: False Positives and Statistical Significance
Netflix2021 - Interpreting A/B Test Results: False Negatives and Power
Netflix2021 - Running Experiments with Google Adwords for Campaign Optimization
DoorDash2021 - The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%
DoorDash2021 - Experimentation Platform at Zalando: Part 1 - Evolution
Zalando2021 - Designing Experimentation Guardrails
Airbnb2021 - How Airbnb Measures Future Value to Standardize Tradeoffs
Airbnb2021 - Network Experimentation at Scale(Paper]
Facebook2021 - Universal Holdout Groups at Disney Streaming
Disney2021 - Experimentation is a major focus of Data Science across Netflix
Netflix2022 - Search Journey Towards Better Experimentation Practices
Spotify2022 - Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb
Airbnb2022 - Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving
Airbnb2022 - Challenges in Experimentation
Lyft2022 - Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity
Booking2022 - Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash
DoorDash2022 - Comparing quantiles at scale in online A/B-testing
Spotify2022 - Accelerating our A/B experiments with machine learning
Dropbox2023 - Supercharging A/B Testing at Uber
Uber
- Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast2018 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple2019 - Runway - Model Lifecycle Management at Netflix
Netflix2020 - Managing ML Models @ Scale - Intuit’s ML Platform
Intuit2020 - ML Model Monitoring - 9 Tips From the Trenches
Nubank2021 - Dealing with Train-serve Skew in Real-time ML Models: A Short Guide
Nubank2023
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper)
Facebook2020 - How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
Roblox2020 - Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper)
Uber2021 - GPU-accelerated ML Inference at Pinterest
Pinterest2022
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn2020 - LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper)
LinkedIn2020 - Introducing Twitter’s first algorithmic bias bounty challenge
Twitter2021 - Examining algorithmic amplification of political content on Twitter
Twitter2021 - A closer look at how LinkedIn integrates fairness into its AI products
LinkedIn2022
- Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook2020 - Elastic Distributed Training with XGBoost on Ray
Uber2021
- Meet Michelangelo: Uber’s Machine Learning Platform
Uber2017 - Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions
Comcast2018 - Big Data Machine Learning Platform at Pinterest
Pinterest2019 - Core Modeling at Instagram
Instagram2019 - Open-Sourcing Metaflow - a Human-Centric Framework for Data Science
Netflix2019 - Managing ML Models @ Scale - Intuit’s ML Platform
Intuit2020 - Real-time Machine Learning Inference Platform at Zomato
Zomato2020 - Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform
Lyft2020 - Building Flexible Ensemble ML Models with a Computational Graph
DoorDash2021 - LyftLearn: ML Model Training Infrastructure built on Kubernetes
Lyft2021 - "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper)
Coveo2021 - MLOps at GreenSteam: Shipping Machine Learning
GreenSteam2021 - Evolving Reddit’s ML Model Deployment and Serving Architecture
Reddit2021 - Redesigning Etsy’s Machine Learning Platform
Etsy2021 - Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper)
Meta2021 - Building a Platform for Serving Recommendations at Etsy
Etsy2022 - Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb
Airbnb2022 - DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn
LinkedIn2022 - The Magic of Merlin: Shopify's New Machine Learning Platform
Shopify2022 - Zalando's Machine Learning Platform
Zalando2022 - Inside Meta's AI optimization platform for engineers across the company (Paper)
Meta2022 - Monzo’s machine learning stack
Monzo2022 - Evolution of ML Fact Store
Netflix2022 - Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline
Binance2022 - Serving Machine Learning Models Efficiently at Scale at Zillow
Zillow2022 - Didact AI: The anatomy of an ML-powered stock picking engine
Didact AI2022 - Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists
Stitch Fix2022 - Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper)
IBM2022
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio2012 - Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google2014 - Rules of Machine Learning: Best Practices for ML Engineering
Google2018 - On Challenges in Machine Learning Model Management
Amazon2018 - Machine Learning in Production: The Booking.com Approach
Booking2019 - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking2019 - Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank
Rabobank2019 - Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper)
Cambridge2020 - Reengineering Facebook AI’s Deep Learning Platforms for Interoperability
Facebook2020 - The problem with AI developer tools for enterprises
Databricks2020 - Continuous Integration and Deployment for Machine Learning Online Serving and Models
Uber2021 - Tuning Model Performance
Uber2021 - Maintaining Machine Learning Model Accuracy Through Monitoring
DoorDash2021 - Building Scalable and Performant Marketing ML Systems at Wayfair
Wayfair2021 - Our approach to building transparent and explainable AI systems
LinkedIn2021 - 5 Steps for Building Machine Learning Models for Business
Shopify2021 - Data Is An Art, Not Just A Science—And Storytelling Is The Key
Shopify2022 - Best Practices for Real-time Machine Learning: Alerting
Nubank2022 - Automatic Retraining for Machine Learning Models: Tips and Lessons Learned
Nubank2022 - RecSysOps: Best Practices for Operating a Large-Scale Recommender System
Netflix2022 - ML Education at Uber: Frameworks Inspired by Engineering Principles
Uber2022 - Building and Maintaining Internal Tools for DS/ML teams: Lessons Learned
Nubank2024
- What is the most effective way to structure a data science team?
Udemy2017 - Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix2016 - Building The Analytics Team At Wish
Wish2018 - Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix2019 - Cultivating Algorithms: How We Grow Data Science at Stitch Fix
Stitch Fix - Analytics at Netflix: Who We Are and What We Do
Netflix2020 - Building a Data Team at a Mid-stage Startup: A Short Story
Erikbern2021 - A Behind-the-Scenes Look at How Postman’s Data Team Works
Postman2021 - Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike?
Nubank2022
- When It Comes to Gorillas, Google Photos Remains Blind
Google2018 - 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate2020 - An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor
Harrisburg University2020 - It's Hard to Generate Neural Text From GPT-3 About Muslims
OpenAI2020 - A British AI Tool to Predict Violent Crime Is Too Flawed to Use
United Kingdom2020 - More inawful-ai
- AI Incident Database
Partnership on AI2022
P.S., Want a summary of ML advancements? Get up to speed with survey papers 👉ml-surveys
About
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.