Zhang et al., 2025
| Publication | Publication Date | Title |
|---|---|---|
| Tahboub et al. | How to architect a query compiler, revisited | |
| Crotty et al. | Tupleware:" Big" Data, Big Analytics, Small Clusters. | |
| Dyer et al. | Boa: Ultra-large-scale software repository and source-code mining | |
| Alexandrov et al. | The stratosphere platform for big data analytics | |
| Dyer et al. | Boa: A language and infrastructure for analyzing ultra-large-scale software repositories | |
| Armbrust et al. | Spark sql: Relational data processing in spark | |
| Crotty et al. | An architecture for compiling udf-centric workflows | |
| Gates et al. | Building a high-level dataflow system on top of Map-Reduce: the Pig experience | |
| Sakr et al. | The family of mapreduce and large-scale data processing systems | |
| Beyer et al. | Jaql: A scripting language for large scale semistructured data analysis | |
| Foufoulas et al. | YeSQL: " you extend SQL" with rich and highly performant user-defined functions in relational databases | |
| Gupta et al. | Aggify: Lifting the curse of cursor loops using custom aggregates | |
| Zhang et al. | G-TADOC: Enabling efficient GPU-based text analytics without decompression | |
| Wu et al. | Big data programming models | |
| Birjali et al. | Evaluation of high-level query languages based on MapReduce in Big Data | |
| Phani et al. | UPLIFT: parallelization strategies for feature transformations in machine learning workloads | |
| Manne et al. | CHEX: multiversion replay with ordered checkpoints | |
| Xu et al. | Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems | |
| Kunft et al. | Scootr: Scaling r dataframes on dataflow systems | |
| Zhang et al. | Mitigating the Impedance Mismatch between Prediction Query Execution and Database Engine | |
| Kim et al. | {STRADS-AP}: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model | |
| Preethi et al. | Big data analytics using Hadoop tools—Apache Hive vs Apache Pig | |
| Zhang et al. | Optimizing random access to hierarchically-compressed data on GPU | |
| Chandramouli et al. | The trill incremental analytics engine | |
| Middleton et al. | ECL/HPCC: A unified approach to big data |