Movatterモバイル変換


[0]ホーム

URL:


Uploaded bytakezoe
PDF, PPTX2,425 views

Testing Distributed Query Engine as a Service

Naoki Takezoe from Treasure Data discussed testing their distributed query engine Presto as a service. They developed a tool called presto-query-simulator to test using production data and queries in a safe manner. The tool reduces testing time by grouping similar queries and narrowing data scans. It also helps analyze results and find problematic queries. Future work includes running tests more frequently and improving coverage.

Embed presentation

Download as PDF, PPTX
Naoki TakezoePresto Conference Tokyo 2020Nov 20, 2020Testing Distributed QueryEngine as a ServiceDeliver our service to customers as safe as possible
© 2020 Treasure DataWho am I?• Naoki Takezoe• Joined Treasure Data in 2018• Work for Presto / Apache Spark• Open Source• GitBucket• Scalatra• Apache PredictionIO• Books• Japanese translation of Scala Puzzlers• Scala 300 recipes, etcTwitter: @takezoenGitHub: https://github.com/takezoe
© 2020 Treasure DataTreasure DataLogsDeviceDataBatchDataPlazmaDBTable SchemaData Collection Cloud Storage Distributed Data ProcessingJobsJob ManagementSQL EditorSchedulerWorkflowsMachineLearningTreasure Data OSSThird Party OSSDataReady to use Cloud Data Platform
© 2020 Treasure DataPresto at Treasure Data• 2010• Presto, developed at Facebook, was open-sourced• Treasure Data was providing Impala As A Service• 2014• Launched Presto As A Service as a replacement of Impala• 2015• 20,000 queries / day• 2019• Reached 1,000,000 queries / day• Presto creators (Martin, Dain and David) left Facebook and founded anNPO Presto Software Foundation (prestosql), then joined Starburst• Hosted Presto Conference in Tokyo
© 2020 Treasure Data
© 2020 Treasure DataDeliver our service to customersas safe as possible
© 2020 Treasure DataTesting distributed database is challenging• Variety of workload• Possible performance degradation• Cluster status• Many corner cases
© 2020 Treasure DataTest can be more important when upgrading Presto• Presto development is super active• 27 releases in 2019• 18 releases in 2020 at this point (Nov 14)• No stable version• Incompatible updates come with bug fixes• Sticking to one version cannot be an option• Backport bug fixes and new features from newer version also getschallenging over timeHow we can upgrade Presto safely...?
© 2020 Treasure DataIn order to minimize the riskUnit test Integration test System testRegular performance provingGradual migration for big updateInternal dogfoodingCluster status monitoringTestRelease processMonitoring
© 2020 Treasure DataWhat are missing?• Covering variety of use cases• Performance degradation in corner cases• Unknown compatibility issues• Production-scale environment• Data size and characteristics• Number of queries, cluster size, etc
© 2020 Treasure DataWhat’s a solution?
© 2020 Treasure Datapresto-query-simulatorTest using production data and queries with security and safetyBase ClusterTarget ClusterQuery Log Hashed ResultsReportQuery SetReal Database Test Databaseread write• Security: We don’t see customer data and query results• Safety: We don’t cause any side-effect on customer dataQuery Metrics
© 2020 Treasure DataChallenges in query-simulator• Query simulation takes very long time• Testing 1-day queries will take 1 day at least, theoretically• Not only time, but also cost of test clusters is the matter• Result verification is not straightforward• Many false positives and duplications• Result analysis tends to depend on personal knowledge
© 2020 Treasure DataMake query simulation faster• Reduce number of queries by grouping by query signature (up to -90%)• Reduce amount of data by narrowing table scan ranges (up to -80%)• Use multiple Presto clusters• Test only long-running queries
© 2020 Treasure DataQuery signatureSELECT time, path, user_agentFROM accessWHERE TD_INTERVAL(time, '-1M')SELECT time, path, user_agentFROM access aINNER JOIN account b ON a.account_id = b.account_idS(T) access->#S(J(T,T)) access->#,account->#Simplified expression of query structureOpen-source Scala implementation is included in Airframe:https://github.com/wvlet/airframe/blob/master/airframe-sql/src/main/scala/wvlet/airframe/sql/analyzer/QuerySignature.scala
© 2020 Treasure DataNarrowing scan rangesTime distribution of recordsUse only x% of total records by adding a time range predicateSELECT time, parh, user_agentFROM accessSELECT time, path, user_agentFROM (SELECT time, path, user_agentFROM access)WHERE TD_TIME_RANGE(time, from, to)Original scan rangeUse this range only
© 2020 Treasure DataWe choose these options depending on thepurpose of query simulation• Reduce number of queries by grouping by query signature (up to -90%)• Reduce amount of data by narrowing table scan ranges (up to -80%)• Use multiple Presto clusters• Test only long-running queriesfor checking compatibility? or for checking performance difference?
© 2020 Treasure DataMake result verification easier• Auto detect non-deterministic query results• Running query multiple times to see if results are the same• Grouping similar errors• Fuzzy comparison of error messages•• List problematic queries based on internal metrics• Performance, resource usage, scan ranges, worker distribution, etc• Finally, check problematic queries by human
© 2020 Treasure DataWe just need to check queries listed on the reportGive a possible reason ofthe inconsistent resultFailures are grouped by thesimilarity of error messagesList only queries morethan 5 min slower
© 2020 Treasure DataFuture work for further improvement• Run query simulation more frequently (hopefully regularly)• Further speed up is required• Maintain small but effective query sets for quick test• Automate test environment provisioning• Improve test coverage• Overcome some system-level restriction• Test with schema and data of that time (like time travel)• Improve the resolution of query grouping• ...and more!!
© 2020 Treasure DataRelated Work
© 2020 Treasure DataRelated Work• Snowtrail: Testing with Production Queries on a Cloud Database• https://resources.snowflake.com/report/snowtrail-testing-with-production-series-on-a-cloud-database• クエリログを使ったAurora MySQLの負荷テスト• https://techlife.cookpad.com/entry/2020/10/13/090000• Building an Automated Testing Framework Based on Chaos Mesh and Argo• https://pingcap.com/blog/building-automated-testing-framework-based-on-chaos-mesh-and-argo

Recommended

PDF
Support Presto as a feature of SaaS
PPTX
Redshift Introduction
PDF
Plazma - Treasure Data’s distributed analytical database -
PPTX
Keeping the Lights On with MongoDB
PPTX
Managing a MongoDB Deployment
PDF
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
PPTX
Amazon EMR
PDF
Architecture at Scale
PDF
Presto: Fast SQL on Everything
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
Análisis del roadmap del Elastic Stack
PPTX
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
PDF
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
PPTX
Built-In Security for the Cloud
PDF
Getting Started on Google Cloud Platform
PPTX
From PoCs to Production
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PPTX
Is there a way that we can build our Azure Data Factory all with parameters b...
PDF
Treasure Data and Fluentd
PDF
User Defined Partitioning on PlazmaDB
PPTX
Gs08 modernize your data platform with sql technologies wash dc
PDF
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
PDF
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPTX
Test Automation for NoSQL Databases
PDF
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
PDF
Jump Start on Apache Spark 2.2 with Databricks
PDF
Real-time personal trainer on the SMACK stack
PDF
Internals of Presto Service
PDF
Journey of Migrating Millions of Queries on The Cloud

More Related Content

PDF
Support Presto as a feature of SaaS
PPTX
Redshift Introduction
PDF
Plazma - Treasure Data’s distributed analytical database -
PPTX
Keeping the Lights On with MongoDB
PPTX
Managing a MongoDB Deployment
PDF
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
PPTX
Amazon EMR
PDF
Architecture at Scale
Support Presto as a feature of SaaS
Redshift Introduction
Plazma - Treasure Data’s distributed analytical database -
Keeping the Lights On with MongoDB
Managing a MongoDB Deployment
Роман Новиков "Best Practices for MySQL Performance & Troubleshooting with th...
 
Amazon EMR
Architecture at Scale

What's hot

PDF
Presto: Fast SQL on Everything
PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
PDF
Análisis del roadmap del Elastic Stack
PPTX
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
PDF
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
PPTX
Built-In Security for the Cloud
PDF
Getting Started on Google Cloud Platform
PPTX
From PoCs to Production
PDF
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
PPTX
Is there a way that we can build our Azure Data Factory all with parameters b...
PDF
Treasure Data and Fluentd
PDF
User Defined Partitioning on PlazmaDB
PPTX
Gs08 modernize your data platform with sql technologies wash dc
PDF
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
PDF
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
PDF
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
PPTX
Test Automation for NoSQL Databases
PDF
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
PDF
Jump Start on Apache Spark 2.2 with Databricks
PDF
Real-time personal trainer on the SMACK stack
Presto: Fast SQL on Everything
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Análisis del roadmap del Elastic Stack
IMC Summit 2016 Breakout - Roman Shtykh - Apache Ignite as a Data Processing Hub
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Built-In Security for the Cloud
Getting Started on Google Cloud Platform
From PoCs to Production
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Is there a way that we can build our Azure Data Factory all with parameters b...
Treasure Data and Fluentd
User Defined Partitioning on PlazmaDB
Gs08 modernize your data platform with sql technologies wash dc
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Test Automation for NoSQL Databases
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Jump Start on Apache Spark 2.2 with Databricks
Real-time personal trainer on the SMACK stack

Similar to Testing Distributed Query Engine as a Service

PDF
Internals of Presto Service
PDF
Journey of Migrating Millions of Queries on The Cloud
PDF
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
PDF
Presto meetup 2015-03-19 @Facebook
PDF
Presto @ Treasure Data - Presto Meetup Boston 2015
PDF
Presto At Arm Treasure Data - 2019 Updates
PPT
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
PDF
Learn from Case Study; How do people run query on Trino? / Trino japan virtua...
PDF
Choosing the Right Database
PDF
Presto At Treasure Data
PDF
Presto as a Service - Tips for operation and monitoring
PDF
Choosing the right database
PPTX
Partner webinar presentation aws pebble_treasure_data
PDF
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
PDF
SQL on Hadoop in Taiwan
PDF
Treasure Data Cloud Strategy
PDF
Your Timestamps Deserve Better than a Generic Database
PDF
Non-Relational Postgres
byEDB
 
PDF
Overview of data analytics service: Treasure Data Service
PDF
Fluentd meetup #3
Internals of Presto Service
Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Presto meetup 2015-03-19 @Facebook
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto At Arm Treasure Data - 2019 Updates
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Learn from Case Study; How do people run query on Trino? / Trino japan virtua...
Choosing the Right Database
Presto At Treasure Data
Presto as a Service - Tips for operation and monitoring
Choosing the right database
Partner webinar presentation aws pebble_treasure_data
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)
 
SQL on Hadoop in Taiwan
Treasure Data Cloud Strategy
Your Timestamps Deserve Better than a Generic Database
Non-Relational Postgres
byEDB
 
Overview of data analytics service: Treasure Data Service
Fluentd meetup #3

More from takezoe

PDF
GitBucket: Open source self-hosting Git server built by Scala
PDF
Revisit Dependency Injection in scala
PDF
How to keep maintainability of long life Scala applications
PDF
頑張りすぎないScala
PDF
GitBucket: Git Centric Software Development Platform by Scala
PDF
Non-Functional Programming in Scala
PDF
Scala警察のすすめ
PDF
Scala製機械学習サーバ「Apache PredictionIO」
PDF
The best of AltJava is Xtend
PDF
Scala Warrior and type-safe front-end development with Scala.js
PDF
Tracing Microservices with Zipkin
PDF
Type-safe front-end development with Scala
PDF
Scala Frameworks for Web Application 2016
PDF
Macro in Scala
PDF
Java9 and Project Jigsaw
PDF
Reactive database access with Slick3
PDF
markedj: The best of markdown processor on JVM
PDF
ネタじゃないScala.js
PDF
Excel方眼紙を支えるJava技術 2015
PDF
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
GitBucket: Open source self-hosting Git server built by Scala
Revisit Dependency Injection in scala
How to keep maintainability of long life Scala applications
頑張りすぎないScala
GitBucket: Git Centric Software Development Platform by Scala
Non-Functional Programming in Scala
Scala警察のすすめ
Scala製機械学習サーバ「Apache PredictionIO」
The best of AltJava is Xtend
Scala Warrior and type-safe front-end development with Scala.js
Tracing Microservices with Zipkin
Type-safe front-end development with Scala
Scala Frameworks for Web Application 2016
Macro in Scala
Java9 and Project Jigsaw
Reactive database access with Slick3
markedj: The best of markdown processor on JVM
ネタじゃないScala.js
Excel方眼紙を支えるJava技術 2015
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala

Recently uploaded

PDF
Here’s the case study that shows how companies lose ₹2–6 Cr annually due to m...
PPT
This-Project-Demonstrates-How-to-Create.ppt
PDF
Database Management Systems(DBMS):UNIT-I Introduction to Database(DBMS) BCA S...
PDF
KoderXpert – Odoo, Web & AI Solutions for Growing Businesses
PDF
How Does AI Improve Location-Based Mobile App Development for Businesses.pdf
PDF
Operating System (OS) :UNIT-I Introduction to Operating System BCA SEP SEM-II...
PDF
Navigating SEC Regulations for Crypto Exchanges Preparing for a Compliant Fut...
PDF
Red Hat Summit 2025 - Triton GPU Kernel programming.pdf
PDF
Constraints First - Why Our On-Prem Ticketing System Starts With Limits, Not ...
PPTX
Binance Smart Chain Development Guide.pptx
PPTX
Managed Splunk Partner vs In-House: Cost, Risk & Value Comparison
PDF
Database Management Systems(DBMS):UNIT-II Relational Data Model BCA SEP SEM ...
PDF
How Modern Custom Software is Revolutionizing Mortgage Lending Processes -Ma...
PDF
Blueprint to build quality before the code exists - StackConnect Milan 2025
PDF
Why Zoho Notebook’s AI-Fueled Upgrade Matters for Knowledge Workers in 2026
PDF
Cybersecurity Alert- What Organisations Must Watch Out For This Christmas Fes...
PPTX
Struggling with Pentaho Limitations How Helical Insight Solves Them.pptx
PDF
API_SECURITY CONSULTANCY SERVICES IN USA
PPTX
#15 All About Anypoint MQ - Calicut MuleSoft Meetup Group
PDF
Imed Eddine Bouchoucha | computer engineer | software Architect
Here’s the case study that shows how companies lose ₹2–6 Cr annually due to m...
This-Project-Demonstrates-How-to-Create.ppt
Database Management Systems(DBMS):UNIT-I Introduction to Database(DBMS) BCA S...
KoderXpert – Odoo, Web & AI Solutions for Growing Businesses
How Does AI Improve Location-Based Mobile App Development for Businesses.pdf
Operating System (OS) :UNIT-I Introduction to Operating System BCA SEP SEM-II...
Navigating SEC Regulations for Crypto Exchanges Preparing for a Compliant Fut...
Red Hat Summit 2025 - Triton GPU Kernel programming.pdf
Constraints First - Why Our On-Prem Ticketing System Starts With Limits, Not ...
Binance Smart Chain Development Guide.pptx
Managed Splunk Partner vs In-House: Cost, Risk & Value Comparison
Database Management Systems(DBMS):UNIT-II Relational Data Model BCA SEP SEM ...
How Modern Custom Software is Revolutionizing Mortgage Lending Processes -Ma...
Blueprint to build quality before the code exists - StackConnect Milan 2025
Why Zoho Notebook’s AI-Fueled Upgrade Matters for Knowledge Workers in 2026
Cybersecurity Alert- What Organisations Must Watch Out For This Christmas Fes...
Struggling with Pentaho Limitations How Helical Insight Solves Them.pptx
API_SECURITY CONSULTANCY SERVICES IN USA
#15 All About Anypoint MQ - Calicut MuleSoft Meetup Group
Imed Eddine Bouchoucha | computer engineer | software Architect

Testing Distributed Query Engine as a Service

  • 1.
    Naoki TakezoePresto ConferenceTokyo 2020Nov 20, 2020Testing Distributed QueryEngine as a ServiceDeliver our service to customers as safe as possible
  • 2.
    © 2020 TreasureDataWho am I?• Naoki Takezoe• Joined Treasure Data in 2018• Work for Presto / Apache Spark• Open Source• GitBucket• Scalatra• Apache PredictionIO• Books• Japanese translation of Scala Puzzlers• Scala 300 recipes, etcTwitter: @takezoenGitHub: https://github.com/takezoe
  • 3.
    © 2020 TreasureDataTreasure DataLogsDeviceDataBatchDataPlazmaDBTable SchemaData Collection Cloud Storage Distributed Data ProcessingJobsJob ManagementSQL EditorSchedulerWorkflowsMachineLearningTreasure Data OSSThird Party OSSDataReady to use Cloud Data Platform
  • 4.
    © 2020 TreasureDataPresto at Treasure Data• 2010• Presto, developed at Facebook, was open-sourced• Treasure Data was providing Impala As A Service• 2014• Launched Presto As A Service as a replacement of Impala• 2015• 20,000 queries / day• 2019• Reached 1,000,000 queries / day• Presto creators (Martin, Dain and David) left Facebook and founded anNPO Presto Software Foundation (prestosql), then joined Starburst• Hosted Presto Conference in Tokyo
  • 5.
  • 6.
    © 2020 TreasureDataDeliver our service to customersas safe as possible
  • 7.
    © 2020 TreasureDataTesting distributed database is challenging• Variety of workload• Possible performance degradation• Cluster status• Many corner cases
  • 8.
    © 2020 TreasureDataTest can be more important when upgrading Presto• Presto development is super active• 27 releases in 2019• 18 releases in 2020 at this point (Nov 14)• No stable version• Incompatible updates come with bug fixes• Sticking to one version cannot be an option• Backport bug fixes and new features from newer version also getschallenging over timeHow we can upgrade Presto safely...?
  • 9.
    © 2020 TreasureDataIn order to minimize the riskUnit test Integration test System testRegular performance provingGradual migration for big updateInternal dogfoodingCluster status monitoringTestRelease processMonitoring
  • 10.
    © 2020 TreasureDataWhat are missing?• Covering variety of use cases• Performance degradation in corner cases• Unknown compatibility issues• Production-scale environment• Data size and characteristics• Number of queries, cluster size, etc
  • 11.
    © 2020 TreasureDataWhat’s a solution?
  • 12.
    © 2020 TreasureDatapresto-query-simulatorTest using production data and queries with security and safetyBase ClusterTarget ClusterQuery Log Hashed ResultsReportQuery SetReal Database Test Databaseread write• Security: We don’t see customer data and query results• Safety: We don’t cause any side-effect on customer dataQuery Metrics
  • 13.
    © 2020 TreasureDataChallenges in query-simulator• Query simulation takes very long time• Testing 1-day queries will take 1 day at least, theoretically• Not only time, but also cost of test clusters is the matter• Result verification is not straightforward• Many false positives and duplications• Result analysis tends to depend on personal knowledge
  • 14.
    © 2020 TreasureDataMake query simulation faster• Reduce number of queries by grouping by query signature (up to -90%)• Reduce amount of data by narrowing table scan ranges (up to -80%)• Use multiple Presto clusters• Test only long-running queries
  • 15.
    © 2020 TreasureDataQuery signatureSELECT time, path, user_agentFROM accessWHERE TD_INTERVAL(time, '-1M')SELECT time, path, user_agentFROM access aINNER JOIN account b ON a.account_id = b.account_idS(T) access->#S(J(T,T)) access->#,account->#Simplified expression of query structureOpen-source Scala implementation is included in Airframe:https://github.com/wvlet/airframe/blob/master/airframe-sql/src/main/scala/wvlet/airframe/sql/analyzer/QuerySignature.scala
  • 16.
    © 2020 TreasureDataNarrowing scan rangesTime distribution of recordsUse only x% of total records by adding a time range predicateSELECT time, parh, user_agentFROM accessSELECT time, path, user_agentFROM (SELECT time, path, user_agentFROM access)WHERE TD_TIME_RANGE(time, from, to)Original scan rangeUse this range only
  • 17.
    © 2020 TreasureDataWe choose these options depending on thepurpose of query simulation• Reduce number of queries by grouping by query signature (up to -90%)• Reduce amount of data by narrowing table scan ranges (up to -80%)• Use multiple Presto clusters• Test only long-running queriesfor checking compatibility? or for checking performance difference?
  • 18.
    © 2020 TreasureDataMake result verification easier• Auto detect non-deterministic query results• Running query multiple times to see if results are the same• Grouping similar errors• Fuzzy comparison of error messages•• List problematic queries based on internal metrics• Performance, resource usage, scan ranges, worker distribution, etc• Finally, check problematic queries by human
  • 19.
    © 2020 TreasureDataWe just need to check queries listed on the reportGive a possible reason ofthe inconsistent resultFailures are grouped by thesimilarity of error messagesList only queries morethan 5 min slower
  • 20.
    © 2020 TreasureDataFuture work for further improvement• Run query simulation more frequently (hopefully regularly)• Further speed up is required• Maintain small but effective query sets for quick test• Automate test environment provisioning• Improve test coverage• Overcome some system-level restriction• Test with schema and data of that time (like time travel)• Improve the resolution of query grouping• ...and more!!
  • 21.
    © 2020 TreasureDataRelated Work
  • 22.
    © 2020 TreasureDataRelated Work• Snowtrail: Testing with Production Queries on a Cloud Database• https://resources.snowflake.com/report/snowtrail-testing-with-production-series-on-a-cloud-database• クエリログを使ったAurora MySQLの負荷テスト• https://techlife.cookpad.com/entry/2020/10/13/090000• Building an Automated Testing Framework Based on Chaos Mesh and Argo• https://pingcap.com/blog/building-automated-testing-framework-based-on-chaos-mesh-and-argo

[8]ページ先頭

©2009-2025 Movatter.jp