Movatterモバイル変換


[0]ホーム

URL:


Jonathan Levin, profile picture
Uploaded byJonathan Levin
PDF, PPTX4,567 views

Scaling MySQL Strategies for Developers

Jonathan is a MySQL consultant who specializes in SQL, indexing, and reporting for big data. This tutorial will cover strategies for resolving 80% of performance problems, including indexes, partitioning, intensive table optimization, and finding and addressing bottlenecks. The strategies discussed will be common, established approaches based on the presenter's experience working with MySQL since 2007.

Embed presentation

Download as PDF, PPTX
Who Am I?• Jonathan• MySQL Consultant• Working with MySQL since 2007• Specialize in SQL, Indexing and Reporting (Big Data)
Who is this for?* Smilies indicate ability to control area
What Will I Cover?               Domain Knowledge This Much0%      20%                       100%
Solutions this              tutorial will coverOccurrences              Problems
serenesimplycomplicated.blogspot.co.uk/2012/07/developing-direction.html
http://frabz.com/3bv6
What Will I Cover?• The top 20% of the strategies to resolve 80% of  your performance problems• The strategies that are within reach of  developers• Strategies that are more common and more established  • From my experience.  • To reduce risk
Table of Contents    Part One             Part Three• Indexes             • Read Cache• Finding             • Scaling Reads Bottlenecks          • Reporting   Part Two           • Write Buffers• Partitioning        • Scaling Writes• Intensive Table     • Sharding Optimization
Indexes
What are Indexes?
Indexes• Advantages • Speed – Use the right path• Now used in NoSQL stores• “A properly indexed database will give  you very few problems” – me• My blog – “Indexing and Caching”
B-Tree Indexes(http://20bits.com/article/interview-questions-database-indexes)
(http://www.youtube.com/watch?v=coRJrcIYbF4)
Choosing the best IndexPrevent Readingfrom Disk                      *3                           *2                                       Prevent Extra                                *1     ProcessingPrevent Table Scans
Indexes• 1 Star – EXPLAIN • Type   • const - where id=1   • ref - where location='london'   • eq_ref - where t1.id = t2.id • Extra   • using where • Limitation – type   • range - where id in (1,2,3,4,5)
Indexes• 2 Star – EXPLAIN • Extra   • using where   • Using index • And Not   • Using filesort   • Using temporary • Limitation   • Using temporary - query contains different GROUP    BY and ORDER BY columns
Indexes• 3 Star – EXPLAIN • Type   • index • Extra   • Using index   • Using index for group-by
Index Examples
Regular UsageSELECT , FROM ₸  WHERE = 121;  (PRIMARY KEY )
Range Scan  SELECT ▲, ■ FROM ₸    WHERE = 121AND BETWEEN 1 AND 100       KEY ( , )
Range ScanSELECT ▲, ■ FROM ₸  WHERE = 121AND IN (1,100,30,7)     KEY ( , )
Covering IndexSELECT , FROM ₸  WHERE = 121   KEY ( , , )
Not OptimalSELECT       FROM ₸  WHERE = 121AND ♪ IN (1,100,30,7)     KEY ( ,       )
Broken RangeSELECT ▲, ■ FROM ₸  WHERE = 121AND IN (1,100,30,7);      KEY ( , )
Sub Queries   SELECT ▲, ■ FROM ₸WHERE IN (SELECT ♪ FROM ♠)          KEY ( )
Indexes for SortingSELECT ☼, ☺ FROM ₸  WHERE = 121    GROUP BY    ORDER BY        KEY ( )
Indexes for SortingSELECT ☼, ☺ FROM ₸  WHERE = 121    GROUP BY    ORDER BY      KEY ( ,   )     or KEY ( , )
Indexes for Joins SELECT . , .      FROMINNER JOIN ON . = .       WHERE . = 232            KEY ( )         or KEY( )
WHERE ₸.Ω = 232;   INNER JOIN ♫
FULL SCAN ♫   INNER JOIN ₸               FILTER Ω = 232
SELECT ▲, (SELECT .. FROM WHERE )       FROM ₸ WHERE ♪ IN   (SELECT ♪ FROM ♠ WHERE..); “I need help optimizing the my.cnf”
Clustered PK and                         Secondary Indexes                            KEY ( , , )                           PRIMARY KEY ( )                              KEY ( , )                 (Can be used in GROUP / ORDER BY                 SELECT variables to make Covering index)http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/
Index Merge SELECT ☺ FROM ₸WHERE =1 OR =2;      KEY ( )      KEY ( )
Index Merge    SELECT☺FROM ₸ WHERE =1UNION (SELECT☺FROM ₸ WHERE =2)            KEY ( )            KEY ( )* 5.6
Finding Bottlenecks
*        **
Gathering DataWhat you will need:1. MySQL Slow log: MySQL >= 5.1 or   Percona/MariaDB microslow patch2. Set long_query_time = 0 - for 6 to 24 hours   for decent sized slow log. (Make sure host   has enough space)
Log ProcessingWorst Response Queries1. Echo ‘’ > slow.log2. mysql> set global long_query_time=0; set   long_query_time = 0; flush logs;3. Wait X hours and return original value.4. pt-query-digest slow.log > slow.txt                                           Processing should beBulky Queries                              done on another host--filter ‘($event->{Rows_examined} > 1000)’Write Queries--filter '($event->{Rows_affected} > 0)‘
Log Processing• MySQL 5.6 • Statement Digest • No need for log processing to get Digest.
Worst Response Queries                 pt-query-digest slow.log > slow.txtRank   Response time    Calls   R/Call               Item                                       SELECT dp_node dp_usernode 1     480.9273 16.3%   600     0.8015 dp_buddylist dp_users                                       dp_node_access 2     322.4220 4.3% 129258 0.0025 ADMIN INIT DB 3     314.8719 4.2%    30220   0.0104 UPDATE dp_users 4     287.7109 3.8%    51606   0.0056 SET                                       SELECT dp_node dp_usernode 5     269.3434 3.6%    600     0.4489 dp_buddylist dp_users                                       dp_node_access 6     238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias
* Gap Locking   * pt-stalk
mysql tables in use 4, locked 25289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries363312MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423  SELECT r.exchange_rate INTO destination_exchange_rate      FROM exchange_rate AS r WHERE r.currency_id =  NAME_CONST('destination_currency_id',6) AND r.date =  NAME_CONST('day',_latin1'2012-06-30' COLLATE  'latin1_swedish_ci')Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916FTABLE LOCK table `currency` trx id 10ECC90CC lock mode ISRECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` oftable `currency` trx id 10ECC90CC lock mode STABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode ISRECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table`daily_summary` trx id 10ECC90CC lock mode STABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode ISTOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
*** (1) TRANSACTION:TRANSACTION 13DCDF4D9, ACTIVE 0 sec starting index readmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s)MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id118341815748UPDATE sys_doctrine_lock_trackingSET timestamp_obtained = '1341839053' WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137'*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table`sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but notgap waitingRecord lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format; 0: len 8; hex 43616d706169676e; 1: len 4; hex 31313436; asc 1146;; 2: len 6; hex 00013dc12d7a; asc = -z;; …
Bottlenecks• Locking Queries   • Try to make them complete as fast as possible • JOINs vs sub query • Function wrapped around index • UDF with SQL inside • Long Transactions • Looping with short queries
Misbehaving Optimizer• Optimizer Hints:  • USE INDEX  • FORCE INDEX  • IGNORE INDEX  • STRIGHT_JOIN• Joins  • LEFT sometimes faster then INNER* Too many indexes confuse the optimizer
Bottlenecks• Virtualization  • Increase (obscenely) innodb_log_file_size    • Needs restart + deleting old log files  • EXT3• General I/O improvements  • innodb-flush-log-at-trx-commit  • Sync binlog                              Group Commit  • Xa support
Bottlenecks• General I/O improvements • Percona server   • Better flushing to disk   • Less mutexes • Upgrade MySQL   • Same reasons as above • Innodb-io-capacity  http://www.wmarow.com/strcalc/
Database Upgrades• My Secret Sauce for smooth MySQL Migrations 1. Upgrade the dev/staging DBs with desired    version 2. Wait 1-2 months till silky-smooth   •   All features have been tested   •   All the query issues have been fixed 3. Upgrade servers   •   Down to Up – Slaves first
Bottlenecks• Mutexes • Query Cache   • “Freeing items” in processlist• Network                                       Batch Processes • Skip-name-resolve • net_write_timeout / read_timeout • thread_cache_size                          Lots of connections
Homework• Haven’t talked fully about:   SQL and EXPLAIN• Webinars  • Indexes - percona.tv/percona-webinars/tools-and-   techniques-for-index-design  • Explain - percona.tv/percona-webinars/explain-demystified• Websites  • http://www.myxplain.net
Part Two
Partitioning
ColumnsRows       Big Table
Columns            Partition            Partition            Partition            PartitionAlgorithm            Partition   Rows            Partition            Partition            Partition            Partition
Select                       Algorithm         Inserting                                                             Parallelize Data Unique Key                              Algorithm                                                                   Reduce DataForeign Keys                           Issues                                                     Use Cases Primary Key  Overhead                                    Partitioning                                                                       ID                Benefits                             Automatic                                                                         TimeB-TreeLevels                                  Manual                                                                  Hash         Short Scans                                 Time             80-90%                              Archive                                                                      DB Usage                                           Shards
Partitioning                    Parallelize Data                                                     Reduce Data• Use Cases                           Use Cases • Reducing Data – Only get the partition/table   that you need • Parallelizing Data - Get an equal amount of   data from each partition in parallel• Benefits                                                  Benefits • Shorter table scans                                 B-Tree • Less levels for index scans   Levels                                          Short Scans
Issues• Algorithm                                  Select                                                         Inserting                                           Algorithm • INSERTs                                               Algorithm • SELECTs                                             Issues• Keys               Unique Key • Foreign Key                            Foreign Keys • Unique Key                                     Primary Key                                                   Overhead • Primary Key Overhead   • Increases Table size   • Can change indexes
Partition Types• Range• List                              ID• Hash                     Automatic                                        Time• Key• Columns                        Hash• Sub-partitioning
Partitioning by Usage                        vs Partitioning by MaintenanceRank   Response time      Calls                          Item                             SELECT address FROM        WHERE 1     480.9273 26.3% 129258     BETWEEN ‘2012-11-10’ and ‘2012-11-                             17’                                  SELECT total FROM         WHERE 2     322.4220 14.3%     600                                  BETWEEN ‘2012-11-01’ and ‘2012-11-30’                                  UPDATE             SET active=1 WHERE   = 3       34.8719 4.2% 30220                                  17635376                                  SELECT dispatch_time FROM 4       28.7109 3.8% 51606                                  WHERE = 7387612
CREATE TABLE orders     id int unsigned not null auto_increment,     `date` date not null,     …                                Can also doPRIMARY KEY (date, id),                                PRIMARY KEY (id, date)KEY id (id),                    KEY date (date)..) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY        (        (date))(PARTITION   VALUES LESS THAN (      (‘2012-01-01’),PARTITION       VALUES LESS THAN (      (‘2012-06-01’)),PARTITION      VALUES LESS THAN           )
Manual Partitioning• Archive - Main table & Archive Table• Time – Create table per year, per month..• Shards – Table per country * Foreign keys             Manual                                        Time                  Archive                               Shards
Intensive Table Optimization
Intensive Table Optimization  Once upon a time, I was researchingways to make a database working set fit asmuch as possible to memory…
mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
Intensive Table Optimization1. People are usually very liberal with data type   sizes2. There were (usually) so many indexes that:  1. They multiplied the table size  2. Were not efficient compares to how the table is     used  3. Confused the optimizer3. Discovered partitions were not used or misused  1. Discovered sub partitions  2. Primary Key alignment4. Discovered InnoDB compression
Intensive Table OptimizationThere are tools to help with this, but…
quickmeme.com/meme/3rmy8y/
Intensive Table Optimization                                                               Data Types           Slowest Queries                      BottleneckUser Statistics                                 Optimizations                        Tables                                   Optional    Table Sizes                                                    Query Logs                                       Partitions                  Indexes                                                    Foreign Keys
Gathering DataWhat you will need:1. MySQL Slow log: MySQL >= 5.1 or   Percona/MariaDB microslow patch2. Set long_query_time = 0 - for 6 to 24 hours   for decent sized slow log. (Make sure host   has enough space)                  Slowest Queries                                      Bottleneck Tables
Gathering DataHelpful (Optional):1. Percona/Mariadb user_statistics patch2. Get list of most read/written tables3. Get list of used and un-used indexes4. List of largest tables                                     Table Sizes                   User Statistics                                      Bottleneck Tables
Worst Response Queries             pt-query-digest slow.log > slow.txtRank   Response time      Calls               Item 1     8589.9513 27.5% 231051 UPDATE dp_users 2     4752.6688 15.2%    257235   SELECT dp_cache_menu 3      1606.4946 5.1%    183542   SELECT community_chats 4      1418.9034 4.5%    259939   SELECT dp_cache 5       564.3305 1.8%   7970165   SELECT dp_url_alias 6       495.0092 1.6%    44940    SELECT dp_event dp_node
Table StatisticsSELECT table_name,FROM information_schema.table_statisticsORDER BY            DESC LIMIT 5;                        table_name        rows_read                        dp_users          2302477894ROWS_CHANGED            dp_node           1231318439ROWS_CHANGED_X_INDEXES dp_comments        1071462211                        dp_userpoints     1033073070                        dp_search_index   260154684
Table Statistics          Worst Response Tables            --group-by tablesRank Response time    Calls         Item 1 7975.4487 6.5% 124384 advertisement 2 5554.1435 4.5%     1834 info 3   4915.4816 4.0%   208     placement 4 4902.7644 4.0%     158     advert_summary
Table Sizes             Table_Name                                Rows            Data            Idx Total_size Idxfractotal_daily_summary                                   610M              77G           88G           165G            1.15advert_summary                                        478M              57G           45G           102G            0.78log_messages                                            92M             47G           10G             57G           0.21SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME,   CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M')                    ROWS,   CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G')        DATA,   CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G')       IDX,   CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE,   ROUND(INDEX_LENGTH / DATA_LENGTH, 2)                         IDXFRACFROM INFORMATION_SCHEMA.TABLESORDER BY DATA_LENGTH + INDEX_LENGTH DESC                                          http://www.mysqlperformanceblog.comLIMIT 10;                                                                            /2008/03/17/researching-your-mysql-table-sizes/
Which table needs             your attention?Table Size
Intensive Table Optimization• Table Targeting • The most “worthy” table to focus your   attention on   • Biggest bang for your buck• If you know which table is the most troublesome • Ignore most of the investigations • Apart from slow log • Investigations help understand DB usage
Optimizations      Compression                         Data Types                        Query Logs          Partitions                   IndexesSub Partitioning                                 Don’t Need                        Foreign Keys
Intensive Table Optimization• Datatypes • SELECT * FROM table                     G • Example: Tinyint instead of Bigint: (7 bytes row + 7bytes index) * 350million rows = 4.9Gb • Enum instead of Varchar • Remove NULLs when not needed
Intensive Table Optimization• Compression • Best for tables with a lot of varchar/text • Compress table by x2, x4, x8..   • Need to experiment with innodb_strict = on; • On my tests (5.5) – Very very slow   • Alter tables   • INSERTS/UPDATES/DELETES                                        Optimizations                        Compression
1.                      Slow Log                                  Filtered by     Get Data                    Target Table                  Query Digest                                   New Results                                                       EXPLAIN      3.                 2.                                           Index-Usage                     Make                           Test                  Assumptions                   Assumptions            4.        Deploy
Target Table ProcessingFilter Log:pt-query-digest slow.log--filter '$event->{arg} =~ m/dp_users /'--no-report --print >dp_users.logWorst Queries from new log:pt-query-digest dp_users.log --limit 100%>tbl_dp_users.txt
ResponseRank   time     Calls                        Item     209.2863         UPDATE dp_users SET access = 133******3 WHERE 1             88850      10.7%           = 23****01G     162.2711 3            1309010 SELECT access FROM dp_users WHERE       = 21***4G       8.3%     139.9009            SELECT uid, name FROM dp_users WHERE           =1 4               197       7.1%              ORDER BY          DESCG     133.8691            SELECT * FROM dp_users u WHERE          = 5               327       6.8%              's******s'G                      SELECT name, created, picture FROM dp_users WHERE     109.6903 6              29152 picture !='' AND      = '1' AND        BETWEEN       5.6%                      '133*****0' AND '133*****60'G      92.9095            SELECT dp_node dp_users using (     ) 7              360642       4.7%              dp_node_revisions      74.2426            SELECT * FROM dp_users u WHERE          = hoa****rio' 8               106       3.8%              AND      = '3837********5f9b' AND        = 1G
Partitioning by UsageRank   Response time    Calls                     Item                                SELECT address FROM orders WHERE date 1     480.9273 26.3% 129258                                BETWEEN ‘2012-11-10’ and ‘2012-11-17’                                SELECT total FROM orders WHERE date 2     322.4220 14.3%   600                                BETWEEN ‘2012-11-01’ and ‘2012-11-30’                                UPDATE order SET active=1 WHERE id = 3       34.8719 4.2% 30220                                17635376                                SELECT dispatch_time FROM order WHERE 4       28.7109 3.8% 51606                                id = 7387612
Testing AssumptionsSELECT uid, name FROM             WHERE       = 1 ORDER BY           DESCG1.30secsSELECT uid, name FROMWHERE         = 1 ORDER BYDESCG0.56secs       Query Digest                                  EXPLAIN
Test Environment• Hardware environment similar to live• Data size similar to live environment:  • Replicating slave   •   Cannot change datatypes on MIXED/ROW       replication   •   Create table2 and run queries against it  • Xtrabackup – full replica                         New Results  • Script with Mysqldump + WHERE   •   mysqldump --databases main --tables table1 table2 –where “date >       now() – interval 30 day” > dump.sql   •   Mysqldump –all-database –ignore-table main.table1 main.table2 >>       dump.sql
Final Tweaking(Remember the table log file – dp_users.log ?)pt-index-usage• pt-index-usage dp_users.log --host  127.0.0.1 --tables dp_users  >idx_dp_users.txt• Go over recommendations                                      Index-Usage                            Test                        Assumptions
Deploy Strategies1. Rolling Servers2. pt-online-schema-change                                Deploy3. Two-part move a. Create new table – table2 b. Insert table rows that will not change – INSERT INTO       table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day; c.    Short downtime d. Rename table1 to table3; rename table2 to table1; e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >=      curdate() – interval 30 day;4. Alter table – long downtime (pre 5.6, maybe)
Continuous Self-Learning
Slow Log                           Filtered byGet Data                  Target Table           Query Digest                                                EXPLAIN                            New Results                                                          Index-Usage                                              Test              Make                                          Assumptions           Assumptions            Deploy
Part Three
Streaming           Reverse         File            Proxy                         VolatileBrowser                                                                    Shield Cache                                         Query Cache                  Page Cache                                                                  Denormalize   2nd Level                      Summary                    Column                       Tables                     Cache   3rd Level                                                         Subtotal                                  Attributes Data Warehouse                                              Conditional
Read Cache• Outside the database  • Page Cache  • Query Cache• Inside the database  • Column Cache  • Summary Table* Complexity
Page Cache• Browser Cache  • Etag, Expires, Last-modified• Reverse Proxy  • Squid, Varnish, Nginx, Apache, Proprietary.• File/Full page cache  • mod_file_cache, Zend_Cache_Backend  • W3 Total Cache, sfSuperCache* Stale
memegenerator.net/instance/23247230
Query Cache• Volatile  • Memcached, Redis, Hibernate Cache, Arrays..  • On-Request, Time-to-Live, Stale and Cache   Stampede• Streaming  • Interval / Async, Stale, Common Queries• Shield – Mongo Shield  • Script/Tool Replication, Dependency  • Aggregation  • Complexity / Layers
Mongo             Shield                                                147cmimg.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg
Sticky          StickySessions                    SessionsMemcached                      Memcached                     Cart      MySQL
Memcached            Sessions   MySQL             Cart
Manipulating Time* Error Handling
Column Cache• Denormalize   • Additional Column(s) to prevent JOINs   • Maintenance, Space on disk   • Example: CustomerID, OrderID, OrderItemID• Sub Total   • Prevent additional slow GROUP BY queries   • Maintenance, Generation, Space on disk   • Example: totalPurchases, moneyOwed* Space vs Speed
Column Cache• Conditional  •   Store conditional (True/False) logic  •   Prevents recalculating result – another query  •   Can prevent rewriting code  •   Example: isDone, hasReview, aboveAvg• Attributes  •   ENUM datatype  •   SET datatype - ARRAY of options  •   Prevents JOINs  •   May save space
Summary TablesAn additional table which consistsof an aggregation of another table orseveral JOIN’d tables.                           Summary                            Tables
SELECT ...FROM main_table t1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table6 t6 on t2.warehouse = t6.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.idWHEREt1.date between '2012-11-01' and '2012-11-30'GROUP BY t1.date
Summary TablesProcessed 1.2million rowsReturned 30 rowsTime 17.52 minutes
Summary TablesCREATE TABLE summary_table(primary key (date,addressid,productid)) asSELECT ...FROM main_table t1INNER JOIN table2 t2 on t1.orderid = t2.idINNER JOIN table3 t3 on t1.customerid =t3.idINNER JOIN table4 t4 on t1.addressid = t4.idINNER JOIN table6 t6 on t2.warehouse = t6.idGROUP BY t1.date, t1.addressid, t1.productid
Summary TablesSELECT ...FROM summary_table t1INNER JOIN table5 t5 on t2.supplierid = t5.idINNER JOIN table2 t7 on t6.addressid = t7.idINNER JOIN table8 t8 on t1.productid = t8.idINNER JOIN table9 t9 on t1.buyerid = t9.idINNER JOIN table10 t10 on t1.officeid = t10.idWHEREt1.date between '2012-11-01' and '2012-11-30‘GROUP BY t1.date
Summary TablesProcessed 35000 rowsReturned 30 rowsTime 0.75 seconds
Summary Tables        as anAnalytics Sub-System
Database Design                       Operational System          Analytic System   Comparison                      Execution of a business   Measurement of aPurpose                      process                   business process                      Insert, Update, Query,Primary Interaction                             Query                      DeleteDesign Optimization Update concurrency          High-performance query                      Entity-relationship (ER) Dimensional design (StarDesign Principle                      3rd Normal form (3NF) schema or cube)     amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/
Data Warehouses   Fact Tables           Dimension Tables• Measurement        •    Context• Narrow             •    Wide• Long               •    Short• Most of the                     •    Filters and                          descriptive  data                    data
Operational Design
Operational DesignCustomers                          Orders                                       Order                                       Items            Addresses                            Products
Star Schema
Star SchemaDate dim                               Address dim                  OrderItems                     Fact Products dim                               Customers dim
Maintenance• Hourly/Daily/Weekly/Monthly Aggregations• Intervals• Off Peak• On-Insert
A lot more                                InnoDB              settings            Read Cache         Buffer Pool                                                   Indexes Sharding                                                     Intensive                           Scaling Reads                        Table                                                             OptimizationGalera               Read                           Partitioning              Slaves                               Better                              Hardware                  Sub Partitioning Another      Read/Write Master        Splitting         IO          Memory
Scaling Reads• InnoDB Buffer Pool  • Cache Warming• Read buffer                        A lot more                                      settings                        InnoDB• Sort Buffer          Buffer Pool• Join Buffer• Temp Table size / on disk
Scaling Reads  • Better Hardware    • Disk I/O    • Memory BetterHardware  IO       Memory
Scaling Reads• Read Slaves  • Read/Write Splitting  • Master/Master  • Galera
Server Architecture                      MySQL
Server Architecture
db3   db1  db2
Columnar                £££                                          A lot more      Store                                                               settings                                                      InnoDB                                                     Buffer Pool                         SummaryHadoop                    Tables                                   Indexes                                                                          Intensive  OLAP                                   Reporting                          Table  Cubes                                                                  Optimization              Sharding                                    Partitioning                             ReportingCross Shard                                       Better                              Slaves   Joins                                         Hardware              Sub Partitioning                Different                Indexes                              IO            Memory
Reporting• Reporting Slaves  • Different Indexes  • No Foreign Keys  • Partitioning  • If ROW-replication:     • Must have same data types                                               Reporting                                                Slaves                                   Different                                   Indexes
Reporting• Sharding  • Cross-shard JOINs  • Go Fish  • Aggregations               Hadoop
Reporting  • Summary tables - aggregations     • Scripts     • Hadoop       • Ready –Made reports       • OLAP Cubes  • Columnar Store     • £££* RDBMS very fast at GROUP BY
Battery                                           A lot more                                                            settings                      RAID                   Innodb Log                    Write-back                 File Size     Local Server                Write Buffers       Storage                                                               MySQL              ETL                                              settings                                        QueueHadoop                                                 Summarized     CRUSH             HandCode                                                   Writes
Write Buffers• RAID card • BBU • Write-back vs write-through • Battery Learning/Drain
Write Buffers• Innodb log file size  • Buffer pool * dirty read (%) * io capacity                                             A lot more                                              settings                               Innodb Log* Virtual Environment            File Size
Write Buffers• innodb-flush-log-at-trx-commit• Sync binlog • Support xa                                   MySQL                                   settings
Write Buffers• ActiveMQ, RabbitMQ, ZeroMQ, Gearman • Not ACID, may need redundancy• Summarized Writes • Memcached counters + interval writes
Write Buffers• Local Storage (web/app servers) • Memcached • SQLite (disk/in-memory) • Log file                      Local Server              <- Most popular      Storage • MySQL• Independent, isolated                   ETL• Need to fetch data • Prevent missed data and duplicates
Write Buffers• Fetching Data • Hand code • ETL tool – Pentaho/Talend • Flume• Aggregation/Processing • Hadoop   <- Very Popular • Google CRUSH
Google CRUSH Toolsgunzip oldlog.log.gz |convdate -f 3 -i "%d/%b/%Y:%H:%M:%S%z" -o "%Y-%m-%d" |reorder -k 3,2 |aggregate -p -k 3,2 -c 1 |csvformat |gzip - > newlog.csv.gz
Innodb Log                                                    File Size                             Write Buffers      Sharding                                                  Indexes                                                             Intensive                            Scaling Writes                                                               Table   Hardware                                                            Optimization                        Remove                                             Partitioning                       BottlenecksOS Settings                                                       Sub Partitioning                               MySQL              Bypass SQL                               Settings                Layer
Scaling Writes• Less Indexes  • Less writes• Partitioning  • Less table maintenance    • Less B-tree levels    • Less need to organize blocks  • Algorithm overhead  • Mutex/Locks
Scaling Writes• Bypass SQL Layer                                SQL Parser • Innodb/Memcached • HandlerSocket                                Optimizer                            Storage Engine                       * Raik
Scaling Writes• IO Scheduler• File System (+ nobarrier, noatime, nodiratime)  • EXT 3  • EXT 4  • XFS  • ZFS• Block Sizes
Scaling Writes• Faster I/O • Faster Disks   • SAS   • SSD • RAID + cache                  Hardware • PCIe SSD   • FusionIO   • Virident
Scaling Writes• Master/Master • Does not scale writes • Writes still need to replicate• Sharding • Does scale writes
App                                   Global Data          Child Data  Function                  DB                                                                       By SchemaProxy                                            Partitioned                   Lookup                           Data                                                                    Shared                                                                    Nothing           Functional                Sharding          Partitioning                                                                            IDGo Fish                  Reporting                  Key                                                                       Area                                                        HashCross Shard              Hadoop
Sharding• Partitioned Data• Splitting the Data  • Vertically    • Main Tables to partition  • Child Tables  • Global Tables• By Schema• Shared Nothing
Sharding• Partitioning by Key • ID – CustomerID, ProductID, App • Area – Country, City, Continent • Hash – Random for equal spread
Sharding• Which shard has the data? • Store it in a DB   • Flexible but slower • Some Function in your App   • Faster, less flexible • Proxy config file   • Faster, less flexible   • Needs some app coding
Sharding• Maintenance • Backups • Slaves• Uptime • Loosely coupled system
Sharding• Functional Partitioning • Different Apps • Share some tables                             Functional                            Partitioning
Sharding• Reporting • Go Fish – One server / Shared nothing • Cross Shard – Many servers • Hadoop – Aggregate to one reporting server
The End
The End• Questions & Answers• Email: contact@jonathanlevin.co.uk• Don’t forget to rate this tutorial
If we have time• MySQL 5.6• NoSQL• ORM• Beyond Hadoop• Bring-Your-Own-Problems

Recommended

PDF
MySQL Performance Schema in Action
PDF
MySQL Query tuning 101
PDF
Understanding Query Execution
PDF
Why Use EXPLAIN FORMAT=JSON?
PDF
PPTX
SQL Tuning, takes 3 to tango
PDF
Introduction into MySQL Query Tuning for Dev[Op]s
PDF
Preparse Query Rewrite Plugins
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
PDF
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
PPTX
Optimizing queries MySQL
PDF
Performance Schema for MySQL Troubleshooting
PDF
PDF
Explaining the MySQL Explain
PDF
Troubleshooting MySQL Performance
PDF
0888 learning-mysql
PDF
New features in Performance Schema 5.7 in action
PDF
Mini Session - Using GDB for Profiling
PDF
Mysql Explain Explained
PDF
Introduction into MySQL Query Tuning
PDF
Moving to the NoSQL side: MySQL JSON functions
PDF
Quick Wins
PDF
Basic MySQL Troubleshooting for Oracle Database Administrators
PDF
Optimizer Histograms: When they Help and When Do Not?
PDF
Troubleshooting MySQL Performance add-ons
PDF
Troubleshooting MySQL Performance
PDF
Performance Schema for MySQL Troubleshooting
PPTX
Oracle JSON internals advanced edition
PDF
Cleanliness is next to Godliness
PDF
Open Source ETL vs Commercial ETL

More Related Content

PDF
MySQL Performance Schema in Action
PDF
MySQL Query tuning 101
PDF
Understanding Query Execution
PDF
Why Use EXPLAIN FORMAT=JSON?
PDF
PPTX
SQL Tuning, takes 3 to tango
PDF
Introduction into MySQL Query Tuning for Dev[Op]s
PDF
Preparse Query Rewrite Plugins
MySQL Performance Schema in Action
MySQL Query tuning 101
Understanding Query Execution
Why Use EXPLAIN FORMAT=JSON?
SQL Tuning, takes 3 to tango
Introduction into MySQL Query Tuning for Dev[Op]s
Preparse Query Rewrite Plugins

What's hot

PDF
Introduction to MySQL Query Tuning for Dev[Op]s
PDF
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
PPTX
Optimizing queries MySQL
PDF
Performance Schema for MySQL Troubleshooting
PDF
PDF
Explaining the MySQL Explain
PDF
Troubleshooting MySQL Performance
PDF
0888 learning-mysql
PDF
New features in Performance Schema 5.7 in action
PDF
Mini Session - Using GDB for Profiling
PDF
Mysql Explain Explained
PDF
Introduction into MySQL Query Tuning
PDF
Moving to the NoSQL side: MySQL JSON functions
PDF
Quick Wins
PDF
Basic MySQL Troubleshooting for Oracle Database Administrators
PDF
Optimizer Histograms: When they Help and When Do Not?
PDF
Troubleshooting MySQL Performance add-ons
PDF
Troubleshooting MySQL Performance
PDF
Performance Schema for MySQL Troubleshooting
PPTX
Oracle JSON internals advanced edition
Introduction to MySQL Query Tuning for Dev[Op]s
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Optimizing queries MySQL
Performance Schema for MySQL Troubleshooting
Explaining the MySQL Explain
Troubleshooting MySQL Performance
0888 learning-mysql
New features in Performance Schema 5.7 in action
Mini Session - Using GDB for Profiling
Mysql Explain Explained
Introduction into MySQL Query Tuning
Moving to the NoSQL side: MySQL JSON functions
Quick Wins
Basic MySQL Troubleshooting for Oracle Database Administrators
Optimizer Histograms: When they Help and When Do Not?
Troubleshooting MySQL Performance add-ons
Troubleshooting MySQL Performance
Performance Schema for MySQL Troubleshooting
Oracle JSON internals advanced edition

Viewers also liked

PDF
Cleanliness is next to Godliness
PDF
Open Source ETL vs Commercial ETL
PDF
Reporting: From MySQL to Hadoop/Hive
PDF
Quick And Easy Guide To Speeding Up MySQL for web developers
PPT
Caching Business Logic in the Database
PDF
How to Monitor MySQL
PDF
ETL tool evaluation criteria
Cleanliness is next to Godliness
Open Source ETL vs Commercial ETL
Reporting: From MySQL to Hadoop/Hive
Quick And Easy Guide To Speeding Up MySQL for web developers
Caching Business Logic in the Database
How to Monitor MySQL
ETL tool evaluation criteria

Similar to Scaling MySQL Strategies for Developers

KEY
10x Performance Improvements
KEY
10x improvement-mysql-100419105218-phpapp02
PDF
query optimization
PDF
Beyond php - it's not (just) about the code
PDF
Zurich2007 MySQL Query Optimization
PDF
Zurich2007 MySQL Query Optimization
PDF
Mysql query optimization
PDF
MySQL Query Optimization (Basics)
ODP
Beyond php - it's not (just) about the code
ODP
Beyond php - it's not (just) about the code
ODP
Beyond php - it's not (just) about the code
PDF
Fosdem 2012 practical_indexing
PDF
Percona Live 2012PPT: MySQL Query optimization
PDF
U C2007 My S Q L Performance Cookbook
PDF
Advance MySQL Training by Pratyush Majumdar
PPTX
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
 
PDF
MySQL Query Optimisation 101
PDF
MariaDB workshop
PPT
15 protips for mysql users pfz
PDF
Covering indexes
10x Performance Improvements
10x improvement-mysql-100419105218-phpapp02
query optimization
Beyond php - it's not (just) about the code
Zurich2007 MySQL Query Optimization
Zurich2007 MySQL Query Optimization
Mysql query optimization
MySQL Query Optimization (Basics)
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Fosdem 2012 practical_indexing
Percona Live 2012PPT: MySQL Query optimization
U C2007 My S Q L Performance Cookbook
Advance MySQL Training by Pratyush Majumdar
Работа с индексами - лучшие практики для MySQL 5.6, Петр Зайцев (Percona)
 
MySQL Query Optimisation 101
MariaDB workshop
15 protips for mysql users pfz
Covering indexes

Recently uploaded

PDF
Day 1 - Cloud Security Strategy and Planning ~ 2nd Sight Lab ~ Cloud Security...
PDF
Digit Expo 2025 - EICC Edinburgh 27th November
PPTX
cybercrime in Information security .pptx
PDF
Decoding the DNA: The Digital Networks Act, the Open Internet, and IP interco...
PDF
Session 1 - Solving Semi-Structured Documents with Document Understanding
PDF
Usage Control for Process Discovery through a Trusted Execution Environment
PDF
Dev Dives: AI that builds with you - UiPath Autopilot for effortless RPA & AP...
PDF
Unser Jahresrückblick – MarvelClient in 2025
PPT
software-security-intro in information security.ppt
PDF
Our Digital Tribe_ Cultivating Connection and Growth in Our Slack Community 🌿...
PDF
Day 2 - Network Security ~ 2nd Sight Lab ~ Cloud Security Class ~ 2020
PPTX
Kanban India 2025 | Daksh Gupta | Modeling the Models, Generative AI & Kanban
PPTX
Software Analysis &Design ethiopia chap-2.pptx
PDF
December Patch Tuesday
 
PDF
The major tech developments for 2026 by Pluralsight, a research and training ...
PPTX
Cloud-and-AI-Platform-FY26-Partner-Playbook.pptx
PPTX
Conversational Agents – Building Intelligent Assistants [Virtual Hands-on Wor...
PDF
Six Shifts For 2026 (And The Next Six Years)
PDF
API-First Architecture in Financial Systems
PDF
Is It Possible to Have Wi-Fi Without an Internet Provider
Day 1 - Cloud Security Strategy and Planning ~ 2nd Sight Lab ~ Cloud Security...
Digit Expo 2025 - EICC Edinburgh 27th November
cybercrime in Information security .pptx
Decoding the DNA: The Digital Networks Act, the Open Internet, and IP interco...
Session 1 - Solving Semi-Structured Documents with Document Understanding
Usage Control for Process Discovery through a Trusted Execution Environment
Dev Dives: AI that builds with you - UiPath Autopilot for effortless RPA & AP...
Unser Jahresrückblick – MarvelClient in 2025
software-security-intro in information security.ppt
Our Digital Tribe_ Cultivating Connection and Growth in Our Slack Community 🌿...
Day 2 - Network Security ~ 2nd Sight Lab ~ Cloud Security Class ~ 2020
Kanban India 2025 | Daksh Gupta | Modeling the Models, Generative AI & Kanban
Software Analysis &Design ethiopia chap-2.pptx
December Patch Tuesday
 
The major tech developments for 2026 by Pluralsight, a research and training ...
Cloud-and-AI-Platform-FY26-Partner-Playbook.pptx
Conversational Agents – Building Intelligent Assistants [Virtual Hands-on Wor...
Six Shifts For 2026 (And The Next Six Years)
API-First Architecture in Financial Systems
Is It Possible to Have Wi-Fi Without an Internet Provider

Scaling MySQL Strategies for Developers

  • 2.
    Who Am I?•Jonathan• MySQL Consultant• Working with MySQL since 2007• Specialize in SQL, Indexing and Reporting (Big Data)
  • 3.
    Who is thisfor?* Smilies indicate ability to control area
  • 5.
    What Will ICover? Domain Knowledge This Much0% 20% 100%
  • 6.
    Solutions this tutorial will coverOccurrences Problems
  • 7.
  • 8.
  • 9.
    What Will ICover?• The top 20% of the strategies to resolve 80% of your performance problems• The strategies that are within reach of developers• Strategies that are more common and more established • From my experience. • To reduce risk
  • 10.
    Table of Contents Part One Part Three• Indexes • Read Cache• Finding • Scaling Reads Bottlenecks • Reporting Part Two • Write Buffers• Partitioning • Scaling Writes• Intensive Table • Sharding Optimization
  • 11.
  • 12.
  • 13.
    Indexes• Advantages •Speed – Use the right path• Now used in NoSQL stores• “A properly indexed database will give you very few problems” – me• My blog – “Indexing and Caching”
  • 14.
  • 15.
  • 18.
    Choosing the bestIndexPrevent Readingfrom Disk *3 *2 Prevent Extra *1 ProcessingPrevent Table Scans
  • 19.
    Indexes• 1 Star– EXPLAIN • Type • const - where id=1 • ref - where location='london' • eq_ref - where t1.id = t2.id • Extra • using where • Limitation – type • range - where id in (1,2,3,4,5)
  • 20.
    Indexes• 2 Star– EXPLAIN • Extra • using where • Using index • And Not • Using filesort • Using temporary • Limitation • Using temporary - query contains different GROUP BY and ORDER BY columns
  • 21.
    Indexes• 3 Star– EXPLAIN • Type • index • Extra • Using index • Using index for group-by
  • 22.
  • 23.
    Regular UsageSELECT ,FROM ₸ WHERE = 121; (PRIMARY KEY )
  • 24.
    Range ScanSELECT ▲, ■ FROM ₸ WHERE = 121AND BETWEEN 1 AND 100 KEY ( , )
  • 25.
    Range ScanSELECT ▲,■ FROM ₸ WHERE = 121AND IN (1,100,30,7) KEY ( , )
  • 26.
    Covering IndexSELECT ,FROM ₸ WHERE = 121 KEY ( , , )
  • 27.
    Not OptimalSELECT FROM ₸ WHERE = 121AND ♪ IN (1,100,30,7) KEY ( , )
  • 28.
    Broken RangeSELECT ▲,■ FROM ₸ WHERE = 121AND IN (1,100,30,7); KEY ( , )
  • 29.
    Sub Queries SELECT ▲, ■ FROM ₸WHERE IN (SELECT ♪ FROM ♠) KEY ( )
  • 30.
    Indexes for SortingSELECT☼, ☺ FROM ₸ WHERE = 121 GROUP BY ORDER BY KEY ( )
  • 31.
    Indexes for SortingSELECT☼, ☺ FROM ₸ WHERE = 121 GROUP BY ORDER BY KEY ( , ) or KEY ( , )
  • 32.
    Indexes for JoinsSELECT . , . FROMINNER JOIN ON . = . WHERE . = 232 KEY ( ) or KEY( )
  • 33.
    WHERE ₸.Ω =232; INNER JOIN ♫
  • 34.
    FULL SCAN ♫ INNER JOIN ₸ FILTER Ω = 232
  • 35.
    SELECT ▲, (SELECT.. FROM WHERE ) FROM ₸ WHERE ♪ IN (SELECT ♪ FROM ♠ WHERE..); “I need help optimizing the my.cnf”
  • 36.
    Clustered PK and Secondary Indexes KEY ( , , ) PRIMARY KEY ( ) KEY ( , ) (Can be used in GROUP / ORDER BY SELECT variables to make Covering index)http://www.dbasquare.com/2012/05/17/can-mysql-use-primary-key-values-from-a-secondary-index/
  • 37.
    Index Merge SELECT☺ FROM ₸WHERE =1 OR =2; KEY ( ) KEY ( )
  • 38.
    Index Merge SELECT☺FROM ₸ WHERE =1UNION (SELECT☺FROM ₸ WHERE =2) KEY ( ) KEY ( )* 5.6
  • 40.
  • 41.
    * **
  • 43.
    Gathering DataWhat youwill need:1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space)
  • 44.
    Log ProcessingWorst ResponseQueries1. Echo ‘’ > slow.log2. mysql> set global long_query_time=0; set long_query_time = 0; flush logs;3. Wait X hours and return original value.4. pt-query-digest slow.log > slow.txt Processing should beBulky Queries done on another host--filter ‘($event->{Rows_examined} > 1000)’Write Queries--filter '($event->{Rows_affected} > 0)‘
  • 45.
    Log Processing• MySQL5.6 • Statement Digest • No need for log processing to get Digest.
  • 46.
    Worst Response Queries pt-query-digest slow.log > slow.txtRank Response time Calls R/Call Item SELECT dp_node dp_usernode 1 480.9273 16.3% 600 0.8015 dp_buddylist dp_users dp_node_access 2 322.4220 4.3% 129258 0.0025 ADMIN INIT DB 3 314.8719 4.2% 30220 0.0104 UPDATE dp_users 4 287.7109 3.8% 51606 0.0056 SET SELECT dp_node dp_usernode 5 269.3434 3.6% 600 0.4489 dp_buddylist dp_users dp_node_access 6 238.8571 3.1% 2902141 0.0001 SELECT dp_url_alias
  • 47.
    * Gap Locking * pt-stalk
  • 48.
    mysql tables inuse 4, locked 25289 lock struct(s), heap size 620984, 273785 row lock(s), undo log entries363312MySQL thread id 467, OS thread handle 0x7fceab7df700, query id 88914423 SELECT r.exchange_rate INTO destination_exchange_rate FROM exchange_rate AS r WHERE r.currency_id = NAME_CONST('destination_currency_id',6) AND r.date = NAME_CONST('day',_latin1'2012-06-30' COLLATE 'latin1_swedish_ci')Trx read view will not see trx with id >= 10ECC92B0, sees < 10ECC916FTABLE LOCK table `currency` trx id 10ECC90CC lock mode ISRECORD LOCKS space id 0 page no 261 n bits 80 index `fk_currency_status1` oftable `currency` trx id 10ECC90CC lock mode STABLE LOCK table `daily_summary` trx id 10ECC90CC lock mode ISRECORD LOCKS space id 0 page no 34829580 n bits 200 index `PRIMARY` of table`daily_summary` trx id 10ECC90CC lock mode STABLE LOCK table `exchange_rate` trx id 10ECC90CC lock mode ISTOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS
  • 49.
    *** (1) TRANSACTION:TRANSACTION13DCDF4D9, ACTIVE 0 sec starting index readmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1248, 2 row lock(s)MySQL thread id 2438176, OS thread handle 0x7f9a37408700, query id118341815748UPDATE sys_doctrine_lock_trackingSET timestamp_obtained = '1341839053' WHERE object_key = '1146' AND user_ident = '158' AND c_type = '137'*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 0 page no 48627 n bits 280 index `PRIMARY` of table`sys_doctrine_lock_tracking` trx id 13DCDF4D9 lock_mode X locks rec but notgap waitingRecord lock, heap no 207 PHYSICAL RECORD: n_fields 6; compact format; 0: len 8; hex 43616d706169676e; 1: len 4; hex 31313436; asc 1146;; 2: len 6; hex 00013dc12d7a; asc = -z;; …
  • 50.
    Bottlenecks• Locking Queries • Try to make them complete as fast as possible • JOINs vs sub query • Function wrapped around index • UDF with SQL inside • Long Transactions • Looping with short queries
  • 51.
    Misbehaving Optimizer• OptimizerHints: • USE INDEX • FORCE INDEX • IGNORE INDEX • STRIGHT_JOIN• Joins • LEFT sometimes faster then INNER* Too many indexes confuse the optimizer
  • 53.
    Bottlenecks• Virtualization• Increase (obscenely) innodb_log_file_size • Needs restart + deleting old log files • EXT3• General I/O improvements • innodb-flush-log-at-trx-commit • Sync binlog Group Commit • Xa support
  • 54.
    Bottlenecks• General I/Oimprovements • Percona server • Better flushing to disk • Less mutexes • Upgrade MySQL • Same reasons as above • Innodb-io-capacity http://www.wmarow.com/strcalc/
  • 55.
    Database Upgrades• MySecret Sauce for smooth MySQL Migrations 1. Upgrade the dev/staging DBs with desired version 2. Wait 1-2 months till silky-smooth • All features have been tested • All the query issues have been fixed 3. Upgrade servers • Down to Up – Slaves first
  • 56.
    Bottlenecks• Mutexes •Query Cache • “Freeing items” in processlist• Network Batch Processes • Skip-name-resolve • net_write_timeout / read_timeout • thread_cache_size Lots of connections
  • 57.
    Homework• Haven’t talkedfully about: SQL and EXPLAIN• Webinars • Indexes - percona.tv/percona-webinars/tools-and- techniques-for-index-design • Explain - percona.tv/percona-webinars/explain-demystified• Websites • http://www.myxplain.net
  • 58.
  • 59.
  • 60.
    ColumnsRows Big Table
  • 61.
    Columns Partition Partition Partition PartitionAlgorithm Partition Rows Partition Partition Partition Partition
  • 62.
    Select Algorithm Inserting Parallelize Data Unique Key Algorithm Reduce DataForeign Keys Issues Use Cases Primary Key Overhead Partitioning ID Benefits Automatic TimeB-TreeLevels Manual Hash Short Scans Time 80-90% Archive DB Usage Shards
  • 63.
    Partitioning Parallelize Data Reduce Data• Use Cases Use Cases • Reducing Data – Only get the partition/table that you need • Parallelizing Data - Get an equal amount of data from each partition in parallel• Benefits Benefits • Shorter table scans B-Tree • Less levels for index scans Levels Short Scans
  • 64.
    Issues• Algorithm Select Inserting Algorithm • INSERTs Algorithm • SELECTs Issues• Keys Unique Key • Foreign Key Foreign Keys • Unique Key Primary Key Overhead • Primary Key Overhead • Increases Table size • Can change indexes
  • 65.
    Partition Types• Range•List ID• Hash Automatic Time• Key• Columns Hash• Sub-partitioning
  • 66.
    Partitioning by Usage vs Partitioning by MaintenanceRank Response time Calls Item SELECT address FROM WHERE 1 480.9273 26.3% 129258 BETWEEN ‘2012-11-10’ and ‘2012-11- 17’ SELECT total FROM WHERE 2 322.4220 14.3% 600 BETWEEN ‘2012-11-01’ and ‘2012-11-30’ UPDATE SET active=1 WHERE = 3 34.8719 4.2% 30220 17635376 SELECT dispatch_time FROM 4 28.7109 3.8% 51606 WHERE = 7387612
  • 67.
    CREATE TABLE orders id int unsigned not null auto_increment, `date` date not null, … Can also doPRIMARY KEY (date, id), PRIMARY KEY (id, date)KEY id (id), KEY date (date)..) ENGINE=InnoDB DEFAULT CHARSET=utf8PARTITION BY ( (date))(PARTITION VALUES LESS THAN ( (‘2012-01-01’),PARTITION VALUES LESS THAN ( (‘2012-06-01’)),PARTITION VALUES LESS THAN )
  • 68.
    Manual Partitioning• Archive- Main table & Archive Table• Time – Create table per year, per month..• Shards – Table per country * Foreign keys Manual Time Archive Shards
  • 70.
  • 71.
    Intensive Table Optimization Once upon a time, I was researchingways to make a database working set fit asmuch as possible to memory…
  • 72.
  • 73.
    Intensive Table Optimization1.People are usually very liberal with data type sizes2. There were (usually) so many indexes that: 1. They multiplied the table size 2. Were not efficient compares to how the table is used 3. Confused the optimizer3. Discovered partitions were not used or misused 1. Discovered sub partitions 2. Primary Key alignment4. Discovered InnoDB compression
  • 74.
    Intensive Table OptimizationThereare tools to help with this, but…
  • 75.
  • 76.
    Intensive Table Optimization Data Types Slowest Queries BottleneckUser Statistics Optimizations Tables Optional Table Sizes Query Logs Partitions Indexes Foreign Keys
  • 77.
    Gathering DataWhat youwill need:1. MySQL Slow log: MySQL >= 5.1 or Percona/MariaDB microslow patch2. Set long_query_time = 0 - for 6 to 24 hours for decent sized slow log. (Make sure host has enough space) Slowest Queries Bottleneck Tables
  • 78.
    Gathering DataHelpful (Optional):1.Percona/Mariadb user_statistics patch2. Get list of most read/written tables3. Get list of used and un-used indexes4. List of largest tables Table Sizes User Statistics Bottleneck Tables
  • 79.
    Worst Response Queries pt-query-digest slow.log > slow.txtRank Response time Calls Item 1 8589.9513 27.5% 231051 UPDATE dp_users 2 4752.6688 15.2% 257235 SELECT dp_cache_menu 3 1606.4946 5.1% 183542 SELECT community_chats 4 1418.9034 4.5% 259939 SELECT dp_cache 5 564.3305 1.8% 7970165 SELECT dp_url_alias 6 495.0092 1.6% 44940 SELECT dp_event dp_node
  • 80.
    Table StatisticsSELECT table_name,FROMinformation_schema.table_statisticsORDER BY DESC LIMIT 5; table_name rows_read dp_users 2302477894ROWS_CHANGED dp_node 1231318439ROWS_CHANGED_X_INDEXES dp_comments 1071462211 dp_userpoints 1033073070 dp_search_index 260154684
  • 81.
    Table Statistics Worst Response Tables --group-by tablesRank Response time Calls Item 1 7975.4487 6.5% 124384 advertisement 2 5554.1435 4.5% 1834 info 3 4915.4816 4.0% 208 placement 4 4902.7644 4.0% 158 advert_summary
  • 82.
    Table Sizes Table_Name Rows Data Idx Total_size Idxfractotal_daily_summary 610M 77G 88G 165G 1.15advert_summary 478M 57G 45G 102G 0.78log_messages 92M 47G 10G 57G 0.21SELECT CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_NAME, CONCAT(ROUND(TABLE_ROWS / 1000000, 2), 'M') ROWS, CONCAT(ROUND(DATA_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') DATA, CONCAT(ROUND(INDEX_LENGTH / ( 1024 * 1024 * 1024 ), 2), 'G') IDX, CONCAT(ROUND(( DATA_LENGTH + INDEX_LENGTH ) / ( 1024 * 1024 * 1024 ), 2), 'G') TOTAL_SIZE, ROUND(INDEX_LENGTH / DATA_LENGTH, 2) IDXFRACFROM INFORMATION_SCHEMA.TABLESORDER BY DATA_LENGTH + INDEX_LENGTH DESC http://www.mysqlperformanceblog.comLIMIT 10; /2008/03/17/researching-your-mysql-table-sizes/
  • 83.
    Which table needs your attention?Table Size
  • 84.
    Intensive Table Optimization•Table Targeting • The most “worthy” table to focus your attention on • Biggest bang for your buck• If you know which table is the most troublesome • Ignore most of the investigations • Apart from slow log • Investigations help understand DB usage
  • 85.
    Optimizations Compression Data Types Query Logs Partitions IndexesSub Partitioning Don’t Need Foreign Keys
  • 86.
    Intensive Table Optimization•Datatypes • SELECT * FROM table G • Example: Tinyint instead of Bigint: (7 bytes row + 7bytes index) * 350million rows = 4.9Gb • Enum instead of Varchar • Remove NULLs when not needed
  • 87.
    Intensive Table Optimization•Compression • Best for tables with a lot of varchar/text • Compress table by x2, x4, x8.. • Need to experiment with innodb_strict = on; • On my tests (5.5) – Very very slow • Alter tables • INSERTS/UPDATES/DELETES Optimizations Compression
  • 88.
    1. Slow Log Filtered by Get Data Target Table Query Digest New Results EXPLAIN 3. 2. Index-Usage Make Test Assumptions Assumptions 4. Deploy
  • 89.
    Target Table ProcessingFilterLog:pt-query-digest slow.log--filter '$event->{arg} =~ m/dp_users /'--no-report --print >dp_users.logWorst Queries from new log:pt-query-digest dp_users.log --limit 100%>tbl_dp_users.txt
  • 90.
    ResponseRanktime Calls Item 209.2863 UPDATE dp_users SET access = 133******3 WHERE 1 88850 10.7% = 23****01G 162.2711 3 1309010 SELECT access FROM dp_users WHERE = 21***4G 8.3% 139.9009 SELECT uid, name FROM dp_users WHERE =1 4 197 7.1% ORDER BY DESCG 133.8691 SELECT * FROM dp_users u WHERE = 5 327 6.8% 's******s'G SELECT name, created, picture FROM dp_users WHERE 109.6903 6 29152 picture !='' AND = '1' AND BETWEEN 5.6% '133*****0' AND '133*****60'G 92.9095 SELECT dp_node dp_users using ( ) 7 360642 4.7% dp_node_revisions 74.2426 SELECT * FROM dp_users u WHERE = hoa****rio' 8 106 3.8% AND = '3837********5f9b' AND = 1G
  • 91.
    Partitioning by UsageRank Response time Calls Item SELECT address FROM orders WHERE date 1 480.9273 26.3% 129258 BETWEEN ‘2012-11-10’ and ‘2012-11-17’ SELECT total FROM orders WHERE date 2 322.4220 14.3% 600 BETWEEN ‘2012-11-01’ and ‘2012-11-30’ UPDATE order SET active=1 WHERE id = 3 34.8719 4.2% 30220 17635376 SELECT dispatch_time FROM order WHERE 4 28.7109 3.8% 51606 id = 7387612
  • 92.
    Testing AssumptionsSELECT uid,name FROM WHERE = 1 ORDER BY DESCG1.30secsSELECT uid, name FROMWHERE = 1 ORDER BYDESCG0.56secs Query Digest EXPLAIN
  • 93.
    Test Environment• Hardwareenvironment similar to live• Data size similar to live environment: • Replicating slave • Cannot change datatypes on MIXED/ROW replication • Create table2 and run queries against it • Xtrabackup – full replica New Results • Script with Mysqldump + WHERE • mysqldump --databases main --tables table1 table2 –where “date > now() – interval 30 day” > dump.sql • Mysqldump –all-database –ignore-table main.table1 main.table2 >> dump.sql
  • 94.
    Final Tweaking(Remember thetable log file – dp_users.log ?)pt-index-usage• pt-index-usage dp_users.log --host 127.0.0.1 --tables dp_users >idx_dp_users.txt• Go over recommendations Index-Usage Test Assumptions
  • 95.
    Deploy Strategies1. RollingServers2. pt-online-schema-change Deploy3. Two-part move a. Create new table – table2 b. Insert table rows that will not change – INSERT INTO table2 SELECT * FROM table1 WHERE date <= curdate() – interval 30 day; c. Short downtime d. Rename table1 to table3; rename table2 to table1; e. INSERT IGNORE INTO table1 SELECT * FROM table3 WHERE date >= curdate() – interval 30 day;4. Alter table – long downtime (pre 5.6, maybe)
  • 96.
  • 97.
    Slow Log Filtered byGet Data Target Table Query Digest EXPLAIN New Results Index-Usage Test Make Assumptions Assumptions Deploy
  • 98.
  • 99.
    Streaming Reverse File Proxy VolatileBrowser Shield Cache Query Cache Page Cache Denormalize 2nd Level Summary Column Tables Cache 3rd Level Subtotal Attributes Data Warehouse Conditional
  • 100.
    Read Cache• Outsidethe database • Page Cache • Query Cache• Inside the database • Column Cache • Summary Table* Complexity
  • 101.
    Page Cache• BrowserCache • Etag, Expires, Last-modified• Reverse Proxy • Squid, Varnish, Nginx, Apache, Proprietary.• File/Full page cache • mod_file_cache, Zend_Cache_Backend • W3 Total Cache, sfSuperCache* Stale
  • 102.
  • 103.
    Query Cache• Volatile • Memcached, Redis, Hibernate Cache, Arrays.. • On-Request, Time-to-Live, Stale and Cache Stampede• Streaming • Interval / Async, Stale, Common Queries• Shield – Mongo Shield • Script/Tool Replication, Dependency • Aggregation • Complexity / Layers
  • 104.
    Mongo Shield 147cmimg.photobucket.com/albums/v158/keris_hanuman/Afbeelding1455.jpg
  • 106.
    Sticky StickySessions SessionsMemcached Memcached Cart MySQL
  • 107.
    Memcached Sessions MySQL Cart
  • 108.
  • 109.
    Column Cache• Denormalize • Additional Column(s) to prevent JOINs • Maintenance, Space on disk • Example: CustomerID, OrderID, OrderItemID• Sub Total • Prevent additional slow GROUP BY queries • Maintenance, Generation, Space on disk • Example: totalPurchases, moneyOwed* Space vs Speed
  • 110.
    Column Cache• Conditional • Store conditional (True/False) logic • Prevents recalculating result – another query • Can prevent rewriting code • Example: isDone, hasReview, aboveAvg• Attributes • ENUM datatype • SET datatype - ARRAY of options • Prevents JOINs • May save space
  • 111.
    Summary TablesAn additionaltable which consistsof an aggregation of another table orseveral JOIN’d tables. Summary Tables
  • 112.
    SELECT ...FROM main_tablet1 INNER JOIN table2 t2 on t1.orderid = t2.id INNER JOIN table3 t3 on t1.customerid = t3.id INNER JOIN table4 t4 on t1.addressid = t4.id INNER JOIN table5 t5 on t2.supplierid = t5.id INNER JOIN table6 t6 on t2.warehouse = t6.id INNER JOIN table2 t7 on t6.addressid = t7.id INNER JOIN table8 t8 on t1.productid = t8.id INNER JOIN table9 t9 on t1.buyerid = t9.id INNER JOIN table10 t10 on t1.officeid = t10.idWHEREt1.date between '2012-11-01' and '2012-11-30'GROUP BY t1.date
  • 113.
    Summary TablesProcessed 1.2millionrowsReturned 30 rowsTime 17.52 minutes
  • 114.
    Summary TablesCREATE TABLEsummary_table(primary key (date,addressid,productid)) asSELECT ...FROM main_table t1INNER JOIN table2 t2 on t1.orderid = t2.idINNER JOIN table3 t3 on t1.customerid =t3.idINNER JOIN table4 t4 on t1.addressid = t4.idINNER JOIN table6 t6 on t2.warehouse = t6.idGROUP BY t1.date, t1.addressid, t1.productid
  • 115.
    Summary TablesSELECT ...FROMsummary_table t1INNER JOIN table5 t5 on t2.supplierid = t5.idINNER JOIN table2 t7 on t6.addressid = t7.idINNER JOIN table8 t8 on t1.productid = t8.idINNER JOIN table9 t9 on t1.buyerid = t9.idINNER JOIN table10 t10 on t1.officeid = t10.idWHEREt1.date between '2012-11-01' and '2012-11-30‘GROUP BY t1.date
  • 116.
    Summary TablesProcessed 35000rowsReturned 30 rowsTime 0.75 seconds
  • 118.
    Summary Tables as anAnalytics Sub-System
  • 119.
    Database Design Operational System Analytic System Comparison Execution of a business Measurement of aPurpose process business process Insert, Update, Query,Primary Interaction Query DeleteDesign Optimization Update concurrency High-performance query Entity-relationship (ER) Dimensional design (StarDesign Principle 3rd Normal form (3NF) schema or cube) amazon.co.uk/Schema-Complete-Reference-Christopher-Adamson/
  • 120.
    Data Warehouses Fact Tables Dimension Tables• Measurement • Context• Narrow • Wide• Long • Short• Most of the • Filters and descriptive data data
  • 121.
  • 122.
    Operational DesignCustomers Orders Order Items Addresses Products
  • 123.
  • 124.
    Star SchemaDate dim Address dim OrderItems Fact Products dim Customers dim
  • 125.
  • 126.
    A lot more InnoDB settings Read Cache Buffer Pool Indexes Sharding Intensive Scaling Reads Table OptimizationGalera Read Partitioning Slaves Better Hardware Sub Partitioning Another Read/Write Master Splitting IO Memory
  • 127.
    Scaling Reads• InnoDBBuffer Pool • Cache Warming• Read buffer A lot more settings InnoDB• Sort Buffer Buffer Pool• Join Buffer• Temp Table size / on disk
  • 128.
    Scaling Reads• Better Hardware • Disk I/O • Memory BetterHardware IO Memory
  • 129.
    Scaling Reads• ReadSlaves • Read/Write Splitting • Master/Master • Galera
  • 130.
  • 131.
  • 135.
    db3db1 db2
  • 136.
    Columnar £££ A lot more Store settings InnoDB Buffer Pool SummaryHadoop Tables Indexes Intensive OLAP Reporting Table Cubes Optimization Sharding Partitioning ReportingCross Shard Better Slaves Joins Hardware Sub Partitioning Different Indexes IO Memory
  • 137.
    Reporting• Reporting Slaves • Different Indexes • No Foreign Keys • Partitioning • If ROW-replication: • Must have same data types Reporting Slaves Different Indexes
  • 138.
    Reporting• Sharding• Cross-shard JOINs • Go Fish • Aggregations Hadoop
  • 139.
    Reporting •Summary tables - aggregations • Scripts • Hadoop • Ready –Made reports • OLAP Cubes • Columnar Store • £££* RDBMS very fast at GROUP BY
  • 140.
    Battery A lot more settings RAID Innodb Log Write-back File Size Local Server Write Buffers Storage MySQL ETL settings QueueHadoop Summarized CRUSH HandCode Writes
  • 141.
    Write Buffers• RAIDcard • BBU • Write-back vs write-through • Battery Learning/Drain
  • 142.
    Write Buffers• Innodblog file size • Buffer pool * dirty read (%) * io capacity A lot more settings Innodb Log* Virtual Environment File Size
  • 143.
    Write Buffers• innodb-flush-log-at-trx-commit•Sync binlog • Support xa MySQL settings
  • 144.
    Write Buffers• ActiveMQ,RabbitMQ, ZeroMQ, Gearman • Not ACID, may need redundancy• Summarized Writes • Memcached counters + interval writes
  • 145.
    Write Buffers• LocalStorage (web/app servers) • Memcached • SQLite (disk/in-memory) • Log file Local Server <- Most popular Storage • MySQL• Independent, isolated ETL• Need to fetch data • Prevent missed data and duplicates
  • 146.
    Write Buffers• FetchingData • Hand code • ETL tool – Pentaho/Talend • Flume• Aggregation/Processing • Hadoop <- Very Popular • Google CRUSH
  • 147.
    Google CRUSH Toolsgunzipoldlog.log.gz |convdate -f 3 -i "%d/%b/%Y:%H:%M:%S%z" -o "%Y-%m-%d" |reorder -k 3,2 |aggregate -p -k 3,2 -c 1 |csvformat |gzip - > newlog.csv.gz
  • 148.
    Innodb Log File Size Write Buffers Sharding Indexes Intensive Scaling Writes Table Hardware Optimization Remove Partitioning BottlenecksOS Settings Sub Partitioning MySQL Bypass SQL Settings Layer
  • 149.
    Scaling Writes• LessIndexes • Less writes• Partitioning • Less table maintenance • Less B-tree levels • Less need to organize blocks • Algorithm overhead • Mutex/Locks
  • 150.
    Scaling Writes• BypassSQL Layer SQL Parser • Innodb/Memcached • HandlerSocket Optimizer Storage Engine * Raik
  • 151.
    Scaling Writes• IOScheduler• File System (+ nobarrier, noatime, nodiratime) • EXT 3 • EXT 4 • XFS • ZFS• Block Sizes
  • 152.
    Scaling Writes• FasterI/O • Faster Disks • SAS • SSD • RAID + cache Hardware • PCIe SSD • FusionIO • Virident
  • 153.
    Scaling Writes• Master/Master• Does not scale writes • Writes still need to replicate• Sharding • Does scale writes
  • 154.
    App Global Data Child Data Function DB By SchemaProxy Partitioned Lookup Data Shared Nothing Functional Sharding Partitioning IDGo Fish Reporting Key Area HashCross Shard Hadoop
  • 155.
    Sharding• Partitioned Data•Splitting the Data • Vertically • Main Tables to partition • Child Tables • Global Tables• By Schema• Shared Nothing
  • 156.
    Sharding• Partitioning byKey • ID – CustomerID, ProductID, App • Area – Country, City, Continent • Hash – Random for equal spread
  • 157.
    Sharding• Which shardhas the data? • Store it in a DB • Flexible but slower • Some Function in your App • Faster, less flexible • Proxy config file • Faster, less flexible • Needs some app coding
  • 158.
    Sharding• Maintenance •Backups • Slaves• Uptime • Loosely coupled system
  • 159.
    Sharding• Functional Partitioning• Different Apps • Share some tables Functional Partitioning
  • 160.
    Sharding• Reporting •Go Fish – One server / Shared nothing • Cross Shard – Many servers • Hadoop – Aggregate to one reporting server
  • 161.
  • 164.
    The End• Questions& Answers• Email: contact@jonathanlevin.co.uk• Don’t forget to rate this tutorial
  • 165.
    If we havetime• MySQL 5.6• NoSQL• ORM• Beyond Hadoop• Bring-Your-Own-Problems

[8]ページ先頭

©2009-2025 Movatter.jp