MySQL Blog Archive
For the latest blogs go toblogs.oracle.com/mysql
MySQL 8.0 Labs: JSON aggregation functions
Posted on Oct 6, 2016 byCatalin Besleaga
Category:Document Store, JSON
Tags:json

In MySQL 5.7 we introduced JSON functionality into the MySQL Server. This work included the introduction of a JSON data type, virtual columns and a set of approximately 20 SQL functions that allow you to manipulate and search JSON data on the server side.

The JSON functionality has been well received, and in MySQL 8.0 we have plans to improve it in a number of ways. This post outlines improvements to the SQL functions with the addition of aggregate functions.

TL;DR

Starting with MySQL 8.0 (lab release)* two  new aggregation functions were added and can be used to combine data into JSON arrays/objects:

  • JSON_ARRAYAGG()
  • JSON_OBJECTAGG()

*(To download MySQL 8.0-labs release go tothis link and choose “MySQL Server 8.0.0 Optimizer)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
mysql>CREATETABLE`t1`(
    ->  `key`  varchar(8)DEFAULTNULL,
    ->  `grp`varchar(8)DEFAULTNULL,
    ->  `val`varchar(8)
    ->)ENGINE=InnoDBDEFAULTCHARSET=latin1;
QueryOK,0rowsaffected(0,01sec)
 
mysql>
mysql>INSERTINTOt1(`key`,`grp`,`val`)VALUES
    ->("key1","g1","v1"),
    ->("key2","g1","v2"),
    ->("key3","g2","v3");
QueryOK,3rowsaffected(0,01sec)
Records:3  Duplicates:0  Warnings:0
 
mysql>
mysql>SELECTJSON_ARRAYAGG(`key`)AS`keys`FROMt1;
+--------------------------+
|keys                    |
+--------------------------+
|["key1",
    "key2",
    "key3"]
|
+--------------------------+
1rowinset(0,00sec)
 
mysql>SELECTgrp,JSON_ARRAYAGG(`key`)AS`keys_grouped`FROMt1  GROUPBYgrp;
+------+------------------+
|grp  |keys_grouped    |
+------+------------------+
|g1  |["key1","key2"]|
|g2  |["key3"]        |
+------+------------------+
2rowsinset(0,00sec)
 
mysql>SELECTJSON_OBJECTAGG(`key`,val)AS`key_val`FROMt1;
+--------------------------------------------+
|key_val                                    |
+--------------------------------------------+
|{"key1":"v1",
    "key2":"v2",
    "key3":"v3"}
|
+--------------------------------------------+
1rowinset(0,00sec)
 
mysql>SELECTgrp,JSON_OBJECTAGG(`key`,val)AS`key_val_grouped`FROMt1GROUPBYgrp;
+------+------------------------------+
|grp  |key_val_grouped              |
+------+------------------------------+
|g1  |{"key1":"v1","key2":"v2"}|
|g2  |{"key3":"v3"}              |
+------+------------------------------+
2rowsinset(0,00sec)

Now for the less impatient:

Let’s think about this scenario: you have a database which contains both structured and semi-structured data and you’ve decided to adopt the EAV model (Entity–Attribute–Value). The tables will look more or less like this:

You have aproduct table which contains the common attributes:

1
2
3
4
5
6
7
CREATETABLE`product`(
  `id`int(11)NOTNULLAUTO_INCREMENT,
  `name`varchar(120)DEFAULTNULL,
  `manufacturer`varchar(120)DEFAULTNULL,
  `price`int(11)DEFAULTNULL,
  PRIMARYKEY(`id`)
)ENGINE=InnoDBDEFAULTCHARSET=latin1;

Then you have theattribute table containing all the non-common attributes that a product might have:

1
2
3
4
5
6
CREATETABLE`attribute`(
  `id`int(11)NOTNULLAUTO_INCREMENT,
  `name`varchar(120)DEFAULTNULL,
  `description`varchar(256)DEFAULTNULL,
  PRIMARYKEY(`id`)
)ENGINE=InnoDBDEFAULTCHARSET=latin1;

With some possible entries:

1
2
3
4
5
6
7
8
9
10
11
INSERTINTOattribute(id,name)VALUES
(1,"color"),
(2,"material"),
(3,"style"),
(4,"bulb_type"),
(5,"usage"),
(6,"cpu_type"),
(7,"cpu_speed"),
(8,"weight"),
(9,"battery_life"),
(10,"fuel_type");

And finally thevalue table which combines the product key, the attribute key with the actual value.

1
2
3
4
5
6
CREATETABLE`value`(
  `prod_id`int(11)NOTNULL,
  `attribute_id`int(11)NOTNULL,
  `value`text,
  PRIMARYKEY(`prod_id`,`attribute_id`)
)ENGINE=InnoDBDEFAULTCHARSET=latin1;

Now let’s insert a few products and their attributes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
INSERTINTOproduct(id,name,manufacturer,price)VALUES
(1,"LED Desk Lamp","X",26);
 
INSERTINTOvalueVALUES
(1,1,"black"),
(1,2,"plastic"),
(1,3,"classic"),
(1,4,"LED"),
(1,5,"Indoor use only");
 
INSERTINTOproduct(id,name,manufacturer,price)VALUES
(2,"Laptop","Y",800);
 
INSERTINTOvalueVALUES
(2,1,"blue"),
(2,6,"quad core"),
(2,7,"3400 mhz"),
(2,8,"2,1 kg"),
(2,9,"9h");
 
INSERTINTOproduct(id,name,manufacturer,price)VALUES
(3,"Grill","Z",300);
 
INSERTINTOvalueVALUES
(3,1,"black"),
(3,8,"5 kg"),
(3,10,"gas");

If you need to select complete products that combines all the attribute keys and values as JSONobject, though combining the structured data (in product) and the semi-structured data (in attribute and in value):  you can use the JSON_OBJECTAGGaggregation function:

MySQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
SELECT
JSON_OBJECT("key",p.id,
            "title",p.name,
            "manufacturer",p.manufacturer,
            "price",p.price,
            "specifications",JSON_OBJECTAGG(a.name,v.value))asproduct
FROMproductaspJOINvalueasvONp.id=v.prod_id
    JOINattributeasaONa.id=v.attribute_id
GROUPBYv.prod_id;
 
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|product                                                                                                                                                                                      |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{
  "key":1,
  "price":26,
  "title":"LED Desk Lamp",
  "manufacturer":"X",
  "specifications":{
    "color":"black",
    "style":"classic",
    "usage":"Indoor use only",
    "material":"plastic",
    "bulb_type":"LED"
  }
}|
|{
  "key":2,
  "price":800,
  "title":"Laptop",
  "manufacturer":"Y",
  "specifications":{
    "color":"blue",
    "weight":"2,1 kg",
    "cpu_type":"quad core",
    "cpu_speed":"3400 mhz",
    "battery_life":"9h"
  }
}|
|{
  "key":3,
  "price":300,
  "title":"Grill",
  "manufacturer":"Z",
  "specifications":{
    "color":"black",
    "weight":"5 kg",
    "fuel_type":"gas"
  }
}|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3rowsinset(0,01sec)

To select anarray containing all the possible attribute keys for each product JSON_ARRAYAGG  can be used:

MySQL
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECTp.id,JSON_ARRAYAGG(a.name)asproduct_attributes
FROMproductaspJOINvalueasvONp.id=v.prod_id
    JOINattributeasaONa.id=v.attribute_id
GROUPBYv.prod_id;
 
+----+--------------------------------------------------------------+
|id|product_attributes                                          |
+----+--------------------------------------------------------------+
|  1|["color","style","usage","material","bulb_type"]        |
|  2|["cpu_type","weight","color","cpu_speed","battery_life"]|
|  3|["color","fuel_type","weight"]                            |
+----+--------------------------------------------------------------+
3rowsinset(0,01sec)

The functions can also be used to help you migrate to a new schema having the semi-structured data stored in JSON columns:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATETABLEcomplete_productAS
(SELECTp.id,p.name,p.manufacturer,p.price,
        JSON_OBJECTAGG(a.name,v.value)assemi_structured_data
FROMproductaspJOINvalueasvONp.id=v.prod_id
      JOINattributeasaONa.id=v.attribute_id
GROUPBYv.prod_id);
 
SELECT *FROMcomplete_product;
+----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+
|id|name          |manufacturer|price|semi_structured_data                                                                                          |
+----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+
|  1|LEDDeskLamp|X            |    26|{"color":"black","style":"classic","usage":"Indoor use only","material":"plastic","bulb_type":"LED"}|
|  2|Laptop        |Y            |  800|{"color":"blue","weight":"2,1 kg","cpu_type":"quad core","cpu_speed":"3400 mhz","battery_life":"9h"}|
|  3|Grill        |Z            |  300|{"color":"black","weight":"5 kg","fuel_type":"gas"}                                                      |
+----+---------------+--------------+-------+---------------------------------------------------------------------------------------------------------------+
3rowsinset(0,00sec)

Can this be achieved in 5.7 too? Yes, but it’s more complicated.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECTp.id,
      CAST(CONCAT('[',
              GROUP_CONCAT(JSON_QUOTE(a.name))
              ,']')
            ASJSON)ASproduct_attributes
FROMproductaspJOINvalueASvONp.id=v.prod_id
JOINattributeasaONa.id=v.attribute_idGROUPBYv.prod_id;
 
 
SELECT
    JSON_OBJECT("key",p.id,
              "title",p.name,
              "manufacturer",p.manufacturer,
              "price",p.price,
              "specifications",
              CAST(CONCAT('{',
                          GROUP_CONCAT(CONCAT(JSON_QUOTE(a.name),
                                              ':',
                                              JSON_QUOTE(v.value))),
                          '}')
                    ASJSON)
        )ASproduct
FROMproductASpJOINvalueASvONp.id=v.prod_id
  JOINattributeASaONa.id=v.attribute_id
GROUPBYv.prod_id;

 

The addition of aggregate functions makes it easier to aggregate data into JSON. Please try out these new features, and let us know your feedback!

Thank you for using MySQL.