Impute

Edit this page

To impute missing data in Vega-Lite, you can either use theimpute transform, either via anencoding field definition or via antransform array.

The impute transform groups data and determines missing values of thekey field within each group. For each missing value in each group, the impute transform will produce a new tuple with theimputed field generated based on a specified imputationmethod (by using a constantvalue or by calculating statistics such as mean within each group).

Documentation Overview

Impute in Encoding Field Definition

// A Single View or a Layer Specification{  ...,  "mark/layer": ...,  "encoding": {    "x": {      "field": ...,      "type": "quantitative",      "impute": {...},               // Impute      ...    },    "y": ...,    ...  },  ...}

Anencoding field definition can include animpute definition object to generate new data objects in place of the missing data.

Theimpute definition object can contain the following properties:

Property	Type	Description
frame	(Null \| Number)[]	A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value`[-5, 5]` indicates that the window should include five objects preceding and five objects following the current object. Default value::`[null, null]` indicating that the window includes all objects.
keyvals	Any[] \|ImputeSequence	Defines the key values that should be considered for imputation. An array of key values or an object defining anumber sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the`key` field. For`impute` in`encoding`, the key field is the x-field if the y-field is imputed, or vice versa. If there is no impute grouping, this propertymust be specified.
method	String	The imputation method to use for the field value of imputed data objects. One of`"value"`,`"mean"`,`"median"`,`"max"` or`"min"`. Default value:`"value"`
value	Any	The field value to use when the imputation`method` is`"value"`.

Forimpute in encoding, the grouping fields and the key field (for identifying missing values) are automatically determined. Values are automatically grouped by the specified fields ofmark property channels,key channel anddetail channel. If x-field isimputed, y-field is the key field. Basically, any missing y-value in each group will lead to a new tuple imputed, and vice versa.

In this example, weimpute they-field ("b"), so thex-field ("a") will be used as the"key". The values are then grouped by the field"c" of thecolor encoding. The impute transform then determines missing key values within each group. In this case, the data tuple where"a" is3 and"c" is1 is missing, so a new tuple with"a" =3,"c" =1, and"b" =0 will be added.

Besides imputing with a constantvalue, we can also use amethod (such as"mean") on existing data points to generate the missing data.

Theframe property ofimpute can be used to control the window over which the specifiedmethod is applied. Here, theframe is[-2, 2] which indicates that the window for calculating mean includes two objects preceding and two objects following the current object.

Specifying the Key Values to be Imputed

Thekeyvals property provides additional key values that should be considered for imputation. If not provided, all of the values will be derived from all unique values of thekey field. If there is no grouping field (e.g., nocolor in the examples given above), thenkeyvalsmust be specified. Otherwise, the impute transform will have no effect on the data.

Thekeyvals property can be an array:

Alternatively, thekeyvals property can be anobject defining a sequence, which can contain the following properties:

Property	Type	Description
start	Number	The starting value of the sequence.Default value:`0`
stop	Number	*Required.* The ending value(exclusive) of the sequence.
step	Number	The step value between sequence entries.Default value:`1` or`-1` if`stop < start`

Impute Transform

An impute transform can also be specified as a part of thetransform array.

// A View Specification{  ...  "transform": [    ...    {      // Impute Transform      "impute": ...,      "key": ...,      "keyvals": ...,      "groupby": [...],      "frame": [...],      "method": ...,      "value": ...    }    ...  ],  ...}

Property	Type	Description
impute	String	*Required.* The data field for which the missing values should be imputed.
key	String	*Required.* A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed.
keyvals	Any[] \|ImputeSequence	Defines the key values that should be considered for imputation. An array of key values or an object defining anumber sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the`key` field. For`impute` in`encoding`, the key field is the x-field if the y-field is imputed, or vice versa. If there is no impute grouping, this propertymust be specified.
groupby	String[]	An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis.
frame	(Null \| Number)[]	A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value`[-5, 5]` indicates that the window should include five objects preceding and five objects following the current object. Default value::`[null, null]` indicating that the window includes all objects.
method	String	The imputation method to use for the field value of imputed data objects. One of`"value"`,`"mean"`,`"median"`,`"max"` or`"min"`. Default value:`"value"`
value	Any	The field value to use when the imputation`method` is`"value"`.

For example, the same chart withimpute in encodingabove can be created using theimpute transform. Here, we have to manually specify thekey andgroupby fields, which were inferred automatically forimpute inencoding.

Similarlykeyvalsmust be specified if thegroupby property is not specified.

Movatterモバイル変換

Impute

Documentation Overview

Impute in Encoding Field Definition

Specifying the Key Values to be Imputed

Impute Transform