To impute missing data in Vega-Lite, you can either use theimpute
transform, either via anencoding field definition or via antransform
array.
The impute transform groups data and determines missing values of thekey
field within each group. For each missing value in each group, the impute transform will produce a new tuple with theimpute
d field generated based on a specified imputationmethod
(by using a constantvalue
or by calculating statistics such as mean within each group).
// A Single View or a Layer Specification{ ..., "mark/layer": ..., "encoding": { "x": { "field": ..., "type": "quantitative", "impute": {...}, // Impute ... }, "y": ..., ... }, ...}
Anencoding field definition can include animpute
definition object to generate new data objects in place of the missing data.
Theimpute
definition object can contain the following properties:
Property | Type | Description |
---|---|---|
frame | (Null | Number)[] | A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value Default value:: |
keyvals | Any[] |ImputeSequence | Defines the key values that should be considered for imputation. An array of key values or an object defining anumber sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the If there is no impute grouping, this propertymust be specified. |
method | String | The imputation method to use for the field value of imputed data objects. One of Default value: |
value | Any | The field value to use when the imputation |
Forimpute
in encoding, the grouping fields and the key field (for identifying missing values) are automatically determined. Values are automatically grouped by the specified fields ofmark property channels,key channel anddetail channel. If x-field isimpute
d, y-field is the key field. Basically, any missing y-value in each group will lead to a new tuple imputed, and vice versa.
In this example, weimpute
they
-field ("b"
), so thex
-field ("a"
) will be used as the"key"
. The values are then grouped by the field"c"
of thecolor
encoding. The impute transform then determines missing key values within each group. In this case, the data tuple where"a"
is3
and"c"
is1
is missing, so a new tuple with"a"
=3
,"c"
=1
, and"b"
=0
will be added.
Besides imputing with a constantvalue
, we can also use amethod
(such as"mean"
) on existing data points to generate the missing data.
Theframe
property ofimpute
can be used to control the window over which the specifiedmethod
is applied. Here, theframe
is[-2, 2]
which indicates that the window for calculating mean includes two objects preceding and two objects following the current object.
Thekeyvals
property provides additional key values that should be considered for imputation. If not provided, all of the values will be derived from all unique values of thekey
field. If there is no grouping field (e.g., nocolor
in the examples given above), thenkeyvals
must be specified. Otherwise, the impute transform will have no effect on the data.
Thekeyvals
property can be an array:
Alternatively, thekeyvals
property can be anobject defining a sequence, which can contain the following properties:
Property | Type | Description |
---|---|---|
start | Number | The starting value of the sequence.Default value: |
stop | Number | Required. The ending value(exclusive) of the sequence. |
step | Number | The step value between sequence entries.Default value: |
An impute transform can also be specified as a part of thetransform
array.
// A View Specification{ ... "transform": [ ... { // Impute Transform "impute": ..., "key": ..., "keyvals": ..., "groupby": [...], "frame": [...], "method": ..., "value": ... } ... ], ...}
Property | Type | Description |
---|---|---|
impute | String | Required. The data field for which the missing values should be imputed. |
key | String | Required. A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed. |
keyvals | Any[] |ImputeSequence | Defines the key values that should be considered for imputation. An array of key values or an object defining anumber sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the If there is no impute grouping, this propertymust be specified. |
groupby | String[] | An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis. |
frame | (Null | Number)[] | A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value Default value:: |
method | String | The imputation method to use for the field value of imputed data objects. One of Default value: |
value | Any | The field value to use when the imputation |
For example, the same chart withimpute
in encodingabove can be created using theimpute
transform. Here, we have to manually specify thekey
andgroupby
fields, which were inferred automatically forimpute
inencoding
.
Similarlykeyvals
must be specified if thegroupby
property is not specified.