Categorical data: Feature crosses Stay organized with collections Save and categorize content based on your preferences.
Page Summary
Feature crosses are created by combining two or more categorical or bucketed features to capture interactions and nonlinearities within a dataset.
They enable linear models to handle nonlinearities similar to polynomial transforms, but feature crosses work with categorical data while polynomial transforms are applied to numerical data.
Feature crosses can be particularly effective when guided by domain expertise, but using neural networks can automate the process of discovering valuable combinations.
Overuse of feature crosses with sparse features should be avoided, as it can lead to excessive sparsity in the resulting feature set.
Feature crosses are created bycrossing (taking the Cartesian product of) two or more categorical or bucketedfeatures of the dataset. Likepolynomialtransforms,feature crosses allow linear models to handle nonlinearities. Feature crossesalso encode interactions between features.
For example, consider a leaf dataset with the categorical features:
edges, containing valuessmooth,toothed, andlobedarrangement, containing valuesoppositeandalternate
Assume the order above is the order of the feature columns in a one-hotrepresentation, so that a leaf withsmooth edges andopposite arrangementis represented as{(1, 0, 0), (1, 0)}.
The feature cross, or Cartesian product, of these two features would be:
{Smooth_Opposite, Smooth_Alternate, Toothed_Opposite, Toothed_Alternate,Lobed_Opposite, Lobed_Alternate}
where the value of each term is the product of the base feature values, suchthat:
Smooth_Opposite = edges[0] * arrangement[0]Smooth_Alternate = edges[0] * arrangement[1]Toothed_Opposite = edges[1] * arrangement[0]Toothed_Alternate = edges[1] * arrangement[1]Lobed_Opposite = edges[2] * arrangement[0]Lobed_Alternate = edges[2] * arrangement[1]
For example, if a leaf has alobed edge and analternate arrangement, thefeature-cross vector will have a value of 1 forLobed_Alternate, and a valueof 0 for all other terms:
{0, 0, 0, 0, 0, 1}
This dataset could be used to classify leaves by tree species, since thesecharacteristics do not vary within a species.
Click here to compare polynomial transforms with feature crosses
Feature crosses are somewhat analogous toPolynomial transforms.Both combine multiple features into a new synthetic feature that the model cantrain on to learn nonlinearities. Polynomial transforms typically combinenumerical data, while feature crosses combine categorical data.
When to use feature crosses
Domain knowledge can suggest a useful combination of featuresto cross. Without that domain knowledge, it can be difficult to determineeffective feature crosses or polynomial transforms by hand. It's often possible,if computationally expensive, to useneural networks toautomatically find and apply useful feature combinations during training.
Be careful—crossing two sparse features produces an even sparser newfeature than the two original features. For example, if feature A is a100-element sparse feature and feature B is a 200-element sparse feature,a feature cross of A and B yields a 20,000-element sparse feature.
Key terms:Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.