- Notifications
You must be signed in to change notification settings - Fork91
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
License
georgian-io/Multimodal-Toolkit
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Documentation |Colab Notebook |Blog Post
A toolkit for incorporating multimodal data on top of text data for classificationand regression tasks. It uses HuggingFace transformers as the base model for text features.The toolkit adds a combining module that takes the outputs of the transformer in addition to categorical and numerical featuresto produce rich multimodal features for downstream classification/regression layers.Given a pretrained transformer, the parameters of the combining module and transformer are trained basedon the supervised task. For a brief literature review, check out the accompanyingblog post on Georgian's Impact Blog.
The code was developed in Python 3.7 with PyTorch and Transformers 4.26.1.The multimodal specific code is inmultimodal_transformers folder.
pip install multimodal-transformersThe following Hugging Face Transformers are supported to handle tabular data. See the documentationhere.
- BERT from Devlin et al.:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (ACL 2019)
- ALBERT from Lan et al.:ALBERT: A Lite BERT for Self-supervised Learning of Language Representations (ICLR 2020)
- DistilBERT from Sanh et al.:DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (NeurIPS 2019)
- RoBERTafrom Liu et al.:RoBERTa: A Robustly Optimized BERT Pretraining Approach
- XLM from Lample et al.:Cross-lingual Language Model Pretraining (NeurIPS 2019)
- XLNET from Yang et al.:XLNet: Generalized Autoregressive Pretraining for Language Understanding (NeurIPS 2019)
- XLM-RoBERTa from Conneau et al.:Unsupervised Cross-lingual Representation Learning at Scale (ACL 2020)
This repository also includes two kaggle datasets which contain text data andrich tabular features
- Women's Clothing E-Commerce Reviews for Recommendation Prediction (Classification)
- Melbourne Airbnb Open Data for Price Prediction (Regression)
- PetFindermy Adoption Prediction for Pet Adoption Speed Prediction (Multiclass Classification)
To quickly see these models in action on say one of the above datasets with preset configurations
$ python main.py ./datasets/Melbourne_Airbnb_Open_Data/train_config.jsonOr if you prefer command line arguments run
$ python main.py \ --output_dir=./logs/test \ --task=classification \ --combine_feat_method=individual_mlps_on_cat_and_numerical_feats_then_concat \ --do_train \ --model_name_or_path=distilbert-base-uncased \ --data_path=./datasets/Womens_Clothing_E-Commerce_Reviews \ --column_info_path=./datasets/Womens_Clothing_E-Commerce_Reviews/column_info.jsonmain.py expects ajson file detailing which columns in a dataset contain text,categorical, or numerical input features. It also expects a path to the folder wherethe data is stored astrain.csv, andtest.csv(and if givenval.csv).For more details on the arguments seemultimodal_exp_args.py.
To see the modules come together in a notebook:
| combine feat method | description | requires both cat and num features |
|---|---|---|
| text_only | Uses just the text columns as processed by a HuggingFace transformer before final classifier layer(s). Essentially equivalent to HuggingFace'sForSequenceClassification models | False |
| concat | Concatenate transformer output, numerical feats, and categorical feats all at once before final classifier layer(s) | False |
| mlp_on_categorical_then_concat | MLP on categorical feats then concat transformer output, numerical feats, and processed categorical feats before final classifier layer(s) | False (Requires cat feats) |
| individual_mlps_on_cat_and_numerical_feats_then_concat | Separate MLPs on categorical feats and numerical feats then concatenation of transformer output, with processed numerical feats, and processed categorical feats before final classifier layer(s). | False |
| mlp_on_concatenated_cat_and_numerical_feats_then_concat | MLP on concatenated categorical and numerical feat then concatenated with transformer output before final classifier layer(s) | True |
| attention_on_cat_and_numerical_feats | Attention based summation of transformer outputs, numerical feats, and categorical feats queried by transformer outputs before final classifier layer(s). | False |
| gating_on_cat_and_num_feats_then_sum | Gated summation of transformer outputs, numerical feats, and categorical feats before final classifier layer(s). Inspired byIntegrating Multimodal Information in Large Pretrained Transformers which performs the mechanism for each token. | False |
| weighted_feature_sum_on_transformer_cat_and_numerical_feats | Learnable weighted feature-wise sum of transformer outputs, numerical feats and categorical feats for each feature dimension before final classifier layer(s) | False |
In practice, taking the categorical and numerical features as they are and just tokenizing them and just concatenating them tothe text columns as extra text sentences is a strong baseline. To do that here, just specify all the categorical and numericalcolumns as text columns and setcombine_feat_method totext_only. For example for each of the included sample datasets in./datasets,intrain_config.json changecombine_feat_method totext_only andcolumn_info_path to./datasets/{dataset}/column_info_all_text.json.
In the experiments below this baseline corresponds to Combine Feat Method beingunimodal.
The following tables shows the results on the two included datasets's respective test sets, by running main.pyNon specified parameters are the default.
Specific training parameters can be seen indatasets/Womens_Clothing_E-Commerce_Reviews/train_config.json.
There are2 text columns,3 categorical columns, and3 numerical columns.
| Model | Combine Feat Method | F1 | PR AUC |
|---|---|---|---|
| Bert Base Uncased | text_only | 0.957 | 0.992 |
| Bert Base Uncased | unimodal | 0.968 | 0.995 |
| Bert Base Uncased | concat | 0.958 | 0.992 |
| Bert Base Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 0.959 | 0.992 |
| Bert Base Uncased | attention_on_cat_and_numerical_feats | 0.959 | 0.992 |
| Bert Base Uncased | gating_on_cat_and_num_feats_then_sum | 0.961 | 0.994 |
| Bert Base Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 0.962 | 0.994 |
Specific training parameters can be seen indatasets/Melbourne_Airbnb_Open_Data/train_config.json.
There are3 text columns,74 categorical columns, and15 numerical columns.
| Model | Combine Feat Method | MAE | RMSE |
|---|---|---|---|
| Bert Base Multilingual Uncased | text_only | 82.74 | 254.0 |
| Bert Base Multilingual Uncased | unimodal | 79.34 | 245.2 |
| Bert Base Uncased | concat | 65.68 | 239.3 |
| Bert Base Multilingual Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 66.73 | 237.3 |
| Bert Base Multilingual Uncased | attention_on_cat_and_numerical_feats | 74.72 | 246.3 |
| Bert Base Multilingual Uncased | gating_on_cat_and_num_feats_then_sum | 66.64 | 237.8 |
| Bert Base Multilingual Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 71.19 | 245.2 |
Specific training parameters can be seen indatasets/PetFindermy_Adoption_PredictionThere are2 text columns,14 categorical columns, and5 numerical columns.
| Model | Combine Feat Method | F1_macro | F1_micro |
|---|---|---|---|
| Bert Base Multilingual Uncased | text_only | 0.088 | 0.281 |
| Bert Base Multilingual Uncased | unimodal | 0.089 | 0.283 |
| Bert Base Uncased | concat | 0.199 | 0.362 |
| Bert Base Multilingual Uncased | individual_mlps_on_cat_and_numerical_feats_then_concat | 0.244 | 0.352 |
| Bert Base Multilingual Uncased | attention_on_cat_and_numerical_feats | 0.254 | 0.375 |
| Bert Base Multilingual Uncased | gating_on_cat_and_num_feats_then_sum | 0.275 | 0.375 |
| Bert Base Multilingual Uncased | weighted_feature_sum_on_transformer_cat_and_numerical_feats | 0.266 | 0.380 |
We now have apaper you can cite for the Multimodal-Toolkit.
@inproceedings{gu-budhkar-2021-package,title ="A Package for Learning on Tabular and Text Data with Transformers",author ="Gu, Ken and Budhkar, Akshay",booktitle ="Proceedings of the Third Workshop on Multimodal Artificial Intelligence",month = jun,year ="2021",address ="Mexico City, Mexico",publisher ="Association for Computational Linguistics",url ="https://www.aclweb.org/anthology/2021.maiworkshop-1.10",doi ="10.18653/v1/2021.maiworkshop-1.10",pages ="69--73",}
About
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors7
Uh oh!
There was an error while loading.Please reload this page.