Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

License

NotificationsYou must be signed in to change notification settings

georgian-io/Multimodal-Toolkit

Repository files navigation


Documentation |Colab Notebook |Blog Post

A toolkit for incorporating multimodal data on top of text data for classificationand regression tasks. It uses HuggingFace transformers as the base model for text features.The toolkit adds a combining module that takes the outputs of the transformer in addition to categorical and numerical featuresto produce rich multimodal features for downstream classification/regression layers.Given a pretrained transformer, the parameters of the combining module and transformer are trained basedon the supervised task. For a brief literature review, check out the accompanyingblog post on Georgian's Impact Blog.

Installation

The code was developed in Python 3.7 with PyTorch and Transformers 4.26.1.The multimodal specific code is inmultimodal_transformers folder.

pip install multimodal-transformers

Supported Transformers

The following Hugging Face Transformers are supported to handle tabular data. See the documentationhere.

Included Datasets

This repository also includes two kaggle datasets which contain text data andrich tabular features

Working Examples

To quickly see these models in action on say one of the above datasets with preset configurations

$ python main.py ./datasets/Melbourne_Airbnb_Open_Data/train_config.json

Or if you prefer command line arguments run

$ python main.py \    --output_dir=./logs/test \    --task=classification \    --combine_feat_method=individual_mlps_on_cat_and_numerical_feats_then_concat \    --do_train \    --model_name_or_path=distilbert-base-uncased \    --data_path=./datasets/Womens_Clothing_E-Commerce_Reviews \    --column_info_path=./datasets/Womens_Clothing_E-Commerce_Reviews/column_info.json

main.py expects ajson file detailing which columns in a dataset contain text,categorical, or numerical input features. It also expects a path to the folder wherethe data is stored astrain.csv, andtest.csv(and if givenval.csv).For more details on the arguments seemultimodal_exp_args.py.

Notebook Introduction

To see the modules come together in a notebook:
Open In Colab

Included Methods

combine feat methoddescriptionrequires both cat and num features
text_onlyUses just the text columns as processed by a HuggingFace transformer before final classifier layer(s). Essentially equivalent to HuggingFace'sForSequenceClassification modelsFalse
concatConcatenate transformer output, numerical feats, and categorical feats all at once before final classifier layer(s)False
mlp_on_categorical_then_concatMLP on categorical feats then concat transformer output, numerical feats, and processed categorical feats before final classifier layer(s)False (Requires cat feats)
individual_mlps_on_cat_and_numerical_feats_then_concatSeparate MLPs on categorical feats and numerical feats then concatenation of transformer output, with processed numerical feats, and processed categorical feats before final classifier layer(s).False
mlp_on_concatenated_cat_and_numerical_feats_then_concatMLP on concatenated categorical and numerical feat then concatenated with transformer output before final classifier layer(s)True
attention_on_cat_and_numerical_featsAttention based summation of transformer outputs, numerical feats, and categorical feats queried by transformer outputs before final classifier layer(s).False
gating_on_cat_and_num_feats_then_sumGated summation of transformer outputs, numerical feats, and categorical feats before final classifier layer(s). Inspired byIntegrating Multimodal Information in Large Pretrained Transformers which performs the mechanism for each token.False
weighted_feature_sum_on_transformer_cat_and_numerical_featsLearnable weighted feature-wise sum of transformer outputs, numerical feats and categorical feats for each feature dimension before final classifier layer(s)False

Simple baseline model

In practice, taking the categorical and numerical features as they are and just tokenizing them and just concatenating them tothe text columns as extra text sentences is a strong baseline. To do that here, just specify all the categorical and numericalcolumns as text columns and setcombine_feat_method totext_only. For example for each of the included sample datasets in./datasets,intrain_config.json changecombine_feat_method totext_only andcolumn_info_path to./datasets/{dataset}/column_info_all_text.json.

In the experiments below this baseline corresponds to Combine Feat Method beingunimodal.

Results

The following tables shows the results on the two included datasets's respective test sets, by running main.pyNon specified parameters are the default.

Review Prediction

Specific training parameters can be seen indatasets/Womens_Clothing_E-Commerce_Reviews/train_config.json.

There are2 text columns,3 categorical columns, and3 numerical columns.

ModelCombine Feat MethodF1PR AUC
Bert Base Uncasedtext_only0.9570.992
Bert Base Uncasedunimodal0.9680.995
Bert Base Uncasedconcat0.9580.992
Bert Base Uncasedindividual_mlps_on_cat_and_numerical_feats_then_concat0.9590.992
Bert Base Uncasedattention_on_cat_and_numerical_feats0.9590.992
Bert Base Uncasedgating_on_cat_and_num_feats_then_sum0.9610.994
Bert Base Uncasedweighted_feature_sum_on_transformer_cat_and_numerical_feats0.9620.994

Pricing Prediction

Specific training parameters can be seen indatasets/Melbourne_Airbnb_Open_Data/train_config.json.

There are3 text columns,74 categorical columns, and15 numerical columns.

ModelCombine Feat MethodMAERMSE
Bert Base Multilingual Uncasedtext_only82.74254.0
Bert Base Multilingual Uncasedunimodal79.34245.2
Bert Base Uncasedconcat65.68239.3
Bert Base Multilingual Uncasedindividual_mlps_on_cat_and_numerical_feats_then_concat66.73237.3
Bert Base Multilingual Uncasedattention_on_cat_and_numerical_feats74.72246.3
Bert Base Multilingual Uncasedgating_on_cat_and_num_feats_then_sum66.64237.8
Bert Base Multilingual Uncasedweighted_feature_sum_on_transformer_cat_and_numerical_feats71.19245.2

Pet Adoption Prediction

Specific training parameters can be seen indatasets/PetFindermy_Adoption_PredictionThere are2 text columns,14 categorical columns, and5 numerical columns.

ModelCombine Feat MethodF1_macroF1_micro
Bert Base Multilingual Uncasedtext_only0.0880.281
Bert Base Multilingual Uncasedunimodal0.0890.283
Bert Base Uncasedconcat0.1990.362
Bert Base Multilingual Uncasedindividual_mlps_on_cat_and_numerical_feats_then_concat0.2440.352
Bert Base Multilingual Uncasedattention_on_cat_and_numerical_feats0.2540.375
Bert Base Multilingual Uncasedgating_on_cat_and_num_feats_then_sum0.2750.375
Bert Base Multilingual Uncasedweighted_feature_sum_on_transformer_cat_and_numerical_feats0.2660.380

Citation

We now have apaper you can cite for the Multimodal-Toolkit.

@inproceedings{gu-budhkar-2021-package,title ="A Package for Learning on Tabular and Text Data with Transformers",author ="Gu, Ken  and      Budhkar, Akshay",booktitle ="Proceedings of the Third Workshop on Multimodal Artificial Intelligence",month = jun,year ="2021",address ="Mexico City, Mexico",publisher ="Association for Computational Linguistics",url ="https://www.aclweb.org/anthology/2021.maiworkshop-1.10",doi ="10.18653/v1/2021.maiworkshop-1.10",pages ="69--73",}

About

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors7

Languages


[8]ページ先頭

©2009-2025 Movatter.jp