- Notifications
You must be signed in to change notification settings - Fork0
CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) for huggingface datasets
creative-graphic-design/huggingface-datasets_CAMERA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
annotations_creators | language | language_creators | license | multilinguality | pretty_name | size_categories | source_datasets | tags | task_categories | task_ids | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| CAMERA |
|
|
- Table of Contents
- Dataset Description
- Dataset Structure
- Dataset Creation
- Considerations for Using the Data
- Additional Information
- Homepage:https://github.com/CyberAgentAILab/camera
- Repository:https://github.com/shunk031/huggingface-datasets_CAMERA
CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) is the Japanese ad text generation dataset. We hope that our dataset will be useful in research for realizing more advanced ad text generation models.
[More Information Needed]
[More Information Needed]
[More Information Needed]
The language data in CAMERA is in Japanese (BCP-47 ja-JP).
When loading a specific configuration, users has to append a version dependent suffix:
fromdatasetsimportload_datasetdataset=load_dataset("shunk031/CAMERA",name="without-lp-images")print(dataset)# DatasetDict({# train: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'],# num_rows: 12395# })# validation: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'],# num_rows: 3098# })# test: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation'],# num_rows: 872# })# })
An example of the CAMERA (w/o LP images) dataset looks as follows:
{"asset_id":13861,"kw":"仙台 ホテル","lp_meta_description":"仙台のホテルや旅館をお探しなら楽天トラベルへ!楽天ポイントが使えて、貯まって、とってもお得な宿泊予約サイトです。さらに割引クーポンも使える!国内ツアー・航空券・レンタカー・バス予約も!","title_org":"仙台市のホテル","title_ne1":"","title_ne2":"","title_ne3":"","domain":"","parsed_full_text_annotation": {"text": ["trivago","Oops...AccessDenied 可","Youarenotallowedtoviewthispage!Ifyouthinkthisisanerror,pleasecontacttrivago.","Errorcode:0.3c99e86e.1672026945.25ba640YourIP:240d:1a:4d8:2800:b9b0:ea86:2087:d141AffectedURL:https://www.trivago.jp/ja/odr/%E8%BB%92","%E4%BB%99%E5%8F%B0-%E5%9B%BD%E5%86%85?search=20072325","Backtotrivago" ],"xmax": [653,838,765,773,815,649 ],"xmin": [547,357,433,420,378,550 ],"ymax": [47,390,475,558,598,663 ],"ymin": [18,198,439,504,566,651 ] }}
fromdatasetsimportload_datasetdataset=load_dataset("shunk031/CAMERA",name="with-lp-images")print(dataset)# DatasetDict({# train: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'],# num_rows: 12395# })# validation: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'],# num_rows: 3098# })# test: Dataset({# features: ['asset_id', 'kw', 'lp_meta_description', 'title_org', 'title_ne1', 'title_ne2', 'title_ne3', 'domain', 'parsed_full_text_annotation', 'lp_image'],# num_rows: 872# })# })
An example of the CAMERA (w/ LP images) dataset looks as follows:
{"asset_id":13861,"kw":"仙台 ホテル","lp_meta_description":"仙台のホテルや旅館をお探しなら楽天トラベルへ!楽天ポイントが使えて、貯まって、とってもお得な宿泊予約サイトです。さらに割引クーポンも使える!国内ツアー・航空券・レンタカー・バス予約も!","title_org":"仙台市のホテル","title_ne1":"","title_ne2":"","title_ne3":"","domain":"","parsed_full_text_annotation": {"text": ["trivago","Oops...AccessDenied 可","Youarenotallowedtoviewthispage!Ifyouthinkthisisanerror,pleasecontacttrivago.","Errorcode:0.3c99e86e.1672026945.25ba640YourIP:240d:1a:4d8:2800:b9b0:ea86:2087:d141AffectedURL:https://www.trivago.jp/ja/odr/%E8%BB%92","%E4%BB%99%E5%8F%B0-%E5%9B%BD%E5%86%85?search=20072325","Backtotrivago" ],"xmax": [653,838,765,773,815,649 ],"xmin": [547,357,433,420,378,550 ],"ymax": [47,390,475,558,598,663 ],"ymin": [18,198,439,504,566,651 ] },"lp_image":<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1200x680 at 0x7F8513446B20>}
asset_id
: ids (associated with LP images)kw
: search keywordlp_meta_description
: meta description extracted from LP (i.e., LP Text)title_org
: ad text (original gold reference)title_ne{1-3}
: ad text (additonal gold references for multi-reference evaluation)domain
: industry domain (HR, EC, Fin, Edu) for industry-wise evaluationparsed_full_text_annotation
: OCR results for LP images
asset_id
: ids (associated with LP images)kw
: search keywordlp_meta_description
: meta description extracted from LP (i.e., LP Text)title_org
: ad text (original gold reference)title_ne{1-3}
: ad text (additional gold references for multi-reference evaluation)domain
: industry domain (HR, EC, Fin, Edu) for industry-wise evaluationparsed_full_text_annotation
: OCR results for LP imageslp_image
: Landing page (LP) image
Fromthe official paper:
Split | # of data | # of reference ad text | industry domain label |
---|---|---|---|
Train | 12,395 | 1 | - |
Valid | 3,098 | 1 | - |
Test | 869 | 4 | ✔ |
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
@inproceedings{mita-et-al:nlp2023,author ="三田 雅人 and 村上 聡一朗 and 張 培楠",title ="広告文生成タスクの規定とベンチマーク構築",booktitle ="言語処理学会 第 29 回年次大会",year =2023,}
Thanks toMasato Mita,Soichiro Murakami, andPeinan Zhang for creating this dataset.
About
CAMERA (CyberAgent Multimodal Evaluation for Ad Text GeneRAtion) for huggingface datasets
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.