Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence#1987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kcz358 wants to merge54 commits intohuggingface:main
base:main
Choose a base branch
Loading
fromkcz358:main

Conversation

kcz358
Copy link

Hi@lewtun , this is our blog for thelmms-eval. Could you help us check the article and see whether there are something that can be added for example user experience or how to add a new model? Also, you might also want to add your names in the author list.

Thank you!


This blog introduces a new evaluation pipeline for large vision language model. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry.

lewtun reacted with rocket emoji
Copy link
Member

@lewtunlewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you very much for this blog post! I left a few minor suggestions and a pointer to include the details in_blog.yml

lmms_eval.md Outdated
@@ -0,0 +1,85 @@
---
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe this should live in theblog repo directly to render on hf.co/blog. See here for an example:https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853

Note you also need to add the blog details to_blog.yml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that's just for the thumbnail, but the blog post itself will show all authors:

Screenshot 2024-04-24 at 16 27 21

lmms_eval.md Outdated
**One-click evaluation**: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring.

```
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
#pip install git+https://github.com/huggingface/lmms-eval.git
accelerate launch --multi_gpu --num_processes=8 -m lmms_eval\
--model llava \
--model_args pretrained="liuhaotian/llava-v1.5-7b" \
--tasks mme,mmbench_en \
--batch_size 1 \
--log_samples \
--log_samples_suffix llava_v1.5_mme_mmbenchen \
--output_path ./logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I will change the link to our current repo since hf forked repo is kind of behind and I will also addpip install git+https://github.com/haotian-liu/LLaVA.git

lmms_eval.md Outdated

Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluatinglargemultimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

kcz358and others added11 commitsApril 20, 2024 13:06
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
@kcz358
Copy link
Author

Hi@lewtun , thank you for your feedback.

I have uploaded the thumbnail picture and fixed several problems in the blog. Could you help us check if there are any more problems to fix in this article?

When we finalize the English version of the article, we will also help to translate everything into Chinese and put it into/blog/zh

Thank you!

@kcz358kcz358 requested a review fromlewtunApril 24, 2024 04:19
Copy link
Member

@lewtunlewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for iterating@kcz358 ! This all looks good to me and gently pinging@pcuenca for final approval

Context: this is a blog post about an open source lib for evaluating multimodal models that the TRL team contributed to and it what we recommend in the TRL examples.

pcuenca and kcz358 reacted with thumbs up emojipcuenca reacted with eyes emoji
Copy link
Member

@pcuencapcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Very interesting!

also cc@merveenoyan for info.

merveenoyan reacted with heart emoji
_blog.yml Outdated
title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
author: kcz358
thumbnail: /blog/assets/lmms_eval/thumbnail.png
date: April 20, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Reminder to update date before release :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(Also I'd move the entry to the end of the file, just in case)

lmms_eval.md Outdated

**Synchronized Online Logging**: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient.

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think these links will be embedded correctly as images (they are references to the github tree)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?

I have uploaded all the imageshere but unable to find a way to let github markdown render the image

lmms_eval.md Outdated

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/>

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png" alt="viewer" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same comment about the image link.

@merveenoyan
Copy link
Contributor

thanks a lot for the blog post! I'll give this a spin 😊

Copy link
Contributor

@merveenoyanmerveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

mostly nits 😊

kcz358 reacted with heart emoji
lmms_eval.md Outdated
- user: liuziwei7
guest: true
---
# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we can make it uppercase for h1 IMO

lmms_eval.md Outdated

Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

would be nice to directly give a link to lmms-eval instead of putting it in code formatting

lmms_eval.md Outdated

<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/>

## Overview of the main features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

again maybe uppercase main and features

kcz358and others added3 commitsApril 25, 2024 10:23
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@kcz358
Copy link
Author

kcz358 commentedApr 25, 2024
edited
Loading

Hi@pcuenca@merveenoyan , thank you for your kind feedback.

I have tried to fix most of the issue in the comments and the image source issue. May I kindly ask for a review for this version and I will try to update the date in_blog.yml before release.

@lewtun
Copy link
Member

Thanks for iterating@kcz358 ! Would you mind resolving the merge conflicts and then we should be pretty good to go!

@kcz358
Copy link
Author

Hi@lewtun , I have merged the main branch and added the Chinese version of the blog. I have also updated the date in_blog.yml

@kcz358
Copy link
Author

Hi@lewtun , sorry for pinning you again. Do you think we are able to merge for current version?

_blog.yml Outdated
Comment on lines 3915 to 3948
- local: sc2-instruct
title: "StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation"
thumbnail: /blog/assets/sc2-instruct/sc2-instruct-banner.png
author: yuxiang630
guest: true
date: Apr 29, 2024
tags:
- nlp
- community
- research
- LLM

- local: evaluation-structured-outputs
title: "Improving Prompt Consistency with Structured Generations"
author: willkurt
guest: true
thumbnail: /blog/assets/evaluating-mmlu-leaderboard/thumbnail.png
date: Apr 30, 2024
tags:
- evaluation
- collaboration
- research
- leaderboard

- local: asr-diarization
title: "Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints"
author: sergeipetrov
thumbnail: /blog/assets/asr-diarization/thumbnail.png
date: May 1, 2024
tags:
- audio
- asr
- inference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

hmmm these entries shouldn't be here. Can you try to mergemain again and ensure there are no duplicates?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for spotting out the issue! I have merge the main again and delete the duplicates.

kcz358and others added7 commitsMay 16, 2024 16:02
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
@lewtun
Copy link
Member

@pcuenca I resolved the merge conflicts - ok if we merge this? (Feel free to do so if you agree)

kcz358 reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@pcuencapcuencapcuenca approved these changes

@lewtunlewtunlewtun approved these changes

@merveenoyanmerveenoyanAwaiting requested review from merveenoyan

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@kcz358@merveenoyan@lewtun@pcuenca@pufanyi@Luodian

[8]ページ先頭

©2009-2025 Movatter.jp