NotificationsYou must be signed in to change notification settings
Fork886
Star3k

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence#1987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

kcz358 wants to merge54 commits intohuggingface:main

base:main

Choose a base branch

fromkcz358:main

Open

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence#1987

kcz358 wants to merge54 commits intohuggingface:mainfromkcz358:main

Conversation

Copy link

kcz358 commentedApr 15, 2024

Hi@lewtun , this is our blog for thelmms-eval. Could you help us check the article and see whether there are something that can be added for example user experience or how to add a new model? Also, you might also want to add your names in the author list.

Thank you!

This blog introduces a new evaluation pipeline for large vision language model. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry.

kcz358and others added9 commits

April 14, 2024 13:04

Initial commit

b2a5ff0

Try to add image

b5ed7fc

See whether it works using huggingface dataset

929c502

Nah

37b377f

Add english version

809268e

Update lmms_eval.md

28c44ae

Add author list

44cef2f

Merge branch 'main' ofhttps://github.com/kcz358/blog

bfda318

Revise author list

bb4f141

lewtun reviewed

Apr 19, 2024

View reviewed changes

Copy link

Member

lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you very much for this blog post! I left a few minor suggestions and a pointer to include the details in_blog.yml

lmms_eval.md Outdated

		@@ -0,0 +1,85 @@
		---
		title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
		thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png

Copy link

Member

lewtunApr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe this should live in theblog repo directly to render on hf.co/blog. See here for an example:https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853

Note you also need to add the blog details to_blog.yml

Copy link

Author

kcz358Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?

Copy link

Member

lewtunApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that's just for the thumbnail, but the blog post itself will show all authors:

lmms_eval.md OutdatedShow resolvedHide resolved

lmms_eval.mdShow resolvedHide resolved

lmms_eval.md Outdated

		One-click evaluation: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring.

		```
		accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs

Copy link

Member

lewtunApr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
	#pip install git+https://github.com/huggingface/lmms-eval.git
	accelerate launch --multi_gpu --num_processes=8 -m lmms_eval\
	--model llava \
	--model_args pretrained="liuhaotian/llava-v1.5-7b" \
	--tasks mme,mmbench_en \
	--batch_size 1 \
	--log_samples \
	--log_samples_suffix llava_v1.5_mme_mmbenchen \
	--output_path ./logs

Copy link

Author

kcz358Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I will change the link to our current repo since hf forked repo is kind of behind and I will also addpip install git+https://github.com/haotian-liu/LLaVA.git

lmms_eval.md OutdatedShow resolvedHide resolved

lmms_eval.md Outdated


		Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

		To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

Copy link

Member

lewtunApr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluatinglargemultimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

lmms_eval.md OutdatedShow resolvedHide resolved

kcz358and others added11 commits

April 20, 2024 13:06

Update lmms_eval.md

6507e4f

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update lmms_eval.md

1aef8f8

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update lmms_eval.md

2fdda3f

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update lmms_eval.md

6aadd54

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update lmms_eval.md

2656012

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Update lmms_eval.md

21fa476

Update lmms_eval in _blog.yml

fb5a9c8

Add thumbnail image to assets

f1f8604

Update lmms_eval.md

5a3f283

Update lmms_eval.md

f04f8ca

Merge branch 'main' into main

2974a3d

Copy link

Author

kcz358 commentedApr 20, 2024

Hi@lewtun , thank you for your feedback.

I have uploaded the thumbnail picture and fixed several problems in the blog. Could you help us check if there are any more problems to fix in this article?

When we finalize the English version of the article, we will also help to translate everything into Chinese and put it into/blog/zh

Thank you!

kcz358 requested a review fromlewtun

April 24, 2024 04:19

lewtun approved these changes

Apr 24, 2024

View reviewed changes

Copy link

Member

lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for iterating@kcz358 ! This all looks good to me and gently pinging@pcuenca for final approval

Context: this is a blog post about an open source lib for evaluating multimodal models that the TRL team contributed to and it what we recommend in the TRL examples.

pcuenca approved these changes

Apr 24, 2024

View reviewed changes

Copy link

Member

pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Very interesting!

also cc@merveenoyan for info.

_blog.yml Outdated

		title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
		author: kcz358
		thumbnail: /blog/assets/lmms_eval/thumbnail.png
		date: April 20, 2024

Copy link

Member

pcuencaApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Reminder to update date before release :)

Copy link

Member

pcuencaApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(Also I'd move the entry to the end of the file, just in case)

lmms_eval.mdShow resolvedHide resolved

lmms_eval.md Outdated


		Synchronized Online Logging: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient.

		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" />

Copy link

Member

pcuencaApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think these links will be embedded correctly as images (they are references to the github tree)

Copy link

Author

kcz358Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?

I have uploaded all the imageshere but unable to find a way to let github markdown render the image

lmms_eval.md Outdated


		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/>

		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png" alt="viewer" />

Copy link

Member

pcuencaApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same comment about the image link.

lmms_eval.mdShow resolvedHide resolved

Copy link

Contributor

merveenoyan commentedApr 24, 2024

thanks a lot for the blog post! I'll give this a spin 😊

merveenoyan approved these changes

Apr 24, 2024

View reviewed changes

Copy link

Contributor

merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

mostly nits 😊

lmms_eval.md Outdated

		- user: liuziwei7
		guest: true
		---
		# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence

Copy link

Contributor

merveenoyanApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we can make it uppercase for h1 IMO

lmms_eval.md Outdated


		Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

		To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

Copy link

Contributor

merveenoyanApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

would be nice to directly give a link to lmms-eval instead of putting it in code formatting

lmms_eval.md Outdated


		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/>

		## Overview of the main features

Copy link

Contributor

merveenoyanApr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

again maybe uppercase main and features

lmms_eval.mdShow resolvedHide resolved

kcz358and others added3 commits

April 25, 2024 10:23

Update lmms_eval.md

18e888f

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Update lmms_eval.md

549c968

Co-authored-by: Merve Noyan <merveenoyan@gmail.com>

Update lmms_eval.md

f288278

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

kcz358and others added9 commits

April 25, 2024 11:13

Change image src

62db0ce

Switch back to github link for image

74fa630

Update image src

d334208

Add link to lmms-eval

9cfde7d

Fix title issue

b8b6aef

Fix upper title

90a66f2

Merge branch 'main' ofhttps://github.com/huggingface/blog

3229476

Add images

d3486c4

Update lmms_eval.md

782e690

Copy link

Author

kcz358 commentedApr 25, 2024•
edited
Loading

Hi@pcuenca @merveenoyan , thank you for your kind feedback.

I have tried to fix most of the issue in the comments and the image source issue. May I kindly ask for a review for this version and I will try to update the date in_blog.yml before release.

kcz358 requested review frompcuenca andmerveenoyan

April 29, 2024 04:32

Copy link

Member

lewtun commentedMay 1, 2024

Thanks for iterating@kcz358 ! Would you mind resolving the merge conflicts and then we should be pretty good to go!

kcz358 added4 commits

May 2, 2024 12:38

Merge remote-tracking branch 'upstream/main'

df74c0a

Merge branch 'main' ofhttps://github.com/kcz358/blog

6dc20a5

Add chinese version

5454271

Update dates

051df69

Copy link

Author

kcz358 commentedMay 2, 2024

Hi@lewtun , I have merged the main branch and added the Chinese version of the blog. I have also updated the date in_blog.yml

kcz358 added2 commits

May 8, 2024 16:13

Merge remote-tracking branch 'upstream/main'

49ecbac

Merge remote-tracking branch 'upstream/main'

12099b7

Copy link

Author

kcz358 commentedMay 16, 2024

Hi@lewtun , sorry for pinning you again. Do you think we are able to merge for current version?

pcuenca reviewed

May 16, 2024

View reviewed changes

_blog.yml Outdated

Comment on lines 3915 to 3948

		- local: sc2-instruct
		title: "StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation"
		thumbnail: /blog/assets/sc2-instruct/sc2-instruct-banner.png
		author: yuxiang630
		guest: true
		date: Apr 29, 2024
		tags:
		- nlp
		- community
		- research
		- LLM

		- local: evaluation-structured-outputs
		title: "Improving Prompt Consistency with Structured Generations"
		author: willkurt
		guest: true
		thumbnail: /blog/assets/evaluating-mmlu-leaderboard/thumbnail.png
		date: Apr 30, 2024
		tags:
		- evaluation
		- collaboration
		- research
		- leaderboard

		- local: asr-diarization
		title: "Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints"
		author: sergeipetrov
		thumbnail: /blog/assets/asr-diarization/thumbnail.png
		date: May 1, 2024
		tags:
		- audio
		- asr
		- inference

Copy link

Member

pcuencaMay 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

hmmm these entries shouldn't be here. Can you try to mergemain again and ensure there are no duplicates?

Copy link

Author

kcz358May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for spotting out the issue! I have merge the main again and delete the duplicates.

kcz358and others added7 commits

May 16, 2024 16:02

Update lmms_eval.md

9636095

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Update lmms_eval.md

dda7b65

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Update lmms_eval.md

03cc232

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Merge remote-tracking branch 'upstream/main'

74220b5

Remove duplicate

cd70bc6

Add resources at the end of the blog

6e223b2

Merge branch 'main' into main

d020514

Copy link

Member

lewtun commentedMay 30, 2024

@pcuenca I resolved the merge conflicts - ok if we merge this? (Feel free to do so if you agree)

Labels

None yet

6 participants

Movatterモバイル変換

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence#1987

Are you sure you want to change the base?

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence#1987

Uh oh!

Conversation

kcz358 commentedApr 15, 2024

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kcz358 commentedApr 20, 2024

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merveenoyan commentedApr 24, 2024

Uh oh!

merveenoyan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kcz358 commentedApr 25, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lewtun commentedMay 1, 2024

Uh oh!

kcz358 commentedMay 2, 2024

Uh oh!

kcz358 commentedMay 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lewtun commentedMay 30, 2024

kcz358 commentedApr 25, 2024•
edited
Loading