This repository was archived by the owner on Oct 9, 2024. It is now read-only.

huggingface/transformers-bloom-inferencePublic archive

NotificationsYou must be signed in to change notification settings
Fork113
Star565

Add configs to run int4 inference#37

Open

RezaYazdaniAminabadi wants to merge5 commits intohuggingface:main

base:main

Choose a base branch

fromRezaYazdaniAminabadi:int4-bloom

Open

Add configs to run int4 inference#37

RezaYazdaniAminabadi wants to merge5 commits intohuggingface:mainfromRezaYazdaniAminabadi:int4-bloom

Conversation

Copy link

RezaYazdaniAminabadi commentedNov 18, 2022

Add some minor config changes to support int4 inference through DeepSpeed-Inference.

The Int4 support will be added to DeepSpeed through thisPR.

cc:@stas00

Add configs to run int4 inference

572e644

stas00 suggested changes

Nov 18, 2022

View reviewed changes

Copy link

Contributor

stas00 left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Amazing work with adding int4-support, Reza!

bloom-inference-scripts/bloom-ds-inference.py Outdated

		mp_size=world_size,
		base_dir=repo_root,
		dtype=getattr(torch,infer_dtype),
		quantization_bits=8ifargs.dtype=='int8'else4,

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what happens with--dtype float16?

probably best to set this inkwargs only if quantization dtype is provided

Copy link

Author

RezaYazdaniAminabadiNov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The quabtization-bit should not be used when running in half-precision. But, I agree we can do it in the kwargs and only for qunatized inference mode.

Copy link

Contributor

stas00Nov 18, 2022•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

these demos are already used by many users so let's make those nice and clean configuration-wise, so it's clear to the reader when what bits should be enabled.

bloom-inference-scripts/bloom-ds-inference.py Outdated

		input_sentences*=math.ceil(args.batch_size/len(input_sentences))

		generate_kwargs=dict(max_new_tokens=num_tokens,do_sample=False)
		generate_kwargs=dict(max_new_tokens=num_tokens,do_sample=True)

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this is already a very different type of change.

Ifint4 requiresdo_sample=True, then again, let's change it only if it's--dtype int4

Copy link

Author

RezaYazdaniAminabadiNov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sure, I will double check thedo_sample=False again to see if the text generated makes sense. If not, I set it to true for int4

Copy link

Author

RezaYazdaniAminabadiNov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I just checked withdo_sample=False and I see the text is produced in the same way as for FP16 and INT8:

in=DeepSpeed is a machine learning frameworkout=DeepSpeed is a machine learning framework for deep learning. It is a Python library, and it is also a framework. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and it is a library. It is a framework, and

So, I am gonna turn it off for now.

Copy link

Contributor

stas00 commentedNov 18, 2022•
edited
Loading

Also should probably assert ifint4 attempted to be used w/odeepspeed>=xyz once the DS PR is merged... could tentatively set to the next deepspeed version? perhaps with XXX to enabled so the script can be used against ds@master.

I can take care of that.

Copy link

Author

RezaYazdaniAminabadi commentedNov 18, 2022

Also should probably assert ifint4 attempted to be used w/odeepspeed>=xyz once the DS PR is merged... could tentatively set to the next deepspeed version? perhaps with XXX to enabled so the script can be used against ds@master.
I can take care of that.

Sounds good to me. Thanks@stas00

fix quantization-bit config & turn off ds_sample

132d99d

stas00 reviewed

Nov 18, 2022

View reviewed changes

bloom-inference-scripts/bloom-ds-inference.py


		model_name=args.name
		infer_dtype=args.dtype
		infer_dtype=args.dtypeifargs.dtype!='int4'else'int8'

Copy link

Contributor

stas00Nov 18, 2022•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

would it make for a more user-friendly API to

keep thedtype intact
drop quantization_bits
letdeepspeed.init_inference derive the number of bits fromdtype?

not only the currently suggested override is confusing, I fail to see what purpose serves carrying the same information indtype and andquantization_bits twice

Copy link

Contributor

stas00Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

oh, wait,torch.init4 still doesn't exist, does it?

let's find the feature request.

Copy link

Contributor

stas00Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

still not implementedpytorch/pytorch#74627

so that's why you had to do the odd workarounds, right?

Copy link

Collaborator

mayank31398Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I guess we can drop it once its implemented@stas00 ?
For now, this might be the best way to do it.

Copy link

Contributor

stas00Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

see#37 (comment)

Copy link

Contributor

stas00Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

it's pointless to wait, since they won't haveint3 andint12

Copy link

awan-10Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

would it make for a more user-friendly API to
keep thedtype intact
drop quantization_bits
letdeepspeed.init_inference derive the number of bits fromdtype?
not only the currently suggested override is confusing, I fail to see what purpose serves carrying the same information indtype and andquantization_bits twice

@stas00 and@RezaYazdaniAminabadi - just clarifying that we have introduced a new DeepSpeedInferenceConfig that can be passed to init_inference. We are keeping it backwards compatible but if we are okay to make changes to this file, I would advocate for writing a config dictionary for DeepSpeed and pass that to init_inference instead of the various kwargs. Please see here for an example:https://gist.github.com/awan-10/6e3d5c756be3a876522e860c6bbf702d#file-bloom-ds-inference-py-L173

Also, see the docs for the new config:https://deepspeed.readthedocs.io/en/latest/inference-init.html

Copy link

Contributor

stas00Nov 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That definitely works.

@awan-10, may I suggest you make the inferenceconfig acceptdict_or_path just likezero does? it might be for some users easier to write out a separate file.

Copy link

awan-10Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@stas00 - thanks for the suggestion. Created an issue so we can track it:deepspeedai/DeepSpeed#2532. Mike and I will work on it.

Copy link

Contributor

stas00Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you very much,@awan-10

Copy link

Contributor

stas00 commentedNov 19, 2022•
edited
Loading

OK, I think I understand the limitations of pytorch and it'll get only worse when you tryint3, etc. even ifint4 is supported.
https://github.com/huggingface/transformers-bloom-inference/pull/37/files#r1026981222

I propose we break the currently proposed API and draw a better one.

I propose to have only 2 user-configurable args related to how deepspeed-inference operates

dtype is the dtype of the original model - so only fp32, fp16 or bf16 - neverintX (i.e. we dropint8)
quantization_bits:[None, 12, 8, 4, 3]

Now the API is simple, unambiguous and future proof (as inint12 orint3, Mixture of Precisions support)

For back-compatdeepspeed.init_inference can simply setquantization_bits=8 ifdtype==torch.int8 is passed. So the API will be unbroken.

What do you think, Reza?

change the quantization config format to work with the new style at D…

99cd7c9

…eepSpeed

Copy link

Collaborator

mayank31398 commentedNov 19, 2022

Huh?
Int4?
I will test this branch surely and let you know.
Thanks a lot for this :)

Copy link

Author

RezaYazdaniAminabadi commentedNov 19, 2022

is simple, unambiguous and future pro

Hi@stas00,
I agree with what you said, and we are going through the same route as you see from my last commit here.
Thanks for the good suggestion :)
Best,
Reza

Copy link

Author

RezaYazdaniAminabadi commentedNov 19, 2022

In that case, we

is simple, unambiguous and future pro
Hi@stas00, I agree with what you said, and we are going through the same route as you see from my last commit here. Thanks for the good suggestion :) Best, Reza

In this case, we can simply pass the bits to the DeepSpeed-inference config:kwargs['quant']['weight']['num_bits'] = quantization_bits

Copy link

Contributor

stas00 commentedNov 19, 2022

may I suggest that the just addedkwargs['quant']['weight']['num_bits'] isn't the most user-friendly API as far askwargs go?

why not have a flat structure of simple key=value pairs and once you got the info in your side you can re-arrange it to any nesting level you want.

Copy link

Author

RezaYazdaniAminabadi commentedNov 19, 2022

may I suggest that the just addedkwargs['quant']['weight']['num_bits'] isn't the most user-friendly API as far askwargs go?
why not have a flat structure of simple key=value pairs and once you got the info in your side you can re-arrange it to any nesting level you want.

I agree, let me work on that and I fix it.

Copy link

awan-10 commentedNov 19, 2022•
edited by stas00
Loading

may I suggest that the just addedkwargs['quant']['weight']['num_bits'] isn't the most user-friendly API as far askwargs go?
why not have a flat structure of simple key=value pairs and once you got the info in your side you can re-arrange it to any nesting level you want.
I agree, let me work on that and I fix it.

@RezaYazdaniAminabadi -- please see my comment above.#37 (comment)

Copy link

Author

RezaYazdaniAminabadi commentedNov 19, 2022

may I suggest that the just addedkwargs['quant']['weight']['num_bits'] isn't the most user-friendly API as far askwargs go?
why not have a flat structure of simple key=value pairs and once you got the info in your side you can re-arrange it to any nesting level you want.
I agree, let me work on that and I fix it.
@RezaYazdaniAminabadi -- please see my comment above.#37 (comment)

thanks@awan-10. Please go ahead and push your changes.

awan-10and others added2 commits

November 21, 2022 10:05

Update bloom-ds-inference.py

32779e8

Merge pull request#1from awan-10/patch-1

b472e48

Update bloom-ds-inference.py

mayank31398 mentioned this pull request

Dec 5, 2022

Sharding a model checkpoint for deepspeed usage#39

Open

mayank31398 force-pushed themain branch 12 times, most recently from235af1a to134b703Compare

January 22, 2023 12:19

mayank31398 force-pushed themain branch 10 times, most recently fromabfc97f to9d48dbfCompare

January 24, 2023 11:37

mayank31398 force-pushed themain branch 2 times, most recently from655179f to114b912Compare

March 3, 2023 16:33

Labels

None yet

Movatterモバイル変換

Add configs to run int4 inference#37

Are you sure you want to change the base?

Add configs to run int4 inference#37

Uh oh!

Conversation

RezaYazdaniAminabadi commentedNov 18, 2022

Uh oh!

stas00 left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00Nov 18, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commentedNov 18, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

RezaYazdaniAminabadi commentedNov 18, 2022

Uh oh!

stas00Nov 18, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stas00 commentedNov 19, 2022• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

mayank31398 commentedNov 19, 2022

Uh oh!

RezaYazdaniAminabadi commentedNov 19, 2022

Uh oh!

RezaYazdaniAminabadi commentedNov 19, 2022

Uh oh!

stas00 commentedNov 19, 2022

Uh oh!

RezaYazdaniAminabadi commentedNov 19, 2022

Uh oh!

awan-10 commentedNov 19, 2022• edited by stas00Loading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

RezaYazdaniAminabadi commentedNov 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stas00 left a comment•
edited
Loading

stas00Nov 18, 2022•
edited
Loading

stas00 commentedNov 18, 2022•
edited
Loading

stas00Nov 18, 2022•
edited
Loading

stas00 commentedNov 19, 2022•
edited
Loading

awan-10 commentedNov 19, 2022•
edited by stas00
Loading