Transformers documentation

GPT

Transformers

You are viewingmain version, which requiresinstallation from source. If you'd likeregular pip install, checkout the latest stable version (v4.57.1).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

This model was released on 2018-06-11 and added to Hugging Face Transformers on 2023-06-20.

GPT

GPT (Generative Pre-trained Transformer) (blog post) focuses on effectively learning text representations and transferring them to tasks. This model trains the Transformer decoder to predict the next word, and then fine-tuned on labeled data.

GPT can generate high-quality text, making it well-suited for a variety of natural language understanding tasks such as textual entailment, question answering, semantic similarity, and document classification.

You can find all the original GPT checkpoints under theOpenAI community organization.

Click on the GPT models in the right sidebar for more examples of how to apply GPT to different language tasks.

The example below demonstrates how to generate text withPipeline,AutoModel, and from the command line.

Pipeline

AutoModel

transformers CLI

import torchfrom transformersimport pipelinegenerator = pipeline(task="text-generation", model="openai-community/gpt", dtype=torch.float16, device=0)output = generator("The future of AI is", max_length=50, do_sample=True)print(output[0]["generated_text"])

Notes

Inputs should be padded on the right because GPT uses absolute position embeddings.

OpenAIGPTConfig

classtransformers.OpenAIGPTConfig

(vocab_size = 40478n_positions = 512n_embd = 768n_layer = 12n_head = 12afn = 'gelu'resid_pdrop = 0.1embd_pdrop = 0.1attn_pdrop = 0.1layer_norm_epsilon = 1e-05initializer_range = 0.02summary_type = 'cls_index'summary_use_proj = Truesummary_activation = Nonesummary_proj_to_labels = Truesummary_first_dropout = 0.1**kwargs)

Parameters

vocab_size (int,optional, defaults to 40478) —Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by theinputs_ids passed when callingOpenAIGPTModel orTFOpenAIGPTModel.
n_positions (int,optional, defaults to 512) —The maximum sequence length that this model might ever be used with. Typically set this to something largejust in case (e.g., 512 or 1024 or 2048).
n_embd (int,optional, defaults to 768) —Dimensionality of the embeddings and hidden states.
n_layer (int,optional, defaults to 12) —Number of hidden layers in the Transformer encoder.
n_head (int,optional, defaults to 12) —Number of attention heads for each attention layer in the Transformer encoder.
afn (str orCallable,optional, defaults to"gelu") —The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu","relu","silu" and"gelu_new" are supported.
resid_pdrop (float,optional, defaults to 0.1) —The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
embd_pdrop (int,optional, defaults to 0.1) —The dropout ratio for the embeddings.
attn_pdrop (float,optional, defaults to 0.1) —The dropout ratio for the attention.
layer_norm_epsilon (float,optional, defaults to 1e-05) —The epsilon to use in the layer normalization layers
initializer_range (float,optional, defaults to 0.02) —The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
summary_type (str,optional, defaults to"cls_index") —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.
Has to be one of the following options:
- "last": Take the last token hidden state (like XLNet).
- "first": Take the first token hidden state (like BERT).
- "mean": Take the mean of all tokens hidden states.
- "cls_index": Supply a Tensor of classification token position (like GPT/GPT-2).
- "attn": Not implemented now, use multi-head attention.
summary_use_proj (bool,optional, defaults toTrue) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.
Whether or not to add a projection after the vector extraction.
summary_activation (str,optional) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.
Pass"tanh" for a tanh activation to the output, any other value will result in no activation.
summary_proj_to_labels (bool,optional, defaults toTrue) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.
Whether the projection outputs should haveconfig.num_labels orconfig.hidden_size classes.
summary_first_dropout (float,optional, defaults to 0.1) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.
The dropout ratio to be used after the projection and activation.

This is the configuration class to store the configuration of aOpenAIGPTModel or aTFOpenAIGPTModel. It isused to instantiate a GPT model according to the specified arguments, defining the model architecture.Instantiating a configuration with the defaults will yield a similar configuration to that of the GPTopenai-community/openai-gpt architecture from OpenAI.

Configuration objects inherit fromPreTrainedConfig and can be used to control the model outputs. Read thedocumentation fromPreTrainedConfig for more information.

Examples:

>>>from transformersimport OpenAIGPTConfig, OpenAIGPTModel>>># Initializing a GPT configuration>>>configuration = OpenAIGPTConfig()>>># Initializing a model (with random weights) from the configuration>>>model = OpenAIGPTModel(configuration)>>># Accessing the model configuration>>>configuration = model.config

OpenAIGPTModel

classtransformers.OpenAIGPTModel

(config)

Parameters

config (OpenAIGPTModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

The bare Openai Model outputting raw hidden-states without any specific head on top.

This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.modeling_outputs.BaseModelOutput ortuple(torch.FloatTensor)

Parameters

input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.
Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
What are input IDs?
attention_mask (torch.FloatTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:
- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
What are attention masks?
token_type_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:
- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
What are token type IDs?
position_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1].
What are position IDs?
inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.

Returns

transformers.modeling_outputs.BaseModelOutput ortuple(torch.FloatTensor)

Atransformers.modeling_outputs.BaseModelOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.

last_hidden_state (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size)) — Sequence of hidden-states at the output of the last layer of the model.
hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.

TheOpenAIGPTModel forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

OpenAIGPTLMHeadModel

classtransformers.OpenAIGPTLMHeadModel

(config)

Parameters

config (OpenAIGPTLMHeadModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

OpenAI GPT Model transformer with a language modeling head on top (linear layer with weights tied to the inputembeddings).

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonelogits_to_keep: typing.Union[int, torch.Tensor] = 0**kwargs)→transformers.modeling_outputs.CausalLMOutput ortuple(torch.FloatTensor)

Parameters

input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.
Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
What are input IDs?
attention_mask (torch.FloatTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:
- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
What are attention masks?
token_type_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:
- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
What are token type IDs?
position_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1].
What are position IDs?
inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
labels (torch.LongTensor of shape(batch_size, sequence_length),optional) —Labels for language modeling. Note that the labelsare shifted inside the model, i.e. you can setlabels = input_ids Indices are selected in[-100, 0, ..., config.vocab_size] All labels set to-100are ignored (masked), the loss is only computed for labels in[0, ..., config.vocab_size]
output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
logits_to_keep (Union[int, torch.Tensor], defaults to0) —If anint, compute logits for the lastlogits_to_keep tokens. If0, calculate logits for allinput_ids (special case). Only last token logits are needed for generation, and calculating them only for thattoken can save memory, which becomes pretty significant for long sequences or large vocabulary size.If atorch.Tensor, must be 1D corresponding to the indices to keep in the sequence length dimension.This is useful when using packed tensor format (single dimension for batch and sequence length).

Returns

transformers.modeling_outputs.CausalLMOutput ortuple(torch.FloatTensor)

Atransformers.modeling_outputs.CausalLMOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.

loss (torch.FloatTensor of shape(1,),optional, returned whenlabels is provided) — Language modeling loss (for next-token prediction).
logits (torch.FloatTensor of shape(batch_size, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.

TheOpenAIGPTLMHeadModel forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Example:

>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTLMHeadModel>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTLMHeadModel.from_pretrained("openai-community/openai-gpt")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>outputs = model(**inputs, labels=inputs["input_ids"])>>>loss = outputs.loss>>>logits = outputs.logits

OpenAIGPTDoubleHeadsModel

classtransformers.OpenAIGPTDoubleHeadsModel

(config)

Parameters

config (OpenAIGPTDoubleHeadsModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

OpenAI GPT Model transformer with a language modeling and a multiple-choice classification head on top e.g. forRocStories/SWAG tasks. The two heads are two linear layers. The language modeling head has its weights tied to theinput embeddings, the classification head takes as input the input of a specified classification token index in theinput sequence).

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonemc_token_ids: typing.Optional[torch.LongTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Nonemc_labels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput ortuple(torch.FloatTensor)

Parameters

input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.
Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
What are input IDs?
attention_mask (torch.FloatTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:
- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
What are attention masks?
token_type_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:
- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
What are token type IDs?
position_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1].
What are position IDs?
inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
mc_token_ids (torch.LongTensor of shape(batch_size, num_choices),optional, default to index of the last token of the input) —Index of the classification token in each input sequence. Selected in the range[0, input_ids.size(-1) - 1].
labels (torch.LongTensor of shape(batch_size, sequence_length),optional) —Labels for language modeling. Note that the labelsare shifted inside the model, i.e. you can setlabels = input_ids Indices are selected in[-1, 0, ..., config.vocab_size] All labels set to-100 areignored (masked), the loss is only computed for labels in[0, ..., config.vocab_size]
mc_labels (torch.LongTensor of shape(batch_size),optional) —Labels for computing the multiple choice classification loss. Indices should be in[0, ..., num_choices]wherenum_choices is the size of the second dimension of the input tensors. (seeinput_ids above)
output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.

Returns

transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput ortuple(torch.FloatTensor)

Atransformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.

loss (torch.FloatTensor of shape(1,),optional, returned whenlabels is provided) — Language modeling loss.
mc_loss (torch.FloatTensor of shape(1,),optional, returned whenmc_labels is provided) — Multiple choice classification loss.
logits (torch.FloatTensor of shape(batch_size, num_choices, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
mc_logits (torch.FloatTensor of shape(batch_size, num_choices)) — Prediction scores of the multiple choice classification head (scores for each choice before SoftMax).
hidden_states (tuple[torch.FloatTensor],optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple[torch.FloatTensor],optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.

TheOpenAIGPTDoubleHeadsModel forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Examples:

>>>from transformersimport AutoTokenizer, OpenAIGPTDoubleHeadsModel>>>import torch>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-community/openai-gpt")>>>tokenizer.add_special_tokens(...    {"cls_token":"[CLS]"}...)# Add a [CLS] to the vocabulary (we should train it also!)>>>model.resize_token_embeddings(len(tokenizer))>>>choices = ["Hello, my dog is cute [CLS]","Hello, my cat is cute [CLS]"]>>>input_ids = torch.tensor([tokenizer.encode(s)for sin choices]).unsqueeze(0)# Batch size 1, 2 choices>>>mc_token_ids = torch.tensor([input_ids.size(-1) -1, input_ids.size(-1) -1]).unsqueeze(0)# Batch size 1>>>outputs = model(input_ids, mc_token_ids=mc_token_ids)>>>lm_logits = outputs.logits>>>mc_logits = outputs.mc_logits

OpenAIGPTForSequenceClassification

classtransformers.OpenAIGPTForSequenceClassification

(config)

Parameters

config (OpenAIGPTForSequenceClassification) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

The Original OpenAI GPT Model transformer with a sequence classification head on top (linear layer).OpenAIGPTForSequenceClassification uses the last token in order to do the classification, as other causalmodels (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of thelast token. If apad_token_id is defined in the configuration, it finds the last token that is not a paddingtoken in each row. If nopad_token_id is defined, it simply takes the last value in each row of the batch. Sinceit cannot guess the padding tokens wheninputs_embeds are passed instead ofinput_ids, it does the same (takethe last value in each row of the batch).

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.modeling_outputs.SequenceClassifierOutput ortuple(torch.FloatTensor)

Parameters

input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.
Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
What are input IDs?
attention_mask (torch.FloatTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:
- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
What are attention masks?
token_type_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:
- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
What are token type IDs?
position_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1].
What are position IDs?
inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
labels (torch.LongTensor of shape(batch_size,),optional) —Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]. Ifconfig.num_labels == 1 a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1 a classification loss is computed (Cross-Entropy).
output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.

Returns

transformers.modeling_outputs.SequenceClassifierOutput ortuple(torch.FloatTensor)

Atransformers.modeling_outputs.SequenceClassifierOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.

loss (torch.FloatTensor of shape(1,),optional, returned whenlabels is provided) — Classification (or regression if config.num_labels==1) loss.
logits (torch.FloatTensor of shape(batch_size, config.num_labels)) — Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.

TheOpenAIGPTForSequenceClassification forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Example of single-label classification:

>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():...    logits = model(**inputs).logits>>>predicted_class_id = logits.argmax().item()>>>model.config.id2label[predicted_class_id]...>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt", num_labels=num_labels)>>>labels = torch.tensor([1])>>>loss = model(**inputs, labels=labels).loss>>>round(loss.item(),2)...

Example of multi-label classification:

>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt", problem_type="multi_label_classification")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():...    logits = model(**inputs).logits>>>predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) >0.5]>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = OpenAIGPTForSequenceClassification.from_pretrained(..."openai-community/openai-gpt", num_labels=num_labels, problem_type="multi_label_classification"...)>>>labels = torch.sum(...    torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1...).to(torch.float)>>>loss = model(**inputs, labels=labels).loss

OpenAIGPTTokenizer

classtransformers.OpenAIGPTTokenizer

(vocab_filemerges_fileunk_token = '<unk>'**kwargs)

Parameters

vocab_file (str) —Path to the vocabulary file.
merges_file (str) —Path to the merges file.
unk_token (str,optional, defaults to"<unk>") —The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be thistoken instead.

Construct a GPT Tokenizer. Based on Byte-Pair-Encoding with the following peculiarities:

lowercases all inputs,
usesSpaCy tokenizer andftfy for pre-BPE tokenization if they are installed, fallback to BERT’sBasicTokenizer if not.

This tokenizer inherits fromPreTrainedTokenizer which contains most of the main methods. Users should refer tothis superclass for more information regarding those methods.

convert_tokens_to_string

(tokens)

Converts a sequence of tokens (string) in a single string.

OpenAIGPTTokenizerFast

classtransformers.OpenAIGPTTokenizerFast

(vocab_file = Nonemerges_file = Nonetokenizer_file = Noneunk_token = '<unk>'**kwargs)

Parameters

vocab_file (str) —Path to the vocabulary file.
merges_file (str) —Path to the merges file.
unk_token (str,optional, defaults to"<unk>") —The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be thistoken instead.

Construct a “fast” GPT Tokenizer (backed by HuggingFace’stokenizers library). Based on Byte-Pair-Encoding withthe following peculiarities:

lower case all inputs
uses BERT’s BasicTokenizer for pre-BPE tokenization

This tokenizer inherits fromPreTrainedTokenizerFast which contains most of the main methods. Users shouldrefer to this superclass for more information regarding those methods.

Update on GitHub

←glm4_moe GPT Neo→

Movatterモバイル変換

Transformers

GPT

Notes

OpenAIGPTConfig

classtransformers.OpenAIGPTConfig

OpenAIGPTModel

classtransformers.OpenAIGPTModel

forward

OpenAIGPTLMHeadModel

classtransformers.OpenAIGPTLMHeadModel

forward

OpenAIGPTDoubleHeadsModel

classtransformers.OpenAIGPTDoubleHeadsModel

forward

OpenAIGPTForSequenceClassification

classtransformers.OpenAIGPTForSequenceClassification

forward

OpenAIGPTTokenizer

classtransformers.OpenAIGPTTokenizer

convert_tokens_to_string

OpenAIGPTTokenizerFast

classtransformers.OpenAIGPTTokenizerFast