This model was released on 2018-06-11 and added to Hugging Face Transformers on 2023-06-20.
GPT
GPT (Generative Pre-trained Transformer) (blog post) focuses on effectively learning text representations and transferring them to tasks. This model trains the Transformer decoder to predict the next word, and then fine-tuned on labeled data.
GPT can generate high-quality text, making it well-suited for a variety of natural language understanding tasks such as textual entailment, question answering, semantic similarity, and document classification.
You can find all the original GPT checkpoints under theOpenAI community organization.
Click on the GPT models in the right sidebar for more examples of how to apply GPT to different language tasks.
The example below demonstrates how to generate text withPipeline,AutoModel, and from the command line.
import torchfrom transformersimport pipelinegenerator = pipeline(task="text-generation", model="openai-community/gpt", dtype=torch.float16, device=0)output = generator("The future of AI is", max_length=50, do_sample=True)print(output[0]["generated_text"])
Notes
- Inputs should be padded on the right because GPT uses absolute position embeddings.
OpenAIGPTConfig
classtransformers.OpenAIGPTConfig
<source>(vocab_size = 40478n_positions = 512n_embd = 768n_layer = 12n_head = 12afn = 'gelu'resid_pdrop = 0.1embd_pdrop = 0.1attn_pdrop = 0.1layer_norm_epsilon = 1e-05initializer_range = 0.02summary_type = 'cls_index'summary_use_proj = Truesummary_activation = Nonesummary_proj_to_labels = Truesummary_first_dropout = 0.1**kwargs)
Parameters
- vocab_size (
int,optional, defaults to 40478) —Vocabulary size of the GPT-2 model. Defines the number of different tokens that can be represented by theinputs_idspassed when callingOpenAIGPTModel orTFOpenAIGPTModel. - n_positions (
int,optional, defaults to 512) —The maximum sequence length that this model might ever be used with. Typically set this to something largejust in case (e.g., 512 or 1024 or 2048). - n_embd (
int,optional, defaults to 768) —Dimensionality of the embeddings and hidden states. - n_layer (
int,optional, defaults to 12) —Number of hidden layers in the Transformer encoder. - n_head (
int,optional, defaults to 12) —Number of attention heads for each attention layer in the Transformer encoder. - afn (
strorCallable,optional, defaults to"gelu") —The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu","relu","silu"and"gelu_new"are supported. - resid_pdrop (
float,optional, defaults to 0.1) —The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. - embd_pdrop (
int,optional, defaults to 0.1) —The dropout ratio for the embeddings. - attn_pdrop (
float,optional, defaults to 0.1) —The dropout ratio for the attention. - layer_norm_epsilon (
float,optional, defaults to 1e-05) —The epsilon to use in the layer normalization layers - initializer_range (
float,optional, defaults to 0.02) —The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - summary_type (
str,optional, defaults to"cls_index") —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.Has to be one of the following options:
"last": Take the last token hidden state (like XLNet)."first": Take the first token hidden state (like BERT)."mean": Take the mean of all tokens hidden states."cls_index": Supply a Tensor of classification token position (like GPT/GPT-2)."attn": Not implemented now, use multi-head attention.
- summary_use_proj (
bool,optional, defaults toTrue) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.Whether or not to add a projection after the vector extraction.
- summary_activation (
str,optional) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.Pass
"tanh"for a tanh activation to the output, any other value will result in no activation. - summary_proj_to_labels (
bool,optional, defaults toTrue) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.Whether the projection outputs should have
config.num_labelsorconfig.hidden_sizeclasses. - summary_first_dropout (
float,optional, defaults to 0.1) —Argument used when doing sequence summary, used in the modelsOpenAIGPTDoubleHeadsModel andOpenAIGPTDoubleHeadsModel.The dropout ratio to be used after the projection and activation.
This is the configuration class to store the configuration of aOpenAIGPTModel or aTFOpenAIGPTModel. It isused to instantiate a GPT model according to the specified arguments, defining the model architecture.Instantiating a configuration with the defaults will yield a similar configuration to that of the GPTopenai-community/openai-gpt architecture from OpenAI.
Configuration objects inherit fromPreTrainedConfig and can be used to control the model outputs. Read thedocumentation fromPreTrainedConfig for more information.
Examples:
>>>from transformersimport OpenAIGPTConfig, OpenAIGPTModel>>># Initializing a GPT configuration>>>configuration = OpenAIGPTConfig()>>># Initializing a model (with random weights) from the configuration>>>model = OpenAIGPTModel(configuration)>>># Accessing the model configuration>>>configuration = model.config
OpenAIGPTModel
classtransformers.OpenAIGPTModel
<source>(config)
Parameters
- config (OpenAIGPTModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.
The bare Openai Model outputting raw hidden-states without any specific head on top.
This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)
This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.
forward
<source>(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.modeling_outputs.BaseModelOutput ortuple(torch.FloatTensor)
Parameters
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_idsindices into associated vectors than themodel’s internal embedding lookup matrix. - output_attentions (
bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returnedtensors for more detail. - output_hidden_states (
bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors formore detail. - return_dict (
bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
Returns
transformers.modeling_outputs.BaseModelOutput ortuple(torch.FloatTensor)
Atransformers.modeling_outputs.BaseModelOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.
last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size)) — Sequence of hidden-states at the output of the last layer of the model.hidden_states (
tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor),optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.
TheOpenAIGPTModel forward method, overrides the__call__ special method.
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.
OpenAIGPTLMHeadModel
classtransformers.OpenAIGPTLMHeadModel
<source>(config)
Parameters
- config (OpenAIGPTLMHeadModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.
OpenAI GPT Model transformer with a language modeling head on top (linear layer with weights tied to the inputembeddings).
This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)
This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.
forward
<source>(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonelogits_to_keep: typing.Union[int, torch.Tensor] = 0**kwargs)→transformers.modeling_outputs.CausalLMOutput ortuple(torch.FloatTensor)
Parameters
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_idsindices into associated vectors than themodel’s internal embedding lookup matrix. - labels (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Labels for language modeling. Note that the labelsare shifted inside the model, i.e. you can setlabels = input_idsIndices are selected in[-100, 0, ..., config.vocab_size]All labels set to-100are ignored (masked), the loss is only computed for labels in[0, ..., config.vocab_size] - output_attentions (
bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returnedtensors for more detail. - output_hidden_states (
bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors formore detail. - return_dict (
bool,optional) —Whether or not to return aModelOutput instead of a plain tuple. - logits_to_keep (
Union[int, torch.Tensor], defaults to0) —If anint, compute logits for the lastlogits_to_keeptokens. If0, calculate logits for allinput_ids(special case). Only last token logits are needed for generation, and calculating them only for thattoken can save memory, which becomes pretty significant for long sequences or large vocabulary size.If atorch.Tensor, must be 1D corresponding to the indices to keep in the sequence length dimension.This is useful when using packed tensor format (single dimension for batch and sequence length).
Returns
transformers.modeling_outputs.CausalLMOutput ortuple(torch.FloatTensor)
Atransformers.modeling_outputs.CausalLMOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.
loss (
torch.FloatTensorof shape(1,),optional, returned whenlabelsis provided) — Language modeling loss (for next-token prediction).logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor),optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.
TheOpenAIGPTLMHeadModel forward method, overrides the__call__ special method.
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.
Example:
>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTLMHeadModel>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTLMHeadModel.from_pretrained("openai-community/openai-gpt")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>outputs = model(**inputs, labels=inputs["input_ids"])>>>loss = outputs.loss>>>logits = outputs.logits
OpenAIGPTDoubleHeadsModel
classtransformers.OpenAIGPTDoubleHeadsModel
<source>(config)
Parameters
- config (OpenAIGPTDoubleHeadsModel) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.
OpenAI GPT Model transformer with a language modeling and a multiple-choice classification head on top e.g. forRocStories/SWAG tasks. The two heads are two linear layers. The language modeling head has its weights tied to theinput embeddings, the classification head takes as input the input of a specified classification token index in theinput sequence).
This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)
This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.
forward
<source>(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonemc_token_ids: typing.Optional[torch.LongTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Nonemc_labels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput ortuple(torch.FloatTensor)
Parameters
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_idsindices into associated vectors than themodel’s internal embedding lookup matrix. - mc_token_ids (
torch.LongTensorof shape(batch_size, num_choices),optional, default to index of the last token of the input) —Index of the classification token in each input sequence. Selected in the range[0, input_ids.size(-1) - 1]. - labels (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Labels for language modeling. Note that the labelsare shifted inside the model, i.e. you can setlabels = input_idsIndices are selected in[-1, 0, ..., config.vocab_size]All labels set to-100areignored (masked), the loss is only computed for labels in[0, ..., config.vocab_size] - mc_labels (
torch.LongTensorof shape(batch_size),optional) —Labels for computing the multiple choice classification loss. Indices should be in[0, ..., num_choices]wherenum_choices is the size of the second dimension of the input tensors. (seeinput_ids above) - output_attentions (
bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returnedtensors for more detail. - output_hidden_states (
bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors formore detail. - return_dict (
bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
Returns
transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput ortuple(torch.FloatTensor)
Atransformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.
loss (
torch.FloatTensorof shape(1,),optional, returned whenlabelsis provided) — Language modeling loss.mc_loss (
torch.FloatTensorof shape(1,),optional, returned whenmc_labelsis provided) — Multiple choice classification loss.logits (
torch.FloatTensorof shape(batch_size, num_choices, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).mc_logits (
torch.FloatTensorof shape(batch_size, num_choices)) — Prediction scores of the multiple choice classification head (scores for each choice before SoftMax).hidden_states (
tuple[torch.FloatTensor],optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple[torch.FloatTensor],optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.
TheOpenAIGPTDoubleHeadsModel forward method, overrides the__call__ special method.
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.
Examples:
>>>from transformersimport AutoTokenizer, OpenAIGPTDoubleHeadsModel>>>import torch>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTDoubleHeadsModel.from_pretrained("openai-community/openai-gpt")>>>tokenizer.add_special_tokens(... {"cls_token":"[CLS]"}...)# Add a [CLS] to the vocabulary (we should train it also!)>>>model.resize_token_embeddings(len(tokenizer))>>>choices = ["Hello, my dog is cute [CLS]","Hello, my cat is cute [CLS]"]>>>input_ids = torch.tensor([tokenizer.encode(s)for sin choices]).unsqueeze(0)# Batch size 1, 2 choices>>>mc_token_ids = torch.tensor([input_ids.size(-1) -1, input_ids.size(-1) -1]).unsqueeze(0)# Batch size 1>>>outputs = model(input_ids, mc_token_ids=mc_token_ids)>>>lm_logits = outputs.logits>>>mc_logits = outputs.mc_logits
OpenAIGPTForSequenceClassification
classtransformers.OpenAIGPTForSequenceClassification
<source>(config)
Parameters
- config (OpenAIGPTForSequenceClassification) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.
The Original OpenAI GPT Model transformer with a sequence classification head on top (linear layer).OpenAIGPTForSequenceClassification uses the last token in order to do the classification, as other causalmodels (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of thelast token. If apad_token_id is defined in the configuration, it finds the last token that is not a paddingtoken in each row. If nopad_token_id is defined, it simply takes the last value in each row of the batch. Sinceit cannot guess the padding tokens wheninputs_embeds are passed instead ofinput_ids, it does the same (takethe last value in each row of the batch).
This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)
This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.
forward
<source>(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.FloatTensor] = Nonetoken_type_ids: typing.Optional[torch.LongTensor] = Noneposition_ids: typing.Optional[torch.LongTensor] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = None)→transformers.modeling_outputs.SequenceClassifierOutput ortuple(torch.FloatTensor)
Parameters
- input_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.
- attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:- 1 for tokens that arenot masked,
- 0 for tokens that aremasked.
- token_type_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]:- 0 corresponds to asentence A token,
- 1 corresponds to asentence B token.
- position_ids (
torch.LongTensorof shape(batch_size, sequence_length),optional) —Indices of positions of each input sequence tokens in the position embeddings. Selected in the range[0, config.n_positions - 1]. - inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_idsindices into associated vectors than themodel’s internal embedding lookup matrix. - labels (
torch.LongTensorof shape(batch_size,),optional) —Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]. Ifconfig.num_labels == 1a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1a classification loss is computed (Cross-Entropy). - output_attentions (
bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentionsunder returnedtensors for more detail. - output_hidden_states (
bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_statesunder returned tensors formore detail. - return_dict (
bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
Returns
transformers.modeling_outputs.SequenceClassifierOutput ortuple(torch.FloatTensor)
Atransformers.modeling_outputs.SequenceClassifierOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (OpenAIGPTConfig) and inputs.
loss (
torch.FloatTensorof shape(1,),optional, returned whenlabelsis provided) — Classification (or regression if config.num_labels==1) loss.logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) — Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor),optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.
TheOpenAIGPTForSequenceClassification forward method, overrides the__call__ special method.
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.
Example of single-label classification:
>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():... logits = model(**inputs).logits>>>predicted_class_id = logits.argmax().item()>>>model.config.id2label[predicted_class_id]...>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt", num_labels=num_labels)>>>labels = torch.tensor([1])>>>loss = model(**inputs, labels=labels).loss>>>round(loss.item(),2)...
Example of multi-label classification:
>>>import torch>>>from transformersimport AutoTokenizer, OpenAIGPTForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("openai-community/openai-gpt")>>>model = OpenAIGPTForSequenceClassification.from_pretrained("openai-community/openai-gpt", problem_type="multi_label_classification")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():... logits = model(**inputs).logits>>>predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) >0.5]>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = OpenAIGPTForSequenceClassification.from_pretrained(..."openai-community/openai-gpt", num_labels=num_labels, problem_type="multi_label_classification"...)>>>labels = torch.sum(... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1...).to(torch.float)>>>loss = model(**inputs, labels=labels).loss
OpenAIGPTTokenizer
classtransformers.OpenAIGPTTokenizer
<source>(vocab_filemerges_fileunk_token = '<unk>'**kwargs)
Construct a GPT Tokenizer. Based on Byte-Pair-Encoding with the following peculiarities:
- lowercases all inputs,
- uses
SpaCytokenizer andftfyfor pre-BPE tokenization if they are installed, fallback to BERT’sBasicTokenizerif not.
This tokenizer inherits fromPreTrainedTokenizer which contains most of the main methods. Users should refer tothis superclass for more information regarding those methods.
convert_tokens_to_string
<source>(tokens)
Converts a sequence of tokens (string) in a single string.
OpenAIGPTTokenizerFast
classtransformers.OpenAIGPTTokenizerFast
<source>(vocab_file = Nonemerges_file = Nonetokenizer_file = Noneunk_token = '<unk>'**kwargs)
Construct a “fast” GPT Tokenizer (backed by HuggingFace’stokenizers library). Based on Byte-Pair-Encoding withthe following peculiarities:
- lower case all inputs
- uses BERT’s BasicTokenizer for pre-BPE tokenization
This tokenizer inherits fromPreTrainedTokenizerFast which contains most of the main methods. Users shouldrefer to this superclass for more information regarding those methods.