Movatterモバイル変換


[0]ホーム

URL:


Hugging Face's logoHugging Face

Transformers documentation

PLBart

Transformers

You are viewingmain version, which requiresinstallation from source. If you'd likeregular pip install, checkout the latest stable version (v4.57.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces
Faster examples with accelerated inference
Switch between documentation themes

to get started

This model was released on 2021-03-10 and added to Hugging Face Transformers on 2022-02-18.

PLBart

PyTorchFlashAttentionSDPA

Overview

The PLBART model was proposed inUnified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained modelplbart-base has been trained using multilingual denoising taskon Java, Python and English.

According to the abstract

Code summarization and generation empower conversion between programming language (PL) and natural language (NL),while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART,a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks.PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding.Experiments on code summarization in the English language, code generation, and code translation in seven programming languagesshow that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., programrepair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding.Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow(e.g., if block inside an else block is equivalent to else if block) that are crucial to program semantics and thus excelseven with limited annotations.

This model was contributed bygchhablani. The Authors’ code can be foundhere.

Usage examples

PLBart is a multilingual encoder-decoder (sequence-to-sequence) model primarily intended for code-to-text, text-to-code, code-to-code tasks. As themodel is multilingual it expects the sequences in a different format. A special language id token is added in both thesource and target text. The source text format isX [eos, src_lang_code] whereX is the source text. Thetarget text format is[tgt_lang_code] X [eos].bos is never used.

However, for fine-tuning, in some cases no language token is provided in cases where a single language is used. Please refer tothe paper to learn more about this.

In cases where the language code is needed, the regularcall() will encode source text formatwhen you pass texts as the first argument or with the keyword argumenttext, and will encode target text format ifit’s passed with thetext_target keyword argument.

Supervised training

>>>from transformersimport PLBartForConditionalGeneration, PLBartTokenizer>>>tokenizer = PLBartTokenizer.from_pretrained("uclanlp/plbart-base", src_lang="en_XX", tgt_lang="python")>>>example_python_phrase ="def maximum(a,b,c):NEW_LINE_INDENTreturn max([a,b,c])">>>expected_translation_english ="Returns the maximum value of a b c.">>>inputs = tokenizer(example_python_phrase, text_target=expected_translation_english, return_tensors="pt")>>>model(**inputs)

Generation

While generating the target text set thedecoder_start_token_id to the target language id. The followingexample shows how to translate Python to English using theuclanlp/plbart-python-en_XX model.

>>>from transformersimport PLBartForConditionalGeneration, PLBartTokenizer>>>tokenizer = PLBartTokenizer.from_pretrained("uclanlp/plbart-python-en_XX", src_lang="python", tgt_lang="en_XX")>>>example_python_phrase ="def maximum(a,b,c):NEW_LINE_INDENTreturn max([a,b,c])">>>inputs = tokenizer(example_python_phrase, return_tensors="pt")>>>model = PLBartForConditionalGeneration.from_pretrained("uclanlp/plbart-python-en_XX")>>>translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["en_XX"])>>>tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]"Returns the maximum value of a b c."

Resources

PLBartConfig

classtransformers.PLBartConfig

<source>

(vocab_size = 50005max_position_embeddings = 1024encoder_layers = 6encoder_ffn_dim = 3072encoder_attention_heads = 12decoder_layers = 6decoder_ffn_dim = 3072decoder_attention_heads = 12encoder_layerdrop = 0.0decoder_layerdrop = 0.0use_cache = Trueis_encoder_decoder = Trueactivation_function = 'gelu'd_model = 768dropout = 0.1attention_dropout = 0.1activation_dropout = 0.0init_std = 0.02classifier_dropout = 0.0scale_embedding = Truepad_token_id = 1bos_token_id = 0eos_token_id = 2forced_eos_token_id = 2**kwargs)

Parameters

  • vocab_size (int,optional, defaults to 50005) —Vocabulary size of the PLBART model. Defines the number of different tokens that can be represented by theinputs_ids passed when callingPLBartModel.
  • d_model (int,optional, defaults to 768) —Dimensionality of the layers and the pooler layer.
  • encoder_layers (int,optional, defaults to 6) —Number of encoder layers.
  • decoder_layers (int,optional, defaults to 6) —Number of decoder layers.
  • encoder_attention_heads (int,optional, defaults to 12) —Number of attention heads for each attention layer in the Transformer encoder.
  • decoder_attention_heads (int,optional, defaults to 12) —Number of attention heads for each attention layer in the Transformer decoder.
  • decoder_ffn_dim (int,optional, defaults to 3072) —Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
  • encoder_ffn_dim (int,optional, defaults to 3072) —Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
  • activation_function (str orfunction,optional, defaults to"gelu") —The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu","relu","silu" and"gelu_new" are supported.
  • dropout (float,optional, defaults to 0.1) —The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
  • attention_dropout (float,optional, defaults to 0.1) —The dropout ratio for the attention probabilities.
  • activation_dropout (float,optional, defaults to 0.0) —The dropout ratio for activations inside the fully connected layer.
  • classifier_dropout (float,optional, defaults to 0.0) —The dropout ratio for classifier.
  • max_position_embeddings (int,optional, defaults to 1024) —The maximum sequence length that this model might ever be used with. Typically set this to something largejust in case (e.g., 512 or 1024 or 2048).
  • init_std (float,optional, defaults to 0.02) —The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
  • encoder_layerdrop (float,optional, defaults to 0.0) —The LayerDrop probability for the encoder. See the [LayerDrop paper](seehttps://huggingface.co/papers/1909.11556)for more details.
  • decoder_layerdrop (float,optional, defaults to 0.0) —The LayerDrop probability for the decoder. See the [LayerDrop paper](seehttps://huggingface.co/papers/1909.11556)for more details.
  • scale_embedding (bool,optional, defaults toTrue) —Scale embeddings by diving by sqrt(d_model).
  • use_cache (bool,optional, defaults toTrue) —Whether or not the model should return the last key/values attentions (not used by all models)
  • forced_eos_token_id (int,optional, defaults to 2) —The id of the token to force as the last generated token whenmax_length is reached. Usually set toeos_token_id.

This is the configuration class to store the configuration of aPLBartModel. It is used to instantiate anPLBART model according to the specified arguments, defining the model architecture. Instantiating a configurationwith the defaults will yield a similar configuration to that of the PLBARTuclanlp/plbart-base architecture.

Configuration objects inherit fromPreTrainedConfig and can be used to control the model outputs. Read thedocumentation fromPreTrainedConfig for more information.

Example:

>>>from transformersimport PLBartConfig, PLBartModel>>># Initializing a PLBART uclanlp/plbart-base style configuration>>>configuration = PLBartConfig()>>># Initializing a model (with random weights) from the uclanlp/plbart-base style configuration>>>model = PLBartModel(configuration)>>># Accessing the model configuration>>>configuration = model.config

PLBartTokenizer

classtransformers.PLBartTokenizer

<source>

(vocab_filebos_token = '<s>'eos_token = '</s>'sep_token = '</s>'cls_token = '<s>'unk_token = '<unk>'pad_token = '<pad>'mask_token = '<mask>'language_codes = 'base'tokenizer_file = Nonesrc_lang = Nonetgt_lang = Nonesp_model_kwargs: typing.Optional[dict[str, typing.Any]] = Noneadditional_special_tokens = Noneclean_up_tokenization_spaces = True**kwargs)

Parameters

  • vocab_file (str) —Path to the vocabulary file.
  • src_lang (str,optional) —A string representing the source language.
  • tgt_lang (str,optional) —A string representing the target language.
  • bos_token (str,optional, defaults to"<s>") —The start of sequence token.
  • eos_token (str,optional, defaults to"</s>") —The end of sequence token.
  • sep_token (str,optional, defaults to"</s>") —The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences forsequence classification or for a text and a question for question answering. It is also used as the lasttoken of a sequence built with special tokens.
  • cls_token (str,optional, defaults to"<s>") —The cls token, which is a special token used as the first token for all tasks.
  • unk_token (str,optional, defaults to"<unk>") —The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be thistoken instead.
  • pad_token (str,optional, defaults to"<pad>") —The token used for padding, for example when batching sequences of different lengths.
  • mask_token(str,optional, defaults to"<mask>") —The token used for masking values. This is the token used when training this model with masking tasks. Thisis only used in the"base" tokenizer type. For"multi" tokenizer, masking is never done for thedownstream tasks.
  • language_codes (str,optional, defaults to"base") —What language codes to use. Should be one of"base" or"multi".
  • sp_model_kwargs (dict,optional) —Will be passed to theSentencePieceProcessor.__init__() method. ThePython wrapper forSentencePiece can be used, among other things,to set:
    • enable_sampling: Enable subword regularization.
    • nbest_size: Sampling parameters for unigram. Invalid for BPE-Dropout.
      • nbest_size = {0,1}: No sampling is performed.
      • nbest_size > 1: samples from the nbest_size results.
      • nbest_size < 0: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)using forward-filtering-and-backward-sampling algorithm.
    • alpha: Smoothing parameter for unigram sampling, and dropout probability of merge operations forBPE-dropout.

Construct an PLBART tokenizer.

Adapted fromRobertaTokenizer andXLNetTokenizer. Based onSentencePiece.

The tokenization method is<tokens> <eos> <language code> for source language documents, and `<language code>

<tokens> <eos>` for target language documents.

Examples:

>>>from transformersimport PLBartTokenizer>>>tokenizer = PLBartTokenizer.from_pretrained("uclanlp/plbart-python-en_XX", src_lang="python", tgt_lang="en_XX")>>>example_python_phrase ="def maximum(a,b,c):NEW_LINE_INDENTreturn max([a,b,c])">>>expected_translation_english ="Returns the maximum value of a b c.">>>inputs = tokenizer(example_python_phrase, text_target=expected_translation_english, return_tensors="pt")

build_inputs_with_special_tokens

<source>

(token_ids_0: listtoken_ids_1: typing.Optional[list[int]] = None)list[int]

Parameters

  • token_ids_0 (list[int]) —List of IDs to which the special tokens will be added.
  • token_ids_1 (list[int],optional) —Optional second list of IDs for sequence pairs.

Returns

list[int]

List ofinput IDs with the appropriate special tokens.

Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating andadding special tokens. An PLBART sequence has the following format, whereX represents the sequence:

  • input_ids (for encoder)X [eos, src_lang_code]
  • decoder_input_ids: (for decoder)X [eos, tgt_lang_code]

BOS is never used. Pairs of sequences are not the expected use case, but they will be handled without aseparator.

PLBartModel

classtransformers.PLBartModel

<source>

(config: PLBartConfig)

Parameters

  • config (PLBartConfig) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

The bare Plbart Model outputting raw hidden-states without any specific head on top.

This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

<source>

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.LongTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Nonedecoder_attention_mask: typing.Optional[torch.Tensor] = Noneencoder_outputs: typing.Optional[list[torch.FloatTensor]] = Nonepast_key_values: typing.Optional[transformers.cache_utils.Cache] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonedecoder_inputs_embeds: typing.Optional[torch.FloatTensor] = Noneuse_cache: typing.Optional[bool] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonecache_position: typing.Optional[torch.LongTensor] = None)transformers.modeling_outputs.Seq2SeqModelOutput ortuple(torch.FloatTensor)

Parameters

  • input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.

    Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are input IDs?

  • attention_mask (torch.LongTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:

    • 1 for tokens that arenot masked,
    • 0 for tokens that aremasked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape(batch_size, target_sequence_length),optional) —Indices of decoder input sequence tokens in the vocabulary.

    Indices can be obtained usingAutoTokenizer orPLBartMultiTokenizer depending on the checkpoint.SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are decoder input IDs?

    PLBart uses a specific language id token as the starting token fordecoder_input_ids generation thatvaries according to source and target language,e.g. 50003 foren_XX, and 50001 forjava. Ifpast_key_values is used, optionally only the lastdecoder_input_ids have to be input (seepast_key_values).

    For translation and summarization training,decoder_input_ids should be provided. If nodecoder_input_ids is provided, the model will create this tensor by shifting theinput_ids to the rightfor denoising pre-training following the paper.

  • decoder_attention_mask (` —
  • ` obj —torch.LongTensor of shape(batch_size, target_sequence_length),optional):Default behavior:generate a tensor that ignores pad tokens indecoder_input_ids. Causal mask will also be used by default.
  • encoder_outputs (list[torch.FloatTensor],optional) —Tuple consists of (last_hidden_state,optional:hidden_states,optional:attentions)last_hidden_state of shape(batch_size, sequence_length, hidden_size),optional) is a sequence ofhidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
  • past_key_values (~cache_utils.Cache,optional) —Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used to speed up sequential decoding. This typically consists in thepast_key_valuesreturned by the model at a previous stage of decoding, whenuse_cache=True orconfig.use_cache=True.

    OnlyCache instance is allowed as input, see ourkv cache guide.If nopast_key_values are passed,DynamicCache will be initialized by default.

    The model will output the same cache format that is fed as input.

    Ifpast_key_values are used, the user is expected to input only unprocessedinput_ids (those that don’thave their past key value states given to this model) of shape(batch_size, unprocessed_length) instead of allinput_idsof shape(batch_size, sequence_length).

  • inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
  • decoder_inputs_embeds (torch.FloatTensor of shape(batch_size, target_sequence_length, hidden_size),optional) —Optionally, instead of passingdecoder_input_ids you can choose to directly pass an embeddedrepresentation. Ifpast_key_values is used, optionally only the lastdecoder_inputs_embeds have to beinput (seepast_key_values). This is useful if you want more control over how to convertdecoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

    Ifdecoder_input_ids anddecoder_inputs_embeds are both unset,decoder_inputs_embeds takes the valueofinputs_embeds.

  • use_cache (bool,optional) —If set toTrue,past_key_values key value states are returned and can be used to speed up decoding (seepast_key_values).
  • output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
  • output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
  • return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
  • cache_position (torch.LongTensor of shape(sequence_length),optional) —Indices depicting the position of the input sequence tokens in the sequence. Contrarily toposition_ids,this tensor is not affected by padding. It is used to update the cache in the correct position and to inferthe complete sequence length.

Returns

transformers.modeling_outputs.Seq2SeqModelOutput ortuple(torch.FloatTensor)

Atransformers.modeling_outputs.Seq2SeqModelOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (PLBartConfig) and inputs.

  • last_hidden_state (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size)) — Sequence of hidden-states at the output of the last layer of the decoder of the model.

    Ifpast_key_values is used only the last hidden-state of the sequences of shape(batch_size, 1, hidden_size) is output.

  • past_key_values (EncoderDecoderCache,optional, returned whenuse_cache=True is passed or whenconfig.use_cache=True) — It is aEncoderDecoderCache instance. For more details, see ourkv cache guide.

    Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used (seepast_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

  • cross_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute theweighted average in the cross-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

ThePLBartModel forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

PLBartForConditionalGeneration

classtransformers.PLBartForConditionalGeneration

<source>

(config: PLBartConfig)

Parameters

  • config (PLBartConfig) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

The PLBART Model with a language modeling head. Can be used for code-to-text, text-to-code and code-to-code.

This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

<source>

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.LongTensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Nonedecoder_attention_mask: typing.Optional[torch.Tensor] = Noneencoder_outputs: typing.Optional[list[torch.FloatTensor]] = Nonepast_key_values: typing.Optional[transformers.cache_utils.Cache] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonedecoder_inputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.Tensor] = Noneuse_cache: typing.Optional[bool] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonecache_position: typing.Optional[torch.LongTensor] = None)transformers.modeling_outputs.Seq2SeqLMOutput ortuple(torch.FloatTensor)

Parameters

  • input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.

    Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are input IDs?

  • attention_mask (torch.LongTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:

    • 1 for tokens that arenot masked,
    • 0 for tokens that aremasked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape(batch_size, target_sequence_length),optional) —Indices of decoder input sequence tokens in the vocabulary.

    Indices can be obtained usingAutoTokenizer orPLBartMultiTokenizer depending on the checkpoint.SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are decoder input IDs?

    PLBart uses a specific language id token as the starting token fordecoder_input_ids generation thatvaries according to source and target language,e.g. 50003 foren_XX, and 50001 forjava. Ifpast_key_values is used, optionally only the lastdecoder_input_ids have to be input (seepast_key_values).

    For translation and summarization training,decoder_input_ids should be provided. If nodecoder_input_ids is provided, the model will create this tensor by shifting theinput_ids to the rightfor denoising pre-training following the paper.

  • decoder_attention_mask (` —
  • ` obj —torch.LongTensor of shape(batch_size, target_sequence_length),optional):Default behavior:generate a tensor that ignores pad tokens indecoder_input_ids. Causal mask will also be used by default.
  • encoder_outputs (list[torch.FloatTensor],optional) —Tuple consists of (last_hidden_state,optional:hidden_states,optional:attentions)last_hidden_state of shape(batch_size, sequence_length, hidden_size),optional) is a sequence ofhidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
  • past_key_values (~cache_utils.Cache,optional) —Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used to speed up sequential decoding. This typically consists in thepast_key_valuesreturned by the model at a previous stage of decoding, whenuse_cache=True orconfig.use_cache=True.

    OnlyCache instance is allowed as input, see ourkv cache guide.If nopast_key_values are passed,DynamicCache will be initialized by default.

    The model will output the same cache format that is fed as input.

    Ifpast_key_values are used, the user is expected to input only unprocessedinput_ids (those that don’thave their past key value states given to this model) of shape(batch_size, unprocessed_length) instead of allinput_idsof shape(batch_size, sequence_length).

  • inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
  • decoder_inputs_embeds (torch.FloatTensor of shape(batch_size, target_sequence_length, hidden_size),optional) —Optionally, instead of passingdecoder_input_ids you can choose to directly pass an embeddedrepresentation. Ifpast_key_values is used, optionally only the lastdecoder_inputs_embeds have to beinput (seepast_key_values). This is useful if you want more control over how to convertdecoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

    Ifdecoder_input_ids anddecoder_inputs_embeds are both unset,decoder_inputs_embeds takes the valueofinputs_embeds.

  • labels (torch.LongTensor of shape(batch_size, sequence_length),optional) —Labels for computing the masked language modeling loss. Indices should either be in[0, ..., config.vocab_size] or -100 (seeinput_ids docstring). Tokens with indices set to-100 are ignored(masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size].
  • use_cache (bool,optional) —If set toTrue,past_key_values key value states are returned and can be used to speed up decoding (seepast_key_values).
  • output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
  • output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
  • return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
  • cache_position (torch.LongTensor of shape(sequence_length),optional) —Indices depicting the position of the input sequence tokens in the sequence. Contrarily toposition_ids,this tensor is not affected by padding. It is used to update the cache in the correct position and to inferthe complete sequence length.

Returns

transformers.modeling_outputs.Seq2SeqLMOutput ortuple(torch.FloatTensor)

Atransformers.modeling_outputs.Seq2SeqLMOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (PLBartConfig) and inputs.

  • loss (torch.FloatTensor of shape(1,),optional, returned whenlabels is provided) — Language modeling loss.

  • logits (torch.FloatTensor of shape(batch_size, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

  • past_key_values (EncoderDecoderCache,optional, returned whenuse_cache=True is passed or whenconfig.use_cache=True) — It is aEncoderDecoderCache instance. For more details, see ourkv cache guide.

    Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used (seepast_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

  • cross_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute theweighted average in the cross-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

ThePLBartForConditionalGeneration forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Example Mask-filling:

>>>from transformersimport AutoTokenizer, PLBartForConditionalGeneration>>>model = PLBartForConditionalGeneration.from_pretrained("uclanlp/plbart-base")>>>tokenizer = AutoTokenizer.from_pretrained("uclanlp/plbart-base")>>># en_XX is the language symbol id <LID> for English>>>TXT ="<s> Is 0 the <mask> Fibonacci number ? </s> en_XX">>>input_ids = tokenizer([TXT], add_special_tokens=False, return_tensors="pt").input_ids>>>logits = model(input_ids).logits>>>masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()>>>probs = logits[0, masked_index].softmax(dim=0)>>>values, predictions = probs.topk(5)>>>tokenizer.decode(predictions).split()['first','same','highest','result','number']

PLBartForSequenceClassification

classtransformers.PLBartForSequenceClassification

<source>

(config: PLBartConfig**kwargs)

Parameters

  • config (PLBartConfig) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

PLBart model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g.for GLUE tasks.

This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

<source>

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.Tensor] = Nonedecoder_input_ids: typing.Optional[torch.LongTensor] = Nonedecoder_attention_mask: typing.Optional[torch.LongTensor] = Noneencoder_outputs: typing.Optional[list[torch.FloatTensor]] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonedecoder_inputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneuse_cache: typing.Optional[bool] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonecache_position: typing.Optional[torch.LongTensor] = None)transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput ortuple(torch.FloatTensor)

Parameters

  • input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.

    Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:

    • 1 for tokens that arenot masked,
    • 0 for tokens that aremasked.

    What are attention masks?

  • decoder_input_ids (torch.LongTensor of shape(batch_size, target_sequence_length),optional) —Indices of decoder input sequence tokens in the vocabulary.

    Indices can be obtained usingAutoTokenizer orPLBartMultiTokenizer depending on the checkpoint.SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are decoder input IDs?

    PLBart uses a specific language id token as the starting token fordecoder_input_ids generation thatvaries according to source and target language,e.g. 50003 foren_XX, and 50001 forjava. Ifpast_key_values is used, optionally only the lastdecoder_input_ids have to be input (seepast_key_values).

    For translation and summarization training,decoder_input_ids should be provided. If nodecoder_input_ids is provided, the model will create this tensor by shifting theinput_ids to the rightfor denoising pre-training following the paper.

  • decoder_attention_mask (` —
  • ` obj —torch.LongTensor of shape(batch_size, target_sequence_length),optional):Default behavior:generate a tensor that ignores pad tokens indecoder_input_ids. Causal mask will also be used by default.
  • encoder_outputs (list[torch.FloatTensor],optional) —Tuple consists of (last_hidden_state,optional:hidden_states,optional:attentions)last_hidden_state of shape(batch_size, sequence_length, hidden_size),optional) is a sequence ofhidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
  • inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
  • decoder_inputs_embeds (torch.FloatTensor of shape(batch_size, target_sequence_length, hidden_size),optional) —Optionally, instead of passingdecoder_input_ids you can choose to directly pass an embeddedrepresentation. Ifpast_key_values is used, optionally only the lastdecoder_inputs_embeds have to beinput (seepast_key_values). This is useful if you want more control over how to convertdecoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

    Ifdecoder_input_ids anddecoder_inputs_embeds are both unset,decoder_inputs_embeds takes the valueofinputs_embeds.

  • labels (torch.LongTensor of shape(batch_size,),optional) —Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]. Ifconfig.num_labels > 1 a classification loss is computed (Cross-Entropy).
  • use_cache (bool,optional) —If set toTrue,past_key_values key value states are returned and can be used to speed up decoding (seepast_key_values).
  • output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
  • output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
  • return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
  • cache_position (torch.LongTensor of shape(sequence_length),optional) —Indices depicting the position of the input sequence tokens in the sequence. Contrarily toposition_ids,this tensor is not affected by padding. It is used to update the cache in the correct position and to inferthe complete sequence length.

Atransformers.modeling_outputs.Seq2SeqSequenceClassifierOutput or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (PLBartConfig) and inputs.

  • loss (torch.FloatTensor of shape(1,),optional, returned whenlabel is provided) — Classification (or regression if config.num_labels==1) loss.

  • logits (torch.FloatTensor of shape(batch_size, config.num_labels)) — Classification (or regression if config.num_labels==1) scores (before SoftMax).

  • past_key_values (EncoderDecoderCache,optional, returned whenuse_cache=True is passed or whenconfig.use_cache=True) — It is aEncoderDecoderCache instance. For more details, see ourkv cache guide.

    Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used (seepast_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

  • cross_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder’s cross-attention layer, after the attention softmax, used to compute theweighted average in the cross-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) — Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in theself-attention heads.

ThePLBartForSequenceClassification forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Example of single-label classification:

>>>import torch>>>from transformersimport AutoTokenizer, PLBartForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("uclanlp/plbart-base")>>>model = PLBartForSequenceClassification.from_pretrained("uclanlp/plbart-base")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():...    logits = model(**inputs).logits>>>predicted_class_id = logits.argmax().item()>>>model.config.id2label[predicted_class_id]...>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = PLBartForSequenceClassification.from_pretrained("uclanlp/plbart-base", num_labels=num_labels)>>>labels = torch.tensor([1])>>>loss = model(**inputs, labels=labels).loss>>>round(loss.item(),2)...

Example of multi-label classification:

>>>import torch>>>from transformersimport AutoTokenizer, PLBartForSequenceClassification>>>tokenizer = AutoTokenizer.from_pretrained("uclanlp/plbart-base")>>>model = PLBartForSequenceClassification.from_pretrained("uclanlp/plbart-base", problem_type="multi_label_classification")>>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>with torch.no_grad():...    logits = model(**inputs).logits>>>predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) >0.5]>>># To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`>>>num_labels =len(model.config.id2label)>>>model = PLBartForSequenceClassification.from_pretrained(..."uclanlp/plbart-base", num_labels=num_labels, problem_type="multi_label_classification"...)>>>labels = torch.sum(...    torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1...).to(torch.float)>>>loss = model(**inputs, labels=labels).loss

PLBartForCausalLM

classtransformers.PLBartForCausalLM

<source>

(config)

Parameters

  • config (PLBartForCausalLM) —Model configuration class with all the parameters of the model. Initializing with a config file does notload the weights associated with the model, only the configuration. Check out thefrom_pretrained() method to load the model weights.

PLBART decoder with a language modeling head on top (linear layer with weights tied to the input embeddings).

This model inherits fromPreTrainedModel. Check the superclass documentation for the generic methods thelibrary implements for all its model (such as downloading or saving, resizing the input embeddings, pruning headsetc.)

This model is also a PyTorchtorch.nn.Module subclass.Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usageand behavior.

forward

<source>

(input_ids: typing.Optional[torch.LongTensor] = Noneattention_mask: typing.Optional[torch.Tensor] = Noneencoder_hidden_states: typing.Optional[torch.FloatTensor] = Noneencoder_attention_mask: typing.Optional[torch.FloatTensor] = Nonepast_key_values: typing.Optional[transformers.cache_utils.Cache] = Noneinputs_embeds: typing.Optional[torch.FloatTensor] = Nonelabels: typing.Optional[torch.LongTensor] = Noneuse_cache: typing.Optional[bool] = Noneoutput_attentions: typing.Optional[bool] = Noneoutput_hidden_states: typing.Optional[bool] = Nonereturn_dict: typing.Optional[bool] = Nonecache_position: typing.Optional[torch.LongTensor] = Nonelogits_to_keep: typing.Union[int, torch.Tensor] = 0)transformers.modeling_outputs.CausalLMOutputWithCrossAttentions ortuple(torch.FloatTensor)

Parameters

  • input_ids (torch.LongTensor of shape(batch_size, sequence_length),optional) —Indices of input sequence tokens in the vocabulary. Padding will be ignored by default.

    Indices can be obtained usingAutoTokenizer. SeePreTrainedTokenizer.encode() andPreTrainedTokenizer.call() for details.

    What are input IDs?

  • attention_mask (torch.Tensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]:

    • 1 for tokens that arenot masked,
    • 0 for tokens that aremasked.

    What are attention masks?

  • encoder_hidden_states (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attentionif the model is configured as a decoder.
  • encoder_attention_mask (torch.FloatTensor of shape(batch_size, sequence_length),optional) —Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used inthe cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]:

    • 1 for tokens that arenot masked,
    • 0 for tokens that aremasked.
  • past_key_values (~cache_utils.Cache,optional) —Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attentionblocks) that can be used to speed up sequential decoding. This typically consists in thepast_key_valuesreturned by the model at a previous stage of decoding, whenuse_cache=True orconfig.use_cache=True.

    OnlyCache instance is allowed as input, see ourkv cache guide.If nopast_key_values are passed,DynamicCache will be initialized by default.

    The model will output the same cache format that is fed as input.

    Ifpast_key_values are used, the user is expected to input only unprocessedinput_ids (those that don’thave their past key value states given to this model) of shape(batch_size, unprocessed_length) instead of allinput_idsof shape(batch_size, sequence_length).

  • inputs_embeds (torch.FloatTensor of shape(batch_size, sequence_length, hidden_size),optional) —Optionally, instead of passinginput_ids you can choose to directly pass an embedded representation. Thisis useful if you want more control over how to convertinput_ids indices into associated vectors than themodel’s internal embedding lookup matrix.
  • labels (torch.LongTensor of shape(batch_size, sequence_length),optional) —Labels for computing the masked language modeling loss. Indices should either be in[0, ..., config.vocab_size] or -100 (seeinput_ids docstring). Tokens with indices set to-100 are ignored(masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size].
  • use_cache (bool,optional) —If set toTrue,past_key_values key value states are returned and can be used to speed up decoding (seepast_key_values).
  • output_attentions (bool,optional) —Whether or not to return the attentions tensors of all attention layers. Seeattentions under returnedtensors for more detail.
  • output_hidden_states (bool,optional) —Whether or not to return the hidden states of all layers. Seehidden_states under returned tensors formore detail.
  • return_dict (bool,optional) —Whether or not to return aModelOutput instead of a plain tuple.
  • cache_position (torch.LongTensor of shape(sequence_length),optional) —Indices depicting the position of the input sequence tokens in the sequence. Contrarily toposition_ids,this tensor is not affected by padding. It is used to update the cache in the correct position and to inferthe complete sequence length.
  • logits_to_keep (Union[int, torch.Tensor], defaults to0) —If anint, compute logits for the lastlogits_to_keep tokens. If0, calculate logits for allinput_ids (special case). Only last token logits are needed for generation, and calculating them only for thattoken can save memory, which becomes pretty significant for long sequences or large vocabulary size.If atorch.Tensor, must be 1D corresponding to the indices to keep in the sequence length dimension.This is useful when using packed tensor format (single dimension for batch and sequence length).

Atransformers.modeling_outputs.CausalLMOutputWithCrossAttentions or a tuple oftorch.FloatTensor (ifreturn_dict=False is passed or whenconfig.return_dict=False) comprising variouselements depending on the configuration (PLBartConfig) and inputs.

  • loss (torch.FloatTensor of shape(1,),optional, returned whenlabels is provided) — Language modeling loss (for next-token prediction).

  • logits (torch.FloatTensor of shape(batch_size, sequence_length, config.vocab_size)) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).

  • hidden_states (tuple(torch.FloatTensor),optional, returned whenoutput_hidden_states=True is passed or whenconfig.output_hidden_states=True) — Tuple oftorch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, +one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).

    Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

  • attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights after the attention softmax, used to compute the weighted average in the self-attentionheads.

  • cross_attentions (tuple(torch.FloatTensor),optional, returned whenoutput_attentions=True is passed or whenconfig.output_attentions=True) — Tuple oftorch.FloatTensor (one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).

    Cross attentions weights after the attention softmax, used to compute the weighted average in thecross-attention heads.

  • past_key_values (Cache,optional, returned whenuse_cache=True is passed or whenconfig.use_cache=True) — It is aCache instance. For more details, see ourkv cache guide.

    Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (seepast_key_values input) to speed up sequential decoding.

ThePLBartForCausalLM forward method, overrides the__call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps whilethe latter silently ignores them.

Example:

>>>from transformersimport AutoTokenizer, PLBartForCausalLM>>>tokenizer = AutoTokenizer.from_pretrained("uclanlp/plbart-base")>>>model = PLBartForCausalLM.from_pretrained("uclanlp/plbart-base", add_cross_attention=False)>>>assert model.config.is_decoder,f"{model.__class__} has to be configured as a decoder.">>>inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")>>>outputs = model(**inputs)>>>logits = outputs.logits>>>expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]>>>list(logits.shape) == expected_shapeTrue
Update on GitHub


[8]ページ先頭

©2009-2025 Movatter.jp