Machine Learning Glossary: Generative AI

  • Generative AI encompasses models that produce original content like text and images, using techniques like pre-training and fine-tuning to achieve this.

  • Prompt engineering is crucial for guiding LLMs, involving crafting effective prompts to elicit desired outputs, with methods like zero-shot, one-shot, and few-shot prompting.

  • Fine-tuning techniques, including distillation, instruction tuning, and parameter-efficient methods like LoRA, adapt pre-trained models for specific tasks while optimizing performance and resource usage.

  • LLMs utilize context windows to process information, employ contextualized language embeddings for nuanced understanding, and can be integrated into systems like RAG and model cascading for enhanced capabilities.

  • Prompt-based learning allows LLMs to adapt to tasks without extensive retraining, leveraging their knowledge base to respond to various instructions and inputs.

This page contains Generative AI glossary terms. For all glossary terms,click here.

A

adaptation

#generativeAI

Synonym for tuning orfine-tuning.

agent

#generativeAI

Software that can reason about multimodal user inputs in order to plan andexecute actions on behalf of the user.

Inreinforcement learning,an agent is the entity that uses apolicy to maximize the expectedreturn gained fromtransitioning betweenstates of theenvironment.

agentic

#generativeAI

The adjective form ofagent. Agentic refers to the qualitiesthat agents possess (such as autonomy).

agentic workflow

#generativeAI

A dynamic process in which anagent autonomously plans andexecutes actions to achieve a goal. The process may involve reasoning,invoking external tools, and self-correcting its plan.

AI slop

#generativeAI

Output from agenerative AI system that favorsquantity over quality. For example, a web page with AI slop is filledwith cheaply produced, AI-generated, low-quality content.

automatic evaluation

#generativeAI

Using software to judge the quality of a model's output.

When model output is relatively straightforward, a script or program cancompare the model's output to agolden response.This type of automatic evaluation is sometimes calledprogrammatic evaluation. Metrics such asROUGE orBLEU are often useful for programmatic evaluation.

When model output is complex or hasno one right answer, a separate ML program called anautorater sometimes performs the automaticevaluation.

Contrast withhuman evaluation.

autorater evaluation

#generativeAI
A hybrid mechanism for judging the quality of agenerative AI model's output that combineshuman evaluation withautomatic evaluation.An autorater is an ML model trained on data created byhuman evaluation. Ideally, an autoraterlearns to mimic a human evaluator.

Prebuilt autoraters are available, but the best autoraters arefine-tuned specifically to the task you are evaluating.

Note: A running autorater is a fully automated process; humans "only" provide data that helpstrain an autorater.

auto-regressive model

#generativeAI

Amodel that infers a prediction based on its own previouspredictions. For example, auto-regressive language models predict the nexttoken based on the previously predicted tokens.AllTransformer-basedlarge language models are auto-regressive.

In contrast,GAN-based image models are usually not auto-regressivesince they generate an image in a single forward-pass and not iteratively insteps. However, certain image generation modelsare auto-regressive becausethey generate an image in steps.

B

base model

#generativeAI

Apre-trained model that can serve as the startingpoint forfine-tuning to address specific tasks orapplications.

See alsopre-trained modelandfoundation model.

C

chain-of-thought prompting

#generativeAI

Aprompt engineering technique that encouragesalarge language model (LLM) to explain itsreasoning, step by step. For example, consider the following prompt, payingparticular attention to the second sentence:

How many g forces would a driver experience in a car that goes from 0 to 60miles per hour in 7 seconds? In the answer, show all relevant calculations.

The LLM'sresponse would likely:

  • Show a sequence of physics formulas, plugging in the values 0, 60, and 7in appropriate places.
  • Explain why it chose those formulas and what the various variables mean.

Chain-of-thought prompting forces the LLM to perform all the calculations,which might lead to a more correct answer. In addition, chain-of-thoughtprompting enables the user to examine the LLM's steps to determine whetheror not the answer makes sense.

chat

#generativeAI

The contents of a back-and-forth dialogue with an ML system, typically alarge language model.The previous interaction in a chat(what you typed and how the large language model responded) becomes thecontext for subsequent parts of the chat.

Achatbot is an application of a large language model.

contextualized language embedding

#generativeAI

Anembedding that comes close to "understanding" wordsand phrases in ways that fluent human speakers can. Contextualized languageembeddings can understand complex syntax, semantics, and context.

For example, consider embeddings of the English wordcow. Older embeddingssuch asword2vec can represent Englishwords such that the distance in theembedding spacefromcow tobull is similar to the distance fromewe (female sheep) toram (male sheep) or fromfemale tomale. Contextualized languageembeddings can go a step further by recognizing that English speakers sometimescasually use the wordcow to mean either cow or bull.

context window

#generativeAI

The number oftokens a model can process in a givenprompt. The larger the context window, the more informationthe model can use to provide coherent and consistentresponsesto the prompt.

conversational coding

#generativeAI

An iterative dialog between you and a generative AI model for the purposeof creating software. You issue a prompt describing some software. Then,the model uses that description to generate code. Then, you issue a newprompt to address the flaws in the previous prompt or in the generatedcode, and the model generates updated code. You two keep going back andforth until the generated software is good enough.

Conversation coding is essentially the original meaning ofvibe coding.

Contrast withspecificational coding.

D

direct prompting

#generativeAI

Synonym forzero-shot prompting.

distillation

#generativeAI

The process of reducing the size of onemodel (known as theteacher) into a smaller model (known as thestudent) that emulatesthe original model's predictions as faithfully as possible. Distillationis useful because the smaller model has two key benefits over the largermodel (the teacher):

  • Faster inference time
  • Reduced memory and energy usage

However, the student's predictions are typically not as good asthe teacher's predictions.

Distillation trains the student model to minimize aloss function based on the difference between the outputsof the predictions of the student and teacher models.

Compare and contrast distillation with the following terms:

SeeLLMs: Fine-tuning, distillation, and promptengineeringin Machine Learning Crash Course for more information.

E

evals

#generativeAI
#Metric

Primarily used as an abbreviation forLLM evaluations.More broadly,evals is an abbreviation for any form ofevaluation.

evaluation

#generativeAI
#Metric

The process of measuring a model's quality or comparing different modelsagainst each other.

To evaluate asupervised machine learningmodel, you typically judge it against avalidation setand atest set.Evaluating a LLMtypically involves broader quality and safety assessments.

F

factuality

#generativeAI

Within the ML world, a property describing a model whose output is basedon reality. Factuality is a concept rather than a metric.For example, suppose you send the followingpromptto alarge language model:

What is the chemical formula for table salt?

A model optimizing factuality would respond:

NaCl

It is tempting to assume that all models should be based on factuality.However, some prompts, such as the following, should cause a generative AI modelto optimizecreativity rather thanfactuality.

Tell me a limerick about an astronaut and a caterpillar.

It is unlikely that the resulting limerick would be based on reality.

Contrast withgroundedness.

fast decay

#generativeAI

Atraining technique to improve the performance ofLLMs. Fast decay involves rapidly decreasingthelearning rate during training.This strategy helps prevent the model fromoverfitting tothe training data, and improvesgeneralization.

few-shot prompting

#generativeAI

Aprompt that contains more than one (a "few") exampledemonstrating how thelarge language modelshould respond. For example, the following lengthy prompt contains twoexamples showing a large language model how to answer a query.

Parts of one promptNotes
What is the official currency of the specified country?The question you want the LLM to answer.
France: EUROne example.
United Kingdom: GBPAnother example.
India:The actual query.

Few-shot prompting generally produces more desirable results thanzero-shot prompting andone-shot prompting. However, few-shot promptingrequires a lengthier prompt.

Few-shot prompting is a form offew-shot learningapplied toprompt-based learning.

SeePromptengineeringin Machine Learning Crash Course for more information.

fine-tuning

#generativeAI

A second, task-specific training pass performed on apre-trained model to refine its parameters for aspecific use case. For example, the full training sequence for somelarge language models is as follows:

  1. Pre-training: Train a large language model on a vastgeneral dataset,such as all the English language Wikipedia pages.
  2. Fine-tuning: Train the pre-trained model to perform aspecific task,such as responding to medical queries. Fine-tuning typically involveshundreds or thousands of examples focused on the specific task.

As another example, the full training sequence for a large image model is asfollows:

  1. Pre-training: Train a large image model on a vastgeneral imagedataset, such as all the images in Wikimedia commons.
  2. Fine-tuning: Train the pre-trained model to perform aspecific task,such as generating images of orcas.

Fine-tuning can entail any combination of the following strategies:

  • Modifyingall of the pre-trained model's existingparameters. This is sometimes calledfull fine-tuning.
  • Modifying onlysome of the pre-trained model's existing parameters(typically, the layers closest to theoutput layer),while keeping other existing parameters unchanged (typically, the layersclosest to theinput layer). Seeparameter-efficient tuning.
  • Adding more layers, typically on top of the existing layers closest to theoutput layer.

Fine-tuning is a form oftransfer learning.As such, fine-tuning might use a different loss function or a different modeltype than those used to train the pre-trained model. For example, you couldfine-tune a pre-trained large image model to produce a regression model thatreturns the number of birds in an input image.

Compare and contrast fine-tuning with the following terms:

SeeFine-tuningin Machine Learning Crash Course for more information.

Flash model

#generativeAI

A family of relatively smallGemini models optimized for speedand lowlatency. Flash models are designed for a widerange of applications where quick responses and high throughput are crucial.

foundation model

#generativeAI
#Metric

A very largepre-trained model trained on an enormousand diversetraining set. A foundation model can do bothof the following:

  • Respond well to a wide range of requests.
  • Serve as abase model for additionalfine-tuning or other customization.

In other words, a foundation model is already very capable in a general sensebut can be further customized to become even more useful for a specific task.

fraction of successes

#generativeAI
#Metric

A metric for evaluating an ML model'sgenerated text.The fraction of successes is the number of "successful" generated textoutputs divided by the total number of generated text outputs. For example,if alarge language model generated 10 blocksof code, five of which were successful, then the fraction of successeswould be 50%.

Although fraction of successes is broadly useful throughout statistics,within ML, this metric is primarily useful for measuring verifiable taskslike code generation or math problems.

G

Gemini

#generativeAI

The ecosystem comprising Google's most advanced AI. Elements of this ecosysteminclude:

  • VariousGemini models.
  • The interactive conversational interface to a Gemini model.Users type prompts and Gemini responds to those prompts.
  • Various Gemini APIs.
  • Various business products based on Gemini models; for example,Gemini for Google Cloud.

Gemini models

#generativeAI

Google's state-of-the-artTransformer-basedmultimodal models. Gemini models are specificallydesigned to integrate withagents.

Users can interact with Gemini models in a variety of ways, including throughan interactive dialog interface and through SDKs.

Gemma

#generativeAI

A family of lightweight open models built from the sameresearch and technology used to create theGemini models. Severaldifferent Gemma models are available, each providing different features, such asvision, code, and instruction following. SeeGemmafor details.

GenAI or genAI

#generativeAI

Abbreviation forgenerative AI.

generated text

#generativeAI

In general, the text that an ML model outputs. When evaluating largelanguage models, some metrics compare generated text againstreference text. For example, suppose you aretrying to determine how effectively an ML model translates from Frenchto Dutch. In this case:

  • Thegenerated text is the Dutch translation that the ML model outputs.
  • Thereference text is the Dutch translation that a human translator (orsoftware) creates.

Note that some evaluation strategies don't involve reference text.

generative AI

#generativeAI

An emerging transformative field with no formal definition.That said, most experts agree that generative AI models cancreate ("generate") content that is all of the following:

  • complex
  • coherent
  • original

Examples of generative AI include:

  • Large language models, which can generatesophisticated original text and answer questions.
  • Image generation model, which can produce unique images.
  • Audio and music generation models, which can compose original music orgenerate realistic speech.
  • Video generation models, which can generate original videos.

Some earlier technologies, includingLSTMsandRNNs, can also generate original andcoherent content. Some experts view these earlier technologies asgenerative AI, while others feel that true generative AI requires more complexoutput than those earlier technologies can produce.

Contrast withpredictive ML.

golden response

#generativeAI

Aresponse known to be good. For example, given the followingprompt:

2 + 2

The golden response is hopefully:

4

Note: Some organizations define additional terms such assilver response andplatinum response for responses of lower or higher quality, respectively,than the golden response. For example, an organization might useplatinumresponse to indicate a golden response generated by an expert and thenfurther vetted by other experts.

Click here for notes about golden response and reference text.

Some evaluation metrics, such asROUGE, comparereference text to a model'sgenerated text.When there is a single right answer to a prompt, the golden response typicallyserves as the reference text.

Some prompts haveno one right answer.For example, the promptSummarize this document would likely have manyright answers. For such prompts, reference text is often impractical becausea model can generate a very wide range of possible summaries. However, a goldenresponse might be helpful in this situation. For example, a golden responsecontaining a good document summary can help train anautorater to discover patterns of gooddocument summaries.


GPT (Generative Pre-trained Transformer)

#generativeAI

A family ofTransformer-basedlarge language models developed byOpenAI.

GPT variants can apply to multiplemodalities, including:

  • image generation (for example, ImageGPT)
  • text-to-image generation (for example,DALL-E).

H

hallucination

#generativeAI

The production of plausible-seeming but factually incorrect output by agenerative AI model that purports to be making anassertion about the real world.For example, a generative AI model that claims that Barack Obama died in 1865ishallucinating.

human evaluation

#generativeAI

A process in whichpeople judge the quality of an ML model's output;for example, having bilingual people judge the quality of an ML translationmodel. Human evaluation is particularly useful for judging models that haveno one right answer.

Contrast withautomatic evaluation andautorater evaluation.

human in the loop (HITL)

#generativeAI

A loosely-defined idiom that could mean either of the following:

  • A policy of viewinggenerative AI output critically orskeptically.
  • A strategy or system for ensuring that people help shape, evaluate, and refinea model's behavior. Keeping a human in the loop enables an AI to benefit fromboth machine intelligence and human intelligence. For example, a system inwhich an AI generates code which software engineers then review is ahuman-in-the-loop system.

I

in-context learning

#generativeAI

Synonym forfew-shot prompting.

inference

#fundamentals
#generativeAI

In traditional machine learning, the process of making predictions byapplying a trained model tounlabeled examples.SeeSupervised Learningin the Intro to ML course to learn more.

Inlarge language models, inference is theprocess of using a trained model to generate aresponseto an inputprompt.

Inference has a somewhat different meaning in statistics. See theWikipedia article on statistical inference for details.

instruction tuning

#generativeAI

A form offine-tuning that improves agenerative AI model's ability to followinstructions. Instruction tuning involves training a model on a seriesof instruction prompts, typically covering a widevariety of tasks. The resulting instruction-tuned model then tends togenerate usefulresponses tozero-shot prompts across a variety of tasks.

Compare and contrast with:

L

large language model

#generativeAI

At a minimum, alanguage model having a very high numberofparameters. More informally, anyTransformer-based language model, such asGemini orGPT.

SeeLarge language models (LLMs)in Machine Learning Crash Course for more information.

latency

#generativeAI

The time it takes for a model to process input and generate a response.Ahigh latency response takes takes longer to generate than alow latency response.

Factors that influence latency oflarge language models include:

  • Input and outputtoken lengths
  • Model complexity
  • The infrastructure the model runs on

Optimizing for latency is crucial for creating responsive and user-friendlyapplications.

LLM

#generativeAI

Abbreviation forlarge language model.

LLM evaluations (evals)

#generativeAI
#Metric

A set of metrics and benchmarks for assessing the performance oflarge language models (LLMs). At a high level,LLM evaluations:

  • Help researchers identify areas where LLMs need improvement.
  • Are useful in comparing different LLMs and identifying the best LLM for aparticular task.
  • Help ensure that LLMs are safe and ethical to use.

SeeLarge languagemodels (LLMs)in Machine Learning Crash Course for more information.

LoRA

#generativeAI

Abbreviation forLow-Rank Adaptability.

Low-Rank Adaptability (LoRA)

#generativeAI

Aparameter-efficient technique forfine tuning that "freezes" the model's pre-trainedweights (such that they can no longer be modified) and then inserts a small setof trainable weights into the model. This set of trainable weights (also knownas "update matrixes") is considerably smaller than the base model and istherefore much faster to train.

LoRA provides the following benefits:

  • Improves the quality of a model's predictions for the domain where the finetuning is applied.
  • Fine-tunes faster than techniques that require fine-tuningall of a model'sparameters.
  • Reduces the computational cost ofinference by enablingconcurrent serving of multiple specialized models sharing the same basemodel.

Click the icon to learn more about update matrixes in LoRA.

The update matrixes used in LoRA consist ofrank decomposition matrixes,which are derived from the base model to help filter out noise andfocus training on the most important features of the model.

M

machine translation

#generativeAI

Using software (typically, a machine learning model) to convert text fromone human language to another human language, for example, from English toJapanese.

mean average precision at k (mAP@k)

#generativeAI
#Metric

The statistical mean of allaverage precision at k scores acrossa validation dataset. One use of mean average precision at k is to judgethe quality of recommendations generated by arecommendation system.

Although the phrase "mean average" sounds redundant, the name ofthe metric is appropriate. After all, this metric finds the meanof multipleaverage precision at k values.

Click the icon to see an example.

Suppose you build a recommendation system that generates a personalizedlist of recommended novels for each user. Based on feedback from selectedusers, you calculate the following five average precision at k scores (onescore per user):

  • 0.73
  • 0.77
  • 0.67
  • 0.82
  • 0.76

The mean Average Precision at K is therefore:

$$\text{mean } = \frac{\text{0.73 + 0.77 + 0.67 + 0.82 + 0.76}} {\text{5}} = \text{0.75}$$

mixture of experts

#generativeAI

A scheme to increaseneural network efficiency byusing only a subset of its parameters (known as anexpert) to processa given inputtoken orexample. Agating network routes each input token or example to the proper expert(s).

For details, see either of the following papers:

MMIT

#generativeAI

Abbreviation formultimodal instruction-tuned.

model cascading

#generativeAI

A system that picks the idealmodel for a specific inferencequery.

Imagine a group of models, ranging from very large (lots ofparameters) to much smaller (far fewer parameters).Very large models consume more computational resources atinference time than smaller models. However, very largemodels can typically infer more complex requests than smaller models.Model cascading determines the complexity of the inference query and thenpicks the appropriate model to perform the inference.The main motivation for model cascading is to reduce inference costs bygenerally selecting smaller models, and only selecting a larger model for morecomplex queries.

Imagine that a small model runs on a phone and a larger version of that modelruns on a remote server. Good model cascading reduces cost andlatency by enabling the smaller model to handle simple requestsand only calling the remote model to handle complex requests.

See alsomodel router.

model router

#generativeAI

The algorithm that determines the idealmodel forinference inmodel cascading.A model router is itself typically a machine learning model thatgradually learns how to pick the best model for a given input.However, a model router could sometimes be a simpler,non-machine learning algorithm.

MOE

#generativeAI

Abbreviation formixture of experts.

MT

#generativeAI

Abbreviation formachine translation.

N

Nano

#generativeAI

A relatively smallGemini model designed for on-deviceuse. SeeGemini Nano for details.

See alsoPro andUltra.

no one right answer (NORA)

#generativeAI

Aprompt havingmultiple correctresponses.For example, the following prompt has no one right answer:

Tell me a funny joke about elephants.

Evaluating the responses to no one right answer promptsis usually far more subjective than evaluating prompts withone right answer. For example, evaluating an elephantjoke requires a systematic way to determine how funny the joke is.

NORA

#generativeAI

Abbreviation forno one right answer.

Notebook LM

#generativeAI

A Gemini-based tool that enables users to upload documents and thenuseprompts to ask questions about, summarize, or organizethose documents. For example, an author could upload several short storiesand ask Notebook LM to find their common themes or to identify which one wouldmake the best movie.

O

one right answer (ORA)

#generativeAI

Aprompt having asingle correctresponse.For example, consider the following prompt:

True or false: Saturn is bigger than Mars.

The only correct response istrue.

Contrast withno one right answer.

one-shot prompting

#generativeAI

Aprompt that containsone example demonstrating how thelarge language model should respond. For example,the following prompt contains one example showing a large language model howit should answer a query.

Parts of one promptNotes
What is the official currency of the specified country?The question you want the LLM to answer.
France: EUROne example.
India:The actual query.

Compare and contrastone-shot prompting with the following terms:

ORA

#generativeAI

Abbreviation forone right answer.

P

parameter-efficient tuning

#generativeAI

A set of techniques tofine-tune a largepre-trained language model (PLM)more efficiently than fullfine-tuning. Parameter-efficienttuning typically fine-tunes far fewerparameters than fullfine-tuning, yet generally produces alarge language model that performsas well (or almost as well) as a large language model built from fullfine-tuning.

Compare and contrast parameter-efficient tuning with:

Parameter-efficient tuning is also known asparameter-efficient fine-tuning.

Pax

#generativeAI

A programming framework designed for training large-scaleneural networkmodels so largethat they span multipleTPUaccelerator chipslicesorpods.

Pax is built onFlax, which is built onJAX.

Diagram indicating Pax's position in the software stack.          Pax is built on top of JAX. Pax itself consists of three          layers. The bottom layer contains TensorStore and Flax.          The middle layer contains Optax and Flaxformer. The top          layer contains Praxis Modeling Library. Fiddle is built          on top of Pax.

PLM

#generativeAI

Abbreviation forpre-trained language model.

post-trained model

#generativeAI

Loosely-defined term that typically refers to apre-trained model that has gone through somepost-processing, such as one or more of the following:

pre-trained model

#generativeAI

Although this term could refer to any trainedmodel ortrainedembedding vector, pre-trained model nowtypically refers to a trainedlarge language modelor other form of trainedgenerative AI model.

See alsobase model andfoundation model.

pre-training

#generativeAI

The initialtraining of a model on a largedataset. Some pre-trained modelsare clumsy giants and must typically be refined through additional training.For example, ML experts might pre-train alarge language model on a vast text dataset,such as all the English pages in Wikipedia. Following pre-training, theresulting model might be further refined through any of the followingtechniques:

Pro

#generativeAI

AGemini model with fewerparametersthanUltra but more parameters thanNano. SeeGemini Profor details.

prompt

#generativeAI

Any text entered as input to alarge language modelto condition the model to behave in a certain way. Prompts can be as short as aphrase or arbitrarily long (for example, the entire text of a novel). Promptsfall into multiple categories, including those shown in the following table:

Prompt categoryExampleNotes
QuestionHow fast can a pigeon fly?
InstructionWrite a funny poem about arbitrage.A prompt that asks the large language model todo something.
ExampleTranslate Markdown code to HTML. For example:
Markdown: * list item
HTML: <ul> <li>list item</li> </ul>
The first sentence in this example prompt is an instruction. The remainder of the prompt is the example.
RoleExplain why gradient descent is used in machine learning training to a PhD in Physics.The first part of the sentence is an instruction; the phrase "to a PhD in Physics" is the role portion.
Partial input for the model to completeThe Prime Minister of the United Kingdom lives atA partial input prompt can either end abruptly (as this example does) or end with an underscore.

Agenerative AI model can respond to a prompt with text,code, images,embeddings, videos…almost anything.

prompt-based learning

#generativeAI

A capability of certainmodels that enables them to adapttheir behavior in response to arbitrary text input (prompts).In a typical prompt-based learning paradigm, alarge language model responds to a prompt bygenerating text. For example, suppose a user enters the following prompt:

Summarize Newton's Third Law of Motion.

A model capable of prompt-based learning isn't specifically trained to answerthe previous prompt. Rather, the model "knows" a lot of facts about physics,a lot about general language rules, and a lot about what constitutes generallyuseful answers. That knowledge is sufficient to provide a (hopefully) usefulanswer. Additional human feedback ("That answer was too complicated." or"What's a reaction?") enables some prompt-based learning systems to graduallyimprove the usefulness of their answers.

prompt design

#generativeAI

Synonym forprompt engineering.

prompt engineering

#generativeAI

The art of creatingprompts that elicit the desiredresponses from alarge language model. Humans perform promptengineering. Writing well-structured prompts is an essential part of ensuringuseful responses from a large language model. Prompt engineering depends onmany factors, including:

  • The dataset used topre-train and possiblyfine-tune the large language model.
  • Thetemperature and other decoding parameters that themodel uses to generate responses.

Prompt design is a synonym for prompt engineering.

SeeIntroduction to prompt designfor more details on writing helpful prompts.

prompt set

#generativeAI

A group ofprompts forevaluating alarge language model. For example, the followingillustration shows a prompt set consisting of three prompts:

Three prompts to an LLM produce three responses. The three prompts          are the prompt set. The three responses are the response set.

Good prompt sets consist of a sufficiently "wide" collection of prompts tothoroughly evaluate the safety and helpfulness of a large language model.

See alsoresponse set.

prompt tuning

#generativeAI

Aparameter efficient tuning mechanismthat learns a "prefix" that the system prepends to theactualprompt.

One variation of prompt tuning—sometimes calledprefix tuning—is toprepend the prefix atevery layer. In contrast, most prompt tuning onlyadds a prefix to theinput layer.

Click the icon to learn more about prefixes.

For prompt tuning, the "prefix" (also known as a "soft prompt") is ahandful of learned, task-specific vectors prepended to the text tokenembeddings from the actual prompt. The system learns the soft prompt byfreezing all other model parameters and fine-tuning on a specific task.


R

reference text

#generativeAI

An expert's response to aprompt. For example, given thefollowing prompt:

Translate the question "What is your name?" from English to French.

An expert's response might be:

Comment vous appelez-vous?

Various metrics (such asROUGE) measure the degree to which thereference text matches an ML model'sgenerated text.

Note: The expert is typically a human but could be an ML model.

reflection

#generativeAI

A strategy for improving the quality of anagentic workflow by examining (reflecting on) astep's output before passing that output to the next step.

The examiner is often the sameLLM that generated the response(though it could be a different LLM).How could the same LLM that generated a response be a fair judge of itsown response? The "trick" is to put the LLM in a critical (reflective)mindset. This process is analogous to a writer who uses a creativemindset to write a first draft and then switches to a critical mindset toedit it.

For example, imagine an agentic workflow whose first step is to createtext for coffee mugs. The prompt for this step might be:

You are a creative. Generate humorous, original text of less than 50characters suitable for a coffee mug.

Now imagine the following reflective prompt:

You are a coffee drinker. Would you find the preceding response humorous?

The workflow might then only pass text that receives a high reflection scoreto the next stage.

Reinforcement Learning from Human Feedback (RLHF)

#generativeAI

Using feedback from human raters to improve the quality of a model'sresponses. For example, an RLHF mechanism can ask users torate the quality of a model's response with a 👍 or 👎 emoji. The systemcan then adjust its future responses based on that feedback.

response

#generativeAI

The text, images, audio, or video that agenerative AImodelinfers. In other words, aprompt istheinput to a generative AI model and the response is theoutput.

response set

#generativeAI

The collection ofresponses alarge language model returns to an inputprompt set.

role prompting

#generativeAI

Aprompt, typically beginning with the pronounyou, thattells agenerative AI model to pretend to be a certainperson or a certain role when generating theresponse.Role prompting can help a generative AI model get into the right "mindset"in order to generate a more useful response. For example, any of thefollowing role prompts might be appropriate depending on the kind ofresponse you are seeking:

You have a PhD in computer science.

You are a software engineer who enjoys giving patient explanations aboutPython to new programming students.

You are an action hero with a very particular set of programming skills.Assure me that you will find a particular item in a Python list.

S

soft prompt tuning

#generativeAI

A technique for tuning alarge language modelfor a particular task, without resource intensivefine-tuning. Instead of retraining all theweights in the model, soft prompt tuningautomatically adjusts aprompt to achieve the same goal.

Given a textual prompt, soft prompt tuningtypically appends additional token embeddings to the prompt and usesbackpropagation to optimize the input.

A "hard" prompt contains actual tokens instead of token embeddings.

specificational coding

#generativeAI

The process of writing and maintaining a file in a human language (for example,English) that describes software. You can then tell a generative AI model oranother software engineer to create the software that fulfills that description.

Automatically-generated code generally requires iteration. In specificationalcoding, you iterate on the description file. By contrast, inconversational coding, you iterate within theprompt box. In practice, automatic code generation sometimes involves acombination ofboth specificational coding and conversational coding.

T

temperature

#generativeAI

Ahyperparameter that controls the degree of randomnessof a model's output. Higher temperatures result in more random output,while lower temperatures result in less random output.

Choosing the best temperature depends on the specific application andor string values.

U

Ultra

#generativeAI

TheGemini model with the mostparameters.SeeGemini Ultrafor details.

See alsoPro andNano.

V

Vertex

#GoogleCloud
#generativeAI
Google Cloud's platform for AI and machine learning. Vertex provides toolsand infrastructure for building, deploying, and managing AI applications,including access toGemini models.

vibe coding

#generativeAI

Prompting a generative AI model to create software. That is, your promptsdescribe the software's purpose and features, which a generative AI modeltranslates into source code. The generated code doesn't always match yourintentions, so vibe coding usually requires iteration.

Andrej Karpathy coined the termvibe coding in thisX post.In the X post, Karpathy describes it as "a new kind of coding...where you fullygive in to the vibes..." So, the term originally implied an intentionally looseapproach to creating software in which you might not even examine the generatedcode. However, the term has rapidly evolved in many circles to now meananyform of AI-generated coding.

For a more detailed description of vibe coding, seeWhat is vibe coding?.

In addition, compare and contrast vibe coding with:

Z

zero-shot prompting

#generativeAI

Aprompt that doesnot provide an example of how you wantthelarge language model to respond. For example:

Parts of one promptNotes
What is the official currency of the specified country?The question you want the LLM to answer.
India:The actual query.

The large language model might respond with any of the following:

  • Rupee
  • INR
  • Indian rupee
  • The rupee
  • The Indian rupee

All of the answers are correct, though you might prefer a particular format.

Compare and contrastzero-shot prompting with the following terms:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-16 UTC.