Snowflake Cortex AISQL (including LLM functions)¶

Preview Feature — Open

Individual functions in the Cortex AISQL suite are in preview. Be sure to check thestatus of each function before using it in production.

Supported regions

Available to all accounts in select regions.

Use Cortex AISQL in Snowflake to run unstructured analytics on text and images with industry-leading LLMs from OpenAI, Anthropic, Meta, Mistral AI, and DeepSeek.Cortex AISQL supports use cases such as:

Extracting entities to enrich metadata and streamline validation
Aggregating insights across customer tickets
Filtering and classifying content by natural language
Sentiment and aspect-based analysis for service improvement
Translating and localizing multilingual content
Parsing documents for analytics and RAG pipelines

All models are fully hosted in Snowflake, ensuring performance, scalability, and governance while keeping your data secure and in place.

Available functions¶

Snowflake Cortex features are provided as SQL functions and are also availablein Python.Cortex AISQL functions can be grouped into the following categories:

AISQL functions¶

Task-specific functions are purpose-built and managed functions that automate routine tasks, like simple summaries andquick translations, that don’t require any customization.

AI_COMPLETE: Generates a completion for a given text string or image using a selected LLM. Use this function for most generative AI tasks.
- This is the updated version ofCOMPLETE (SNOWFLAKE.CORTEX).
AI_CLASSIFY: Classifies text or images into user-defined categories.
- This is the updated version ofCLASSIFY_TEXT (SNOWFLAKE.CORTEX) with support for multi-label and image classification.
AI_FILTER: Returns True or False for a given text or image input, allowing you to filter results inSELECT,WHERE, orJOIN...ON clauses.
AI_AGG: Aggregates a text column and returns insights across multiple rows based on a user-defined prompt. This function isn’t subject to context window limitations.
AI_EMBED: Generates an embedding vector for a text or image input, which can be used for similarity search, clustering, and classification tasks.
- This is the updated version ofEMBED_TEXT_1024 (SNOWFLAKE.CORTEX).
AI_SUMMARIZE_AGG: Aggregates a text column and returns a summary across multiple rows. This function isn’t subject to context window limitations.
AI_SIMILARITY: Calculates the embedding similarity between two inputs.
PARSE_DOCUMENT (SNOWFLAKE.CORTEX): Extracts text (using OCR mode) or text with layout information (using LAYOUT mode) from documents in an internal or external stage.
TRANSLATE (SNOWFLAKE.CORTEX): Translates text between supported languages.
SENTIMENT (SNOWFLAKE.CORTEX): Extracts sentiment scores from text.
EXTRACT_ANSWER (SNOWFLAKE.CORTEX): Extracts the answer to a question from unstructured data, provided that the relevant data exists.
SUMMARIZE (SNOWFLAKE.CORTEX): Returns a summary of the text that you’ve specified.

Note

Functions that were formerly referred to as “LLM functions” are listed in the “SNOWFLAKE.CORTEX” namespace

Helper functions¶

Helper functions are purpose-built and managed functions that reduce cases of failures when running other AISQL functions, for example bygetting the count of tokens in an input prompt to ensure the call doesn’t exceed a model limit.

COUNT_TOKENS (SNOWFLAKE.CORTEX): Given an input text, returns the token count based on the model or Cortexfunction specified.
TRY_COMPLETE (SNOWFLAKE.CORTEX): Works like the COMPLETE function, but returns NULLwhen the function could not execute instead of an error code.

Cortex Guard¶

Cortex Guard is an option of the COMPLETE function designed to filter possible unsafe and harmful responses from alanguage model. Cortex Guard is currently built with Meta’s Llama Guard 3. Cortex Guard works by evaluating the responses of a languagemodel before that output is returned to the application. Once you activate Cortex Guard, language model responses which may be associatedwith violent crimes, hate, sexual content, self-harm, and more are automatically filtered. SeeCOMPLETE arguments for syntax and examples.

Note

Usage of Cortex Guard incurs compute charges based on the number ofinput tokens processed.

Performance considerations¶

Cortex AISQL Functions are optimized for throughput. We recommend using these functions to process numerous inputs such as text from large SQL tables. Batch processing is typically better suited for AISQL Functions. For more interactive use cases where latency is important, use the REST API. These are available for simple inference (Complete API), embedding (Embed API) and agentic applications (Agents API).

Required privileges¶

The CORTEX_USER database role in the SNOWFLAKE database includes the privileges that allow users to call SnowflakeCortex AI functions. By default, the CORTEX_USER role is granted to the PUBLIC role. The PUBLIC role is automatically grantedto all users and roles, so this allows all users in your account to use the Snowflake Cortex AI functions.

If you don’t want all users to have this privilege, you can revoke access to the PUBLIC role and grant access to specific roles.

To revoke the CORTEX_USER database role from the PUBLIC role, run the following commands using the ACCOUNTADMIN role:

REVOKEDATABASE ROLESNOWFLAKE.CORTEX_USERFROMROLEPUBLIC;REVOKEIMPORTEDPRIVILEGESONDATABASESNOWFLAKEFROMROLEPUBLIC;

Copy

You can then selectively provide access to specific roles. The SNOWFLAKE.CORTEX_USER database role cannot be granted directly to a user.For more information, seeUsing SNOWFLAKE database roles. A user with the ACCOUNTADMIN role can grant this role to a custom role inorder to allow users to access Cortex AI functions. In the following example, use the ACCOUNTADMIN role and grant the usersome_userthe CORTEX_USER database role via the account rolecortex_user_role, which you create for this purpose.

USEROLEACCOUNTADMIN;CREATEROLEcortex_user_role;GRANTDATABASE ROLESNOWFLAKE.CORTEX_USERTOROLEcortex_user_role;GRANTROLEcortex_user_roleTOUSERsome_user;

Copy

You can also grant access to Snowflake Cortex AI functions through existing roles commonly used by specific groups ofusers. (SeeUser roles.) For example, if you have created ananalyst role that is usedas a default role by analysts in your organization, you can easily grant these users access to Snowflake Cortex AISQLfunctions with a single GRANT statement.

GRANTDATABASE ROLESNOWFLAKE.CORTEX_USERTOROLEanalyst;

Copy

Control model access¶

There are two methods to control access to models in Snowflake Cortex. You can use one or both methods together for a mix of broad and fine-grained access control:

Model allowlist
Role-based access control

The model allowlist provides a default level of access to models for all users in the account, which can be customized using theCORTEX_MODELS_ALLOWLIST parameter. Role-based access control allows fine-grained access management by granting or revoking privileges to specific model objects through application roles.

Model access control is available for the following services:

Model allowlist¶

Use theCORTEX_MODELS_ALLOWLIST parameter in theALTERACCOUNTSET command to set model access for all users in the account.If you need to provide specific users with access beyond what you’ve specified in the allowlist, you should use role-based access control instead.For more information about role-based access control, seeRole based access control.

When your users make a request, Snowflake Cortex evaluates the parameter to determine whether the user can access the model.

For theCORTEX_MODELS_ALLOWLIST parameter, you can set the following values:

CORTEX_MODELS_ALLOWLIST='All'Provides access to all models.
The following command provides user access to all models:
ALTERACCOUNTSETCORTEX_MODELS_ALLOWLIST='All';
Copy
CORTEX_MODELS_ALLOWLIST='model1,model2,...'
Provides users with access to the models specified in a comma-separated list.
The following command provides users with access to themistral-large2 andllama3.1-70b models:
ALTERACCOUNTSETCORTEX_MODELS_ALLOWLIST='mistral-large2,llama3.1-70b';
Copy
CORTEX_MODELS_ALLOWLIST='None'Prevents users from accessing any model.
The following command prevents user access to any model:
ALTERACCOUNTSETCORTEX_MODELS_ALLOWLIST='None';
Copy

Role based access control¶

Each model in Snowflake Cortex is a unique object in the SNOWFLAKE.MODELS schema with an associated application role.You can use the model objects and application roles to manage access to the model object.

As a role with the ACCOUNTADMIN privilege, run the following command to get access to the latest models:

CALLSNOWFLAKE.MODELS.CORTEX_BASE_MODELS_REFRESH();

Copy

Next, use the following command to list the models that are available for your current role:

SHOWMODELSINSNOWFLAKE.MODELS;

Copy

The command returns a list of models, such as the following:

created_on	name	model_type	database_name	schema_name	owner
2025-04-22 09:35:38.558 -0700	CLAUDE-3-5-SONNET	CORTEX_BASE	SNOWFLAKE	MODELS	SNOWFLAKE
2025-04-22 09:36:16.793 -0700	LLAMA3.1-405B	CORTEX_BASE	SNOWFLAKE	MODELS	SNOWFLAKE
2025-04-22 09:37:18.692 -0700	SNOWFLAKE-ARCTIC	CORTEX_BASE	SNOWFLAKE	MODELS	SNOWFLAKE

Use the following command to list the application roles for these models:

SHOWAPPLICATIONROLESINAPPLICATIONSNOWFLAKE;

Copy

The command returns a list of application roles, such as the following:

created_on	name	owner	comment	owner_role_type
2025-04-22 09:35:38.558 -0700	CORTEX-MODEL-ROLE-ALL	SNOWFLAKE	MODELS	APPLICATION
2025-04-22 09:36:16.793 -0700	CORTEX-MODEL-ROLE-LLAMA3.1-405B	SNOWFLAKE	MODELS	APPLICATION
2025-04-22 09:37:18.692 -0700	CORTEX-MODEL-ROLE-SNOWFLAKE-ARCTIC	SNOWFLAKE	MODELS	APPLICATION

Important

If you do not see models or their associated application roles, make sure to runCALLSNOWFLAKE.MODELS.CORTEX_BASE_MODELS_REFRESH() to get access to the latest models.Only roles with the ACCOUNTADMIN privilege can call this stored procedure.

To grant access to a specific model, you grant the model’s application role to a user role.For example, you can grantCORTEX-MODEL-ROLE-LLAMA3.1-70B, the application role forSNOWFLAKE.MODELS."LLAMA3.1-70B", to a user role.The following command grants theCORTEX-MODEL-ROLE-LLAMA3.1-70B application role to theMY_ROLE user role:

GRANTAPPLICATION ROLESNOWFLAKE."CORTEX-MODEL-ROLE-LLAMA3.1-70B"TOROLEMY_ROLE;

Copy

To make an inference call, use the fully qualified model name.The following is an example of a call users can make:

SELECTAI_COMPLETE('SNOWFLAKE.MODELS."LLAMA3.1-70B"','Hello');

Copy

Important

When a user makes a request, Snowflake Cortex first uses role-based access control to determine whether the user has access to the model.If the user doesn’t have access, Snowflake Cortex evaluates CORTEX_MODELS_ALLOWLIST to determine access to the model.If the model is in the allowlist (or if the value of the allowlist is set to'All'), the user is granted access to the model.To enable granular access to a model, remove the model name from CORTEX_MODELS_ALLOWLIST or set it to'None'.

Availability¶

Snowflake Cortex AI functions are currently available natively in the following regions. If your region is not listed for a particular function,usecross-region inference.

Note

The TRY_COMPLETE function is available in the same regions as COMPLETE.
The COUNT_TOKENS function is available in all regions for any model, but the models themselves are available only in the regions specified in the tables below.

The following models are available in any region viacross-region inference.

Function (Model)	AWS US (Cross-Region)	AWS EU (Cross-Region)	AWS APJ (Cross-Region)	Azure US (Cross-Region)
COMPLETE (`claude-4-sonnet`)	✔
COMPLETE (`claude-4-opus`)	In preview
COMPLETE (`claude-3-7-sonnet`)	✔	✔
COMPLETE (`claude-3-5-sonnet`)	✔
COMPLETE (`llama4-maverick`)	✔
COMPLETE (`llama4-scout`)	✔
COMPLETE (`llama3.2-1b`)	✔
COMPLETE (`llama3.2-3b`)	✔
COMPLETE (`llama3.1-8b`)	✔	✔	✔	✔
COMPLETE (`llama3.1-70b`)	✔	✔	✔	✔
COMPLETE (`llama3.3-70b`)	✔
COMPLETE (`snowflake-llama-3.3-70b`)	✔
COMPLETE (`llama3.1-405b`)	✔			✔
COMPLETE (`openai-gpt-4.1`)				In preview
COMPLETE (`openai-o4-mini`)				In preview
COMPLETE (`snowflake-llama-3.1-405b`)	✔
COMPLETE (`snowflake-arctic`)	✔			✔
COMPLETE (`deepseek-r1`)	✔
COMPLETE (`reka-core`)	✔
COMPLETE (`reka-flash`)	✔		✔
COMPLETE (`mistral-large2`)	✔		✔	✔
COMPLETE (`mixtral-8x7b`)	✔	✔	✔	✔
COMPLETE (`mistral-7b`)	✔	✔	✔	✔
COMPLETE (`jamba-instruct`)	✔	✔	✔	✔
COMPLETE (`jamba-1.5-mini`)	✔	✔	✔	✔
COMPLETE (`jamba-1.5-large`)	✔	✔
COMPLETE (`gemma-7b`)	✔	✔		✔
EMBED_TEXT_768 (`e5-base-v2`)	✔	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m`)	✔	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m-v1.5`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0-8k`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`nv-embed-qa-4`)	✔
EMBED_TEXT_1024 (`multilingual-e5-large`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`voyage-multilingual-2`)	✔	✔	✔	✔
AI_CLASSIFY TEXT	✔	✔	✔	✔
AI_CLASSIFY IMAGE
AI_FILTER TEXT	✔	✔	✔	✔
AI_FILTER IMAGE
AI_AGG	✔	✔	✔	✔
AI_SIMILARITY TEXT	✔	✔	✔	✔
AI_SIMILARITY IMAGE	✔	✔
AI_SUMMARIZE_AGG	✔	✔	✔	✔
EXTRACT_ANSWER	✔	✔	✔	✔
SENTIMENT	✔	✔	✔	✔
ENTITY_SENTIMENT	✔	✔	✔	✔
SUMMARIZE	✔	✔	✔	✔
TRANSLATE	✔	✔	✔	✔

The following models are available natively in North American regions.

Function (Model)	AWS US West 2 (Oregon)	AWS US East 1 (N. Virginia)	AWS US East (Commercial Gov - N. Virginia)	Azure East US 2 (Virginia)
COMPLETE (`claude-4-sonnet`)
COMPLETE (`claude-4-opus`)
COMPLETE (`claude-3-7-sonnet`)
COMPLETE (`claude-3-5-sonnet`)	✔	✔
COMPLETE (`llama4-maverick`)	✔
COMPLETE (`llama4-scout`)	✔
COMPLETE (`llama3.2-1b`)	✔
COMPLETE (`llama3.2-3b`)	✔
COMPLETE (`llama3.1-8b`)	✔	✔	✔	✔
COMPLETE (`llama3.1-70b`)	✔	✔	✔	✔
COMPLETE (`llama3.3-70b`)	✔
COMPLETE (`snowflake-llama-3.3-70b`)	✔
COMPLETE (`llama3.1-405b`)	✔	✔	✔	✔
COMPLETE (`openai-gpt-4.1`)				In preview
COMPLETE (`openai-o4-mini`)				In preview
COMPLETE (`snowflake-llama-3.1-405b`)	✔
COMPLETE (`snowflake-arctic`)	✔			✔
COMPLETE (`deepseek-r1`)	✔
COMPLETE (`reka-core`)		✔	✔
COMPLETE (`reka-flash`)	✔	✔	✔
COMPLETE (`mistral-large2`)	✔	✔	✔	✔
COMPLETE (`mixtral-8x7b`)	✔	✔	✔	✔
COMPLETE (`mistral-7b`)	✔	✔	✔	✔
COMPLETE (`jamba-instruct`)	✔
COMPLETE (`jamba-1.5-mini`)	✔
COMPLETE (`jamba-1.5-large`)	✔
COMPLETE (`gemma-7b`)	✔	✔	✔	✔
EMBED_TEXT_768 (`e5-base-v2`)	✔	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m`)	✔	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m-v1.5`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0-8k`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`nv-embed-qa-4`)	✔
EMBED_TEXT_1024 (`multilingual-e5-large`)	✔	✔	✔	✔
EMBED_TEXT_1024 (`voyage-multilingual-2`)	✔	✔	✔	✔
AI_CLASSIFY TEXT	✔	✔		✔
AI_CLASSIFY IMAGE	✔	✔
AI_FILTER TEXT	✔	✔		✔
AI_FILTER IMAGE	✔	✔
AI_AGG	✔	✔		✔
AI_SIMILARITY TEXT	✔	✔		✔
AI_SIMILARITY IMAGE	✔	✔
AI_SUMMARIZE_AGG	✔	✔		✔
EXTRACT_ANSWER	✔	✔	✔	✔
SENTIMENT	✔	✔	✔	✔
ENTITY_SENTIMENT	✔	✔	✔	✔
SUMMARIZE	✔	✔	✔	✔
TRANSLATE	✔	✔	✔	✔

The following models are available natively in European regions.

Function (Model)	AWS Europe Central 1 (Frankfurt)	AWS Europe West 1 (Ireland)	Azure West Europe (Netherlands)
COMPLETE (`claude-4-sonnet`)
COMPLETE (`claude-4-opus`)
COMPLETE (`claude-3-7-sonnet`)
COMPLETE (`claude-3-5-sonnet`)
COMPLETE (`llama4-maverick`)
COMPLETE (`llama4-scout`)
COMPLETE (`llama3.2-1b`)
COMPLETE (`llama3.2-3b`)
COMPLETE (`llama3.1-8b`)	✔	✔	✔
COMPLETE (`llama3.1-70b`)	✔	✔	✔
COMPLETE (`llama3.3-70b`)
COMPLETE (`snowflake-llama-3.3-70b`)
COMPLETE (`llama3.1-405b`)
COMPLETE (`openai-gpt-4.1`)
COMPLETE (`openai-o4-mini`)
COMPLETE (`snowflake-llama-3.1-405b`)
COMPLETE (`snowflake-arctic`)
COMPLETE (`deepseek-r1`)
COMPLETE (`reka-core`)
COMPLETE (`reka-flash`)
COMPLETE (`mistral-large2`)	✔	✔	✔
COMPLETE (`mixtral-8x7b`)	✔	✔	✔
COMPLETE (`mistral-7b`)	✔	✔	✔
COMPLETE (`jamba-instruct`)	✔
COMPLETE (`jamba-1.5-mini`)	✔
COMPLETE (`jamba-1.5-large`)
COMPLETE (`gemma-7b`)	✔		✔
EMBED_TEXT_768 (`e5-base-v2`)	✔		✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m`)	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m-v1.5`)	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0`)	✔	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0-8k`)	✔	✔	✔
EMBED_TEXT_1024 (`nv-embed-qa-4`)
EMBED_TEXT_1024 (`multilingual-e5-large`)	✔	✔	✔
EMBED_TEXT_1024 (`voyage-multilingual-2`)	✔	✔	✔
AI_CLASSIFY TEXT	✔	✔	✔
AI_CLASSIFY IMAGE	✔
AI_FILTER TEXT	✔	✔	✔
AI_FILTER IMAGE	✔
AI_AGG	✔	✔	✔
AI_SIMILARITY TEXT	✔	✔	✔
AI_SIMILARITY IMAGE	✔
AI_SUMMARIZE_AGG	✔	✔	✔
EXTRACT_ANSWER	✔	✔	✔
SENTIMENT	✔	✔	✔
ENTITY_SENTIMENT	✔		✔
SUMMARIZE	✔	✔	✔
TRANSLATE	✔	✔	✔

The following models are available natively in Asia Pacific regions.

Function (Model)	AWS AP Southeast 2 (Sydney)	AWS AP Northeast 1 (Tokyo)
COMPLETE (`claude-4-sonnet`)
COMPLETE (`claude-4-opus`)
COMPLETE (`claude-3-7-sonnet`)
COMPLETE (`claude-3-5-sonnet`)
COMPLETE (`llama4-maverick`)
COMPLETE (`llama4-scout`)
COMPLETE (`llama3.2-1b`)
COMPLETE (`llama3.2-3b`)
COMPLETE (`llama3.1-8b`)	✔	✔
COMPLETE (`llama3.1-70b`)	✔	✔
COMPLETE (`llama3.3-70b`)
COMPLETE (`snowflake-llama-3.3-70b`)
COMPLETE (`llama3.1-405b`)
COMPLETE (`openai-gpt-4.1`)
COMPLETE (`openai-o4-mini`)
COMPLETE (`snowflake-llama-3.1-405b`)
COMPLETE (`snowflake-arctic`)
COMPLETE (`deepseek-r1`)
COMPLETE (`reka-core`)
COMPLETE (`reka-flash`)		✔
COMPLETE (`mistral-large2`)	✔	✔
COMPLETE (`mixtral-8x7b`)	✔	✔
COMPLETE (`mistral-7b`)	✔	✔
COMPLETE (`jamba-instruct`)		✔
COMPLETE (`jamba-1.5-mini`)		✔
COMPLETE (`jamba-1.5-large`)
COMPLETE (`gemma-7b`)
EMBED_TEXT_768 (`e5-base-v2`)	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m`)	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m-v1.5`)	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0`)	✔	✔
EMBED_TEXT_1024 (`snowflake-arctic-embed-l-v2.0-8k`)	✔	✔
EMBED_TEXT_1024 (`nv-embed-qa-4`)
EMBED_TEXT_1024 (`multilingual-e5-large`)	✔	✔
EMBED_TEXT_1024 (`voyage-multilingual-2`)	✔	✔
AI_CLASSIFY TEXT	✔	✔
AI_CLASSIFY IMAGE
AI_FILTER TEXT	✔	✔
AI_FILTER IMAGE
AI_AGG	✔	✔
AI_SIMILARITY TEXT	✔	✔
AI_SIMILARITY IMAGE
AI_SUMMARIZE_AGG	✔	✔
EXTRACT_ANSWER	✔	✔
SENTIMENT	✔	✔
ENTITY_SENTIMENT		✔
SUMMARIZE	✔	✔
TRANSLATE	✔	✔

The following Snowflake Cortex AI functions are currently available in the following extended regions.

Function (Model)	AWS US East 2 (Ohio)	AWS CA Central 1 (Central)	AWS SA East 1 (São Paulo)	AWS Europe West 2 (London)	AWS Europe Central 1 (Frankfurt)	AWS Europe North 1 (Stockholm)	AWS AP Northeast 1 (Tokyo)	AWS AP South 1 (Mumbai)	AWS AP Southeast 2 (Sydney)	AWS AP Southeast 3 (Jakarta)	Azure South Central US (Texas)	Azure West US 2 (Washington)	Azure UK South (London)	Azure North Europe (Ireland)	Azure Switzerland North (Zürich)	Azure Central India (Pune)	Azure Japan East (Tokyo, Saitama)	Azure Southeast Asia (Singapore)	Azure Australia East (New South Wales)	GCP Europe West 2 (London)	GCP Europe West 4 (Netherlands)	GCP US Central 1 (Iowa)	GCP US East 4 (N. Virginia)
EMBED_TEXT_768 (`snowflake-arctic-embed-m-v1.5`)	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
EMBED_TEXT_768 (`snowflake-arctic-embed-m`)	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔
EMBED_TEXT_1024 (`multilingual-e5-large`)	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔	✔

The following table lists legacy models. If you’re just getting started, start with models in the previous tables.

Legacy¶
Function (Model)	AWS US West 2 (Oregon)	AWS US East 1 (N. Virginia)	AWS Europe Central 1 (Frankfurt)	AWS AP Southeast 2 (Sydney)	AWS AP Northeast 1 (Tokyo)	Azure East US 2 (Virginia)	Azure West Europe (Netherlands)
COMPLETE (`llama2-70b-chat`)	✔	✔	✔			✔	✔
COMPLETE (`llama3-8b`)	✔	✔	✔	✔	✔	✔
COMPLETE (`llama3-70b`)	✔	✔	✔		✔	✔
COMPLETE (`mistral-large`)	✔	✔	✔			✔	✔

Cost considerations¶

Snowflake Cortex AI functions incur compute cost based on the number of tokens processed. Refer to theSnowflake Service Consumption Table for each function’s cost in credits per million tokens.

A token is the smallest unit of text processed by Snowflake Cortex AI functions, approximately equal to fourcharacters. The equivalence of raw input or output text to tokens can vary by model.

For functions that generate new text in the response (AI_COMPLETE, AI_CLASSIFY, AI_FILTER, AI_AGG, AI_SUMMARIZE, and TRANSLATE), both input and outputtokens are counted.
For CORTEX GUARD, only input tokens are counted. The number of input tokens is based on the number of output tokens perLLM model used in the COMPLETE function.
For AI_SIMILARITY the EMBED_* functions, only input tokens are counted.
For EXTRACT_ANSWER, the number of billable tokens is the sum of the number of tokens in thefrom_text andquestion fields.
AI_CLASSIFY, AI_FILTER, AI_AGG, AI_SUMMARIZE_AGG, SUMMARIZE, TRANSLATE, EXTRACT_ANSWER, and ENTITY_SENTIMENT, and SENTIMENT add a prompt to the input text in order to generate the response.As a result, the input token count is higher than the number of tokens in the text you provide.
AI_CLASSIFY labels, descriptions, and examples are counted as input tokens for each record processed, not just once for each AI_CLASSIFY call.
For PARSE_DOCUMENT (SNOWFLAKE.CORTEX), billing is based on the number of document pages processed.
TRY_COMPLETE(SNOWFLAKE.CORTEX) does not incur costs for error handling. This means that if the TRY_COMPLETE(SNOWFLAKE.CORTEX) function returns NULL, no costis incurred.
COUNT_TOKENS(SNOWFLAKE.CORTEX) incurs only compute cost to run the function. No additional token based costs are incurred.

Snowflake recommends executing queries that call a Snowflake Cortex AISQL function or a Cortex PARSE_DOCUMENT function with a smallerwarehouse (no larger than MEDIUM) because larger warehouses do not increase performance. The cost associated with keeping a warehouse activewill continue to apply when executing a query that calls a Snowflake Cortex LLM Function. For general information oncompute costs, seeUnderstanding compute cost.

Track costs for AI services¶

To track credits used for AI Services including LLM Functions in your account, use theMETERING_HISTORY view:

SELECT*FROMSNOWFLAKE.ACCOUNT_USAGE.METERING_DAILY_HISTORYWHERESERVICE_TYPE='AI_SERVICES';

Copy

Track credit consumption for AISQL functions¶

To view the credit and token consumption for each AISQL function call, use theCORTEX_FUNCTIONS_USAGE_HISTORY view:

SELECT*FROMSNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_USAGE_HISTORY;

Copy

You can also view the credit and token consumption for each query within your Snowflake account. Viewing the credit and token consumption for each query helps you identify queries that are consuming the most credits and tokens.

The following example query uses theCORTEX_FUNCTIONS_QUERY_USAGE_HISTORY view to show the credit and token consumption for all of your queries within your account.

SELECT*FROMSNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_QUERY_USAGE_HISTORY;

Copy

You can also use the same view to see the credit and token consumption for a specific query.

SELECT*FROMSNOWFLAKE.ACCOUNT_USAGE.CORTEX_FUNCTIONS_QUERY_USAGE_HISTORYWHEREquery_id='<query-id>';

Copy

Note

You can’t get granular usage information for requests made with the REST API.

The query usage history is grouped by the models used in the query. For example, if you ran:

SELECTAI_COMPLETE('mistral-7b','Is a hot dog a sandwich'),AI_COMPLETE('mistral-large','Is a hot dog a sandwich');

Copy

The query usage history would show two rows, one formistral-7b and one formistral-large.

Usage quotas¶

Note

On-demand Snowflake accounts without a valid payment method (such as trial accounts) are limited to 10credits per day for Snowflake Cortex AISQL usage. To remove this limit, seeconvert your trial account to a paid account.

Managing costs¶

Snowflake recommends using a warehouse size no larger than MEDIUM when calling Snowflake Cortex AISQLfunctions. Using a larger warehouse than necessary does not increase performance, but can result in unnecessary costs.This recommendation might not apply in the future due to upcoming product updates.

Model restrictions¶

Models used by Snowflake Cortex have limitations on size as described in the table below. Sizes are given in tokens.Tokens generally represent about four characters of text, so the number of words corresponding to a limit isless than the number of tokens. Inputs that exceed the limit result in an error.

The maximum size of the output that a model can produce is limited by the following:

The model’s output token limit.
The space available in the context window after the model consumes the input tokens.

For example,claude-3-5-sonnet has a context window of 200,000 tokens. If 100,000 tokens are used for the input, the model can generate up to 8,192 tokens. However, if 195,000 tokens are used as input, then the model can only generate up to 5,000 tokens for a total of 200,000 tokens.

Important

In the AWS AP Southeast 2 (Sydney) region:

the context window forllama3-8b andmistral-7b is 4,096 tokens.
the context window forllama3.1-8b is 16,384 tokens.
the context window for the Snowflake managed model from the SUMMARIZE function is 4,096 tokens.

In the AWS Europe West 1 (Ireland) region:

the context window forllama3.1-8b is 16,384 tokens.
the context window formistral-7b is 4,096 tokens.

Function	Model	Context window (tokens)	Max output AISQL functions (tokens)
COMPLETE	`llama4-maverick`	128,000	8,192
	`llama4-scout`	128,000	8,192
	`snowflake-arctic`	4,096	8,192
	`deepseek-r1`	32,768	8,192
	`claude-4-opus`	200,000	8,192
	`claude-4-sonnet`	200,000	32,000
	`claude-3-7-sonnet`	200,000	32,000
	`claude-3-5-sonnet`	200,000	8,192
	`mistral-large`	32,000	8,192
	`mistral-large2`	128,000	8,192
	`openai-gpt-4.1`	128,000	8,192
	`openai-o4-mini`	200,000	8,192
	`reka-flash`	100,000	8,192
	`reka-core`	32,000	8,192
	`jamba-instruct`	256,000	8,192
	`jamba-1.5-mini`	256,000	8,192
	`jamba-1.5-large`	256,000	8,192
	`mixtral-8x7b`	32,000	8,192
	`llama2-70b-chat`	4,096	8,192
	`llama3-8b`	8,000	8,192
	`llama3-70b`	8,000	8,192
	`llama3.1-8b`	128,000	8,192
	`llama3.1-70b`	128,000	8,192
	`llama3.3-70b`	128,000	8,192
	`snowflake-llama-3.3-70b`	8,000	8,192
	`llama3.1-405b`	128,000	8,192
	`snowflake-llama-3.1-405b`	8,000	8,192
	`llama3.2-1b`	128,000	8,192
	`llama3.2-3b`	128,000	8,192
	`mistral-7b`	32,000	8,192
	`gemma-7b`	8,000	8,192
EMBED_TEXT_768	`e5-base-v2`	512	n/a
	`snowflake-arctic-embed-m`	512	n/a
EMBED_TEXT_1024	`nv-embed-qa-4`	512	n/a
	`multilingual-e5-large`	512	n/a
	`voyage-multilingual-2`	32,000	n/a
AI_FILTER	Snowflake managed model	128,000	n/a
AI_CLASSIFY / CLASSIFY_TEXT	Snowflake managed model	128,000	n/a
AI_AGG	Snowflake managed model	128,000 per row can be used across multiple rows	8,192
AI_SUMMARIZE_AGG	Snowflake managed model	128,000 per row can be used across multiple rows	8,192
ENTITY_SENTIMENT	Snowflake managed model	2,048	n/a
EXTRACT_ANSWER	Snowflake managed model	2,048 for text 64 for question	n/a
SENTIMENT	Snowflake managed model	512	n/a
SUMMARIZE	Snowflake managed model	32,000	4,096
TRANSLATE	Snowflake managed model	4,096	n/a

Choosing a model¶

The Snowflake Cortex COMPLETE function supports multiple models of varying capability, latency, and cost. These modelshave been carefully chosen to align with common customer use cases. To achieve the bestperformance per credit, choose a model that’s a good match for the content size and complexity of yourtask. Here are brief overviews of the available models.

Large models¶

If you’re not sure where to start, try the most capable models first to establish a baseline to evaluate other models.claude-3-7-sonnet,reka-core, andmistral-large2 are the most capable models offered by Snowflake Cortex,and will give you a good idea what a state-of-the-art model can do.

Claude3-7Sonnet is a leader in general reasoning and multimodal capabilities. It outperforms its predecessors in tasks that require reasoning across different domains and modalities. You can use its large output capacity to get more information from either structured or unstructured queries. Its reasoning capabilities and large context windows make it well-suited for agentic workflows.
deepseek-r1 is a foundation model trained using large-scale reinforcement-learning (RL) without supervised fine-tuning (SFT).It can deliver high performance across math, code, and reasoning tasks.To access the model, set thecross-region inference parameter toAWS_US.
mistral-large2 is Mistral AI’s most advanced large language model with top-tier reasoning capabilities.Compared tomistral-large, it’s significantly more capable in code generation, mathematics, reasoning, andprovides much stronger multilingual support. It’s ideal for complex tasks that require large reasoning capabilitiesor are highly specialized, such as synthetic text generation, code generation, and multilingual text analytics.
llama3.1-405b is an open source model from thellama3.1 model family from Meta with a large 128K context window.It excels in long document processing, multilingual support, synthetic data generation and model distillation.
snowflake-llama3.1-405b is a model derived from the open source llama3.1 model. It uses the<SwiftKVoptimizationshttps://www.snowflake.com/en/blog/up-to-75-lower-inference-cost-llama-meta-llm/> that have been developed by the Snowflake AI research team to deliver up to a 75% inference cost reduction. SwiftKV achieves higher throughput performance with minimal accuracy loss.

Medium models¶

llama3.1-70b is an open source model that demonstrates state-of-the-art performance ideal for chat applications,content creation, and enterprise applications. It is a highly performant, cost effective model that enables diverse usecases with a context window of 128K.llama3-70b is still supported and has a context window of 8K.
snowflake-llama3.3-70b is a model derived from the open source llama3.3 model. It uses the<SwiftKVoptimizationshttps://www.snowflake.com/en/blog/up-to-75-lower-inference-cost-llama-meta-llm/> developed by the Snowflake AI research team to deliver up to a 75% inference cost reduction. SwiftKV achieves higher throughput performance with minimal accuracy loss.
snowflake-arctic is Snowflake’s top-tier enterprise-focused LLM. Arctic excels at enterprise tasks such as SQLgeneration, coding and instruction following benchmarks.
mixtral-8x7b is ideal for text generation, classification, and question answering. Mistral models are optimizedfor low latency with low memory requirements, which translates into higher throughput for enterprise use cases.
Thejamba-Instruct model is built by AI21 Labs to efficiently meet enterprise requirements. It is optimized to offer a 256k tokencontext window with low cost and latency, making it ideal for tasks like summarization, Q&A, and entity extraction on lengthy documentsand extensive knowledge bases.
The AI21 Jamba 1.5 family of models is state-of-the-art, hybrid SSM-Transformer instruction following foundation models. Thejamba-1.5-mini andjamba-1.5-large with a context length of 256K supports use cases such as structured output (JSON), and grounded generation.

Small models¶

Thellama3.2-1b andllama3.2-3b models support context length of 128K tokens and are state-of-the-art in their class for usecases like summarization, instruction following, and rewriting tasks. The Llama 3.2 models deliver multilingual capabilities, withsupport for English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
llama3.1-8b is ideal for tasks that require low to moderate reasoning. It’s a light-weight, ultra-fast model with a context windowof 128K.llama3-8b andllama2-70b-chat are still supported models that provide a smaller context window and relatively lower accuracy.
mistral-7b is ideal for your simplest summarization, structuration, and question answering tasks that need to bedone quickly. It offers low latency and high throughput processing for multiple pages of text with its 32K contextwindow.
gemma-7b is suitable for simple code and text completion tasks. It has a context window of 8,000 tokens but issurprisingly capable within that limit, and quite cost-effective.

The following table provides information on how popular models perform on various benchmarks,including the models offered by Snowflake Cortex COMPLETE as well as a few other popular models.

Model	Context Window (Tokens)	MMLU (Reasoning)	HumanEval (Coding)	GSM8K (Arithmetic Reasoning)	Spider 1.0 (SQL)
GPT 4.o	128,000	88.7	90.2	96.4	-
Claude 3.5 Sonnet	200,000	88.3	92.0	96.4	-
llama3.1-405b	128,000	88.6	89	96.8	-
reka-core	32,000	83.2	76.8	92.2	-
llama3.1-70b	128,000	86	80.5	95.1	-
mistral-large2	128,000	84	92	93	-
reka-flash	100,000	75.9	72	81	-
llama3.1-8b	128,000	73	72.6	84.9	-
mixtral-8x7b	32,000	70.6	40.2	60.4	-
jamba-instruct	256,000	68.2	40	59.9	-
jamba-1.5-mini	256,000	69.7	-	75.8	-
jamba-1.5-large	256,000	81.2	-	87	-
Snowflake Arctic	4,096	67.3	64.3	69.7	79
llama3.2-1b	128,000	49.3	-	44.4	-
llama3.2-3b	128,000	69.4	-	77.7	-
gemma-7b	8,000	64.3	32.3	46.4	-
mistral-7b	32,000	62.5	26.2	52.1	-
GPT 3.5 Turbo^*	4,097	70	48.1	57.1	-

Previous model versions¶

The Snowflake Cortex COMPLETE function also supports the following older model versions. We recommend using the latest model versions instead of the versions listed in this table.

Model	Context Window (Tokens)	MMLU (Reasoning)	HumanEval (Coding)	GSM8K (Arithmetic Reasoning)	Spider 1.0 (SQL)
mistral-large	32,000	81.2	45.1	81	81
llama-2-70b-chat	4,096	68.9	30.5	57.5	-

Using Snowflake Cortex AISQL with Python¶

You can use Snowflake Cortex AISQL functions in the Snowpark Python API. Within the API, you can use the functions to classify, summarize, and filter both text and image data.

These functions include the following:

Theai_agg() function aggregates a column of text using natural language instructions in a similar manner to how you would ask an analyst to summarize or extract findings from grouped or ungrouped data.

The following example summarizes customer reviews for each product using theai_agg() function. The function takes a column of text and a natural language instruction to summarize the reviews.

fromsnowflake.snowpark.functionsimportai_agg,coldf=session.create_dataframe([[1,"Excellent product!"],[1,"Great battery life."],[1,"A bit expensive but worth it."],[2,"Terrible customer service."],[2,"Won’t buy again."],],schema=["product_id","review"])# Summarize reviews per productsummary_df=df.group_by("product_id").agg(ai_agg(col("review"),"Summarize the customer reviews in one sentence."))summary_df.show()

Copy

Note

Use task descriptions that are detailed and centered around the use case. For example, “Summarize the customer feedback for an investor report”.

Theai_classify() function takes a text or image and classifies it into the categories that you define.

The following example classifies travel reviews into categories such as “travel” and “cooking”. The function takes a column of text and a list of categories to classify the text into.

fromsnowflake.snowpark.functionsimportai_classify,coldf=session.create_dataframe([["I dream of backpacking across South America."],["I made the best pasta yesterday."],],schema=["sentence"])df=df.select("sentence",ai_classify(col("sentence"),["travel","cooking"]).alias("classification"))df.show()

Copy

Note

You can provide up to 500 categories. You can classify both text and images.

Theai_filter() function evaluates a natural language condition and returnsTRUE orFALSE. You can use it to filter or tag rows.

fromsnowflake.snowpark.functionsimportai_filter,prompt,coldf=session.create_dataframe(["Canada","Germany","Japan"],schema=["country"])filtered_df=df.select("country",ai_filter(prompt("Is{0} in Asia?",col("country"))).alias("is_in_asia"))filtered_df.show()

Copy

Note

You can filter on both strings and files. For dynamic prompts, use the prompt() function.For more information, seeSnowpark Python reference.

Existing Snowpark ML functions are still supported inSnowpark ML version 1.1.2and later. SeeUsing Snowflake ML Locally for instructions on setting up Snowpark ML.

If you run your Python script outside of Snowflake, you must create a Snowpark session to use these functions. SeeConnecting to Snowflake for instructions.

The following Python example illustrates calling Snowflake Cortex AI functions on single values:

fromsnowflake.corteximportComplete,ExtractAnswer,Sentiment,Summarize,Translatetext="""    The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,    and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana."""print(Complete("llama2-70b-chat","how do snowflakes get their unique patterns?"))print(ExtractAnswer(text,"When was snowflake founded?"))print(Sentiment("I really enjoyed this restaurant. Fantastic service!"))print(Summarize(text))print(Translate(text,"en","fr"))

Copy

You can pass options that affect the model’s hyperparameters when using the COMPLETE function. The following Python example illustrates calling the COMPLETE function with a modification of the maximum number of output tokens that the model can generate:

fromsnowflake.corteximportComplete,CompleteOptionsmodel_options1=CompleteOptions({'max_tokens':30})print(Complete("llama3.1-8b","how do snowflakes get their unique patterns?",options=model_options1))

Copy

You can also call an AI function on a table column, as shown below. This example requires a session object (stored insession) and a tablearticles containing a text columnabstract_text, and creates a new columnabstract_summary containing a summary of the abstract.

fromsnowflake.corteximportSummarizefromsnowflake.snowpark.functionsimportcolarticle_df=session.table("articles")article_df=article_df.withColumn("abstract_summary",Summarize(col("abstract_text")))article_df.collect()

Copy

Note

The advanced chat-style (multi-message) form of COMPLETE is not currently supported in Python.

Using Snowflake Cortex AI functions with Snowflake CLI¶

Snowflake Cortex AISQL is available inSnowflake CLI version 2.4.0and later. SeeIntroducing Snowflake CLI for more information about using Snowflake CLI.

The following examples illustrate using thesnowcortex commands on single values. The-c parameter specifies which connection to use.

Note

The advanced chat-style (multi-message) form of COMPLETE is not currently supported in Snowflake CLI.

snowcortexcomplete"Is 5 more than 4? Please answer using one word without a period."-c"snowhouse"

Copy

snowcortexextract-answer"what is snowflake?""snowflake is a company"-c"snowhouse"

Copy

snowcortexsentiment"Mary had a little Lamb"-c"snowhouse"

Copy

snowcortexsummarize"John has a car. John's car is blue. John's car is old and John is thinking about buying a new car. There are a lot of cars to choose from and John cannot sleep because it's an important decision for John."

Copy

snowcortextranslateherb--topl

Copy

You can also use files that contain the text you want to use for the commands. For this example, assume that the fileabout_cortex.txt contains the following content:

Snowflake Cortex gives you instant access to industry-leading large language models (LLMs) trained by researchers at companies like Anthropic, Mistral, Reka, Meta, and Google, including Snowflake Arctic, an open enterprise-grade model developed by Snowflake.Since these LLMs are fully hosted and managed by Snowflake, using them requires no setup. Your data stays within Snowflake, giving you the performance, scalability, and governance you expect.Snowflake Cortex features are provided as SQL functions and are also available in Python. The available functions are summarized below.COMPLETE: Given a prompt, returns a response that completes the prompt. This function accepts either a single prompt or a conversation with multiple prompts and responses.EMBED_TEXT_768: Given a piece of text, returns a vector embedding that represents that text.EXTRACT_ANSWER: Given a question and unstructured data, returns the answer to the question if it can be found in the data.SENTIMENT: Returns a sentiment score, from -1 to 1, representing the detected positive or negative sentiment of the given text.SUMMARIZE: Returns a summary of the given text.TRANSLATE: Translates given text from any supported language to any other.

You can then execute thesnowcortexsummarize command by passing in the filename using the--file parameter, as shown:

snowcortexsummarize--fileabout_cortex.txt

Copy

Snowflake Cortex offers instant access to industry-leading language models, including Snowflake Arctic, with SQL functions for completing prompts (COMPLETE), text embedding (EMBED\_TEXT\_768), extracting answers (EXTRACT\_ANSWER), sentiment analysis (SENTIMENT), summarizing text (SUMMARIZE), and translating text (TRANSLATE).

For more information about these commands, seesnow cortex commands.

Legal notices¶

The data classification of inputs and outputs are as set forth in the following table.

Input data classification	Output data classification	Designation
Usage Data	Customer Data	Generally available functions are Covered AI Features. Preview functions are Preview AI Features.[1]

[1]

Represents the defined term used in the AI Terms and Acceptable Use Policy.

For additional information, refer toSnowflake AI and ML.

Movatterモバイル変換

Snowflake Cortex AISQL (including LLM functions)¶

Available functions¶

AISQL functions¶

Helper functions¶

Cortex Guard¶

Performance considerations¶

Required privileges¶

Control model access¶

Model allowlist¶

Role based access control¶

Availability¶

Cost considerations¶

Track costs for AI services¶

Track credit consumption for AISQL functions¶

Usage quotas¶

Managing costs¶

Model restrictions¶

Choosing a model¶

Large models¶

Medium models¶

Small models¶

Previous model versions¶

Using Snowflake Cortex AISQL with Python¶

Using Snowflake Cortex AI functions with Snowflake CLI¶

Legal notices¶