llm_embedding Function
The llm_embedding
function generates vector embeddings that represent the semantic meaning of text from specified table columns.
1. Simple Usage (without data)
1.1 Basic Embedding Generation
SELECT llm_embedding(
{'model_name': 'text-embedding-3-small', 'secret_name': 'embedding_secret'},
{'context_columns': [{'data': product_name}, {'data': product_description}]}
) AS product_embedding
FROM VALUES
('Wireless Headphones', 'Premium noise-cancelling headphones with 30-hour battery life'),
('Gaming Laptop', 'High-performance laptop with RTX graphics and 16GB RAM'),
('Smart Watch', 'Fitness tracker with heart rate monitor and GPS')
AS t(product_name, product_description);
Description: This example generates vector embeddings for each product, combining the product_name
and product_description
columns using the text-embedding-3-small
model. The output is a semantic vector that represents the content of the product's name and description.
1.2 Similarity Search
WITH sample_products AS (
SELECT *
FROM (VALUES
(1, 'Wireless Headphones', 'Premium noise-cancelling headphones with 30-hour battery life'),
(2, 'Bluetooth Earbuds', 'Compact wireless earbuds with charging case'),
(3, 'Gaming Laptop', 'High-performance laptop with RTX graphics and 16GB RAM'),
(4, 'Office Laptop', 'Lightweight laptop perfect for business and productivity')
) AS t(product_id, product_name, product_description)
),
product_embeddings AS (
SELECT
product_id,
product_name,
llm_embedding(
{'model_name': 'text-embedding-3-small'},
{'context_columns': [{'data': product_name}, {'data': product_description}]}
) AS product_embedding
FROM sample_products
)
SELECT
a.product_name,
b.product_name,
array_cosine_similarity(a.product_embedding::DOUBLE[1536], b.product_embedding::DOUBLE[1536]) AS similarity
FROM product_embeddings a
JOIN product_embeddings b
ON a.product_id != b.product_id
WHERE similarity > 0.8;
Description: This example demonstrates how to use the vector embeddings for similarity search. It calculates the cosine similarity between embeddings of different products to find similar items based on their semantic meaning. Only product pairs with a similarity greater than 0.8
are included.
2. Input Parameters
The llm_embedding
function accepts two primary inputs: model configuration and column mappings.
2.1 Model Configuration
- Parameter:
model_name
andsecret_name
2.1.1 Model Selection
- Description: Specifies the model used for text generation.
- Example:
{ 'model_name': 'gpt-4o' }
2.1.2 Model Selection with Secret
- Description: Specifies the model along with the secret name to be used for authentication when accessing the model.
- Example:
{ 'model_name': 'gpt-4o', 'secret_name': 'your_secret_name' }
2.2 Context Columns Configuration
- Parameter:
context_columns
array - Description: Specifies the text columns from the table to be passed to the model for embedding generation. Each column can have two properties:
data
: The SQL column data (required)name
: Custom name for the column (optional)
- Note: The
llm_embedding
function currently supports only text data (no image support) - Example:
{ 'context_columns': [{'data': product_name}, {'data': product_description}] }
3. Output
The function returns a JSON array containing floating-point numbers that represent the semantic vector of the input text.
Example Output:
For a product with the description "Wireless headphones with noise cancellation", the output might look like this:
[0.342, -0.564, 0.123, ..., 0.789]
This array of floating-point numbers encodes the semantic meaning of the product description in high-dimensional space.