Image captions
By default, the loader utilizes the pre-trainedSalesforce BLIP image captioning model.
This notebook shows how to use theImageCaptionLoader
to generate a queryable index of image captions.
%pip install-qU transformers langchain_openai langchain_chroma
import getpass
import os
os.environ["OPENAI_API_KEY"]= getpass.getpass()
Prepare a list of image urls from Wikimedia
from langchain_community.document_loadersimport ImageCaptionLoader
list_image_urls=[
"https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg",
"https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg",
]
API Reference:ImageCaptionLoader
Create the loader
loader= ImageCaptionLoader(images=list_image_urls)
list_docs= loader.load()
list_docs
[Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg'}, page_content='an image of a bird flying in the air [SEP]'),
Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg'}, page_content='an image of a vintage car parked on the street [SEP]')]
import requests
from PILimport Image
Image.open(requests.get(list_image_urls[0], stream=True).raw).convert("RGB")
Create the index
from langchain_chromaimport Chroma
from langchain_openaiimport OpenAIEmbeddings
from langchain_text_splittersimport RecursiveCharacterTextSplitter
text_splitter= RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits= text_splitter.split_documents(list_docs)
vectorstore= Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever= vectorstore.as_retriever(k=2)
API Reference:OpenAIEmbeddings |RecursiveCharacterTextSplitter
Query
from langchain.chainsimport create_retrieval_chain
from langchain.chains.combine_documentsimport create_stuff_documents_chain
from langchain_core.promptsimport ChatPromptTemplate
from langchain_openaiimport ChatOpenAI
model= ChatOpenAI(model="gpt-4o", temperature=0)
system_prompt=(
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
prompt= ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human","{input}"),
]
)
question_answer_chain= create_stuff_documents_chain(model, prompt)
rag_chain= create_retrieval_chain(retriever, question_answer_chain)
response= rag_chain.invoke({"input":"What animals are in the images?"})
print(response["answer"])
The images include a bird.
response= rag_chain.invoke({"input":"What kind of images are there?"})
print(response["answer"])
There are images of a bird flying in the air and a vintage car parked on the street.
Related
- Document loaderconceptual guide
- Document loaderhow-to guides