NotificationsYou must be signed in to change notification settings
Fork1k
Star11k

Support image output#1130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

Kludex wants to merge1 commit intomain

base:main

Choose a base branch

fromplaying-with-gemini-images-output

Draft

Support image output#1130

Kludex wants to merge1 commit intomainfromplaying-with-gemini-images-output

+61 −2

Conversation

Copy link

Member

Kludex commentedMar 15, 2025•
edited
Loading

ClosesSupport for model Gemini Flash 2.0 Image Generation #1126

Still a lot to do, and decide... It's still not type safe, and can't usemessage_history properly.

Themain.py in the files already work tho.

Support image output

e8ba35b

Kludex marked this pull request as draft

March 15, 2025 13:20

Copy link

github-actionsbot commentedMar 15, 2025

Docs Preview

commit:	`e8ba35b`
Preview URL:	https://6c9c1503-pydantic-ai-previews.pydantic.workers.dev

Kludex mentioned this pull request

Mar 16, 2025

Support for model Gemini Flash 2.0 Image Generation#1126

Open

Copy link

Contributor

DouweM commentedApr 30, 2025

@Kludex Are you planning to work on this or are we better off closing it for now?

DouweM assignedKludex

Apr 30, 2025

Copy link

MemberAuthor

Kludex commentedApr 30, 2025

This is still in my radar, I prefer to keep it open.

Copy link

ollz272 commentedJun 30, 2025

hi, would love access to this feature, is there an ETA?

DouweM mentioned this pull request

Jul 7, 2025

ModelResponsePart should support multimodal data#2140

Open

Copy link

lshamis commentedJul 7, 2025

I think something like this will be necessary sooner than later. Many models can/will generate interleaved multimodal content.

Slightly philosophical question, but why are the output types of an LLM different from those of ToolCall?

Copy link

Contributor

DouweM commentedJul 7, 2025

Slightly philosophical question, but why are the output types of an LLM different from those of ToolCall?

@lshamis Because the types of data LLMs support as input (whether that's via the user prompt as a tool call result) are not the same as the types of data they can output. For example, all models support text input and text output, and many support image, video, audio, and document input, but only a handful support image output, and as far as I know none can output e.g. PDF files. So there's necessarily a difference between the types of things we allow tools to output (as it's anything that can be sent back to the model as input) and what models themselves can output.

DouweM mentioned this pull request

Jul 18, 2025

BinaryContent returned by a tool is replaced with e4fcfe by agent#2243

Closed

2 tasks

Labels

None yet

4 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support image output#1130

Are you sure you want to change the base?

Support image output#1130

Conversation

Kludex commentedMar 15, 2025•
edited
Loading

Uh oh!