Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

☁️ Build multimodal AI applications with cloud-native stack

License

NotificationsYou must be signed in to change notification settings

jina-ai/serve

PyPIPyPI - Downloads from official pypistatsGithub CD status

Jina-serve is a framework for building and deploying AI services that communicate via gRPC, HTTP and WebSockets. Scale your services from local development to production while focusing on your core logic.

Key Features

  • Native support for all major ML frameworks and data types
  • High-performance service design with scaling, streaming, and dynamic batching
  • LLM serving with streaming output
  • Built-in Docker integration and Executor Hub
  • One-click deployment to Jina AI Cloud
  • Enterprise-ready with Kubernetes and Docker Compose support
Comparison with FastAPI

Key advantages over FastAPI:

  • DocArray-based data handling with native gRPC support
  • Built-in containerization and service orchestration
  • Seamless scaling of microservices
  • One-command cloud deployment

Install

pip install jina

See guides forApple Silicon andWindows.

Core Concepts

Three main layers:

  • Data: BaseDoc and DocList for input/output
  • Serving: Executors process Documents, Gateway connects services
  • Orchestration: Deployments serve Executors, Flows create pipelines

Build AI Services

Let's create a gRPC-based AI service using StableLM:

fromjinaimportExecutor,requestsfromdocarrayimportDocList,BaseDocfromtransformersimportpipelineclassPrompt(BaseDoc):text:strclassGeneration(BaseDoc):prompt:strtext:strclassStableLM(Executor):def__init__(self,**kwargs):super().__init__(**kwargs)self.generator=pipeline('text-generation',model='stabilityai/stablelm-base-alpha-3b'        )@requestsdefgenerate(self,docs:DocList[Prompt],**kwargs)->DocList[Generation]:generations=DocList[Generation]()prompts=docs.textllm_outputs=self.generator(prompts)forprompt,outputinzip(prompts,llm_outputs):generations.append(Generation(prompt=prompt,text=output))returngenerations

Deploy with Python or YAML:

fromjinaimportDeploymentfromexecutorimportStableLMdep=Deployment(uses=StableLM,timeout_ready=-1,port=12345)withdep:dep.block()
jtype:Deploymentwith:uses:StableLMpy_modules:   -executor.pytimeout_ready:-1port:12345

Use the client:

fromjinaimportClientfromdocarrayimportDocListfromexecutorimportPrompt,Generationprompt=Prompt(text='suggest an interesting image generation prompt')client=Client(port=12345)response=client.post('/',inputs=[prompt],return_type=DocList[Generation])

Build Pipelines

Chain services into a Flow:

fromjinaimportFlowflow=Flow(port=12345).add(uses=StableLM).add(uses=TextToImage)withflow:flow.block()

Scaling and Deployment

Local Scaling

Boost throughput with built-in features:

  • Replicas for parallel processing
  • Shards for data partitioning
  • Dynamic batching for efficient model inference

Example scaling a Stable Diffusion deployment:

jtype:Deploymentwith:uses:TextToImagetimeout_ready:-1py_modules:   -text_to_image.pyenv:CUDA_VISIBLE_DEVICES:RRreplicas:2uses_dynamic_batching:/default:preferred_batch_size:10timeout:200

Cloud Deployment

Containerize Services

  1. Structure your Executor:
TextToImage/├── executor.py├── config.yml├── requirements.txt
  1. Configure:
# config.ymljtype:TextToImagepy_modules: -executor.pymetas:name:TextToImagedescription:Text to Image generation Executor
  1. Push to Hub:
jina hub push TextToImage

Deploy to Kubernetes

jinaexport kubernetes flow.yml ./my-k8skubectl apply -R -f my-k8s

Use Docker Compose

jinaexport docker-compose flow.yml docker-compose.ymldocker-compose up

JCloud Deployment

Deploy with a single command:

jina cloud deploy jcloud-flow.yml

LLM Streaming

Enable token-by-token streaming for responsive LLM applications:

  1. Define schemas:
fromdocarrayimportBaseDocclassPromptDocument(BaseDoc):prompt:strmax_tokens:intclassModelOutputDocument(BaseDoc):token_id:intgenerated_text:str
  1. Initialize service:
fromtransformersimportGPT2Tokenizer,GPT2LMHeadModelclassTokenStreamingExecutor(Executor):def__init__(self,**kwargs):super().__init__(**kwargs)self.model=GPT2LMHeadModel.from_pretrained('gpt2')
  1. Implement streaming:
@requests(on='/stream')asyncdeftask(self,doc:PromptDocument,**kwargs)->ModelOutputDocument:input=tokenizer(doc.prompt,return_tensors='pt')input_len=input['input_ids'].shape[1]for_inrange(doc.max_tokens):output=self.model.generate(**input,max_new_tokens=1)ifoutput[0][-1]==tokenizer.eos_token_id:breakyieldModelOutputDocument(token_id=output[0][-1],generated_text=tokenizer.decode(output[0][input_len:],skip_special_tokens=True            ),        )input= {'input_ids':output,'attention_mask':torch.ones(1,len(output[0])),        }
  1. Serve and use:
# ServerwithDeployment(uses=TokenStreamingExecutor,port=12345,protocol='grpc')asdep:dep.block()# Clientasyncdefmain():client=Client(port=12345,protocol='grpc',asyncio=True)asyncfordocinclient.stream_doc(on='/stream',inputs=PromptDocument(prompt='what is the capital of France ?',max_tokens=10),return_type=ModelOutputDocument,    ):print(doc.generated_text)

Support

Jina-serve is backed byJina AI and licensed underApache-2.0.


[8]ページ先頭

©2009-2025 Movatter.jp