dwyl/image-classifierPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star27

🖼️ Classify images and extract data from or describe their contents using machine learning

License

GPL-2.0 license

27 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 605 Commits
.github		.github
_comparison		_comparison
assets		assets
config		config
hnswlib		hnswlib
lib		lib
priv		priv
rel		rel
test		test
.dockerignore		.dockerignore
.formatter.exs		.formatter.exs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
coveralls.json		coveralls.json
deployment.md		deployment.md
fly.toml		fly.toml
mix.exs		mix.exs
mix.lock		mix.lock

Repository files navigation

Image Captioning & Semantic Search in`Elixir`

Let's useElixir machine learning capabilitiesto build an applicationthat performsimage captioningandsemantic searchto look for uploaded imageswith your voice! 🎙️

Image Captioning & Semantic Search inElixir

Why? 🤷

Whilst building ourapp,we considerimages anessential medium of communication.

We needed a fully-offline capable (no 3rd party APIs/Services) image captioning serviceusing state-of-the-art pre-trained image and embedding models to describe images uploaded in ourApp.

By adding a way of captioning images, we make iteasy for people to suggest meta tags that describe images so they becomesearchable.

What? 💭

A step-by-step tutorial building a fully functionalPhoenix LiveView web application that allows anyoneto upload an image and have it describedand searchable.

In addition to this,the app will allow the person to record an audiowhich describes the image they want to find.

The audio will be transcribed into textand be semantically queryable.We do this by encoding the image captionsas vectors and runningknn search on them.

We'll be using three different models:

Salesforce's BLIP modelblip-image-captioning-largefor image captioning.
OpenAI's speech recognition modelwhisper-small.
sentence-transformers/paraphrase-MiniLM-L6-v2embedding model.

Who? 👤

This tutorial is aimed atPhoenix beginnerswho want to start exploring the machine-learning capabilitiesof the Elixir language within aPhoenix application.We propose to use pre-trained models from Hugging Face viaBumblebeeand grasp how to:

run a model, in particular image captioning.
how to use embeddings.
how to run a semantic search using anApproximate Nearest Neighbouralgorithm.

If you are completely new toPhoenix andLiveView,we recommend you follow theLiveViewCounter Tutorial:

dwyl/phoenix-liveview-counter-tutorial

How? 💻

In these chapters, we'll go over the development process of this small application.You'll learn how to do thisyourself, so grab some coffee and let's get cracking!

This section will be divided into two sections.One will go overimage captioningwhile the second one will expand the applicationby addingsemantic search.

Prerequisites

This tutorial requires you to haveElixir andPhoenix installed.

If you don't, please seehow to install Elixir andPhoenix.

This guide assumes you know the basics ofPhoenixand havesome knowledge of how it works.If you don't, wehighly suggest you follow our other tutorials first, e.g:github.com/dwyl/phoenix-chat-example

In addition to this,some knowledge ofAWS - what it is, what anS3 bucket is/does -is assumed.

Note

If you have questions or get stuck,please open an issue!/dwyl/image-classifier/issues

🌄 Image Captioning in`Elixir`

In this section, we'll start building our applicationwithBumblebee that supports Transformer models.At the end of this section,you'll have a fully functional applicationthat receives an image,processes it accordinglyand captions it.

0. Creating a fresh`Phoenix` project

Let's create a freshPhoenix project.Run the following command in a given folder:

mix phx.new. --app app --no-dashboard --no-ecto  --no-gettext --no-mailer

We're runningmix phx.new to generate a new project without a dashboard and mailer (email) service,since we don't need those features in our project.

After this, if you runmix phx.server to run your server, you should be able to see the following page.

We're ready to start building.

1. Installing initial dependencies

Now that we're ready to go, let's start by adding some dependencies.

Head over tomix.exsand add the following dependenciesto thedeps section.

{:bumblebee,"~> 0.5.0"},{:exla,"~> 0.7.0"},{:nx,"~> 0.7.0 "},{:hnswlib,"~> 0.1.5"},

bumblebee is a framework that will allows us to integrateTransformer Models inPhoenix.TheTransformers (fromHugging Face)are APIs that allow us to easily download and usepre-trained models.TheBumblebee package aims to support all Transformer Models, even if some are still lacking.You may check which ones are supported by visitingBumblebee's repository or by visitinghttps://jonatanklosko-bumblebee-tools.hf.space/apps/repository-inspectorand checking if the model is currently supported.
Nx is a library that allows us to work withNumerical Elixir, the Elixir's way of doingnumerical computing. It supports tensors and numericla computations.
EXLA is the Elixir implementation ofGoogle's XLA,a compiler that provides faster linear algebra calculationswithTensorFlow models.This backend compiler is needed forNx.We are installingEXLA because it allows us to compile modelsjust-in-time and run them on CPU and/or GPU.
Vix is an Elixir extension forlibvips, an image processing library.

Inconfig/config.exs, let's add our:nx configurationto useEXLA.

config:nx,default_backend:EXLA.Backend

2. Adding`LiveView` capabilities to our project

As it stands, our project is not usingLiveView.Let's fix this.

This will launch a super-powered process that establishes a WebSocket connectionbetween the server and the browser.

Inlib/app_web/router.ex, change thescope "/" to the following.

scope"/",AppWebdopipe_through:browserlive"/",PageLiveend

Instead of using thePageController,we are going to be creatingPageLive,aLiveView file.

Let's create ourLiveView files.Insidelib/app_web,create a folder calledliveand create the following filepage_live.ex.

#/lib/app_web/live/page_live.exdefmoduleAppWeb.PageLivedouseAppWeb,:live_view@impltruedefmount(_params,_session,socket)do{:ok,socket}endend

This is a simpleLiveView controller.

In the samelive folder,create a file calledpage_live.html.heexand use the following code.

<divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 sm:py-28 lg:px-8 xl:px-28 xl:py-32"><divclass="flex justify-center items-center mx-auto max-w-xl w-[50vw] lg:mx-0"><form><divclass="space-y-12"><div><h2class="text-base font-semibold leading-7 text-gray-900">            Image Classifier</h2><pclass="mt-1 text-sm leading-6 text-gray-600">            Drag your images and we'll run an AI model to caption it!</p><divclass="mt-10 grid grid-cols-1 gap-x-6 gap-y-8 sm:grid-cols-6"><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"><divclass="text-center"><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><span>Upload a file</span><inputid="file-upload"name="file-upload"type="file"class="sr-only"/></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">                    PNG, JPG, GIF up to 5MB</p></div></div></div></div></div></div></form></div></div>

This is a simple HTML form that usesTailwind CSSto enhance the presentation of the upload form.We'll also remove the unused header of the page layout,while we're at it.

Locate the filelib/app_web/components/layouts/app.html.heexand remove the<header> class.The file should only have the following code:

<mainclass="px-4 py-20 sm:px-6 lg:px-8"><divclass="mx-auto max-w-2xl"><.flash_group flash={@flash}/><%= @inner_content %></div></main>

Now you can safely delete thelib/app_web/controllers folder,which is no longer used.

If you runmix phx.server,you should see the following screen:

This means we've successfully addedLiveViewand changed our view!

3. Receiving image files

Now, let's start by receiving some image files.

WithLiveView,we can easily do this by usingallow_upload/3when mounting ourLiveView.With this function, we can easily acceptfile uploads with progress.We can define file types, max number of entries,max file size,validate the uploaded file and much more!

Firstly,let's make some changes tolib/app_web/live/page_live.html.heex.

<divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 sm:py-28 lg:px-8 xl:px-28 xl:py-32"><divclass="flex justify-center items-center mx-auto max-w-xl w-[50vw] lg:mx-0"><divclass="space-y-12"><divclass="border-gray-900/10 pb-12"><h2class="text-base font-semibold leading-7 text-gray-900">          Image Classification</h2><pclass="mt-1 text-sm leading-6 text-gray-600">          Do simple captioning with this<ahref="https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html"class="font-mono font-medium text-sky-500">LiveView</a>          demo, powered by<ahref="https://github.com/elixir-nx/bumblebee"class="font-mono font-medium text-sky-500">Bumblebee</a>.</p><!-- File upload section --><divclass="mt-10 grid grid-cols-1 gap-x-6 gap-y-8 sm:grid-cols-6"><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"phx-drop-target="{@uploads.image_list.ref}"><divclass="text-center"><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formphx-change="validate"phx-submit="save"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/> Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">                  PNG, JPG, GIF up to 5MB</p></div></div></div></div></div></div></div></div>

We've added a few features:

we used<.live_file_input/>forLiveView file upload.We've wrapped this componentwith an element that is annotated with thephx-drop-target attributepointing to the DOMid of the file input.
because we used<.live_file_input/>,we need to annotate its wrapping elementwithphx-submit andphx-change,as perhexdocs.pm/phoenix_live_view/uploads.html#render-reactive-elements

Because we've added these bindings,we need to add the event handlers inlib/app_web/live/page_live.ex.Open it and update it to:

defmoduleAppWeb.PageLivedouseAppWeb,:live_view@impltruedefmount(_params,_session,socket)do{:ok,socket|>assign(label:nil,upload_running?:false,task_ref:nil)|>allow_upload(:image_list,accept:~w(image/*),auto_upload:true,progress:&handle_progress/3,max_entries:1,chunk_size:64_000)}end@impltruedefhandle_event("validate",_params,socket)do{:noreply,socket}end@impltruedefhandle_event("remove-selected",%{"ref"=>ref},socket)do{:noreply,cancel_upload(socket,:image_list,ref)}end@impltruedefhandle_event("save",_params,socket)do{:noreply,socket}enddefphandle_progress(:image_list,entry,socket)whenentry.done?douploaded_file=consume_uploaded_entry(socket,entry,fn%{path:_path}=_meta->{:ok,entry}end){:noreply,socket}enddefphandle_progress(:image_list,_,socket),do:{:noreply,socket}end

whenmount/3ing the LiveView,we are creating three socket assigns:label pertains to the model prediction;upload_running? is a boolean referring to whether the model is running or not;task_ref refers to the reference of the task that was created for image classification(we'll delve into this further later down the line).Additionally, we are using theallow_upload/3 function to define our upload configuration.The most important settings here areauto_upload set totrueand theprogress fields.By configuring these two properties,we are tellingLiveView thatwhenever the person uploads a file,it is processed immediately and consumed.
theprogress field is handled by thehandle_progress/3 function.It receives chunks from the client with a build-inUploadWriter function(as explained in thedocs).When the chunks are all consumed, we get the booleanentry.done? == true.We consume the file in this function by usingconsume_uploaded_entry/3.The anonymous function returns{:ok, data} or{:postpone, message}.Whilst consuming the entry/file,we can access its path and then use its content.For now, we don't need to use it.But we will in the future to feed our image classifier with it!After the callback function is executed,this function "consumes the entry",essentially deleting the image from the temporary folderand removing it from the uploaded files list.()
the"validate","remove-selected","save" event handlersare called whenever the person uploads the image,wants to remove it from the list of uploaded imagesand when wants to submit the form,respectively.You may see that we're not doing much with these handlers;we're simply replying with a:noreplybecause we don't need to do anything with them.

And that's it!If you runmix phx.server, nothing will change.

4. Integrating`Bumblebee`

Now here comes the fun part!It's time to do some image captioning! 🎉

4.1`Nx` configuration

We first need to add some initial setup in thelib/app/application.ex file.Head over there and changethestart function like so:

@impltruedefstart(_type,_args)dochildren=[# Start the Telemetry supervisorAppWeb.Telemetry,# Start the PubSub system{Phoenix.PubSub,name:App.PubSub},{Nx.Serving,serving:serving(),name:ImageClassifier},# Start the Endpoint (http/https)AppWeb.Endpoint]opts=[strategy::one_for_one,name:App.Supervisor]Supervisor.start_link(children,opts)enddefservingdo{:ok,model_info}=Bumblebee.load_model({:hf,"microsoft/resnet-50"}){:ok,featurizer}=Bumblebee.load_featurizer({:hf,"microsoft/resnet-50"})Bumblebee.Vision.image_classification(model_info,featurizer,top_k:1,compile:[batch_size:10],defn_options:[compiler:EXLA])end

We are usingNx.Serving, which simply allows us to encapsulate tasks; it can be networking, machine learning, data processing or any other task.

In this specific case, we are using it tobatch requests.This is extremely useful and important because we are using models that typically run onGPU.The GPU isreally good atparallelizing tasks.Therefore, instead of sending an image classification request one by one, we canbatch them/bundle them together as much as we can and then send it over.

We can define thebatch_size andbatch_timeout withNx.Serving.We're going to use the default values, hence why we're not explicitly defining them.

WithNx.Serving, we define aserving/0 functionthat is then used by it, which in turn is executed in the supervision tree since we declare it as a child in the Application module.

In theserving/0 function, we are loading theResNet-50 model and its featurizer.

Note

Afeaturizer can be seen as aFeature Extractor.It is essentially a component that is responsible for converting input datainto a format that can be processed by a pre-trained language model.

It takes raw information and performs various transformations,such astokenization,padding,and encoding to prepare the data for model training or inference.

Lastly, this function returns a serving for image classification by callingimage_classification/3, where we can define our compiler and task batch size.We gave our serving function the nameImageClassifier as declared in the Application module.

4.2`Async` processing the image for classification

Now we're ready to send the image to the modeland get a prediction of it!

Every time we upload an image,we are going to runasync processing.This means that the task responsible for image classification will be run in another process, thus asynchronously, meaning that the LiveViewwon't have to wait for this task to finishto continue working.

For this scenario, we are going to be using theTask module to spawn processes to complete this task.

Go tolib/app_web/live/page_live.exand change the following code.

defhandle_progress(:image_list,entry,socket)whenentry.done?do# Consume the entry and get the tensor to feed to classifiertensor=consume_uploaded_entry(socket,entry,fn%{}=meta->{:ok,vimage}=Vix.Vips.Image.new_from_file(meta.path)pre_process_image(vimage)end)# Create an async task to classify the imagetask=Task.async(fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)# Update socket assigns to show spinner whilst task is running{:noreply,assign(socket,upload_running?:true,task_ref:task.ref)}end@impltruedefhandle_info({ref,result},%{assigns:%{task_ref:ref}}=socket)do# This is called everytime an Async Task is created.# We flush it here.Process.demonitor(ref,[:flush])# And then destructure the result from the classifier.%{predictions:[%{label:label}]}=result# Update the socket assigns with result and stopping spinner.{:noreply,assign(socket,label:label,upload_running?:false)}end

Note

Thepre_process_image/1 function is yet to be defined.We'll do that in the following section.

In thehandle_progress/3 function,whilst we are consuming the image,we are first converting it to aVix.Vips.ImageStructusing the file path.We then feed this image to thepre_process_image/1 function that we'll implement later.

What's important is to notice this line:

task=Task.async(fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)

We are usingTask.async/1to call ourNx.Serving build functionImageClassifier we've defined earlier,thus initiating a batched run with the image tensor.While the task is spawned,we update the socket assigns with the reference to the task (:task_ref)and update the:upload_running? assign totrue,so we can show a spinner or a loading animation.

When the task is spawned usingTask.async/1,a couple of things happen in the background.The new process is monitored by the caller (ourLiveView),which means that the caller will receive a{:DOWN, ref, :process, object, reason}message once the process it is monitoring dies.And, a link is created between both processes.

Therefore,wedon't need to useTask.await/2.Instead, we create a new handler to receive the aforementioned.That's what we're doing in thehandle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket) function.The received message contains a{ref, result} tuple,whereref is the monitor’s reference.We use this reference to stop monitoring the task,since we received the result we needed from our taskand we can discard an exit message.

In this same function, we destructure the predictionfrom the model and assign it to the socket assign:labeland set:upload_running? tofalse.

Quite beautiful, isn't it?With this, we don't have to worry if the person closes the browser tab.The process dies (as does ourLiveView),and the work is automatically cancelled,meaning no resources are spenton a process for which nobody expects a result anymore.

4.2.1 Considerations regarding`async` processes

When a task is spawned usingTask.async/2,it is linked to the caller.Which means that they're related:if one dies, the other does too.

We ought to take this into account when developing our application.If we don't have control over the result of the task,and we don't want ourLiveView to crash if the task crashes,we must use a different alternative to spawn our task -Task.Supervisor.async_nolink/3can be used for this effect,meaning we can use it if we want to make sureourLiveView won't die and the error is reported,even if the task crashes.

We've chosenTask.async/2 for this very reason.We are doing somethingthat takes time/is expensiveand wewant to stop the task ifLiveView is closed/crashes.However, if you are building somethinglike a report that has to be generated even if the person closes the browser tab,this is not the right solution.

4.2.2 Alternative for better testing

We are spawning async tasks by callingTask.async/1.This is creating anunsupervised task.Although it's plausible for this simple app,it's best for us to create aSupervisorthat manages their child tasks.This gives more control over the executionand lifetime of the child tasks.

Additionally, it's better to have these tasks supervisedbecause it makes it possible to create tests for ourLiveView.For this, we need to make a couple of changes.

First, head over tolib/app/application.exand add a supervisor to thestart/2 function children array.

defstart(_type,_args)dochildren=[AppWeb.Telemetry,{Phoenix.PubSub,name:App.PubSub},{Nx.Serving,serving:serving(),name:ImageClassifier},{Task.Supervisor,name:App.TaskSupervisor},# add this lineAppWeb.Endpoint]opts=[strategy::one_for_one,name:App.Supervisor]Supervisor.start_link(children,opts)end

We are creating aTask.Supervisorwith the nameApp.TaskSupervisor.

Now, inlib/app_web/live/page_live.ex,we create the async task like so:

task=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)

We are now usingTask.Supervisor.async,passing the name of the supervisor defined earlier.

And that's it!We are creating async tasks like before,the only difference is that they're nowsupervised.

In tests, you can create a small module that waits for the tasks to be completed.

defmoduleAppWeb.SupervisorSupportdo@moduledoc"""    This is a support module helper that is meant to wait for all the children of a supervisor to complete.    If you go to `lib/app/application.ex`, you'll see that we created a `TaskSupervisor`, where async tasks are spawned.    This module helps us to wait for all the children to finish during tests.  """@doc"""    Find all children spawned by this supervisor and wait until they finish.  """defwait_for_completion()dopids=Task.Supervisor.children(App.TaskSupervisor)Enum.each(pids,&Process.monitor/1)wait_for_pids(pids)enddefpwait_for_pids([]),do:nildefpwait_for_pids(pids)doreceivedo{:DOWN,_ref,:process,pid,_reason}->wait_for_pids(List.delete(pids,pid))endendend

You can callAppWeb.SupervisorSupport.wait_for_completion()in unit tests so they wait for the tasks to complete.In our case,we do that until theprediction is made.

4.3 Image pre-processing

As we've noted before,we need topre-process the image before passing it to the model.For this, we have three main steps:

removing thealphaout of the image, flattening it out.
convert the image tosRGBcolourspace.This is needed to ensure that the image is consistentand aligns with the model's training data images.
set the representation of the image as aTensortoheight, width, bands.The image tensor will then be organized as a three-dimensional array,where the first dimension represents the height of the image,the second refers to the width of the image,and the third pertains to the differentspectral bands/channels of the image.

Ourpre_process_image/1 function will implement these three steps.Let's implement it now!
Inlib/app_web/live/page_live.ex,add the following:

defppre_process_image(%Vimage{}=image)do# If the image has an alpha channel, flatten it:{:ok,flattened_image}=caseVix.Vips.Image.has_alpha?(image)dotrue->Vix.Vips.Operation.flatten(image)false->{:ok,image}end# Convert the image to sRGB colourspace ----------------{:ok,srgb_image}=Vix.Vips.Operation.colourspace(flattened_image,:VIPS_INTERPRETATION_sRGB)# Converting image to tensor ----------------{:ok,tensor}=Vix.Vips.Image.write_to_tensor(srgb_image)# We reshape the tensor given a specific format.# In this case, we are using {height, width, channels/bands}.%Vix.Tensor{data:binary,type:type,shape:{x,y,bands}}=tensorformat=[:height,:width,:bands]shape={x,y,bands}final_tensor=binary|>Nx.from_binary(type)|>Nx.reshape(shape,names:format){:ok,final_tensor}end

The function receives aVix image,as detailed earlier.We useflatten/1to flatten the alpha out of the image.

The resulting image has its colourspace changedby callingcolourspace/3,where we change the tosRGB.

The colourspace-altered image is then converted to atensor,by callingwrite_to_tensor/1.

We thenreshapethe tensor according to the format that was previously mentioned.

This function returns the processed tensor,that is then used as input to the model.

4.4 Updating the view

All that's left is updating the viewto reflect these changes we've made to theLiveView.Head over tolib/app_web/live/page_live.html.heexand change it to this.

<.flash_group flash={@flash}/><divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 sm:py-28 lg:px-8 xl:px-28 xl:py-32"><divclass="flex justify-center items-center mx-auto max-w-xl w-[50vw] lg:mx-0"><divclass="space-y-12"><divclass="border-gray-900/10 pb-12"><h2class="text-base font-semibold leading-7 text-gray-900">          Image Classification</h2><pclass="mt-1 text-sm leading-6 text-gray-600">          Do simple classification with this<ahref="https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html"class="font-mono font-medium text-sky-500">LiveView</a>          demo, powered by<ahref="https://github.com/elixir-nx/bumblebee"class="font-mono font-medium text-sky-500">Bumblebee</a>.</p><!-- File upload section --><divclass="mt-10 grid grid-cols-1 gap-x-6 gap-y-8 sm:grid-cols-6"><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"phx-drop-target="{@uploads.image_list.ref}"><divclass="text-center"><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/> Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">                  PNG, JPG, GIF up to 5MB</p></div></div></div></div><!-- Prediction text --><divclass="mt-6 flex space-x-1.5 items-center font-bold text-gray-900 text-xl"><span>Description:</span><!-- Spinner --><%= if @upload_running? do %><divrole="status"><divclass="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400"><divclass="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white"></div></div></div><% else %><%= if @label do %><spanclass="text-gray-700 font-light"><%= @label %></span><% else %><spanclass="text-gray-300 font-light">Waiting for image input.</span><% end %><% end %></div></div></div></div></div>

In these changes,we've added the output of the model in the form of text.We are rendering a spinnerif the:upload_running? socket assign is set to true.Otherwise,we add the:label, which holds the prediction made by the model.

You may have also noticed thatwe've changed thephx event handlerstonoop.This is simply to simplify theLiveView.

Head over tolib/app_web/live/page_live.ex.You can now remove the"validate","save"and"remove-selected" handlers,because we're not going to be needing them.Replace them with this handler:

@impltruedefhandle_event("noop",_params,socket)do{:noreply,socket}end

4.5 Check it out!

And that's it!Our app is nowfunctional 🎉.

If you run the app,you can drag and drop or select an image.After this, a task will be spawned that will run the modelagainst the image that was submitted.

Once a prediction is made, display it!

You can andshould try other models.ResNet-50 is just one of the many that are supported byBumblebee.You can see the supported models athttps://github.com/elixir-nx/bumblebee#model-support.

4.6 Considerations on user images

To keep the app as simple as possible,we are receiving the image from the person as is.Although we are processing the image,we are doing it soit is processable by the model.

We have to understand that:

in most cases,full-resolution images are not necessary,because neural networks work on much smaller inputs(e.g.ResNet-50 works with224px x 224px images).This means that a lot of data is unnecessarily uploaded over the network,increasing workload on the server to potentially downsize a large image.
decoding an image requires an additional package,meaning more work on the server.

We can avoid both of these downsides by moving this work to the client.We can leverage theCanvas APIto decode and downsize this image on the client-side,reducing server workload.

You can see an example implementation of this techniqueinBumblebee's repositoryathttps://github.com/elixir-nx/bumblebee/blob/main/examples/phoenix/image_classification.exs

However, since we are not usingJavaScript for anything,we can (andshould!) properly downsize our imagesso they better fit the training dataset of the model we use.This will allow the model to process fastersince larger images carry over more data that is ultimately unnecessaryfor models to make predictions.

Openlib/app_web/live/page_live.ex,find thehandle_progress/3 functionand change resize the imagebefore processing it.

file_binary=File.read!(meta.path)# Get image and resize# This is dependant on the resolution of the model's dataset.# In our case, we want the width to be closer to 640, whilst maintaining aspect ratio.width=640{:ok,thumbnail_vimage}=Vix.Vips.Operation.thumbnail(meta.path,width,size::VIPS_SIZE_DOWN)# Pre-process it{:ok,tensor}=pre_process_image(thumbnail_vimage)#...

We are usingVix.Vips.Operation.thumbnail/3to resize our image to a fixed widthwhilst maintaining aspect ratio.Thewidth variable can be dependent on the model that you use.For example,ResNet-50 is trained on224px224 pictures,so you may want to resize the image to this width.

Note: We are using thethumbnail/3 functioninstead ofresize/3 because it'smuch faster.
Checkhttps://github.com/libvips/libvips/wiki/HOWTO----Image-shrinkingto know why.

5. Final Touches

Although our app is functional,we can make itbetter. 🎨

5.1 Setting max file size

In order to better control user input,we should add a limit to the size of the image that is being uploaded.It will be easier on our server and ultimately save costs.

Let's add a cap of5MB to our app!Fortunately for you, this is super simple!You just need to add themax_file_sizeto theallow_uploads/2 functionwhen mounting theLiveView!

defmount(_params,_session,socket)do{:ok,socket|>assign(label:nil,upload_running?:false,task_ref:nil)|>allow_upload(:image_list,accept:~w(image/*),auto_upload:true,progress:&handle_progress/3,max_entries:1,chunk_size:64_000,max_file_size:5_000_000# add this)}end

And that's it!The number is inbytes,hence why we set it as5_000_000.

5.2 Show errors

In case a person uploads an image that is too large,we should show this feedback to the person!

For this, we can leverage theupload_errors/2function.This function will return the entry errors for an upload.We need to add a handler for one of these errors to show it first.

Head overlib/app_web/live/page_live.exand add the following line.

deferror_to_string(:too_large),do:"Image too large. Upload a smaller image up to 5MB."

Now, add the following section below the upload forminsidelib/app_web/live/page_live.html.heex.

<!-- Show errors --><%= for entry<-@uploads.image_list.entriesdo%><divclass="mt-2"><%= for err<-upload_errors(@uploads.image_list,entry)do%><divclass="rounded-md bg-red-50 p-4 mb-2"><divclass="flex"><divclass="flex-shrink-0"><svgclass="h-5 w-5 text-red-400"viewBox="0 0 20 20"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.28 7.22a.75.75 0 00-1.06 1.06L8.94 10l-1.72 1.72a.75.75 0 101.06 1.06L10 11.06l1.72 1.72a.75.75 0 101.06-1.06L11.06 10l1.72-1.72a.75.75 0 00-1.06-1.06L10 8.94 8.28 7.22z"clip-rule="evenodd"/></svg></div><divclass="ml-3"><h3class="text-sm font-medium text-red-800"><%= error_to_string(err) %></h3></div></div></div><% end %></div><% end %>

We are iterating over the errors returned byupload_errors/2and invokingerror_to_string/1,which we've just defined in ourLiveView.

Now, if you run the appand try to upload an image that is too large,an error will show up.

Awesome! 🎉

5.3 Show image preview

As of now, even though our app predicts the given images,it is not showing a preview of the image the person submitted.Let's fix this 🛠️.

Let's add a new socket assign variablepertaining to thebase64 representationof the image inlib/app_web/live_page/live.ex

|>assign(label:nil,upload_running?:false,task_ref:nil,image_preview_base64:nil)

We've addedimage_preview_base64as a new socket assign,initializing it asnil.

Next, we need toread the file while consuming it,and properly update the socket assignso we can show it to the person.

In the same file,change thehandle_progress/3 function to the following.

defhandle_progress(:image_list,entry,socket)whenentry.done?do# Consume the entry and get the tensor to feed to classifier%{tensor:tensor,file_binary:file_binary}=consume_uploaded_entry(socket,entry,fn%{}=meta->file_binary=File.read!(meta.path){:ok,vimage}=Vix.Vips.Image.new_from_file(meta.path){:ok,tensor}=pre_process_image(vimage){:ok,%{tensor:tensor,file_binary:file_binary}}end)# Create an async task to classify the imagetask=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)# Encode the image to base64base64="data:image/png;base64, "<>Base.encode64(file_binary)# Update socket assigns to show spinner whilst task is running{:noreply,assign(socket,upload_running?:true,task_ref:task.ref,image_preview_base64:base64)}end

We're usingFile.read!/1to retrieve the binary representation of the image that was uploaded.We useBase.encode64/2to encode this file binaryand assign the newly createdimage_preview_base64 socket assignwith this base64 representation of the image.

Now, all that's left to dois torender the image on our view.Inlib/app_web/live/page_live.html.heex,locate the line:

<divclass="text-center"></div>

We are going to update this<div>to show the image with theimage_preview_base64 socket assign.

<divclass="text-center"><!-- Show image preview --><%= if @image_preview_base64 do %><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/><imgsrc="{@image_preview_base64}"/></label></form><% else %><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/>          Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">PNG, JPG, GIF up to 5MB</p><% end %></div>

As you can see,we are checking if@image_preview_base64 is defined.If so, we simply show the image with it assrc 😊.

Now, if you run the application,you'll see that after dragging the image,it is previewed and shown to the person!

6. What about other models?

Maybe you weren't happy with the results from this model.

That's fair.ResNet-50 is a smaller, "older" model compared to otherimage captioning/classification models.

What if you wanted to use others?Well, as we've mentioned before,Bumblebee usesTransformer models fromHuggingFace.To know if one is supported(as shown inBumblebee's docs),we need to check theconfig.json filein the model repositoryand copy the class name under"architectures"and search it onBumblebee's codebase.

For example,here's one of the more popular image captioning models -Salesforce'sBLIP -https://huggingface.co/Salesforce/blip-image-captioning-large/blob/main/config.json.

If you visitBumblebee's codebaseand search for the class name,you'll find it is supported.

Awesome!Now we can use it!

If you dig aroundBumblebee's docs as well(https://hexdocs.pm/bumblebee/Bumblebee.Vision.html#image_to_text/5),you'll see that we've got to useimage_to_text/5 with this model.It needs atokenizer,featurizer and ageneration-configso we can use it.

Let's do it!Head over tolib/app/application.ex,and change theserving/0 function.

defservingdo{:ok,model_info}=Bumblebee.load_model({:hf,"Salesforce/blip-image-captioning-base"}){:ok,featurizer}=Bumblebee.load_featurizer({:hf,"Salesforce/blip-image-captioning-base"}){:ok,tokenizer}=Bumblebee.load_tokenizer({:hf,"Salesforce/blip-image-captioning-base"}){:ok,generation_config}=Bumblebee.load_generation_config({:hf,"Salesforce/blip-image-captioning-base"})Bumblebee.Vision.image_to_text(model_info,featurizer,tokenizer,generation_config,compile:[batch_size:10],defn_options:[compiler:EXLA])end

As you can see, we're using the repository name ofBLIP's modelfrom the HuggingFace website.

If you runmix phx.server,you'll see that it will download the new models,tokenizers, featurizer and configs to run the model.

|======================================================================| 100% (989.82 MB)[info] TfrtCpuClient created.|======================================================================| 100% (711.39 KB)[info] Running AppWeb.Endpoint with cowboy 2.10.0 at 127.0.0.1:4000 (http)[info] Access AppWeb.Endpoint at http://localhost:4000[watch] build finished, watchingfor changes...

You may think we're done here.But we are not! ✋

Thedestructuring of the output of the model may not be the same.
If you try to submit a photo,you'll get this error:

no match of right hand side value:%{results: [%{text:"a person holding a large blue ball on a beach"}]}

This means that we need to make some changeswhen parsing the output of the model 😀.

Head over tolib/app_web/live/page_live.exand change thehandle_info/3 functionthat is called after the async task is completed.

defhandle_info({ref,result},%{assigns:%{task_ref:ref}}=socket)doProcess.demonitor(ref,[:flush])%{results:[%{text:label}]}=result# change this line{:noreply,assign(socket,label:label,upload_running?:false)}end

As you can see, we are now correctly destructuring the result from the model.And that's it!

If you runmix phx.server,you'll see that we got far more accurate results!

Awesome! 🎉

Note

Be aware thatBLIPis amuch larger model thanResNet-50.There are more accurate and even larger models out there(e.g:blip-image-captioning-large,the larger version of the model we've just used).This is a balancing act: the larger the model, the longer a prediction may takeand more resources your server will need to have to handle this heavier workload.

Warning

We've created a small module that allows you to have multiple modelscached and downloaded and keep this logic contained.

For this, check thedeployment guide.

7. How do I deploy this thing?

There are a few considerations you may want to havebefore considering deploying this.Luckily for you,we've created a small documentthat willguide you through deploying this app infly.io!

Check thedeployment.md file for more information.

8. Showing example images

Warning

This section assumes you've made the changes made in the previous section.Therefore, you should follow the instructions in7. How do I deploy this thing?and come back after you're done.

We have a fully functioning application that predicts images.Now we can add some cool touches to show the personsome examples if they are inactive.

For this,we are going to need to makethree changes.

create a hook in theclient (Javascript)to send an event when there's inactivity aftera given number of seconds.
changepage_live.exLiveView to accommodatethis new event.
change theview mpage_live.html.heexto show these changes to the person.

Let's go over each one!

8.1 Creating a hook in client

We are going to detect the inactivity of the personwith someJavascript code.

Head over toassets/js/app.jsand change it to the following.

// If you want to use Phoenix channels, run `mix help phx.gen.channel`// to get started and then uncomment the line below.// import "./user_socket.js"// You can include dependencies in two ways.//// The simplest option is to put them in assets/vendor and// import them using relative paths:////     import "../vendor/some-package.js"//// Alternatively, you can `npm install some-package --prefix assets` and import// them using a path starting with the package name:////     import "some-package"//// Include phoenix_html to handle method=PUT/DELETE in forms and buttons.import"phoenix_html";// Establish Phoenix Socket and LiveView configuration.import{Socket}from"phoenix";import{LiveSocket}from"phoenix_live_view";importtopbarfrom"../vendor/topbar";// Hooks to track inactivityletHooks={};Hooks.ActivityTracker={mounted(){// Set the inactivity duration in millisecondsconstinactivityDuration=8000;// 8 seconds// Set a variable to keep track of the timer and if the process to predict example image has already been sentletinactivityTimer;letprocessHasBeenSent=false;letctx=this;// Function to reset the timerfunctionresetInactivityTimer(){// Clear the previous timerclearTimeout(inactivityTimer);// Start a new timerinactivityTimer=setTimeout(()=>{// Perform the desired action after the inactivity duration// For example, send a message to the Elixir process using Phoenix Socketif(!processHasBeenSent){processHasBeenSent=true;ctx.pushEvent("show_examples",{});}},inactivityDuration);}// Call the function to start the timer initiallyresetInactivityTimer();// Reset the timer whenever there is user activitydocument.addEventListener("mousemove",resetInactivityTimer);document.addEventListener("keydown",resetInactivityTimer);},};letcsrfToken=document.querySelector("meta[name='csrf-token']").getAttribute("content");letliveSocket=newLiveSocket("/live",Socket,{hooks:Hooks,params:{_csrf_token:csrfToken},});// Show progress bar on live navigation and form submitstopbar.config({barColors:{0:"#29d"},shadowColor:"rgba(0, 0, 0, .3)"});window.addEventListener("phx:page-loading-start",(_info)=>topbar.show(300));window.addEventListener("phx:page-loading-stop",(_info)=>topbar.hide());// connect if there are any LiveViews on the pageliveSocket.connect();// expose liveSocket on window for web console debug logs and latency simulation:// >> liveSocket.enableDebug()// >> liveSocket.enableLatencySim(1000)  // enabled for duration of browser session// >> liveSocket.disableLatencySim()window.liveSocket=liveSocket;

we have added aHooks variable,with a propertyActivityTracker.This hook has themounted() functionthat is executed when the component that is hooked with this hook is mounted.You can find more information athttps://hexdocs.pm/phoenix_live_view/js-interop.html.
inside themounted() function,we create aresetInactivityTimer() functionthat is executed every time themouse moves (mousemove event)and akey is pressed(keydown).This function resets the timerthat is run whilst there is a lack of inactivity.
if the person is inactive for8 seconds,we create an event"show_examples".We will create a handler in the LiveViewto handle this event later.
we add theHooks variable to thehooksproperty when initializing thelivesocket.

And that's it!

For this hook toactually be executed,we need to create a component that uses itinside our view file.

For this, we can simply create a hidden componenton top of thelib/app_web/live/page_live.html.heex file.

Add the following hidden component.

<divclass="hidden"id="tracker_el"phx-hook="ActivityTracker"/>

We use thephx-hook attributeto bind the hook we've created in ourapp.js fileto the component so it's executed.When this component is mounted,themounted() function inside the hookis executed.

And that's it! 👏

Your app won't work yet becausewe haven't created a handlerto handle the"show_examples" event.

Let's do that right now!

8.2 Handling the example images list event inside our LiveView

Now that we have our client sorted,let's head over to our LiveViewatlib/app_web/live/page_live.exand make the needed changes!

Before anything,let's add socket assigns that we will need throughout our adventure!On top of the file,change the socket assigns to the following.

socket|>assign(# Related to the file uploaded by the userlabel:nil,upload_running?:false,task_ref:nil,image_preview_base64:nil,# Related to the list of image examplesexample_list_tasks:[],example_list:[],display_list?:false)

We've addedthree new assigns:

example_list_tasks is a list ofthe async tasks that are created for each example image.
example_list is a list of the example imageswith their respective predictions.
display_list? is a boolean thattells us if the list is to be shown or not.

Awesome! Let's continue.

As we've mentioned before,we need to create a handler for our"show_examples" event.

Add the following function to the file.

@image_width640defhandle_event("show_examples",_data,socket)do# Only run if the user hasn't uploaded anythingif(is_nil(socket.assigns.task_ref))do# Retrieves a random image from Picsum with a given `image_width` dimensionrandom_image="https://picsum.photos#{@image_width}/#{@image_width}"# Spawns prediction tasks for example image from random Picsum imagetasks=for_<-1..2do{:req,body}={:req,Req.get!(random_image).body}predict_example_image(body)end# List to change `example_list` socket assign to show skeleton loadingdisplay_example_images=Enum.map(tasks,fnobj->%{predicting?:true,ref:obj.ref}end)# Updates the socket assigns{:noreply,assign(socket,example_list_tasks:tasks,example_list:display_example_images)}else{:noreply,socket}endend

Warning

We are using thereq packageto download the file binary from the URL.Make sure to install it in themix.exs file.

we are using thePicsum API,anawesome image API with lots of photos!They provide a/random URL that yields a random photo.In this URL we can inclusively define the dimensions we want!That's what we're doing in the first line of the function.We are using a module constant@image_width 640 on top of the file,so add that to the top of the file.This function is relevant because we preferably want to dealwith images that are in the same resolution as the dataset the model was trained in.
we are creatingtwo async tasks that retrieve the binary of the imageand pass it on to apredict_example_image/1 function(we will create this function next).
the two tasks that we've created are in an arraytasks.We createanother arraydisplay_example_images with the same number of elements astasks,which will have two properties:predicting, meaning if the image is being predicted by the model;andref, the reference of the task.
we assign thetasks array to theexample_list_tasks socket assignanddisplay_example_images array to theexample_list array.Soexample_list will temporarily holdobjects with:predicting and:ref propertieswhilst the model is being executed.

As we've just mentioned,we are making use of a function calledpredict_example_image/1 to make predictionsof a given file binary.

Let's implement it now!In the same file, add:

defpredict_example_image(body)dowith{:vix,{:ok,img_thumb}}<-{:vix,Vix.Vips.Operation.thumbnail_buffer(body,@image_width)},{:pre_process,{:ok,t_img}}<-{:pre_process,pre_process_image(img_thumb)}do# Create an async task to classify the image from PicsumTask.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,t_img)end)|>Map.merge(%{base64_encoded_url:"data:image/png;base64, "<>Base.encode64(body)})else{stage,error}->{stage,error}endend

For the body of the image to be executed by the model,it needs to go through some pre-processing.

we are usingthumbnail_buffer/3to make sure it's properly resizedand then feeding the result to our own implementedpre_process_image/1 functionso it can be converted to a parseable tensor by the model.
after these two operations are successfully completed,we spawn two async tasks (like we've done before)and feed it to the model.We add the base64-encoded imageto the return value so it can later be shown to the person.
if these operations fail, we return an error.

Great job! 👏

Our example images async tasks have successfully been createdand are on their way to the model!

Now we need to handle these newly created async tasksonce they are completed.As we know, we are handling our async tasks completionin thedef handle_info({ref, result}, %{assigns: %{task_ref: ref}} = socket) function.Let's change it like so.

defhandle_info({ref,result},%{assigns:assigns}=socket)do# Flush async callProcess.demonitor(ref,[:flush])# You need to change how you destructure the output of the model depending# on the model you've chosen for `prod` and `test` envs on `models.ex`.)label=caseApplication.get_env(:app,:use_test_models,false)dotrue->App.Models.extract_test_label(result)# coveralls-ignore-startfalse->App.Models.extract_prod_label(result)# coveralls-ignore-stopendconddo# If the upload task has finished executing, we update the socket assigns.Map.get(assigns,:task_ref)==ref->{:noreply,assign(socket,label:label,upload_running?:false)}# If the example task has finished executing, we upload the socket assigns.img=Map.get(assigns,:example_list_tasks)|>Enum.find(&(&1.ref==ref))-># Update the element in the `example_list` enum to turn "predicting?" to `false`updated_example_list=Map.get(assigns,:example_list)|>Enum.map(fnobj->ifobj.ref==img.refdoMap.put(obj,:base64_encoded_url,img.base64_encoded_url)|>Map.put(:label,label)|>Map.put(:predicting?,false)elseobjendend){:noreply,assign(socket,example_list:updated_example_list,upload_running?:false,display_list?:true)}endend

The only change we've made is thatwe've added acondflow structure.We are essentially checkingif the task reference that has been completedisfrom the uploaded image from the person (:task_ref socket assign)orfrom an example image (inside the:example_list_tasks socket assign list).

If it's the latter,we retrieve we are updatingtheexample_list socket assign listwith theprediction (:label),thebase64-encoded image from the task list (:base64_encoded_url)and setting the:predicting property tofalse.

And that's it!Great job! 🥳

8.3 Updating the view

Now that we've made all the necessary changes to our LiveView,we need to update our view so it reflects them!

Head over tolib/app_web/live/page_live.html.heexand change it to the following piece of code.

<divclass="hidden"id="tracker_el"phx-hook="ActivityTracker"/><divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 sm:py-24 lg:px-8 xl:px-28 xl:py-32"><divclass="flex flex-col justify-start"><divclass="flex justify-center items-center w-full"><divclass="2xl:space-y-12"><divclass="mx-auto max-w-2xl lg:text-center"><p><spanclass="rounded-full w-fit bg-brand/5 px-2 py-1 text-[0.8125rem] font-medium text-center leading-6 text-brand"><ahref="https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html"target="_blank"rel="noopener noreferrer">                🔥 LiveView</a>              +<ahref="https://github.com/elixir-nx/bumblebee"target="_blank"rel="noopener noreferrer">                🐝 Bumblebee</a></span></p><pclass="mt-2 text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl">            Caption your image!</p><h3class="mt-6 text-lg leading-8 text-gray-600">            Upload your own image (up to 5MB) and perform image captioning with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">              Elixir</a>            !</h3><pclass="text-lg leading-8 text-gray-400">            Powered with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">              HuggingFace🤗</a>            transformer models, you can run this project locally and perform            machine learning tasks with a handful lines of code.</p></div><divclass="border-gray-900/10"><!-- File upload section --><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"phx-drop-target="{@uploads.image_list.ref}"><divclass="text-center"><!-- Show image preview --><%= if @image_preview_base64 do %><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/><imgsrc="{@image_preview_base64}"/></label></form><% else %><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/> Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">                  PNG, JPG, GIF up to 5MB</p><% end %></div></div></div></div><!-- Show errors --><%= for entry<-@uploads.image_list.entriesdo%><divclass="mt-2"><%= for err<-upload_errors(@uploads.image_list,entry)do%><divclass="rounded-md bg-red-50 p-4 mb-2"><divclass="flex"><divclass="flex-shrink-0"><svgclass="h-5 w-5 text-red-400"viewBox="0 0 20 20"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.28 7.22a.75.75 0 00-1.06 1.06L8.94 10l-1.72 1.72a.75.75 0 101.06 1.06L10 11.06l1.72 1.72a.75.75 0 101.06-1.06L11.06 10l1.72-1.72a.75.75 0 00-1.06-1.06L10 8.94 8.28 7.22z"clip-rule="evenodd"/></svg></div><divclass="ml-3"><h3class="text-sm font-medium text-red-800"><%= error_to_string(err) %></h3></div></div></div><% end %></div><% end %><!-- Prediction text --><divclass="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl"><span>Description:</span><!-- Spinner --><%= if @upload_running? do %><divrole="status"><divclass="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400"><divclass="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white"></div></div></div><% else %><%= if @label do %><spanclass="text-gray-700 font-light"><%= @label %></span><% else %><spanclass="text-gray-300 font-light">Waiting for image input.</span><% end %><% end %></div></div></div><!-- Examples --><%= if @display_list? do %><divclass="flex flex-col"><h3class="mt-10 text-xl lg:text-center font-light tracking-tight text-gray-900 lg:text-2xl">        Examples</h3><divclass="flex flex-row justify-center my-8"><divclass="mx-auto grid max-w-2xl grid-cols-1 gap-x-6 gap-y-20 sm:grid-cols-2"><%= for example_img<-@example_listdo%><!-- Loading skeleton if it is predicting --><%= if example_img.predicting? == true do %><divrole="status"class="flex items-center justify-center w-full h-full max-w-sm bg-gray-300 rounded-lg animate-pulse"><svgclass="w-10 h-10 text-gray-200 dark:text-gray-600"aria-hidden="true"xmlns="http://www.w3.org/2000/svg"fill="currentColor"viewBox="0 0 20 18"><pathd="M18 0H2a2 2 0 0 0-2 2v14a2 2 0 0 0 2 2h16a2 2 0 0 0 2-2V2a2 2 0 0 0-2-2Zm-5.5 4a1.5 1.5 0 1 1 0 3 1.5 1.5 0 0 1 0-3Zm4.376 10.481A1 1 0 0 1 16 15H4a1 1 0 0 1-.895-1.447l3.5-7A1 1 0 0 1 7.468 6a.965.965 0 0 1 .9.5l2.775 4.757 1.546-1.887a1 1 0 0 1 1.618.1l2.541 4a1 1 0 0 1 .028 1.011Z"/></svg><spanclass="sr-only">Loading...</span></div><% else %><div><imgid="{example_img.base64_encoded_url}"src="{example_img.base64_encoded_url}"class="rounded-2xl object-cover"/><h3class="mt-1 text-lg leading-8 text-gray-900 text-center"><%= example_img.label %></h3></div><% end %><% end %></div></div></div><% end %></div></div>

We've made two changes.

we've added some text to better introduce our applicationat the top of the page.
added a section to show the example images list.This section is only rendered if thedisplay_list? socket assign is set totrue.If so, we iterate over theexample_list socket assign listand show aloading skeletonif the image is:predicting.If not, it means the image has already been predicted,and we show the base64-encoded imagelike we do with the image uploaded by the person.

And that's it! 🎉

8.4 Using URL of image instead of base64-encoded

While our example list is being correctly rendered,we are using additional CPUto base64 encode our imagesso they can be shown to the person.

Initially, we did this becausehttps://picsum.photosresolves into a different URL every time it is called.This means that the image that was fed into the modelwould be different from the one shown in the example listif we were to use this URL in our view.

To fix this,we need to follow the redirection when the URL is resolved.

Note

We can do this withFinch,if we wanted to.

We could do something likedef rand_splash do

%{scheme:scheme,host:host,path:path}=Finch.build(:get,"https://picsum.photos")|>Finch.request!(MyFinch)|>Map.get(:headers)|>Enum.filter(fn{a,_b}->a=="location"end)|>List.first()|>elem(1)|>URI.parse()scheme<>"://"<>host<>pathend

And then call it, like so.

App.rand_splash()# https://images.unsplash.com/photo-1694813646634-9558dc7960e3

Because we are already usingreq,let's make use of itinstead of adding additional dependencies.

Let's first add a function that will do thisinlib/app_web/live/page_live.ex.Add the following piece of codeat the end of the file.

defptrack_redirected(url)do# Create requestreq=Req.new(url:url)# Add tracking properties to req objectreq=req|>Req.Request.register_options([:track_redirected])|>Req.Request.prepend_response_steps(track_redirected:&track_redirected_uri/1)# Make request{:ok,response}=Req.request(req)# Return the final URI%{url:URI.to_string(response.private.final_uri),body:response.body}enddefptrack_redirected_uri({request,response})do{request,%{response|private:Map.put(response.private,:final_uri,request.url)}}end

This function adds properties to the request objectand tracks the redirection.It will add aURIobject insideprivate.final_uri.This function returnsthebody of the imageand the finalurlit is resolved to(the URL of the image).

Now all we need to do is use this function!Head over to thehandle_event("show_examples"... functionand change the loop to the following.

tasks=for_<-1..2do%{url:url,body:body}=track_redirected(random_image)predict_example_image(body,url)end

We are making use oftrack_redirected/1,the function we've just created.We pass bothbody andurl topredict_example_image/1,which we will now change.

defpredict_example_image(body,url)dowith{:vix,{:ok,img_thumb}}<-{:vix,Vix.Vips.Operation.thumbnail_buffer(body,@image_width)},{:pre_process,{:ok,t_img}}<-{:pre_process,pre_process_image(img_thumb)}do# Create an async task to classify the image from PicsumTask.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,t_img)end)|>Map.merge(%{url:url})else{:vix,{:error,msg}}->{:error,msg}{:pre_process,{:error,msg}}->{:error,msg}endend

Instead of usingbase64_encoded_url,we are now using theurl we've acquired.

The last step we need to do in our LiveViewis to finally use thisurlinhandle_info/3.

defhandle_info({ref,result},%{assigns:assigns}=socket)doProcess.demonitor(ref,[:flush])label=caseApplication.get_env(:app,:use_test_models,false)dotrue->App.Models.extract_test_label(result)false->App.Models.extract_prod_label(result)endconddoMap.get(assigns,:task_ref)==ref->{:noreply,assign(socket,label:label,upload_running?:false)}img=Map.get(assigns,:example_list_tasks)|>Enum.find(&(&1.ref==ref))->updated_example_list=Map.get(assigns,:example_list)|>Enum.map(fnobj->ifobj.ref==img.refdoobj|>Map.put(:url,img.url)# change here|>Map.put(:label,label)|>Map.put(:predicting?,false)elseobjendend){:noreply,assign(socket,example_list:updated_example_list,upload_running?:false,display_list?:true)}endend

And that's it!

The last thing we need to do is change our viewso it uses the:url parameterinstead of the obsolete:base64_encoded_url.

Head over tolib/app_web/live/page_live.html.heexand change the<img> being shown in the example listso it uses the:url parameter.

<imgid="{example_img.url}"src="{example_img.url}"class="rounded-2xl object-cover"/>

And we're done! 🎉

We are now rendering the image on the clientthrough the URL the Picsum API resolves intoinstead of having the LiveView serverencoding the image.Therefore, we're saving some CPUto the thing that matters the most:running our model.

8.5 See it running

Now let's see our application in action!We are expecting the examples to be shown after8 seconds of inactivity.If the person is inactive for this time duration,we fetch a random image from Picsum APIand feed it to our model!

You should see different images every time you use the app.Isn't that cool? 😎

9. Store metadata and classification info

Our app is shaping up quite nicely!As it stands, it's an application that does inference on images.However, it doesn't save them.

Let's expand our application so it has adatabasewhere image classification is saved/persisted!

We'll usePostgres for this.Typically, when you create a newPhoenix projectwithmix phx.new,aPostgres database will be automatically created.Because we didn't do this,we'll have to configure this ourselves.

Let's do it!

9.1 Installing dependencies

We'll install all the needed dependencies first.Inmix.exs, add the following snippetto thedeps section.

# HTTP Request{:httpoison,"~> 2.2"},{:mime,"~> 2.0.5"},{:ex_image_info,"~> 0.2.4"},# DB{:phoenix_ecto,"~> 4.4"},{:ecto_sql,"~> 3.10"},{:postgrex,">= 0.0.0"},

httpoison,mime andex_image_infoare used to make HTTP requests,get the content type and information from an image file,respectively.These will be needed to upload a given image toimgup,by making multipart requests.
phoenix_ecto,ecto_sql andpostgrexare needed to properly configure our driver in Elixirthat will connect to a Postgres database,in which we will persist data.

Runmix deps.get to install these dependencies.

9.2 Adding`Postgres` configuration files

Now let's create the needed filesto properly connect to a Postgres relational database.Start by going tolib/appand createrepo.ex.

defmoduleApp.RepodouseEcto.Repo,otp_app::app,adapter:Ecto.Adapters.Postgresend

This module will be needed in our configuration filesso our app knows where the database is.

Next, inlib/app/application.ex,add the following line to thechildren arrayin the supervision tree.

children=[AppWeb.Telemetry,App.Repo,# add this line{Phoenix.PubSub,name:App.PubSub},...]

Awesome! 🎉

Now let's head over to the files inside theconfig folder.

Inconfig/config.exs,add these lines.

config:app,ecto_repos:[App.Repo],generators:[timestamp_type::utc_datetime]

We are referring the module we've previously created (App.Repo)toecto_repos soecto knows where the configurationfor the database is located.

Inconfig/dev.exs,add:

config:app,App.Repo,username:"postgres",password:"postgres",hostname:"localhost",database:"app_dev",stacktrace:true,show_sensitive_data_on_connection_error:true,pool_size:10

We are defining the parameters of the databasethat is used during development.

Inconfig/runtime.exs,add:

ifconfig_env()==:proddodatabase_url=System.get_env("DATABASE_URL")||raise"""      environment variable DATABASE_URL is missing.      For example: ecto://USER:PASS@HOST/DATABASE      """maybe_ipv6=ifSystem.get_env("ECTO_IPV6")in~w(true 1),do:[:inet6],else:[]config:app,App.Repo,# ssl: true,url:database_url,pool_size:String.to_integer(System.get_env("POOL_SIZE")||"10"),socket_options:maybe_ipv6# ...

We are configuring the runtime database configuration..

Inconfig/test.exs,add:

config:app,App.Repo,username:"postgres",password:"postgres",hostname:"localhost",database:"app_test#{System.get_env("MIX_TEST_PARTITION")}",pool:Ecto.Adapters.SQL.Sandbox,pool_size:10

Here we're defining the database used during testing.

Now let's create amigration file to create our database table.Inpriv/repo/migrations/, create a filecalled20231204092441_create_images.exs(or any other timestamp string)with the following piece of code.

defmoduleApp.Repo.Migrations.CreateImagesdouseEcto.Migrationdefchangedocreatetable(:images)doadd:url,:stringadd:description,:stringadd:width,:integeradd:height,:integertimestamps(type::utc_datetime)endendend

And that's it!Those are the needed files that our application needsto properly connect and persist data into the Postgres database.

You can now runmix ecto.create andmix ecto.migrateto create the databaseand the"images" table.

9.3 Creating`Image` schema

For now, let's create a simple table"images"in our databasethat has the following properties:

description: the description of the imagefrom the model.
width: width of the image.
height: height of the image.
url: public URL where the image is publicly hosted.

With this in mind, let's create a new file!Inlib/app/, create a file calledimage.ex.

defmoduleApp.ImagedouseEcto.SchemaaliasApp.{Image,Repo}@primary_key{:id,:id,autogenerate:true}schema"images"dofield(:description,:string)field(:width,:integer)field(:url,:string)field(:height,:integer)timestamps(type::utc_datetime)enddefchangeset(image,params\\%{})doimage|>Ecto.Changeset.cast(params,[:url,:description,:width,:height])|>Ecto.Changeset.validate_required([:url,:description,:width,:height])end@doc"""  Uploads the given image to S3  and adds the image information to the database.  """definsert(image)do%Image{}|>changeset(image)|>Repo.insert!()endend

We've just created theApp.Image schemawith the aforementioned fields.

We've createdchangeset/1, which is used to castand validate the properties of a given objectbefore interacting with the database.

insert/1 receives an object,runs it through the changesetand inserts it in the database.

9.4 Changing our LiveView to persist data

Now that we have our database set up,let's change some of our code so we persist data into it!In this section, we'll be working in thelib/app_web/live/page_live.ex file.

First, let's importApp.Imageand create anImageInfo struct to hold the informationof the image throughout the process of uploadingand classifying the image.

defmoduleAppWeb.PageLivedouseAppWeb,:live_viewaliasApp.Image# add this importaliasVix.Vips.Image,as:VimagedefmoduleImageInfodo@doc"""    General information for the image that is being analysed.    This information is useful when persisting the image to the database.    """defstruct[:mimetype,:width,:height,:url,:file_binary]end# ...

We are going to be usingImageInfoin our socket assigns.Let's add to it when the LiveView is mounting!

|>assign(label:nil,upload_running?:false,task_ref:nil,image_info:nil,# add this lineimage_preview_base64:nil,example_list_tasks:[],example_list:[],display_list?:false)

When the person uploads an image,we want to retrieve its info (namely itsheight,width)and upload the image to anS3 bucket (we're doing this throughimgup)so we can populate the:url field of the schema in the database.

We can retrieve this informationwhile consuming the entry/uploading the image file.For this, go tohandle_progress(:image_list, entry, socket)and change the function to the following.

defhandle_progress(:image_list,entry,socket)whenentry.done?do# We've changed the object that is returned from `consume_uploaded_entry/3` to return an `image_info` object.%{tensor:tensor,image_info:image_info}=consume_uploaded_entry(socket,entry,fn%{}=meta->file_binary=File.read!(meta.path)# Add this line. It uses `ExImageInfo` to retrieve the info from the file binary.{mimetype,width,height,_variant}=ExImageInfo.info(file_binary){:ok,thumbnail_vimage}=Vix.Vips.Operation.thumbnail(meta.path,@image_width,size::VIPS_SIZE_DOWN){:ok,tensor}=pre_process_image(thumbnail_vimage)# Add this line. Uploads the image to the S3, which returns the `url` and `compressed url`.# (we'll implement this function next)url=Image.upload_image_to_s3(meta.path,mimetype)|>Map.get("url")# Add this line. We are constructing the image_info object to be returned.image_info=%ImageInfo{mimetype:mimetype,width:width,height:height,file_binary:file_binary,url:url}# Return it{:ok,%{tensor:tensor,image_info:image_info}}end)task=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)base64="data:image/png;base64, "<>Base.encode64(image_info.file_binary)# Change this line so `image_info` is defined when the image is uploaded{:noreply,assign(socket,upload_running?:true,task_ref:task.ref,image_preview_base64:base64,image_info:image_info)}#else#  {:noreply, socket}#endend

Check the comment lines for more explanation on the changes that have bee nmade.We are usingExImageInfo to fetch the information from the imageand assigning it to theimage_info socket we defined earlier.

We are also usingImage.upload_image_to_s3/2 to upload our image toimgup.Let's define this function inlib/app/image.ex.

defupload_image_to_s3(file_path,mimetype)doextension=MIME.extensions(mimetype)|>Enum.at(0)# Upload to Imgup - https://github.com/dwyl/imgupupload_response=HTTPoison.post!("https://imgup.fly.dev/api/images",{:multipart,[{:file,file_path,{"form-data",[name:"image",filename:"#{Path.basename(file_path)}.#{extension}"]},[{"Content-Type",mimetype}]}]},[])# Return URLJason.decode!(upload_response.body)end

We're usingHTTPoison to make a multipart request to theimgup server,effectively uploading the image to the server.If the upload is successful, it returns theurl of the uploaded image.

Let's go back tolib/app_web/live/page_live.ex.Now that we haveimage_info in the socket assigns,we can use it toinsert a row in the"images" table in the database.We only want to do this after the model is done running,so simply changehandle_info/2 function(which is called after the model is done with classifying the image).

conddo# If the upload task has finished executing, we update the socket assigns.Map.get(assigns,:task_ref)==ref-># Insert image to databaseimage=%{url:assigns.image_info.url,width:assigns.image_info.width,height:assigns.image_info.height,description:label}Image.insert(image)# Update socket assigns{:noreply,assign(socket,label:label,upload_running?:false)}# ...

In thecond do statement,we want to change the one pertaining to the image that is uploaded,not the example list that is defined below.We simply create animage variable with informationthat is passed down toImage.insert/1,effectively adding the row to the database.

And that's it!

Now every time a person uploads an imageand the model is executed,we are saving its location (:url),information (:width and:height)and the result of the classifying model(:description).

🥳

Note

If you're curious and want to see the data in your database,we recommend usingDBeaver,an open-source database manager.

You can learn more about it athttps://github.com/dwyl/learn-postgresql.

10. Adding double MIME type check and showing feedback to the person in case of failure

Currently, we are not handling any errorsin case the upload of the image toimgup fails.Although this is not critical,it'd be better if we could show feedback to the personin case the upload toimgup fails.This is good for us as well,because wecan monitor and locate the error fasterif we log the errors.

For this, let's head over tolib/app/image.exand update theupload_image_to_s3/2 function we've implemented.

defupload_image_to_s3(file_path,mimetype)doextension=MIME.extensions(mimetype)|>Enum.at(0)# Upload to Imgup - https://github.com/dwyl/imgupupload_response=HTTPoison.post("https://imgup.fly.dev/api/images",{:multipart,[{:file,file_path,{"form-data",[name:"image",filename:"#{Path.basename(file_path)}.#{extension}"]},[{"Content-Type",mimetype}]}]},[])# Process the response and return error if there was a problem uploading the imagecaseupload_responsedo# In case it's successful{:ok,%HTTPoison.Response{status_code:200,body:body}}->%{"url"=>url,"compressed_url"=>_}=Jason.decode!(body){:ok,url}# In case it returns HTTP 400 with specific reason it failed{:ok,%HTTPoison.Response{status_code:400,body:body}}->%{"errors"=>%{"detail"=>reason}}=Jason.decode!(body){:error,reason}# In case the request fails for whatever other reason{:error,%HTTPoison.Error{reason:reason}}->{:error,reason}endend

As you can see,we are returning{:error, reason} if an error occurs,and providing feedback alongside it.If it's successful, we return{:ok, url}.

Because we've just changed this function,we need to also updatedef handle_progress(:image_list...insidelib/app_web/live/page_live.exto properly handle this new function output.

We are also introducing a double MIME type check to ensure that only image files are uploaded and processed.We useGenMagic. It provides supervised and customisable access tolibmagic using a supervised external process.This gist explains that Magic numbers are the first bits of a filewhich uniquely identifies the type of file.

We use the GenMagic server as a daemon; it is started in the Application module.It is referenced by its name.When we runperform, we obtain a map and compare the mime type with the one read byExImageInfo.If they correspond with each other, we continue, or else we stop the process.

On your computer, for this to work locally, you should install the packagelibmagic-dev.

Note

Depending on your OS, you may installlibmagic in different ways.A quick Google search will suffice,but here are a few resources nonetheless:

Definitely readgen_magic's installation section inhttps://github.com/evadne/gen_magic#installation.You may need to perform additional steps.

You'll need to addgen_magictomix.exs.This dependency will allow us to accesslibmagic throughElixir.

defdepsdo[{:gen_magic,"~> 1.1.1"}]end

In theApplication module, you should add theGenMagic daemon(the C lib is loaded once for all and referenced by its name).

#application.exchildren=[...,{GenMagic.Server,name::gen_magic},]

In the Dockerfile (needed to deploy this app), we will install thelibmagic-dev as well:

RUN apt-get update -y && \  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates libmagic-dev\  && apt-get clean && rm -f /var/lib/apt/lists/*_*

Add the following function in the module App.Image:

@doc"""  Check file type via magic number. It uses a GenServer running the `C` lib "libmagic".  """defgen_magic_eval(path,accepted_mime)doGenMagic.Server.perform(:gen_magic,path)|>casedo{:error,reason}->{:error,reason}{:ok,%GenMagic.Result{mime_type:mime,encoding:"binary",content:_content}}->ifEnum.member?(accepted_mime,mime),do:{:ok,%{mime_type:mime}},else:{:error,"Not accepted mime type."}{:ok,%GenMagic.Result{}=res}->requireLoggerLogger.warning("⚠️ MIME type error:#{inspect(res)}"){:error,"Not acceptable."}endendend

In thepage_live.ex module, add the functions:

@doc"""Use the previous function and eturn the GenMagic reponse from the previous function"""defmagic_check(path)doApp.Image.gen_magic_eval(path,@accepted_mime)|>casedo{:ok,%{mime_type:mime}}->{:ok,%{mime_type:mime}}{:error,msg}->{:error,msg}endend@doc"""Double-checks the MIME type of uploaded file to ensure that the fileis an image and is not corrupted."""defcheck_mime(magic_mime,info_mime)doifmagic_mime==info_mime,do::ok,else::errorend

We are now ready to double-check the file inputwithExImageInfo andGenMagic to ensure the safety of the uploads.

defhandle_progress(:image_list,entry,socket)whenentry.done?do# We consume the entry only if the entry is done uploading from the image# and if consuming the entry was successful.with%{tensor:tensor,image_info:image_info}<-consume_uploaded_entry(socket,entry,fn%{path:path}->with{:magic,{:ok,%{mime_type:mime}}}<-{:magic,magic_check(path)},{:read,{:ok,file_binary}}<-{:read,File.read(path)},{:image_info,{mimetype,width,height,_variant}}<-{:image_info,ExImageInfo.info(file_binary)},{:check_mime,:ok}<-{:check_mime,check_mime(mime,mimetype)},# Get image and resize{:ok,thumbnail_vimage}<-Vix.Vips.Operation.thumbnail(path,@image_width,size::VIPS_SIZE_DOWN),# Pre-process it{:ok,tensor}<-pre_process_image(thumbnail_vimage)do# Upload image to S3Image.upload_image_to_s3(path,mimetype)|>casedo{:ok,url}->image_info=%ImageInfo{mimetype:mimetype,width:width,height:height,file_binary:file_binary,url:url}{:ok,%{tensor:tensor,image_info:image_info}}# If S3 upload fails, we return error{:error,reason}->{:ok,%{error:reason}}endelse{:error,reason}->{:postpone,%{error:reason}}endend)do# If consuming the entry was successful, we spawn a task to classify the image# and update the socket assignstask=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)# Encode the image to base64base64="data:image/png;base64, "<>Base.encode64(image_info.file_binary){:noreply,assign(socket,upload_running?:true,task_ref:task.ref,image_preview_base64:base64,image_info:image_info)}# Otherwise, if there was an error uploading the image, we log the error and show it to the person.else%{error:reason}->Logger.warning("⚠️ Error uploading image.#{inspect(reason)}"){:noreply,push_event(socket,"toast",%{message:"Image couldn't be uploaded to S3.\n#{reason}"})}_->{:noreply,socket}endend

Phew! That's a lot!Let's go through the changes we've made.

we are using thewith statementto only feed the image to the model for classificationin case the upload toimgup succeeds.We've changed whatconsume_uploaded_entry/3 returnsin case the upload fails - we return{:ok, %{error: reason}}.
in case the upload fails,we pattern match the{:ok, %{error: reason}} objectand push a"toast" event to the Javascript client(we'll implement these changes shortly).

Because we push an event in case the upload fails,we are going to make some changes to the Javascript client.We are going toshow a toast with the error when the upload fails.

10.1 Showing a toast component with error

To show atoast component,we are going to usetoastify.js.

Navigate toassets folderand run:

  pnpm install toastify-js

With this installed, we need to importtoastify stylesinassets/css/app.css.

@import"../node_modules/toastify-js/src/toastify.css";

All that's left ishandle the"toast" event inassets/js/app.js.Add the following snippet of code to do so.

// Hook to show message toastHooks.MessageToaster={mounted(){this.handleEvent("toast",(payload)=>{Toastify({text:payload.message,gravity:"bottom",position:"right",style:{background:"linear-gradient(to right, #f27474, #ed87b5)",},duration:4000,}).showToast();});},};

With thepayload.message we're receiving from the LiveView(remember when we executedpush_event/3 in our LiveView?),we are using it to create aToastify objectthat is shown in case the upload fails.

And that's it!Quite easy, isn't it? 😉

Ifimgup is down or the image that was sent was for example, invalid, an error should be shown, like so.

11. Benchmarking image captioning models

You may be wondering: which model is most suitable for me?Depending on the use case,Bumblebee supports different modelsfor different scenarios.

To help you make up your mind,we've created a guidethat benchmarks some ofBumblebee-supported modelsfor image captioning.

Although few models are supported,as they add more models,this comparison table will grow.So any contribution is more than welcome! 🎉

You may check the guideand all of the codeinside the_comparison folder.

🔍 Semantic search

In this section, we will focus on implementing afull-text search query through the captions of the images.At the end of this,you'll be able to transcribe audio,create embeddings from the audio transcriptionand search the closest related image.

Note

This section was kindly implemented and documented by@ndrean. It is based on articles written by Sean Moriarty's published in the Dockyard's blog.Do check him out! 🎉

We can leverage machine learning to greatly improve this search process:we'll look for images whose captionsare close in terms of meaningto the search.

In this section, you'll learn how to performsemantic searchwith machine learning.These techniques are widely used in search engines,including in widespread tools likeElastic Search.

0. Overview of the process

Let's go over the process in detail so we know what to expect.

As it stands, when images are uploaded and captioned,the URL is saved, as well as the caption,in our local database.

Here's an overview of how semantic search usually works(which is what we'll exactly implement).

Source:https://www.elastic.co/what-is/semantic-search

We will use the following toolchain:

0.1 Audio transcription

We simply let the user start and stop the recordingby using a submit button in a form.This can of course be greatly refined by usingVoice Detection. You may find an examplehere.

Firstly, we will:

record an audio withMediaRecorder API.
run aSpeech-To-Text process to produce a text transcriptionfrom the audio.

We will use thepre-trained modelopenai/whisper-smallfromhttps://huggingface.coand use it with the help of theBumblebee.Audio.speech_to_text_whisper function.We get anNx.Serving that we will use to run this model with an input.

0.2 Creating embeddings

We then want to find images whose captionsapproximate this text in terms of meaning.This transcription is the"target text".This is whereembeddings come into play:they arevector representations of certain inputs,which in our case, are the text transcription of the audio file recorded by the user.We encode each transcription as an embeddingand then use an approximation algorithm to find the closest neighbours.

Our next steps will be to prepare thesymmetric semantic search.We will use atransformer model,more specifically the pre-trainedsBertsystem available inHuggingface.

We transform a text into a vector with the sentence-transformer modelsentence-transformers/paraphrase-MiniLM-L6-v2.

Note

You may find models in theMTEB English leaderboard. We looked for "small" models in terms of file size and dimensions. You may want to try and useGTE small.

We will run the model with the help of theBumblebee.Text.TextEmbedding.text_embedding function.

This encoding is done for each image caption.

0.3 Semantical search

At this point, we have:

the embedding of the text transcription of the recording made by the user(e.g"a dog").
all the embeddings of all the images in our "image bank".

To search for the images that are related to"a dog",we need to apply an algorithm that compares these embeddings!

For this, we will run aknn_neighbour search.There are several ways to do this.

we can usepgvector , a vector extension of Postgres. It is used to store vectors (the embeddings) and to run similarity searches.Withpgvector, we can run:
- a full exact search with thecosine similarity operator<=>,
- or use an Approximate Nearest Neighbour seach with indexing algorithms. The extension proposes theIVFFLAT and theHNSWLIB algorithms. You can find some explanations on both algorithms inhttps://tembo.io/blog/vector-indexes-in-pgvector andhttps://neon.tech/blog/understanding-vector-search-and-hnsw-index-with-pgvector.

Note

Note thatSupabase can use thepgvector extension, and you can useSupabase with Fly.io.

Warning

Note that you need to save the embeddings (as vectors) into the database, so the database will be intensively used. This may lead to scaling problems and potential race conditions.

we can alternatively use thehnswlib library and its Elixir bindingHNSWLib.This "externalises" the ANN search from the database as it uses an in-memory file.This file needs to be persisted on disk, thus at the expense of using the filesystem with again potential race conditions.It works with anindex struct: this struct will allow us to efficiently retrieve vector data.

We will use this last option,mostly because we use Fly.ioandpgvector is hard to come by on this platform.We will use a GenServer to wrap all the calls tohnswlib so every writes will be run synchronously.Additionally, you don't rely on a framework that does the heavy lifting for you.We're here to learn, aren't we? 😃

We will append incrementally the computed embedding from the captions into the Index.We will get an indice which simply is the order of this embedding in the Index.We then run a "knn_search" algorithm; the input will be the embedding of the audio transcript.This algorithm will return the most relevant position(s) -indices -among theIndex indices that minimize the chosen distance between this input and the existing vectors.

This is where we'll need to save:

whether the index,
or the embedding

to look up for the corresponding image(s), depending upon if you append items one by one or by batch.

In our case, you will append items one by one so we will use the index to uniquely recover the nearest image whose caption is close semantically to our audio.

Do note that the measured distance is dependent on thesimilarity metricused by the embedding model.Because the "sentence-transformer" model we've chosen was trained withcosine_similarity,this is what we'll use.Bumblebee may have options to correctly use this metric, but we used a normalisation process which fits our needs.

1. Pre-requisites

We have already installed all the dependencies that we need.

[!WARNING] >You will also need to installffmpeg.Indeed,Bumblebee usesffmpeg under the hood to process audio files into tensors,but it uses itas an external dependency.

And now we're ready to rock and roll! 🎸

2. Transcribe an audio recording

Source:https://dockyard.com/blog/2023/03/07/audio-speech-recognition-in-elixir-with-whisper-bumblebee?utm_source=elixir-merge

We first need to capture the audio and upload it to the server.

The process is quite similar to the image upload, except that weuse a special Javascript hook to record the audioand upload it to the Phoenix LiveView.

We use alive_file_input in a form to capture the audio and use the JavascriptMediaRecorder API.The Javascript code is triggered by an attached hookAudio declared in the HTML.We also let the user listen to his audio by adding anembedded audio element<audio> in the HTML.Its source is the audio blob as a URL object.

1.1 Adding a loading spinner

We also add a spinner to display that the transcription process is running,in the same way as we did for the captioning process.To avoid code duplication, we introduce a Phoenix component "Spinner".Create the filespinner.ex inlib/app_web/components/and create theSpinner component,like so:

# /lib/app_web/components/spinner.exdefmoduleAppWeb.SpinnerdousePhoenix.Componentattr:spin,:boolean,default:falsedefspin(assigns)do~H"""<div:if={@spin}role="status"><divclass="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400"><divclass="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white"></div></div></div>"""endend

Inpage_live_html.heex, add the following snippet of code.

# page_live.html.heex<formphx-change="noop"><.live_file_input upload={@uploads.speech}/><buttonid="record"class="bg-blue-500 hover:bg-blue-700 text-white font-bold px-4 rounded"type="button"phx-hook="Audio"disabled="{@mic_off?}"><Heroicons.microphoneoutlineclass="w-6 h-6 text-white font-bold group-active:animate-pulse"/><spanid="text">Record</span></button></form><pclass="flex flex-col items-center"><audioid="audio"controls></audio><AppWeb.Spinner.spinspin="{@audio_running?}"/></p>

You can also use this component to display the spinner when the captioning task is running,so this part of your code will shrink to:

<!--Spinner --><%=if@upload_running?do%>  <AppWeb.Spinner.spinspin={@upload_running?} /><%else%>  <%=if@labeldo%>  <spanclass="text-gray-700 font-light"><%= @label%></span><%else%>  <spanclass="text-gray-300 font-light text-justify">Waiting for image input.</span><%end%><%end%>

2.2 Defining`Javascript` hook

Currently, we provide a basic user experience:we let the user click on a button to start and stop the recording.

When doing image captioning,we carefully worked on the size of the image fileused for the captioning modelto optimize the app's latency.

In the same spirit,we can downsize the original audio fileso it's easier for the model to process it.This will have the benefit of less overhead in our application.Even though thewhisper model does downsize the audio's sampling rate,we can do this on our client side to skip this stepand marginally reduce the overhead of running the model in our application.

The main parameters we're dealing with are:

lower sampling rate (the higher the more accurate is the sound),
use mono instead of stereo,
andthe file type (WAV, MP3)

Since most microphones on PC have a single channel (mono) and sample at48kHz,we will focus on resampling to16kHz.We willnot make the conversion to mp3 here.

Next, we define the hook in a new JS file, located in theassets/js folder.

To resample to16kHz, we use anAudioContext,and pass the desiredsampleRate.We then use the methoddecodeAudioDatawhich receives anAudioBuffer.We get one from theBlob methodarrayBuffer().

The important part is thePhoenix.js functionupload,to which we pass an identifier"speech":this sends the data asBlob via a channel to the server.

We use an action button in the HTML,and attach Javascript listeners to it on the"click","dataavailable" and"stop" events.We also play with the CSS classes to modify the appearance of the action button when recording or not.

Navigate to theassets folder and run the following command.We will use this to lower the sampling rateand conver the recorded audio file.

npm add"audiobuffer-to-wav"

Create a file calledassets/js/micro.jsand use the code below.

// /assets/js/micro.jsimporttoWavfrom"audiobuffer-to-wav";exportdefault{mounted(){letmediaRecorder,audioChunks=[];// Defining the elements and styles to be used during recording// and shown on the HTML.constrecordButton=document.getElementById("record"),audioElement=document.getElementById("audio"),text=document.getElementById("text"),blue=["bg-blue-500","hover:bg-blue-700"],pulseGreen=["bg-green-500","hover:bg-green-700","animate-pulse"];_this=this;// Adding event listener for "click" eventrecordButton.addEventListener("click",()=>{// Check if it's recording.// If it is, we stop the record and update the elements.if(mediaRecorder&&mediaRecorder.state==="recording"){mediaRecorder.stop();text.textContent="Record";}// Otherwise, it means the user wants to start recording.else{navigator.mediaDevices.getUserMedia({audio:true}).then((stream)=>{// Instantiate MediaRecordermediaRecorder=newMediaRecorder(stream);mediaRecorder.start();// And update the elementsrecordButton.classList.remove(...blue);recordButton.classList.add(...pulseGreen);text.textContent="Stop";// Add "dataavailable" event handlermediaRecorder.addEventListener("dataavailable",(event)=>{audioChunks.push(event.data);});// Add "stop" event handler for when the recording stops.mediaRecorder.addEventListener("stop",()=>{constaudioBlob=newBlob(audioChunks);// update the source of the Audio tag for the user to listen to his audioaudioElement.src=URL.createObjectURL(audioBlob);// create an AudioContext with a sampleRate of 16000constaudioContext=newAudioContext({sampleRate:16000});// async read the Blob as ArrayBuffer to feed the "decodeAudioData"constarrayBuffer=awaitaudioBlob.arrayBuffer();// decodes the ArrayBuffer into the AudioContext formatconstaudioBuffer=awaitaudioContext.decodeAudioData(arrayBuffer);// converts the AudioBuffer into a WAV formatconstwavBuffer=toWav(audioBuffer);// builds a Blob to pass to the Phoenix.JS.uploadconstwavBlob=newBlob([wavBuffer],{type:"audio/wav"});// upload to the server via a chanel with the built-in Phoenix.JS.upload_this.upload("speech",[wavBlob]);//  close the MediaRecorder instancemediaRecorder.stop();// cleanupsaudioChunks=[];recordButton.classList.remove(...pulseGreen);recordButton.classList.add(...blue);});});}});},};

Now let's import this file and declare ourhook object in ourlivesocket object.In ourassets/js/app.js file, let's do:

// /assets/js/app.js...importAudiofrom"./micro.js";...letliveSocket=newLiveSocket("/live",Socket,{params:{_csrf_token:csrfToken},hooks:{ Audio},});

2.3 Handling audio upload in`LiveView`

We now need to add some server-side code.

The uploaded audio file will be saved on diskas a temporary file in the/priv/static/uploads folder.We will also make this fileunique every time a user records an audio.We use anEcto.UUID string to the file name and pass it into the Liveview socket.

The Liveviewmount/3 function returns a socket.Let's update itand pass extra arguments -typically booleans for the UI such as the button disabling and the spinner -as well as anotherallow_upload/3 to handle the upload process of the audio file.

Inlib/app_web/live/page_live.ex,we change the code like so:

#page_live.ex@upload_dirApplication.app_dir(:app,["priv","static","uploads"])@tmp_wavPath.expand("priv/static/uploads/tmp.wav")defmount(_,_,socket)dosocket|>assign(...,transcription:nil,mic_off?:false,audio_running?:false,tmp_wav:@tmp_wav)|>allow_upload(:speech,accept::any,auto_upload:true,progress:&handle_progress/3,max_entries:1)|>allow_upload(:image_list,...)end

We then create a specifichandle_progress for the:speech eventas we did with the:image_list event.It will launch a task to run theAutomatic Speech Recognition modelon this audio file.We named the serving"Whisper".

defhandle_progress(:speech,entry,%{assigns:assigns}=socket)whenentry.done?dotmp_wav=socket|>consume_uploaded_entry(entry,fn%{path:path}->tmp_wav=assigns.tmp_wav<>Ecto.UUID.generate()<>".wav":ok=File.cp!(path,tmp_wav){:ok,tmp_wav}end)audio_task=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(Whisper,{:file,@tmp_wav})end){:noreply,socket|>assign(audio_ref:audio_task.ref,mic_off?:true,audio_running?:true,tmp_wav:tmp_wav,)}end

And that's it for the Liveview portion!

2.4 Serving the`Whisper` model

Now that we are adding several models,let's refactor ourmodels.ex modulethat manages the models.Since we're dealing with multiple models,we want our app to shutdown if there's any problem loading them.

We now add the modelWhisper in thelib/app/application.exso it's available throughout the application on runtime.

# lib/app/application.exdefcheck_models_on_startupdoApp.Models.verify_and_download_models()|>casedo{:error,msg}->Logger.error("⚠️#{msg}")System.stop(0):ok->:okendenddefstart(_type,_args)doend# model check-up:ok=check_models_on_startup()children=[...,# Nx serving for Speech-to-Text{Nx.Serving,serving:ifApplication.get_env(:app,:use_test_models)==truedoApp.Models.audio_serving_test()elseApp.Models.audio_serving()end,name:Whisper},,...]...

As you can see, we're using a serving similarto the captioning model we've implemented earlier.For this to work, we need to make some changes to themodels.ex module.Recall that this module simply manages the models that aredownloaded locally and used in our application.

To implement the functions above,we change thelib/app/models.ex module so it looks like so.

defmoduleModelInfodo@moduledoc"""  Information regarding the model being loaded.  It holds the name of the model repository and the directory it will be saved into.  It also has booleans to load each model parameter at will - this is because some models (like BLIP) require featurizer, tokenizations and generation configuration.  """defstruct[:name,:cache_path,:load_featurizer,:load_tokenizer,:load_generation_config]enddefmoduleApp.Modelsdo@moduledoc"""  Manages loading the modules and their location according to env.  """requireLogger# IMPORTANT: This should be the same directory as defined in the `Dockerfile`# where the models will be downloaded into.@models_folder_pathApplication.compile_env!(:app,:models_cache_dir)# Embedding-------@embedding_model%ModelInfo{name:"sentence-transformers/paraphrase-MiniLM-L6-v2",cache_path:Path.join(@models_folder_path,"paraphrase-MiniLM-L6-v2"),load_featurizer:false,load_tokenizer:true,load_generation_config:true}# Captioning --@captioning_test_model%ModelInfo{name:"microsoft/resnet-50",cache_path:Path.join(@models_folder_path,"resnet-50"),load_featurizer:true}@captioning_prod_model%ModelInfo{name:"Salesforce/blip-image-captioning-base",cache_path:Path.join(@models_folder_path,"blip-image-captioning-base"),load_featurizer:true,load_tokenizer:true,load_generation_config:true}# Audio transcription --@audio_test_model%ModelInfo{name:"openai/whisper-small",cache_path:Path.join(@models_folder_path,"whisper-small"),load_featurizer:true,load_tokenizer:true,load_generation_config:true}@audio_prod_model%ModelInfo{name:"openai/whisper-small",cache_path:Path.join(@models_folder_path,"whisper-small"),load_featurizer:true,load_tokenizer:true,load_generation_config:true}defextract_captioning_test_label(result)do%{predictions:[%{label:label}]}=resultlabelenddefextract_captioning_prod_label(result)do%{results:[%{text:label}]}=resultlabelend@doc"""  Verifies and downloads the models according to configuration  and if they are already cached locally or not.  The models that are downloaded are hardcoded in this function.  """defverify_and_download_models()do{Application.get_env(:app,:force_models_download,false),Application.get_env(:app,:use_test_models,false)}|>casedo{true,true}-># Delete any cached pre-existing modelsFile.rm_rf!(@models_folder_path)with:ok<-download_model(@captioning_test_model),:ok<-download_model(@embedding_model),:ok<-download_model(@audio_test_model)do:okelse{:error,msg}->{:error,msg}end{true,false}-># Delete any cached pre-existing modelsFile.rm_rf!(@models_folder_path)with:ok<-download_model(@captioning_prod_model),:ok<-download_model(@audio_prod_model),:ok<-download_model(@embedding_model)do:okelse{:error,msg}->{:error,msg}end{false,false}-># Check if the prod model cache directory exists or if it's not empty.# If so, we download the prod models.with:ok<-check_folder_and_download(@captioning_prod_model),:ok<-check_folder_and_download(@audio_prod_model),:ok<-check_folder_and_download(@embedding_model)do:okelse{:error,msg}->{:error,msg}end{false,true}-># Check if the test model cache directory exists or if it's not empty.# If so, we download the test models.with:ok<-check_folder_and_download(@captioning_test_model),:ok<-check_folder_and_download(@audio_test_model),:ok<-check_folder_and_download(@embedding_model)do:okelse{:error,msg}->{:error,msg}endendend@doc"""  Serving function that serves the `Bumblebee` captioning model used throughout the app.  This function is meant to be called and served by `Nx` in `lib/app/application.ex`.  This assumes the models that are being used exist locally, in the @models_folder_path.  """defcaption_servingdoload_offline_model(@captioning_prod_model)|>then(fnresponse->caseresponsedo{:ok,model}->%Nx.Serving{}=Bumblebee.Vision.image_to_text(model.model_info,model.featurizer,model.tokenizer,model.generation_config,compile:[batch_size:1],defn_options:[compiler:EXLA],# needed to run on `Fly.io`preallocate_params:true){:error,msg}->{:error,msg}endend)end@doc"""  Serving function that serves the `Bumblebee` audio transcription model used throughout the app.  """defaudio_servingdoload_offline_model(@audio_prod_model)|>then(fnresponse->caseresponsedo{:ok,model}->%Nx.Serving{}=Bumblebee.Audio.speech_to_text_whisper(model.model_info,model.featurizer,model.tokenizer,model.generation_config,chunk_num_seconds:30,task::transcribe,defn_options:[compiler:EXLA],preallocate_params:true){:error,msg}->{:error,msg}endend)end@doc"""  Serving function for tests only. It uses a test audio transcription model.  """defaudio_serving_testdoload_offline_model(@audio_test_model)|>then(fnresponse->caseresponsedo{:ok,model}->%Nx.Serving{}=Bumblebee.Audio.speech_to_text_whisper(model.model_info,model.featurizer,model.tokenizer,model.generation_config,chunk_num_seconds:30,task::transcribe,defn_options:[compiler:EXLA],preallocate_params:true){:error,msg}->{:error,msg}endend)end@doc"""  Serving function for tests only. It uses a test captioning model.  This function is meant to be called and served by `Nx` in `lib/app/application.ex`.  This assumes the models that are being used exist locally, in the @models_folder_path.  """defcaption_serving_testdoload_offline_model(@captioning_test_model)|>then(fnresponse->caseresponsedo{:ok,model}->%Nx.Serving{}=Bumblebee.Vision.image_classification(model.model_info,model.featurizer,top_k:1,compile:[batch_size:10],defn_options:[compiler:EXLA],# needed to run on `Fly.io`preallocate_params:true){:error,msg}->{:error,msg}endend)end# Loads the models from the cache folder.# It will load the model and the respective the featurizer, tokenizer and generation config if needed,# and return a map with all of these at the end.@specload_offline_model(map())::{:ok,map()}|{:error,String.t()}defpload_offline_model(model)doLogger.info("ℹ️ Loading#{model.name}...")# Loading modelloading_settings={:hf,model.name,cache_dir:model.cache_path,offline:true}Bumblebee.load_model(loading_settings)|>casedo{:ok,model_info}->info=%{model_info:model_info}# Load featurizer, tokenizer and generation config if neededinfo=ifMap.get(model,:load_featurizer)do{:ok,featurizer}=Bumblebee.load_featurizer(loading_settings)Map.put(info,:featurizer,featurizer)elseinfoendinfo=ifMap.get(model,:load_tokenizer)do{:ok,tokenizer}=Bumblebee.load_tokenizer(loading_settings)Map.put(info,:tokenizer,tokenizer)elseinfoendinfo=ifMap.get(model,:load_generation_config)do{:ok,generation_config}=Bumblebee.load_generation_config(loading_settings)Map.put(info,:generation_config,generation_config)elseinfoend# Return a map with the model and respective parameters.{:ok,info}{:error,msg}->{:error,msg}endend# Downloads the pre-trained models according to a given %ModelInfo struct.# It will load the model and the respective the featurizer, tokenizer and generation config if needed.@specdownload_model(map())::{:ok,map()}|{:error,binary()}defpdownload_model(model)doLogger.info("ℹ️ Downloading#{model.name}...")# Download modeldownloading_settings={:hf,model.name,cache_dir:model.cache_path}# Download featurizer, tokenizer and generation config if neededBumblebee.load_model(downloading_settings)|>casedo{:ok,_}->ifMap.get(model,:load_featurizer)do{:ok,_}=Bumblebee.load_featurizer(downloading_settings)endifMap.get(model,:load_tokenizer)do{:ok,_}=Bumblebee.load_tokenizer(downloading_settings)endifMap.get(model,:load_generation_config)do{:ok,_}=Bumblebee.load_generation_config(downloading_settings)end:ok{:error,msg}->{:error,msg}endend# Checks if the folder exists and downloads the model if it doesn't.defcheck_folder_and_download(model)do:ok=File.mkdir_p!(@models_folder_path)model_location=Path.join(model.cache_path,"huggingface")ifFile.ls(model_location)=={:error,:enoent}orFile.ls(model_location)=={:ok,[]}dodownload_model(model)|>casedo:ok->:ok{:error,msg}->{:error,msg}endelseLogger.info("ℹ️ No download needed:#{model.name}"):okendendend

That's a lot! But we just need to focus on some new parts we've added:

we've createdaudio_serving_test/1 andaudio_serving/1, our audio serving functionsthat are used in theapplication.ex file.
added@audio_prod_model and@audio_test_model,theWhisper model definitions to be used to download the models locally.
refactored the image captioning model definitions to be more clear.

Now we're successfully serving audio-to-text capabilitiesin our application!

2.5 Handling the model's response and updating elements in the view

We expect the response of this task to bein the following form:

%{chunks:[%{text:"Hi there",#^^^the text of our audiostart_timestamp_seconds:nil,end_timestamp_seconds:nil}]}

We capture this response in ahandle_info callbackwhere we simply prune the temporary audio fileand update the socket state with the result,and update the booleans used for our UI(the spinner element, the button availability and reset of the task once done).

defhandle_info({ref,%{chunks:[%{text:text}]}=_result},%{assigns:assigns}=socket)whenassigns.audio_ref==refdoProcess.demonitor(ref,[:flush])File.rm!(assigns.tmp_wav){:noreply,assign(socket,transcription:String.trim(text),mic_off?:false,audio_running?:false,audio_ref:nil,tmp_wav:@tmp_wav)}end

And that's it for this section!Our application is now able torecord audioandtranscribe it. 🎉

3. Embeddings and semantic search

We want to encode every caption and the input textinto an embedding which is a vector of a specific vector space.In other words, we encode a string into a list of numbers.

We chose the transformer"sentence-transformers/paraphrase-MiniLM-L6-v2" model.

This transformer uses a384 dimensional vector space.Since this transformer is trained with acosine metric,we embed the vector space of embeddings with the same distance.You can read more aboutcosine_similarity here.

This model is loaded and served by anNx.Serving started in the Application module like all other models.

3.1 The`HNSWLib` Index (GenServer)

This libraryHNSWLibworks with anindex.We instantiate the Index file in aGenServer which holds the index in the state.

We will use an Index file that is saved locally in our file system.This file will be updated any time we append an embedding;all the client calls and writes to the HNSWLib index are handled by the GenServer.They will happen synchronously. We want to minimize the race conditions in case several users interact with the app.This app is only meant to runon a single node.

It is started in the Application module (application.ex).When the app starts, we either read or create this file. The file is saved in the "/priv/static/uploads" folder.

Because we are deploying with Fly.io, we need to persist the Index file in the database because the machine - thus its attached volume - is pruned when inactive.

It is crucial to save the correspondence between theImage table and the Index file to retrieve the correct images.In simple terms,the file in theIndex table in the DB must correspond to the Index file in the system.

We therefore disable a user from loading several times the same file as otherwise,we would have several indexes for the same picture.This is done throughSHA computation.

Since computations using models is a long-run process,and because several users may interact with the app,we need several steps to ensure that the information is synchronized between the database and the index file.

We also endow the vector space with a:cosine pseudo-metric.

Add the followingGenServer file:it will load the Index file,and also provide a client API to interact with the Index,which is held in the state of the GenServer.

Again, this solution works for a single nodeonly.

defmoduleApp.KnnIndexdouseGenServer@moduledoc"""  A GenServer to load and handle the Index file for HNSWLib.  It loads the index from the FileSystem if existing or from the table HnswlibIndex.  It creates an new one if no Index file is found in the FileSystem  and if the table HnswlibIndex is empty.  It holds the index and the App.Image singleton table in the state.  """requireLogger@dim384@max_elements200@upload_dirApplication.app_dir(:app,["priv","static","uploads"])@saved_indexifApplication.compile_env(:app,:knnindex_indices_test,false),do:Path.join(@upload_dir,"indexes_test.bin"),else:Path.join(@upload_dir,"indexes.bin")# Client API ------------------defstart_link(args)do:ok=File.mkdir_p!(@upload_dir)GenServer.start_link(__MODULE__,args,name:__MODULE__)enddefindex_pathdo@saved_indexenddefsave_index_to_dbdoGenServer.call(__MODULE__,:save_index_to_db)enddefget_countdoGenServer.call(__MODULE__,:get_count)enddefadd_item(embedding)doGenServer.call(__MODULE__,{:add_item,embedding})enddefknn_search(input)doGenServer.call(__MODULE__,{:knn_search,input})enddefnot_empty_indexdoGenServer.call(__MODULE__,:not_empty)end# ---------------------------------------------------@impltruedefinit(args)do# Trying to load the index fileindex_path=Keyword.fetch!(args,:index)space=Keyword.fetch!(args,:space)caseFile.exists?(index_path)do# If the index file doesn't exist, we try to load from the database.false->{:ok,index,index_schema}=App.HnswlibIndex.maybe_load_index_from_db(space,@dim,@max_elements){:ok,{index,index_schema,space}}# If the index file does exist, we compare the one with teh table and check for incoherences.true->Logger.info("ℹ️ Index file found on disk. Let's compare it with the database...")App.Repo.get_by(App.HnswlibIndex,id:1)|>casedonil->{:stop,{:error,"Error comparing the index file with the one on the database. Incoherence on table."}}schema->check_integrity(index_path,schema,space)endendenddefpcheck_integrity(path,schema,space)do# We check the count of the images in the database and the one in the index.withdb_count<-App.Repo.all(App.Image)|>length(),{:ok,index}<-HNSWLib.Index.load_index(space,@dim,path),{:ok,index_count}<-HNSWLib.Index.get_current_count(index),true<-index_count==db_countdoLogger.info("ℹ️ Integrity: ✅"){:ok,{index,schema,space}}# If it fails, we return an error.elsefalse->{:stop,{:error,"Integrity error. The count of images from index differs from the database."}}{:error,msg}->Logger.error("⚠️#{msg}"){:stop,{:error,msg}}endend@impltruedefhandle_call(:save_index_to_db,_,{index,index_schema,space}=state)do# We read the index file and try to update the index on the table as well.File.read(@saved_index)|>casedo{:ok,file}->{:ok,updated_schema}=index_schema|>App.HnswlibIndex.changeset(%{file:file})|>App.Repo.update(){:reply,{:ok,updated_schema},{index,updated_schema,space}}{:error,msg}->{:reply,{:error,msg},state}endenddefhandle_call(:get_count,_,{index,_,_}=state)do{:ok,count}=HNSWLib.Index.get_current_count(index){:reply,count,state}enddefhandle_call({:add_item,embedding},_,{index,_,_}=state)do# We add the new item to the index and update it.with:ok<-HNSWLib.Index.add_items(index,embedding),{:ok,idx}<-HNSWLib.Index.get_current_count(index),:ok<-HNSWLib.Index.save_index(index,@saved_index)do{:reply,{:ok,idx},state}else{:error,msg}->{:reply,{:error,msg},state}endenddefhandle_call({:knn_search,nil},_,state)do{:reply,{:error,"No index found"},state}enddefhandle_call({:knn_search,input},_,{index,_,_}=state)do# We search for the nearest neighbors of the input embedding.caseHNSWLib.Index.knn_query(index,input,k:1)do{:ok,labels,_distances}->response=labels[0]|>Nx.to_flat_list()|>hd()|>then(fnidx->App.Repo.get_by(App.Image,%{idx:idx+1})end)# TODO: add threshold on  "distances"{:reply,response,state}{:error,msg}->{:reply,{:error,msg},state}endenddefhandle_call(:not_empty,_,{index,_,_}=state)docaseHNSWLib.Index.get_current_count(index)do{:ok,0}->Logger.warning("⚠️ Empty index."){:reply,:error,state}{:ok,_}->{:reply,:ok,state}endendend

Let's unpack a bit of what we are doing here.

we first aredefining the module constants.Here, we add the dimensions of the embedding vector space(these are dependent on the model you choose).Check with the model you've used to tweak this settings optimally.
define the upload directory wherethe index file will be saved inside the filesystem.
when the GenServer is initialized (init/1 function),we perform severalintegrity verifications,checking if both theIndex file in the filesystemand the file in theIndex table(from now on, this table will be calledHnswlibIndex,under the name of the same schema).These validations essentially make sure the contentof both files are the same.
the other functions provide a basic API forcallers to add items to the index file,so it is saved.

3.2 Saving the`HNSWLib` Index in the database

As you may have seen from the previous GenServer,we are calling functions from a module calledApp.HnswlibIndex that we have not yet created.

This module pertains to theschema that will holdinformation of theHNSWLib table.This table will only have a single row,with the file contents.As we've discussed earlier,we will compare the Index file in this rowwith the one in the filesystemto check for any inconsistencies that may arise.

Let's implement this module now!

Insidelib/app, create a file calledhnswlib_index.exand use the following code.

defmoduleApp.HnswlibIndexdouseEcto.SchemaaliasApp.HnswlibIndexrequireLogger@moduledoc"""  Ecto schema to save the HNSWLib Index file into a singleton table  with utility functions  """schema"hnswlib_index"dofield(:file,:binary)field(:lock_version,:integer,default:1)enddefchangeset(struct\\%__MODULE__{},params\\%{})dostruct|>Ecto.Changeset.cast(params,[:id,:file])|>Ecto.Changeset.optimistic_lock(:lock_version)|>Ecto.Changeset.validate_required([:id])end@doc"""  Tries to load index from DB.  If the table is empty, it creates a new one.  If the table is not empty but there's no file, an index is created from scratch.  If there's one, we use it and load it to be used throughout the application.  """defmaybe_load_index_from_db(space,dim,max_elements)do# Check if the table has an entryApp.Repo.get_by(HnswlibIndex,id:1)|>casedo# If the table is emptynil->Logger.info("ℹ️ No index file found in DB. Creating new one...")create(space,dim,max_elements)# If the table is not empty but has no fileresponsewhenresponse.file==nil->Logger.info("ℹ️ Empty index file in DB. Recreating one...")# Purge the table and create a new file row in itApp.Repo.delete_all(App.HnswlibIndex)create(space,dim,max_elements)# If the table is not empty and has a fileindex_db->Logger.info("ℹ️ Index file found in DB. Loading it...")# We get the path of the indexwithpath<-App.KnnIndex.index_path(),# Save the file on disk:ok<-File.write(path,index_db.file),# And load it{:ok,index}<-HNSWLib.Index.load_index(space,dim,path)do{:ok,index,index_db}endendenddefpcreate(space,dim,max_elements)do# Inserting the row in the table{:ok,schema}=HnswlibIndex.changeset(%__MODULE__{},%{id:1})|>App.Repo.insert()# Creates index{:ok,index}=HNSWLib.Index.new(space,dim,max_elements)# Builds index for testing onlyifApplication.get_env(:app,:use_test_models,false)doempty_index=Application.app_dir(:app,["priv","static","uploads"])|>Path.join("indexes_empty.bin")HNSWLib.Index.save_index(index,empty_index)end{:ok,index,schema}endend

In this module:

we are creatingtwo fields:lock_version,to simply check the version of the file;andfile,the binary content of the index file.
lock_version will be extremely useful toperformoptmistic locking,which is what we do in thechangeset/2 function.This will allow us to prevent deadlockingwhen two different people upload the same image at the same time,and overcome any race condition that may occur.This will maintain the data consistency in the Index file.
maybe_load_index_from_db/3 fetches the singleton rowon this table and checks if the file exists in the row.If it doesn't, it creates a new one.Otherwise, it just loads the existing one inside the row.
create/3 creates a new index file.It's a private function that encapsulates creatingthe Index file so it can be used in the singleton rowinside the table.

And that's it!We've added additional code to conditionally create different indexesaccording to the environment(useful for testing),but you can safely ignore those conditional callsif you're not interested in testing(though you should 😛).

3.2 The embeding model

We provide a serving for the embedding model in theApp.Models module.It should look like this:

#App.Models@embedding_model%ModelInfo{name:"sentence-transformers/paraphrase-MiniLM-L6-v2",cache_path:Path.join(@models_folder_path,"paraphrase-MiniLM-L6-v2"),load_featurizer:false,load_tokenizer:true,load_generation_config:true}defembedding()doload_offline_model(@embedding_model)|>then(fnresponse->caseresponsedo{:ok,model}-># return n %Nx.Serving{} struct%Nx.Serving{}=Bumblebee.Text.TextEmbedding.text_embedding(model.model_info,model.tokenizer,defn_options:[compiler:EXLA],preallocate_params:true){:error,msg}->{:error,msg}endend)enddefverify_and_download_models()doforce_models_download=Application.get_env(:app,:force_models_download,false)use_test_models=Application.get_env(:app,:use_test_models,false)case{force_models_download,use_test_models}do{true,true}->File.rm_rf!(@models_folder_path)download_model(@captioning_test_model)download_model(@audio_test_model){true,false}->File.rm_rf!(@models_folder_path)download_model(@embedding_model)^^^download_model(@captioning_prod_model)download_model(@audio_prod_model){false,false}->check_folder_and_download(@embedding_model)^^check_folder_and_download(@captioning_prod_model)check_folder_and_download(@audio_prod_model){false,true}->check_folder_and_download(@captioning_test_model)check_folder_and_download(@audio_test_model)endend

You then add theNx.Serving for the embeddings:

#application.exchildren=[...,{Nx.Serving,serving:App.Models.embedding(),name:Embedding,batch_size:5},...]

Yourapplication.ex file should look like so:

defmoduleApp.Applicationdo# See https://hexdocs.pm/elixir/Application.html# for more information on OTP Applications@moduledocfalserequireLoggeruseApplication@upload_dirApplication.app_dir(:app,["priv","static","uploads"])@saved_indexifApplication.compile_env(:app,:knnindex_indices_test,false),do:Path.join(@upload_dir,"indexes_test.bin"),else:Path.join(@upload_dir,"indexes.bin")defcheck_models_on_startupdoApp.Models.verify_and_download_models()|>casedo{:error,msg}->Logger.error("⚠️#{msg}")System.stop(0):ok->Logger.info("ℹ️ Models: ✅"):okendend@impltruedefstart(_type,_args)do:ok=check_models_on_startup()children=[# Start the Telemetry supervisorAppWeb.Telemetry,# Setup DBApp.Repo,# Start the PubSub system{Phoenix.PubSub,name:App.PubSub},# Nx serving for the embedding{Nx.Serving,serving:App.Models.embedding(),name:Embedding,batch_size:1},# Nx serving for Speech-to-Text{Nx.Serving,serving:ifApplication.get_env(:app,:use_test_models)==truedoApp.Models.audio_serving_test()elseApp.Models.audio_serving()end,name:Whisper},# Nx serving for image classifier{Nx.Serving,serving:ifApplication.get_env(:app,:use_test_models)==truedoApp.Models.caption_serving_test()elseApp.Models.caption_serving()end,name:ImageClassifier},{GenMagic.Server,name::gen_magic},# Adding a supervisor{Task.Supervisor,name:App.TaskSupervisor},# Start the Endpoint (http/https)AppWeb.Endpoint# Start a worker by calling: App.Worker.start_link(arg)# {App.Worker, arg}]# We are starting the HNSWLib Index GenServer only during testing.# Because this GenServer needs the database to be seeded first,# we only add it when we're not testing.# When testing, you need to spawn this process manually (it is done in the test_helper.exs file).children=ifApplication.get_env(:app,:start_genserver,true)==truedoEnum.concat(children,[{App.KnnIndex,[space::cosine,index:@saved_index]}])elsechildrenend# See https://hexdocs.pm/elixir/Supervisor.html# for other strategies and supported optionsopts=[strategy::one_for_one,name:App.Supervisor]Supervisor.start_link(children,opts)end# Tell Phoenix to update the endpoint configuration# whenever the application is updated.@impltruedefconfig_change(changed,_new,removed)doAppWeb.Endpoint.config_change(changed,removed):okendend

Note

We have added a few alterations to how the supervision treeinapplication.ex is initialized.This is because wetest our code,so that's why you see some of these changes above.

If you don't want to change test the code,you can ignore the conditional changes that are madeto the supervision tree according to the environment(which we do to check if the code is being tested or not).

4. Using the Index and embeddings

In this section, we'll go over how to use the Indexand the embeddings and tie everything together tohave a working application 😍.

If you want to better understand embeddings andhow to useHNSWLib,the math behind it and see a working exampleof running an embedding model,you can check the next section.However,it is entirely optionaland not necessary for our app.

4.0 Check the folder "hnswlib"

For a working example on how to use the index inhnswlib,you can run the ".exs" file there.

4.1 Computing the embeddings in our app

@tmp_wavPath.expand("priv/static/uploads/tmp.wav")defmount(_,_,socket)do{:ok,socket|>assign(...,# Related to the Audiotranscription:nil,mic_off?:false,audio_running?:false,audio_search_result:nil,tmp_wav:@tmp_wav,)|>allow_upload(:speech,...)[...]}end

Recall that every time you upload an image,you get back a URL from our bucketand you compute a caption as a string.We will now compute an embedding from this stringand save it in the Index.This is done in thehandle_info callback.

Update the Liveviewhandle_info callback where we handle the captioning results:

defhandle_info({ref,result},%{assigns:assigns}=socket)do# Flush async callProcess.demonitor(ref,[:flush])conddo# If the upload task has finished executing,# we update the socket assigns.Map.get(assigns,:task_ref)==ref->image=%{url:assigns.image_info.url,width:assigns.image_info.width,height:assigns.image_info.height,description:label}with%{embedding:data}<-Nx.Serving.batched_run(Embedding,label),# compute a normed embedding (cosine case only) on the text resultnormed_data<-Nx.divide(data,Nx.LinAlg.norm(data)),{:check_used,{:ok,pending_image}}<-{:check_used,App.Image.check_before_append_to_index(image.sha1)}do{:ok,idx}=App.KnnIndex.add_item(normed_data)do# save the App.Image to the DBMap.merge(image,%{idx:idx,caption:label})|>App.Image.insert(){:noreply,socket|>assign(upload_running?:false,task_ref:nil,label:label)}else{:error,msg}->{:noreply,socket|>put_flash(:error,msg)|>assign(upload_running?:false,task_ref:nil,label:nil)}end[...]endend

Every time we produce an audio file, we transcribe it into a text.We then compute the embedding of the audio input transcription and run an ANN search.The last step should return a (possibly) populated%App.Image{} struct with a look-up in the database.We then update the"audio_search_result" assign with it and display the transcription.

Modify the following handler:

defhandle_info({ref,%{chunks:[%{text:text}]}=result},%{assigns:assigns}=socket)whenassigns.audio_ref==refdoProcess.demonitor(ref,[:flush])File.rm!(@tmp_wav)# compute an normed embedding (cosine case only) on the text result# and returns an App.Image{} as the result of a "knn_search"with%{embedding:input_embedding}<-Nx.Serving.batched_run(Embedding,text),normed_input_embedding<-Nx.divide(input_embedding,Nx.LinAlg.norm(input_embedding)),{:not_empty_index,:ok}<-{:not_empty_index,App.KnnIndex.not_empty_index()},#  {:not_empty_index, App.HnswlibIndex.not_empty_index(index)},%App.Image{}=result<-App.KnnIndex.knn_search(normed_input_embedding)do{:noreply,assign(socket,transcription:String.trim(text),mic_off?:false,audio_running?:false,audio_search_result:result,audio_ref:nil,tmp_wav:@tmp_wav)}# record without entries{:not_empty_index,:error}->{:noreply,assign(socket,mic_off?:false,audio_search_result:nil,audio_running?:false,audio_ref:nil,tmp_wav:@tmp_wav)}nil->{:noreply,assign(socket,transcription:String.trim(text),mic_off?:false,audio_search_result:nil,audio_running?:false,audio_ref:nil,tmp_wav:@tmp_wav)}endend

We next come back to theknn_search function we defined in the "KnnIndex" GenServer.The "approximate nearest neighbour" search function uses the functionHNSWLib.Index.knn_query/3.It returns a tuple{:ok, indices, distances} where "indices" and "distances" are lists.The length is the number of neighbours you want to find parametrized by thek parameter.Withk=1, we ask for a single neighbour.

Note

You may further use a cut-off distance to exclude responses that might not be meaningful.

We will now display the found image with the URL field of the%App.Image{} struct.

Add this to"page_live.html.heex":

<!-- /lib/app_Web/live/page_live.html.heex --><div:if="{@audio_search_result}"><imgsrc="{@audio_search_result.url}"alt="found_image"/></div>

4.1.1 Changing the`Image` schema so it's embeddable

Now we'll save the index found.Let's add a column to theImage table.To do this, run amix task to generate a timestamped file.

mix ecto.gen.migration add_idx_to_images

In the"/priv/repo" folder, open the newly created file and add:

defmoduleApp.Repo.Migrations.AddIdxToImagesdouseEcto.Migrationdefchangedoaltertable(:images)doadd(:idx,:integer,default:0)add(:sha1,:string)endendend

and run the migrationby runningmix ecto.migrate.

Modify theApp.Image struct and the changeset:

@primary_key{:id,:id,autogenerate:true}schema"images"dofield(:description,:string)field(:width,:integer)field(:url,:string)field(:height,:integer)field(:idx,:integer)field(:sha1,:string)timestamps(type::utc_datetime)enddefchangeset(image,params\\%{})doimage|>Ecto.Changeset.cast(params,[:url,:description,:width,:height,:idx,:sha1])|>Ecto.Changeset.validate_required([:width,:height])|>Ecto.Changeset.unique_constraint(:sha1,name::images_sha1_index)|>Ecto.Changeset.unique_constraint(:idx,name::images_idx_index)end

We've added the fieldsidx andsha1 to the image schema.The former pertains to the index of the imagewithin theHNSWLIB index file,so we can look for the image.The latter pertains to thesha1 representation of the image.This will allow us to check if two images are the same,so we can avoid adding duplicate imagesand save some throughput in our application.

In ourchangeset/2 function,we've fundamentally added twounique_constraint/3 functionsto check for the uniqueness of the newly addedidx andsha1 function.These are enforced at the database level so we don't haveduplicated images.

In addition to these changes,we are going to need functions tocalculate thesha1 of the image.Add the following functions to the same file.

defcalc_sha1(file_binary)do:crypto.hash(:sha,file_binary)|>Base.encode16()enddefcheck_sha1(sha1)whenis_binary(sha1)doApp.Repo.get_by(App.Image,%{sha1:sha1})|>casedonil->nil%App.Image{}=image->{:ok,image}endend

calc_sha1/1 uses the:crypto package to hash the file binaryand encode it.
check_sha1/1 fetches an image according to a givensha1 codeand returns the result.

And that's all we need to deal with our images!

4.1.2 Using embeddings in semantic search

Now we have

all the embedding models ready to be used,
our Index file correctly created and maintained throughfilesystem and in the database in thehnswlib_index schema,
the neededsha1 functions to check for duplicated images.

It's time to bring everything together and use all of these toolsto implement semantic search into our application.

We are going to be working insidelib/app_web/live/page_live.ex from now on.

4.1.2.1 Mount socket assigns

First, we are going to update our socket assigns onmount/3.

@image_width640@accepted_mime~w(image/jpeg image/jpg image/png image/webp)@tmp_wavPath.expand("priv/static/uploads/tmp.wav")@impltruedefmount(_params,_session,socket)do{:ok,socket|>assign(# Related to the file uploaded by the userlabel:nil,upload_running?:false,task_ref:nil,image_info:nil,image_preview_base64:nil,# Related to the list of image examplesexample_list_tasks:[],example_list:[],display_list?:false,# Related to the Audiotranscription:nil,mic_off?:false,audio_running?:false,audio_search_result:nil,tmp_wav:@tmp_wav)|>allow_upload(:image_list,accept:~w(image/*),auto_upload:true,progress:&handle_progress/3,max_entries:1,chunk_size:64_000,max_file_size:5_000_000)|>allow_upload(:speech,accept::any,auto_upload:true,progress:&handle_progress/3,max_entries:1)}end

To reiterate:

we've added a few fields related to audio.
- transcription will pertain to the result of the audio transcriptionthat will occur after transcribing the audio from the person.
- mic_off? is simply a toggle to visually show the personwhether the microphone is recording or not.
- audio_running? is a boolean to show the personif the audio transcription and semantic search are occurring (loading).
- audio_search_result is the result of the imagethat is closest semantically to the image's label from thetranscribed audio.
- tmp_wav is the path of the temporary audio filethat is saved in the filesystem while the audio is being transcribed.
additionally, we also have addedallow_upload/3 pertaining to the audio upload(it is tagged as:speech and is being handledin the same function as the upload:image_list).

These are the socket assignsthat will allow us to dynamically update the person using our appwith what the app is doing.

4.1.2.2 Consuming image uploads

As you can see, we are usinghandle_progress/3withallow_upload/3.As we know,handle_progress/3 is called whenever an upload happens(whether an image or recording of the person's voice).We define two different declarations for how we want toprocess:image uploads and:speech uploads.

Let's start with the first one.

We have addedsha1 andidx as fields to our image schema.Therefore, we are going to need to make some changesto thehandle_progress/3 of the:image_list.Change it like so:

defhandle_progress(:image_list,entry,socket)whenentry.done?do# We consume the entry only if the entry is done uploading from the image# and if consuming the entry was successful.consume_uploaded_entry(socket,entry,fn%{path:path}->with{:magic,{:ok,%{mime_type:mime}}}<-{:magic,magic_check(path)},# Check if file can be properly read{:read,{:ok,file_binary}}<-{:read,File.read(path)},# Check the image info{:image_info,{mimetype,width,height,_variant}}<-{:image_info,ExImageInfo.info(file_binary)},# Check mime type{:check_mime,:ok}<-{:check_mime,check_mime(mime,mimetype)},# Get SHA1 code from the image and check itsha1=App.Image.calc_sha1(file_binary),{:sha_check,nil}<-{:sha_check,App.Image.check_sha1(sha1)},# Get image and resize{:ok,thumbnail_vimage}<-Vops.thumbnail(path,@image_width,size::VIPS_SIZE_DOWN),# Pre-process the image as tensor{:pre_process,{:ok,tensor}}<-{:pre_process,pre_process_image(thumbnail_vimage)}do# Create image info to be saved as partial imageimage_info=%{mimetype:mimetype,width:width,height:height,sha1:sha1,description:nil,url:nil,# set a random big int to the "idx" fieldidx::rand.uniform(1_000_000_000_000)*1_000}# Save partial imageApp.Image.insert(image_info)|>casedo{:ok,_}->image_info=Map.merge(image_info,%{file_binary:file_binary}){:ok,%{tensor:tensor,image_info:image_info,path:path}}{:error,changeset}->{:error,changeset.errors}end|>handle_upload()else{:magic,{:error,msg}}->{:postpone,%{error:msg}}{:read,msg}->{:postpone,%{error:inspect(msg)}}{:image_info,nil}->{:postpone,%{error:"image_info error"}}{:check_mime,:error}->{:postpone,%{error:"Bad mime type"}}{:sha_check,{:ok,%App.Image{}}}->{:postpone,%{error:"Image already uploaded"}}{:pre_process,{:error,_msg}}->{:postpone,%{error:"pre_processing error"}}{:error,reason}->{:postpone,%{error:inspect(reason)}}endend)|>casedo# If consuming the entry was successful, we spawn a task to classify the image# and update the socket assigns%{tensor:tensor,image_info:image_info}->task=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(ImageClassifier,tensor)end)# Encode the image to base64base64="data:image/png;base64, "<>Base.encode64(image_info.file_binary){:noreply,assign(socket,upload_running?:true,task_ref:task.ref,image_preview_base64:base64,image_info:image_info)}# Otherwise, if there was an error uploading the image, we log the error and show it to the person.%{error:error}->Logger.warning("⚠️ Error uploading image.#{inspect(error)}"){:noreply,push_event(socket,"toast",%{message:"Image couldn't be uploaded to S3.\n#{error}"})}endend

Let's go over these changes.Some of these is code that has been written prior,but for clarification, we'll go over them again.

we useconsume_uploaded_entry/3 to consume the imagethat the person uploads.To consume the image successfully,the image goes through an array of validations.
- we usemagic_check/1 to check the MIME type of the image validity.
- we read the contents of the image usingExImageInfo.info/1.
- we check if the MIME type is valid usingcheck_mime/2.
- we calculate thesha1 with theApp.Image.calc_sha1/1 functionwe've developed earlier.
- we resize the image and scale it down to the same widthas the images that are trained using the image captioning model we've chosen(to yield better results and to save memory bandwidth).We useVix.Operations.thumbnail/3 to resize the image.
- finally, we convert the resized image to a tensor usingpre_process_image/1 so it can be consumed by our image captioning model.
after this series of validations,we use the image info we've obtained earlier tocreate an "early-save" of the image.With this, we are saving the image and associating it withthesha1 that was retrieved from the image contents.We are doing this "partial image saving"in case two identical images are being uploaded at the same time.Because we are enforcingsha1 to be unique at the database level,this race condition is solved by the database optimistically.
afterwards, we callhandle_upload/0.This function will upload the image to theS3 bucket.We are going to implement this function in just a second 😉.
if the upload is successful,using the tensor and the image information from the previous steps,we spawn the async task to run the model.This step should be familiar to yousince we've already implemented this.Finally, we update the socket assigns accordingly.
we handle all possible errors in theelse statement of thewith flow control statement before the image is uploaded.

Hopefully, this demystifies some of the code we've just implemented!

Because we are usinghandle_upload/0 in this functionto upload the image to ourS3 bucket,let's do it right now!

defhandle_upload({:ok,%{path:path,tensor:tensor,image_info:image_info}=map})whenis_map(map)do# Upload the image to S3Image.upload_image_to_s3(path,image_info.mimetype)|>casedo# If the upload is successful, we update the socket assigns with the image info{:ok,url}->image_info=struct(%ImageInfo{},Map.merge(image_info,%{url:url})){:ok,%{tensor:tensor,image_info:image_info}}# If S3 upload fails, we return error{:error,reason}->Logger.warning("⚠️ Error uploading image:#{inspect(reason)}"){:postpone,%{error:"Bucket error"}}endenddefhandle_upload({:error,error})doLogger.warning("⚠️ Error creating partial image:#{inspect(error)}"){:postpone,%{error:"Error creating partial image"}}end

This function is fairly easy to understand.We upload the image by callingImage.upload_image_to_s3/2 and,if successful,we add the returning URL to the image struct.Otherwise, we handle the error and return it.

After this small detour,let's implement thehandle_progress/3for the:speech uploads,that is, the audio the person records.

defhandle_progress(:speech,entry,%{assigns:assigns}=socket)whenentry.done?do# We consume the audio filetmp_wav=socket|>consume_uploaded_entry(entry,fn%{path:path}->tmp_wav=assigns.tmp_wav<>Ecto.UUID.generate()<>".wav":ok=File.cp!(path,tmp_wav){:ok,tmp_wav}end)# After consuming the audio file, we spawn a task to transcribe the audioaudio_task=Task.Supervisor.async(App.TaskSupervisor,fn->Nx.Serving.batched_run(Whisper,{:file,tmp_wav})end)# Update the socket assigns{:noreply,assign(socket,audio_ref:audio_task.ref,mic_off?:true,tmp_wav:tmp_wav,audio_running?:true,audio_search_result:nil,transcription:nil)}end

As we know, this function is called after the upload is completed.In the case of audio uploads,the hook is called by the person recording their voiceinassets/js/app.js.Similarly to thehandle_progress/3 function of the:image_list uploads,we also useconsume_uploaded_entry/3 to consume the audio file.

we consume the audio file and save it in our filesystemas a.wav file.
we spawn the async task and use thewhisper audio transcription modelwith the audio file we've just saved.
we update the socket assigns accordingly.

Pretty simple, right?

4.1.2.3 Using the embeddings to semantically search images

In this section, we'll finally useour embedding model and semantically search for our images!

As you've seen in the previous section,we've spawned the task to transcribe the audio into thewhipser model.Now we need a handler!For this scenario,add the following function.

@impltruedefhandle_info({ref,%{chunks:[%{text:text}]}=_result},%{assigns:assigns}=socket)whenassigns.audio_ref==refdoProcess.demonitor(ref,[:flush])File.rm!(assigns.tmp_wav)# Compute an normed embedding (cosine case only) on the text result# and returns an App.Image{} as the result of a "knn_search"with{:not_empty_index,:ok}<-{:not_empty_index,App.KnnIndex.not_empty_index()},%{embedding:input_embedding}<-Nx.Serving.batched_run(Embedding,text),%Nx.Tensor{}=normed_input_embedding<-Nx.divide(input_embedding,Nx.LinAlg.norm(input_embedding)),%App.Image{}=result<-App.KnnIndex.knn_search(normed_input_embedding)do{:noreply,assign(socket,transcription:String.trim(text),mic_off?:false,audio_running?:false,audio_search_result:result,audio_ref:nil,tmp_wav:@tmp_wav)}else# Stop transcription if no entries in the Index{:not_empty_index,:error}->{:noreply,socket|>push_event("toast",%{message:"No images yet"})|>assign(mic_off?:false,transcription:"!! The image bank is empty. Please upload some !!",audio_search_result:nil,audio_running?:false,audio_ref:nil,tmp_wav:@tmp_wav)}nil->{:noreply,assign(socket,transcription:String.trim(text),mic_off?:false,audio_search_result:nil,audio_running?:false,audio_ref:nil,tmp_wav:@tmp_wav)}endend

Let's break down this function:

given therecording text transcription:
- we check if the Index file holding isnot empty.
- we use the text transcription and run itthrough the embedding model and get its result.
- with the embedding we've received from the model,wenormalize it.
- with the normalized embedding,we **run it through aknn search.For this, we call theApp.KnnIndex.knn_search/1 functionwe've defined in theApp.KnnIndex GenServerwe've implemented earlier on.
- theknn search returns the closest semantical image(through the image caption)from the audio transcription.
- upon the success of this process, we update the socket assigns.
- otherwise, we handle each error case accordinglyand update the socket assigns.

And that's it!We just add to sequentially call the functionsthat we've implemented prior!

4.1.2.4 Creating embeddings when uploading images

Now that we haveused the embeddings,there's one thing we forgot:we forgot to keep track of the embeddings of each image that is uploaded.These embeddings are saved in the Index file.

To fix this, we need to create an embedding of the imageafter it is uploaded and captioned.Head over to thehandle_info/2 pertaining to the image captioning,and change it to the following piece of code:

defhandle_info({ref,result},%{assigns:assigns}=socket)do# Flush async callProcess.demonitor(ref,[:flush])# You need to change how you destructure the output of the model depending# on the model you've chosen for `prod` and `test` envs on `models.ex`.)label=caseApplication.get_env(:app,:use_test_models,false)dotrue->App.Models.extract_captioning_test_label(result)# coveralls-ignore-startfalse->App.Models.extract_captioning_prod_label(result)# coveralls-ignore-stopend%{image_info:image_info}=assignsconddo# If the upload task has finished executing, we run the embedding model on the imageMap.get(assigns,:task_ref)==ref->image=%{url:image_info.url,width:image_info.width,height:image_info.height,description:label,sha1:image_info.sha1}# Create embedding taskwith%{embedding:data}<-Nx.Serving.batched_run(Embedding,label),# Compute a normed embedding (cosine case only) on the text resultnormed_data<-Nx.divide(data,Nx.LinAlg.norm(data)),# Check the SHA1 of the image{:check_used,{:ok,pending_image}}<-{:check_used,App.Image.check_sha1(image.sha1)}doEcto.Multi.new()# Save updated Image to DB|>Ecto.Multi.run(:update_image,fn_,_->idx=App.KnnIndex.get_count()+1Ecto.Changeset.change(pending_image,%{idx:idx,description:image.description,url:image.url})|>App.Repo.update()end)# Save Index file to DB|>Ecto.Multi.run(:save_index,fn_,_->{:ok,_idx}=App.KnnIndex.add_item(normed_data)App.KnnIndex.save_index_to_db()end)|>App.Repo.transaction()|>casedo{:error,:update_image,_changeset,_}->{:noreply,socket|>push_event("toast",%{message:"Invalid entry"})|>assign(upload_running?:false,task_ref:nil,label:nil)}{:error,:save_index,_,_}->{:noreply,socket|>push_event("toast",%{message:"Please retry"})|>assign(upload_running?:false,task_ref:nil,label:nil)}{:ok,_}->{:noreply,socket|>assign(upload_running?:false,task_ref:nil,label:label)}endelse{:check_used,nil}->{:noreply,socket|>push_event("toast",%{message:"Race condition"})|>assign(upload_running?:false,task_ref:nil,label:nil)}{:error,msg}->{:noreply,socket|>push_event("toast",%{message:msg})|>assign(upload_running?:false,task_ref:nil,label:nil)}end# If the example task has finished executing, we upload the socket assigns.img=Map.get(assigns,:example_list_tasks)|>Enum.find(&(&1.ref==ref))-># Update the element in the `example_list` enum to turn "predicting?" to `false`updated_example_list=update_example_list(assigns,img,label){:noreply,assign(socket,example_list:updated_example_list,upload_running?:false,display_list?:true)}endend

Let's go over the flow of this function:

we extract the captioning label from the result of the image captioning model.This code is the same as it was before.
afterwards, we get the labelandfeed it into the embedding model.
the embedding model yields the embedding,we normalize it andcheck if thesha1 code of the image is already being used.
if these three processes occur successfuly,we perform adatabase transaction wherewesave the updated image to the database,update the Index file count (we increment it)andsave the index file to the database.
we finally update the socket assigns accordingly.
if any of the previous calls fail,we handle these error scenariosand update the socket assigns.

And that's it!Our app is fully loaded with semantic search capabilities! 🔋

4.1.2.5 Update the LiveView view

All that's left is updating our view.We are going to add basic elementsto make this transition as smooth as possible.

Head over tolib/app_web/live/page_live.html.heexand update it as so:

<divclass="hidden"id="tracker_el"phx-hook="ActivityTracker"/><divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 sm:py-24 lg:px-8 xl:px-28 xl:py-32"><divclass="flex flex-col justify-start"><divclass="flex justify-center items-center w-full"><divclass="2xl:space-y-12"><divclass="mx-auto max-w-2xl lg:text-center"><p><spanclass="rounded-full w-fit bg-brand/5 px-2 py-1 text-[0.8125rem] font-medium text-center leading-6 text-brand"><ahref="https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html"target="_blank"rel="noopener noreferrer">                🔥 LiveView</a>              +<ahref="https://github.com/elixir-nx/bumblebee"target="_blank"rel="noopener noreferrer">                🐝 Bumblebee</a></span></p><pclass="mt-2 text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl">            Caption your image!</p><h3class="mt-6 text-lg leading-8 text-gray-600">            Upload your own image (up to 5MB) and perform image captioning with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">              Elixir</a>            !</h3><pclass="text-lg leading-8 text-gray-400">            Powered with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">              HuggingFace🤗</a>            transformer models, you can run this project locally and perform            machine learning tasks with a handful lines of code.</p></div><div></div><divclass="border-gray-900/10"><!-- File upload section --><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"phx-drop-target="{@uploads.image_list.ref}"><divclass="text-center"><!-- Show image preview --><%= if @image_preview_base64 do %><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><%= if not @upload_running? do %><.live_file_input upload={@uploads.image_list}/><% end %><imgsrc="{@image_preview_base64}"/></label></form><% else %><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/> Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">                  PNG, JPG, GIF up to 5MB</p><% end %></div></div></div></div><!-- Show errors --><%= for entry<-@uploads.image_list.entriesdo%><divclass="mt-2"><%= for err<-upload_errors(@uploads.image_list,entry)do%><divclass="rounded-md bg-red-50 p-4 mb-2"><divclass="flex"><divclass="flex-shrink-0"><svgclass="h-5 w-5 text-red-400"viewBox="0 0 20 20"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.28 7.22a.75.75 0 00-1.06 1.06L8.94 10l-1.72 1.72a.75.75 0 101.06 1.06L10 11.06l1.72 1.72a.75.75 0 101.06-1.06L11.06 10l1.72-1.72a.75.75 0 00-1.06-1.06L10 8.94 8.28 7.22z"clip-rule="evenodd"/></svg></div><divclass="ml-3"><h3class="text-sm font-medium text-red-800"><%= error_to_string(err) %></h3></div></div></div><% end %></div><% end %><!-- Prediction text --><divclass="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl"><span>Description:</span><!-- conditional Spinner or display caption text or waiting text--><AppWeb.Spinner.spinspin="{@upload_running?}"/><%= if @label do %><spanclass="text-gray-700 font-light"><%= @label %></span><% else %><spanclass="text-gray-300 font-light">Waiting for image input.</span><% end %></div></div></div><!-- Audio --><br/><divclass="mx-auto max-w-2xl lg"><h2class="mt-2 text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl text-center">        Semantic search using an audio</h2><br/><p>        Please record a phrase. You can listen to your audio. It will be        transcripted automatically into a text and appear below. The semantic        search for matching images will then run automatically and the found        image appear below.</p><br/><formid="audio-upload-form"phx-change="noop"class="flex flex-col items-center"><.live_file_input upload={@uploads.speech}/><buttonid="record"class="bg-blue-500 hover:bg-blue-700 text-white font-bold px-4 rounded flex"type="button"phx-hook="Audio"disabled="{@mic_off?}"><Heroicons.microphoneoutlineclass="w-6 h-6 text-white font-bold group-active:animate-pulse"/><spanid="text">Record</span></button></form><br/><pclass="flex flex-col items-center"><audioid="audio"controls></audio></p><br/><divclass="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl"><span>Transcription:</span><%= if @audio_running? do %><AppWeb.Spinner.spinspin="{@audio_running?}"/><% else %><%= if @transcription do %><spanclass="text-gray-700 font-light"><%= @transcription %></span><% else %><spanclass="text-gray-300 font-light text-justify">Waiting for audio input.</span><% end %><% end %></div><br/><div:if="{@audio_search_result}"><divclass="border-gray-900/10"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"><imgsrc="{@audio_search_result.url}"alt="found_image"/></div></div></div></div><!-- Examples --><div:if="{@display_list?}"class="flex flex-col"><h3class="mt-10 text-xl lg:text-center font-light tracking-tight text-gray-900 lg:text-2xl">        Examples</h3><divclass="flex flex-row justify-center my-8"><divclass="mx-auto grid max-w-2xl grid-cols-1 gap-x-6 gap-y-20 sm:grid-cols-2"><%= for example_img<-@example_listdo%><!-- Loading skeleton if it is predicting --><%= if example_img.predicting? == true do %><divrole="status"class="flex items-center justify-center w-full h-full max-w-sm bg-gray-300 rounded-lg animate-pulse"><imgsrc={~p"/images/spinner.svg"} alt="spinner"/><spanclass="sr-only">Loading...</span></div><% else %><div><imgid="{example_img.url}"src="{example_img.url}"class="rounded-2xl object-cover"/><h3class="mt-1 text-lg leading-8 text-gray-900 text-center"><%= example_img.label %></h3></div><% end %></div></div></div></div></div>

As you may have noticed,we've made some changes to the Audio portion of the HTML.

we check if the@transcription assign exists.If so, we display the text to the person.
we check if the@audio_search_result assign is notnil.If that's the case, the image that is semantically closestto the audio transcription is shown to the person.

And that's it!We are simply showing the personthe results.

And with that, you've successfully addedsemantic search into the application!Pat yourself on the back! 👏

You've expanded your knowledge in key areas of machine learningand artificial intelligence,that is increasingly becoming more prevalent!

5. Tweaking our UI

Now that we have all the features we want in our application,let's make it prettier!As it stands, it's responsive enough.But we can always make it better!

We're going to show you the changes you're going to need to makeand then explain it to you what it means!

Head over tolib/app_web/live/page_live.html.heex and change it like so:

<divclass="hidden"id="tracker_el"phx-hook="ActivityTracker"/><divclass="h-full w-full px-4 py-10 flex justify-center sm:px-6 xl:px-28"><divclass="flex flex-col justify-start lg:w-full"><divclass="flex justify-center items-center w-full"><divclass="w-full 2xl:space-y-12"><divclass="mx-auto lg:text-center"><!-- Title pill --><pclass="text-center"><spanclass="rounded-full w-fit bg-brand/5 px-2 py-1 text-[0.8125rem] font-medium text-center leading-6 text-brand"><ahref="https://hexdocs.pm/phoenix_live_view/Phoenix.LiveView.html"target="_blank"rel="noopener noreferrer">            🔥 LiveView</a>            +<ahref="https://github.com/elixir-nx/bumblebee"target="_blank"rel="noopener noreferrer">            🐝 Bumblebee</a></span></p><!-- Toggle Buttons --><divclass="flex justify-center lg:invisible"><spanclass="isolate inline-flex rounded-md shadow-sm mt-2"><buttonid="upload_option"type="button"class="relative inline-flex items-center gap-x-1.5 rounded-l-md bg-blue-500 text-white hover:bg-blue-600 px-3 py-2 text-sm font-semibold ring-1 ring-inset ring-gray-300 focus:z-10"><svgfill="none"viewBox="0 0 24 24"stroke-width="1.5"stroke="currentColor"class="-ml-0.5 h-5 w-5 text-white"><pathstroke-linecap="round"stroke-linejoin="round"d="M3 16.5v2.25A2.25 2.25 0 0 0 5.25 21h13.5A2.25 2.25 0 0 0 21 18.75V16.5m-13.5-9L12 3m0 0 4.5 4.5M12 3v13.5"/></svg>                Upload</button><buttonid="search_option"type="button"class="relative -ml-px inline-flex items-center rounded-r-md bg-white px-3 py-2 text-sm font-semibold text-gray-900 ring-1 ring-inset ring-gray-300 hover:bg-gray-50 focus:z-10"><svgfill="none"viewBox="0 0 24 24"stroke-width="1.5"stroke="currentColor"class="-ml-0.5 h-5 w-5 text-gray-400"><pathstroke-linecap="round"stroke-linejoin="round"d="m21 21-5.197-5.197m0 0A7.5 7.5 0 1 0 5.196 5.196a7.5 7.5 0 0 0 10.607 10.607Z"/></svg>                Search</button></span></div><!-- Containers --><divclass="flex flex-col lg:flex-row lg:justify-around"><!-- UPLOAD CONTAINER --><divid="upload_container"class="mb-6 lg:px-10"><pclass="mt-2 text-center text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl">                Caption your image!</p><divclass="flex gap-x-4 rounded-xl bg-black/5 px-6 py-2 mt-2"><divclass="flex flex-col justify-center items-center"><svgfill="none"viewBox="0 0 24 24"stroke-width="1.5"stroke="currentColor"class="w-7 h-7 text-indigo-400"><pathstroke-linecap="round"stroke-linejoin="round"d="M9 8.25H7.5a2.25 2.25 0 0 0-2.25 2.25v9a2.25 2.25 0 0 0 2.25 2.25h9a2.25 2.25 0 0 0 2.25-2.25v-9a2.25 2.25 0 0 0-2.25-2.25H15m0-3-3-3m0 0-3 3m3-3V15"/></svg></div><divclass="text-sm leading-2 text-justify flex flex-col justify-center"><pclass="text-slate-700">                      Upload your own image (up to 5MB) and perform image captioning with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">                      Elixir</a>                      !</p></div></div><pclass="mt-4 text-center text-sm leading-2 text-gray-400">                Powered with<ahref="https://elixir-lang.org/"target="_blank"rel="noopener noreferrer"class="font-mono font-medium text-sky-500">                HuggingFace🤗</a>                transformer models,                you can run this project locally and perform machine learning tasks with a handful lines of code.</p><!-- File upload section --><divclass="border-gray-900/10 mt-4"><divclass="col-span-full"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"phx-drop-target={@uploads.image_list.ref}><divclass="text-center"><!-- Show image preview --><%= if @image_preview_base64 do %><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><%= if not @upload_running? do %><.live_file_input upload={@uploads.image_list}/><% end %><imgsrc={@image_preview_base64}/></label></form><% else %><svgclass="mx-auto h-12 w-12 text-gray-300"viewBox="0 0 24 24"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M1.5 6a2.25 2.25 0 012.25-2.25h16.5A2.25 2.25 0 0122.5 6v12a2.25 2.25 0 01-2.25 2.25H3.75A2.25 2.25 0 011.5 18V6zM3 16.06V18c0 .414.336.75.75.75h16.5A.75.75 0 0021 18v-1.94l-2.69-2.689a1.5 1.5 0 00-2.12 0l-.88.879.97.97a.75.75 0 11-1.06 1.06l-5.16-5.159a1.5 1.5 0 00-2.12 0L3 16.061zm10.125-7.81a1.125 1.125 0 112.25 0 1.125 1.125 0 01-2.25 0z"clip-rule="evenodd"/></svg><divclass="mt-4 flex text-sm leading-6 text-gray-600"><labelfor="file-upload"class="relative cursor-pointer rounded-md bg-white font-semibold text-indigo-600 focus-within:outline-none focus-within:ring-2 focus-within:ring-indigo-600 focus-within:ring-offset-2 hover:text-indigo-500"><formid="upload-form"phx-change="noop"phx-submit="noop"><labelclass="cursor-pointer"><.live_file_input upload={@uploads.image_list}/> Upload</label></form></label><pclass="pl-1">or drag and drop</p></div><pclass="text-xs leading-5 text-gray-600">PNG, JPG, GIF up to 5MB</p><% end %></div></div></div></div><!-- Show errors --><%= for entry<-@uploads.image_list.entriesdo%><divclass="mt-2"><%= for err<-upload_errors(@uploads.image_list,entry)do%><divclass="rounded-md bg-red-50 p-4 mb-2"><divclass="flex"><divclass="flex-shrink-0"><svgclass="h-5 w-5 text-red-400"viewBox="0 0 20 20"fill="currentColor"aria-hidden="true"><pathfill-rule="evenodd"d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.28 7.22a.75.75 0 00-1.06 1.06L8.94 10l-1.72 1.72a.75.75 0 101.06 1.06L10 11.06l1.72 1.72a.75.75 0 101.06-1.06L11.06 10l1.72-1.72a.75.75 0 00-1.06-1.06L10 8.94 8.28 7.22z"clip-rule="evenodd"/></svg></div><divclass="ml-3"><h3class="text-sm font-medium text-red-800"><%= error_to_string(err) %></h3></div></div></div><% end %></div><% end %><!-- Prediction text --><divclass="flex mt-2 space-x-1.5 items-center"><spanclass="font-bold text-gray-900">Description:</span><!-- conditional Spinner or display caption text or waiting text--><%= if @upload_running? do %><AppWeb.Spinner.spinspin={@upload_running?}/><% else %><%= if @label do %><spanclass="text-gray-700 font-light"><%= @label %></span><% else %><spanclass="text-gray-300 font-light text-justify">Waiting for image input.</span><% end %><% end %></div><!-- Examples --><%= if @display_list? do %><div:if={@display_list?}class="mt-16 flex flex-col"><h3class="text-xl text-center font-bold tracking-tight text-gray-900 lg:text-2xl">                    Examples</h3><divclass="flex flex-row justify-center my-8"><divclass="mx-auto grid max-w-2xl grid-cols-1 gap-x-6 gap-y-10 sm:grid-cols-2"><%= for example_img<-@example_listdo%><!-- Loading skeleton if it is predicting --><%= if example_img.predicting? == true do %><divrole="status"class="flex items-center justify-center w-full h-full max-w-sm bg-gray-300 rounded-lg animate-pulse"><imgsrc={~p"/images/spinner.svg"} alt="spinner"/><spanclass="sr-only">Loading...</span></div><% else %><div><imgid={example_img.url}src={example_img.url}class="rounded-2xl object-cover"/><h3class="mt-1 text-lg leading-8 text-gray-900 text-center"><%= example_img.label %></h3></div><% end %><% end %></div></div></div><% end %></div><!-- AUDIO SEMANTIC SEARCH CONTAINER --><divid="search_container"class="hidden mb-6 mx-auto lg:block lg:px-10"><h2class="mt-2 text-3xl font-bold tracking-tight text-gray-900 sm:text-4xl text-center">                ...or search it!</h2><divclass="flex gap-x-4 rounded-xl bg-black/5 px-6 py-2 mt-2"><divclass="flex flex-col justify-center items-center"><svgfill="none"viewBox="0 0 24 24"stroke-width="1.5"stroke="currentColor"class="w-7 h-7 text-indigo-400"><pathstroke-linecap="round"stroke-linejoin="round"d="M12 18.75a6 6 0 0 0 6-6v-1.5m-6 7.5a6 6 0 0 1-6-6v-1.5m6 7.5v3.75m-3.75 0h7.5M12 15.75a3 3 0 0 1-3-3V4.5a3 3 0 1 1 6 0v8.25a3 3 0 0 1-3 3Z"/></svg></div><divclass="text-sm leading-2 text-justify flex flex-col justify-center"><pclass="text-slate-700">                      Record a phrase or some key words.                      We'll detect them and semantically search it in our database of images!</p></div></div><pclass="mt-4 text-center text-sm leading-2 text-gray-400">                After recording your audio, you can listen to it. It will be transcripted automatically into text and appear below.</p><pclass="text-center text-sm leading-2 text-gray-400">                Semantic search will automatically kick in and the resulting image will be shown below.</p><!-- Audio recording button --><formid="audio-upload-form"phx-change="noop"class="mt-8 flex flex-col items-center"><.live_file_input upload={@uploads.speech}/><buttonid="record"class="bg-blue-500 hover:bg-blue-700 text-white font-bold p-4 rounded flex"type="button"phx-hook="Audio"disabled={@mic_off?}><Heroicons.microphoneoutlineclass="w-6 h-6 text-white font-bold group-active:animate-pulse"/><spanid="text">Record</span></button></form><!-- Audio preview --><pclass="flex flex-col items-center mt-6"><audioid="audio"controls></audio></p><!-- Audio transcription --><divclass="flex mt-2 space-x-1.5 items-center"><spanclass="font-bold text-gray-900">Transcription:</span><%= if @audio_running? do %><AppWeb.Spinner.spinspin={@audio_running?}/><% else %><%= if @transcription do %><spanid="output"class="text-gray-700 font-light"><%= @transcription %></span><% else %><spanclass="text-gray-300 font-light text-justify">Waiting for audio input.</span><% end %><% end %></div><!-- Semantic search result --><div:if={@audio_search_result}><divclass="border-gray-900/10"><divclass="mt-2 flex justify-center rounded-lg border border-dashed border-gray-900/25 px-6 py-10"><imgsrc={@audio_search_result.url}alt="found_image"/></div></div><spanclass="text-gray-700 font-light"><%= @audio_search_result.description %></span></div></div></div></div></div></div></div></div>

That may look like a lot, but we've done just a handful of changes!

we've reestructured our HTML so it's easier to read.You just have a few key elements: the pill on top of the page,a toggle that we've added (to show theupload sectionand thesearch section) and two containerswith the upload and search section, respectively.The code is practically intact for each section.
we've made minor styling changes to each section.On theupload section,we added a small callout section.In thesearch section,we added a small calout section as well,and added the description of the imagethat is found after the audio transcription occurs.

And that's it!What's importantis that only one second is shown at a time on mobile devicesandboth sections are shown on desktop devices(over1024 pixels -pertaining to thelg breakpoint ofTailwindCSS).On desktop devices, the toggle button should disappear.

To accomplish these, we need to add a bit ofJavascript andCSS magic. 🪄Head over toassets/js/app.jsand add the following code.

document.getElementById("upload_option").addEventListener("click",function(){document.getElementById("upload_container").style.display="block";document.getElementById("search_container").style.display="none";document.getElementById("upload_option").classList.replace("bg-white","bg-blue-500");document.getElementById("upload_option").classList.replace("text-gray-900","text-white");document.getElementById("upload_option").classList.replace("hover:bg-gray-50","hover:bg-blue-600");document.getElementById("upload_option").getElementsByTagName("svg")[0].classList.replace("text-gray-400","text-white");document.getElementById("search_option").classList.replace("bg-blue-500","bg-white");document.getElementById("search_option").classList.replace("text-white","text-gray-900");document.getElementById("search_option").classList.replace("hover:bg-blue-600","hover:bg-gray-50");document.getElementById("search_option").getElementsByTagName("svg")[0].classList.replace("text-white","text-gray-400");});document.getElementById("search_option").addEventListener("click",function(){document.getElementById("upload_container").style.display="none";document.getElementById("search_container").style.display="block";document.getElementById("search_option").classList.replace("bg-white","bg-blue-500");document.getElementById("search_option").classList.replace("text-gray-900","text-white");document.getElementById("search_option").classList.replace("hover:bg-gray-50","hover:bg-blue-600");document.getElementById("search_option").getElementsByTagName("svg")[0].classList.replace("text-gray-400","text-white");document.getElementById("upload_option").classList.replace("bg-blue-500","bg-white");document.getElementById("upload_option").classList.replace("text-white","text-gray-900");document.getElementById("upload_option").classList.replace("hover:bg-blue-600","hover:bg-gray-50");document.getElementById("upload_option").getElementsByTagName("svg")[0].classList.replace("text-white","text-gray-400");});

The code is self-explanatory.We are changing the styles of the toggle buttonsaccording to the button that is clicked.

The other thing we need to make sure is thatboth sections are shown on desktop devices,regardless of what the current section is selected.Luckily, we can override styles by adding this piece of codetoassets/css/app.css.

@media (min-width:1024px) {#upload_container,#search_container {display: block!important;/* Override any inline styles */  }}

And that's it!We can see our slightly refactored UI in all of its gloryby runningmix phx.server!

Please star the repo! ⭐️

If you find this package/repo useful,please star it on GitHub, so that we know! ⭐

Thank you! 🙏

About

🖼️ Classify images and extract data from or describe their contents using machine learning

Movatterモバイル変換

License

dwyl/image-classifier

Folders and files

Latest commit

History

Repository files navigation

Image Captioning & Semantic Search inElixir

Why? 🤷

What? 💭

Who? 👤

How? 💻

Prerequisites

🌄 Image Captioning inElixir

0. Creating a freshPhoenix project

1. Installing initial dependencies

2. AddingLiveView capabilities to our project

3. Receiving image files

4. IntegratingBumblebee

4.1Nx configuration

4.2Async processing the image for classification

4.2.1 Considerations regardingasync processes

4.2.2 Alternative for better testing

4.3 Image pre-processing

4.4 Updating the view

4.5 Check it out!

4.6 Considerations on user images

5. Final Touches

5.1 Setting max file size

5.2 Show errors

5.3 Show image preview

6. What about other models?

7. How do I deploy this thing?

8. Showing example images

8.1 Creating a hook in client

8.2 Handling the example images list event inside our LiveView

8.3 Updating the view

8.4 Using URL of image instead of base64-encoded

8.5 See it running

9. Store metadata and classification info

9.1 Installing dependencies

9.2 AddingPostgres configuration files

9.3 CreatingImage schema

9.4 Changing our LiveView to persist data

10. Adding double MIME type check and showing feedback to the person in case of failure

10.1 Showing a toast component with error

11. Benchmarking image captioning models

🔍 Semantic search

0. Overview of the process

0.1 Audio transcription

0.2 Creating embeddings

0.3 Semantical search

1. Pre-requisites

2. Transcribe an audio recording

1.1 Adding a loading spinner

2.2 DefiningJavascript hook

2.3 Handling audio upload inLiveView

2.4 Serving theWhisper model

2.5 Handling the model's response and updating elements in the view

3. Embeddings and semantic search

3.1 TheHNSWLib Index (GenServer)

3.2 Saving theHNSWLib Index in the database

3.2 The embeding model

4. Using the Index and embeddings

4.0 Check the folder "hnswlib"

4.1 Computing the embeddings in our app

4.1.1 Changing theImage schema so it's embeddable

4.1.2 Using embeddings in semantic search

4.1.2.1 Mount socket assigns

4.1.2.2 Consuming image uploads

4.1.2.3 Using the embeddings to semantically search images

4.1.2.4 Creating embeddings when uploading images

4.1.2.5 Update the LiveView view

5. Tweaking our UI

Please star the repo! ⭐️

About

Topics

Resources

License

Uh oh!

Image Captioning & Semantic Search in`Elixir`

🌄 Image Captioning in`Elixir`

0. Creating a fresh`Phoenix` project

2. Adding`LiveView` capabilities to our project

4. Integrating`Bumblebee`

4.1`Nx` configuration

4.2`Async` processing the image for classification

4.2.1 Considerations regarding`async` processes

9.2 Adding`Postgres` configuration files

9.3 Creating`Image` schema

2.2 Defining`Javascript` hook

2.3 Handling audio upload in`LiveView`

2.4 Serving the`Whisper` model

3.1 The`HNSWLib` Index (GenServer)

3.2 Saving the`HNSWLib` Index in the database

4.1.1 Changing the`Image` schema so it's embeddable