Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

🗯️Illustrate stories in real time using nothing but your voice! Top 3 overall at Hack Harvard 2018

NotificationsYou must be signed in to change notification settings

kumailn/comical

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InspirationWhile struggling to settle on a project idea, our team went on a walk and found ourselves at the Harvard Art Museum. It was here that we came up with the idea to create comical, inspired by this year's Bold Strokes theme. With comical, anyone can express themself creatively and make incredible comics in minutes, using nothing but their voice.

What it doescomical acts as a chatbot that prompts the user to tell a story using a Google Home (or any other Google Assistant powered device). comical converts user voice or text input into a relevant comic style image, complete with narrative text boxes and character speech bubbles. Machine Learning and Sentiment Analysis are applied to the input in order to extract key phrases and determine the tone. This information is used to find appropriate images to display in the comic.

The images are "cartoonified" to provide a comic book feel before comical applies facial recognition software to determine the appropriate location for speech bubbles and text boxes. Once this process is complete, the finalized image is added to a comic book panel display on a web application in real time so that the story is created in front of the user's eyes as they speak.

How we built itOur team built this application using a Vue front-end and a Node.js back-end, using the following external services:

Azure

Sentiment Analysis to determine the tone/emotions of user inputFace API to map the location of faces within imagesKey Phrase Extraction to break user input down into its critical componentsBing Image Search API to find images from extracted keywordsGoogle Cloud Platform

Firebase used to deploy Node.js functionsSession data storage in FirestoreDialogflow used to create the conversation modelActions-on-Google v2 Library for conversational fulfilmentChallenges we ran intoThe biggest issue our team encountered was in the criteria and classification of the image being selected for our comics by the Bing Image Search API. For example, if the user enters "My sister was crying", the key phrase extraction would occasionally feel that the most relevant information was the word "sister"and would generate an image of a smiling woman. This was the reason our team implemented sentiment analysis, in order to understand the context of the input. Our team also lacked experience in many of the Azure/Google Cloud platform features we needed to implement.

Accomplishments that we're proud ofWe're proud of having ideated, designed, developed and tested an ambitious idea in just 36 hours! Our team formed after the start of the Hackathon, and our members came from a variety of countries and backgrounds. We feel our biggest accomplishments are..:

Learning multiple new tools within a short period of timeCreating a project that communicates between multiple endpoints, API's and databasesSleeping ~3 hours per person over a 3 day periodWhat we learnedWe would struggle to fit everything we learned over the past 3 days onto one page, but we feel the biggest lessons were that...:

To always focus on the minimum viable product when time is limitedIt's more fun to work on projects you're excited aboutM.I.T is a further walk than it seems on a mapWhat's next for comicalGiven more time, our team would have improved the training models for our ML products, and would have better utilized sentiment analysis. We also plan on creating better view control animations for our Vue front end, so that the screen zooms and shifts through the panels as the user speaks.


[8]ページ先頭

©2009-2025 Movatter.jp