jakowenko/phramePublic

NotificationsYou must be signed in to change notification settings
Fork16
Star132

AI-powered digital picture frame. Generate captivating and unique art from spoken conversations.

License

MIT license

132 stars 16 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.build		.build
.develop		.develop
.github		.github
.husky		.husky
.vscode		.vscode
api		api
frontend		frontend
.commitlintrc.js		.commitlintrc.js
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc.js		.prettierrc.js
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
release.config.js		release.config.js

Repository files navigation

Phrame

Phrame generates captivating and unique art by listening to conversations around it, transforming spoken words and emotions into visually stunning masterpieces. Unleash your creativity and transform the soundscape around you.

How

Phrame relies on theSpeechRecognition interface of theWeb Speech API to transform audio into text. This text is processed by OpenAI, producing a condensed summary. The summary is then combined with the configured generative AI image services and the final images are saved.

Donations

If you would like to make a donation to support development, please useGitHub Sponsors.

Features

Create unique AI-generated artwork from spoken conversations
Automatic, manual or voice-activated summary generation for on-demand art
User-friendly UI, optimized for both desktop and mobile
Real-time updates and remote control via WebSockets
Integrated config editor for customization
Support for multiple generativeAI image services
Voice commands for image generation and navigation
Manage your gallery effortlessly: browse, favorite, delete images, and navigate using keyboard shortcuts
Access and manage logs for efficient troubleshooting

Supported Architecture

amd64
arm64

Supported AIs

*Midjourney currently uses an unofficial third partypackage. Use this integration at your own risk.

Voice Commands

Activate the microphone to interact with Phrame using the following voice commands.

Command	Action
`Hey Phrame`	Wake word to generate images on demand
`Next Image`	Advance to next image
`Previous Image`	Advance to previous image
`Last Image`	Advance to previous image

UI

Phrame has a responsive UI available atlocalhost:3000.

Path	Name
`/`	Controller
`/phrame?mic`	Phrame with microphone support
`/phrame`	Phrame without microphone support
`/gallery`	Gallery
`/config`	Config
`/logs`	Logs

Privacy

Speech recognition in Phrame is managed by the browser. The handling of audio data for speech recognition depends on the specific browser used. For instance, Chrome takes the audio and sends it toGoogle's servers to perform the transcription. It is encouraged to review the privacy policy of your chosen browser to fully understand how speech data is handled.

Once transcribed, Phrame saves these transcriptions into a local database. They are then processed by OpenAI to generate a summary, and immediately after, the original transcriptions are deleted. This summary is used in conjunction with the configured generative AI image services and the final pieces of art are saved locally.

It's important to clarify that Phrame does not retain or transmit your transcripts beyond the local device, except for the brief period required for generating the summary through OpenAI. Apart from these specific instances, no personal data is used, stored, or transmitted for any other purposes.

Usage

Phrame operates as a single Docker container and is easily accessible using any modern browser, even without a microphone.

To take advantage of the speech recognition feature, acompatible browser and microphone are required. At this time Chrome and Safari are the only browsers that support speech recognition.

Artwork within Phrame is displayed according to theimage.order value. The latest summary and any favorite images are seamlessly merged, providing an evolving canvas of unique AI-generated art. As new images are created, they are instantly displayed by Phrame.

Quick Start

Start Phrame
Go tolocalhost:3000/config
1. Add yourOpenAI API key and save
2. Verify OpenAI shows as configured with a green circle
In a new window go tolocalhost:3000/phrame?mic and follow the on screen instructions
Go tolocalhost:3000 and verify the microphone and speech recognition are working

Docker Run

docker run -d --restart=unless-stopped --name=phrame -v phrame:/.storage -p 3000:3000 jakowenko/phrame

Docker Compose

version:'3.9'volumes:phrame:services:phrame:container_name:phrameimage:jakowenko/phramerestart:unless-stoppedvolumes:      -phrame:/.storageports:      -3000:3000

Launch on Boot

Modern browsers require a user click to access the microphone. To automatically start Phrame on boot, you can use the following script. This requiresydotool orxdotool (depending on your display server) to be installed which allows you to simulate keyboard input and mouse activity.

The script will wait 15 seconds for the Docker Engine and Phrame to start before launching Chrome. You can adjust the delay by changing thesleep value. After launching the browser, the script will wait 5 seconds before sending a click to get microphone access and start speech recognition.

Depending on your system, you may need to adjust the path to Chrome.

ydotool

#!/bin/bashexport YDOTOOL_SOCKET=/tmp/.ydotool_socket# wait for the desktop and docker to be fully loadedsleep 15s# launch chrome in kiosk mode for microphone access/usr/bin/google-chrome-stable --kiosk --no-first-run --hide-crash-restore-bubble --password-store=basic"http://localhost:3000/phrame?mic"&# wait for chrome and phrame to loadsleep 5s# move the mouse to the coordinates and click the left mouse buttonydotool mousemove --absolute 0 0ydotool click 0xC0

xdotool

#!/bin/bash# wait for the desktop and docker to be fully loadedsleep 15s# launch chrome in kiosk mode for microphone access/usr/bin/google-chrome-stable --kiosk --no-first-run --hide-crash-restore-bubble --password-store=basic"http://localhost:3000/phrame?mic"&# wait for chrome and phrame to loadsleep 5s# move the mouse to the coordinates and click the left mouse buttonxdotool mousemove --sync 0 0 click 1

Configuration

Configurable options are saved to/.storage/config/config.yml and are editable via the UI atlocalhost:3000/config.

Note: Default values do not need to be specified in configuration unless they need to be overwritten.

`image`

# image settings (default: shown below)image:# time in seconds between image transitionsinterval:60# order of images to display: random, recentorder:recent

`autogen`

Images can be automatically generated by creating random summaries. This can be scheduled with a cron expression. Keywords can be passed to help guide the summary.

# autogen settings (default: shown below)autogen:# schedule as a cron expression for processing transcripts (at every 15th and 45th minute)cron:'15,45 * * * *'prompt:Provide a random short description to describe a picture. It should be no more than one or two sentences. If keywords are provided select a couple at random to help guide the description.# keywords to guide the summarykeywords:[]

`transcript`

Images are generated by processing transcripts. This can be scheduled with a cron expression. All of the transcripts within X minutes will then be processed by OpenAI usingopenai.summary.prompt to summarize the transcripts.

# transcript settings (default: shown below)transcript:# schedule as a cron expression for processing transcripts (at every 30th minute)cron:'*/30 * * * *'# how many minutes of files to look back for (process the last 30 minutes of transcripts)minutes:30# minimum number of transcripts required to processminimum:5

`openai`

To configure OpenAI, obtain anAPI key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# openai settings (default: shown below)openai:# api keykey:summary:# model name (https://platform.openai.com/docs/models/overview)model:gpt-3.5-turbo# prompt used to generate a summary from transcriptsprompt:You will be given a string of random conversations and need to pull out a few keywords and topics that were talked about. You will then turn this into a short description to describe a picture. It should be no more than two or three sentences.# prompt used to generate a random summaryrandom:Provide a random short description to describe a picture. It should be no more than two or three sentences.image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# size of the generated images: 256x256, 512x512, or 1024x1024size:512x512# number of images to generate for each stylen:1# used with summary to guide the image model towards a particular stylestyle:      -cinematic

`midjourney`

Midjourney currently uses an unofficial third partypackage. Use this integration at your own risk.

To configure Midjourney, you will need the following:

Discord Server ID and Channel ID
- Obtain by going to your Discord channel in a browser which should follow this pattern -https://discord.com/channels/SERVER_ID/CHANNEL_ID
InviteMidjourney bot to your server
While not necessary, it is also recommended to use aHugging Face token for security prompts

All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# midjourney settings (default: shown below)midjourney:# discord server idserver_id:# discord channel idchannel_id:# discord token (https://linuxhint.com/get-discord-token)token:# hugging face token (https://huggingface.co/docs/hub/security-tokens)hugging_face_token:image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# options added to a prompt that change how an image generates (https://docs.midjourney.com/docs/parameter-list)parameters:--chaos 80 --no text# upscale options (false, random, 1,2,3,4)upscale:random# used with summary to guide the image model towards a particular stylestyle:      -cinematic

`stabilityai`

To configure Stability AI, obtain anAPI key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# stabilityai settings (default: shown below)stabilityai:# api keykey:image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# number of seconds before the request times out and is abortedtimeout:30# engined used for image generationengine_id:stable-diffusion-512-v2-1# width of the image in pixels, must be in increments of 64width:512# height of the image in pixels, must be in increments of 64height:512# how strictly the diffusion process adheres to the prompt text (higher values keep your image closer to your prompt)cfg_scale:7# number of images to generate for each stylesamples:1# number of diffusion steps to runsteps:50# image model style (https://platform.stability.ai/rest-api#tag/v1generation/operation/textToImage)style:      -cinematic

`deepai`

To configure DeepAI, obtain anAPI key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# deepai settings (default: shown below)deepai:# api keykey:image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# number of seconds before the request times out and is abortedtimeout:30# 1 returns one image and 2 returns four imagesgrid_size:1# width of the image in pixels, between 128 and 1536width:512# height of the image in pixels, between 128 and 1536height:512# indicate what you want to be removed from the imagenegative_prompt:# image model style (https://deepai.org/machine-learning-model/text2img)style:      -text2img

`dream`

To configure Dream, obtain anAPI key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# dream settings (default: shown below)dream:# api keykey:image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# number of seconds before the request times out and is abortedtimeout:30# width of the image in pixelswidth:512# height of the image in pixelsheight:512# image model style (https://api.luan.tools/api/styles)style:      -buliojourney v2

`leonardoai`

To configure Leonardo.Ai, obtain anAPI key and add it to your config like the following. All other default settings found bellow will also be applied. You can overwrite the settings by updating yourconfig.yml file.

# leonardoai settings (default: shown below)leonardoai:# api keykey:image:# enable or disable image generationenable:true# trim letterbox and pillarbox imagestrim:false# number of seconds before the request times out and is abortedtimeout:30# indicate what you want to be removed from the imagenegative_prompt:# model id used for the image generation, if not provided uses sd_version to determine the version of stable diffusion to usemodel_id:6bef9f1b-29cb-40c7-b9df-32b51c1f67d3# base version of stable diffusion to use if not using a custom modelsd_version:v2# number of images to generate for each stylenum_images:1# width of the image in pixels, must be between 32 and 1024 and be a multiple of 8width:512# height of the image in pixels, must be between 32 and 1024 and be a multiple of 8height:512# number of inference steps to use for the generation, must be between 30 and 60num_inference_steps:# how strongly the generation should reflect the prompt, must be between 1 and 20.guidance_scale:7# scheduler to generate images withscheduler:# style to generate images withpreset_style:LEONARDO# whether the generated images should tile on all axistiling:# whether the generated images should show in the community feedpublic:# enable to use prompt magicprompt_magic:# used with summary to guide the image model towards a particular stylestyle:      -cinematic

`time`

# time settings (default: shown below)time:# defaults to iso 8601 format with support for token-based formatting# https://github.com/moment/luxon/blob/master/docs/formatting.md#table-of-tokensformat:# time zone used in logstimezone:UTC

`logs`

# log settings (default: shown below)logs:# options: silent, error, warn, info, http, verbose, debug, sillylevel:info

`telemetry`

# telemetry settings (default: shown below)# self hosted version of plausible.io# 100% anonymous, used to help improve project# no cookies and fully compliant with GDPR, CCPA and PECRtelemetry:true

Development

Run Local Services

Service	Command	URL
UI	`npm run local:frontend`	`localhost:8080`
API	`npm run local:api`	`localhost:3000`

Build Local Docker Image

./.develop/build

About

AI-powered digital picture frame. Generate captivating and unique art from spoken conversations.

hub.docker.com/r/jakowenko/phrame

Releases3

v1.1.0 Latest

Jun 22, 2023

+ 2 releases

Sponsor this project

Learn more about GitHub Sponsors

Movatterモバイル変換

Uh oh!

License

jakowenko/phrame

Folders and files

Latest commit

History

Repository files navigation

Phrame

How

Donations

Features

Supported Architecture

Supported AIs

Voice Commands

UI

Privacy

Usage

Quick Start

Docker Run

Docker Compose

Launch on Boot

Configuration

image

autogen

transcript

openai

midjourney

stabilityai

deepai

dream

leonardoai

time

logs

telemetry

Development

Run Local Services

Build Local Docker Image

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases3

Sponsor this project

Uh oh!

Uh oh!

Languages

`image`

`autogen`

`transcript`

`openai`

`midjourney`

`stabilityai`

`deepai`

`dream`

`leonardoai`

`time`

`logs`

`telemetry`