SYSTEM AND METHOD FOR ACQUIRING OCULAR MEDIA
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims benefit of US Patent Application Serial No. 16/985,014, entitled “System And Method For A Portable Eye Examination Camera”, filed August 4, 2020, US Patent Application Serial No. 17/603,579, entitled “Device Navigation And Capture Of Media Data”, filed April 14, 2020, and incorporates the content of the referenced applications herein by reference in its entirety.
TECHNICAL FIELD
The present invention relates to a novel method of capturing ocular media of a patient of sufficient quality to permit medical diagnosis. It involves a system capable of capturing media, in combination with any variety and combination of sensors and feedback mechanisms and devices that provide feedback to a user to guide them to reliably collect high-quality media. It also involves a process whereby a user and/or patient is guided to capture the desired ocular media, and that media is made available for review by a physician or certified professional.
BACKGROUND OF THE INVENTION
[0001] Only 60% of the 30 million known diabetics in the US actually receive their mandatory annual retinal eye screening due to inconvenience, cost, and lack of specialist availability. If a diabetes-related eye disease is detected late, 40-45% of diabetics may lose their eyesight, despite these diseases being treatable. Early detection and treatment, while the patient is still asymptomatic, can prevent up to 98% of visual loss due to diabetes. Retinal eye exams can also diagnose a plethora of other eye diseases, such as glaucoma, age-related macular degeneration, retinal detachment, cataracts, as well as, whole-body diseases.
[0002] Patients with diabetes are recommended to visit an eye care specialist, such as an optometrist or ophthalmologist, every year to get their retina examined, however, very few of them actually get the screening done. The ability to screen for retinal diseases at the primary care site (emergency room, primary care, corporate wellness, the Veterans  Administration, pharmacies, and eventually at home) would allow for large cost savings by detecting eye diseases early and implementing any interventions thereby alleviating dependence on higher-cost specialists for basic vision diagnostics. Retinal examinations can be separated into a data acquisition portion, that is performed at the patient site, and a data interpretation portion, that can be performed for example, by remote specialists with or without the aid of machine learning (artificial intelligence).
[0003] Current solutions include the use of large and expensive fundus cameras which require specialized training and are currently limited to eye specialists' offices, such as optometry and ophthalmology offices.
[0004] Therefore, a need exists in the field for a device that is simple to use outside of the eye specialists’ offices. Herein, we describe a method in which media (i.e., an image) of the patient's eye, such as the retina, region outside of the eye, etc., is captured with the use of a device. The device is operated by a local user who receives guidance from a system that is able to view and interpret the data captured in real-time. A user is guided by cues or signals, such as auditory instructions, visual indicators, and/or haptic feedback. In some cases, the algorithm initiates the capture of the media, whereas in other cases, a user initiates the capture. Once the media has been captured, the algorithm may determine if the media is suitable, and the media may be stored locally or in cloud services where it can then be retrieved for interpretation at a later point in time.
[0005] Herein this invention, we disclose a method and a system that is able to guide a user in real-time on how to position the device in order to capture good ocular media. The method and system may be able to assess the suitability of the media captured, and provide additional steps or guidance to a user to capture better media.
[0006] BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 is a block diagram illustrating an overview of a guidance system;
[0008] FIG. 2 is a block diagram showing how multiple inputs can be used to train an instruction model (Logic Algorithm used in FIG. 5) ;
[0009] FIG. 3 is a block diagram showing possible data flow for determining the best human interpretable instructions to give to a user;
[0010] FIG. 4 is a high-level view of an algorithm that determines the best interpretable instructions to give a user;  [0011 ] FIG. 5 is a detailed view of an algorithm that determines the best interpretable instructions to give a user;
[0012] FIG. 6 is an exemplary method of training of object detection model;
[0013] FIGs. 7A and 7B exemplary method of training of a classification model, and prediction process of a classification model;
[0014] FIG. 8 is a chart depicting an example of how to divide up the screening process into various phases;
[0015] FIG. 9 is an example flowchart of media capture;
[0016] FIG. 10 is an example breakup of instructions to a user and/or patient that can be given by the guidance system;
[0017] FIGs. 11A-1 IE is an example showing how to find the focus region by using differential “cliffs” for each column/row;
[0018] FIG. 12 is a diagram showing “orders” of blood vessels. The “1st order” blood vessel is the arcade (labeled with a “1”). 2nd order blood vessels are the 2nd branch from a blood vessel (labeled with a “2”. Blood vessels are considered 3rd order clarity if you can see 3 branches from a single blood vessel (labeled with a “3”); and
[0019] FIG. 13 is an example flowchart showing how instructions can be given to the user to guide the user to achieve predefined criteria.
DETAILED DESCRIPTION OF THE INVENTION
[0020] The term “Ocular” as used in this specification can refer to the retina, the fundus, optic disc, macula, iris, pupil, or other eye-related anatomical components. As used in this specification, the term “media” refers to an image frame, photo, video, audio, signals, sensor data, or other digital information. Patient is a person or animal whose ocular features are being examined. User is the person who is handling and guiding the device relative to the patient. The user may be the patient, or the user may be another person handling the device. Specialist refers to someone or something that may be able to read and diagnose eye and retinal images and/or media, such as an optometrist, ophthalmologist, retinal specialist, or algorithm.
[0021] The term “hardware and software” as used in this specification encompasses firmware. [0022] The term “human interpretable instructions” can refer to synthesized speech instructions, pre-recorded speech instructions, auditory beeps, flashing lights, vibrations or the like.
[0023] The term “3rd order clarity” applies to a blood vessel if you can see 3 branches from a single blood vessel.
[0024] The device refers to a media-capturing (i.e., image capturing) device, such as an ophthalmoscope, retinal camera, or fundus camera. The device may include a communication-enabled device and an optical hardware device. The communication- enabled device is a device, such as a smartphone that has access to modules, such as a camera, a speaker, a microphone, a vibration or haptic engine, and is capable of transmitting information through various wireless methods such as WiFi, Bluetooth, cellular/Long Term Evolution (LTE), or wired methods such as a Universal Serial Bus (USB), Universal Asynchronous Receiver/Transmitter (UART), Universal Synchronous/Asynchronous Receiver/Transmitter (US ART)... etc. The optical hardware device is a device that consists of one or more optical lenses configured in such a manner to enable the camera to focus and capture images and videos. A camera may be located on the communication-enabled device, or as a separate module that communicates with the device. In some configurations, the communication-enabled device and optical hardware device may be the same device.
[0025] The term storage media refers to any form of temporary or permanent electronic or non-electronic storage of data e.g. RAM, ROM, hard disc, firmware, EPROM, SSD, etc.
[0026] A user refers to the person holding and operating the device. The patient is the person or animal of whom we wish to capture ocular media. A user and the patient may be two separate living beings, or may be the same person.
[0027] THE EXAMINATION
[0028] To provide feedback to a user regarding how to move, position, or operate the device such that ocular media is collected, a plurality of sensors are attached to the device, and computational capabilities receive input from the sensors and compute instructions to a user on how to move or position the device or how the patient should  move or position the patient’s body or body parts. Below we describe various embodiments regarding the device, how input from the sensor(s) will be evaluated to provide feedback, and the means of providing feedback to a user and/or the patient.
[0029] The device can be any device with sufficient optical capabilities for imaging the eye and/or retina, such as devices in conformance with ISO 10940 or its equivalents, which contain one or more sensors and computational capabilities to analyze input from the sensors. The device may be a hand-held, table-top, or kiosk ophthalmic camera. In another embodiment, the device could be attached to a harness that the patient wears on their chest, hips, back, arms, head, neck, or shoulders, etc. In another embodiment, the device could also be embedded into a VR headset. In another embodiment, the device could be attached to an external object such as a drone, stand, kiosk, etc.
[0030] When a patient wishes to get their retina examined, they can find a user in possession of the device available to perform the procedure. The patient’s eye(s) may or may not be dilated. In one configuration where the patient’s eyes are not dilated, the device may be used in a dark room to allow for natural dilation, or the device may have an enclosure around the patient’s eyes to create darkness, thereby allowing for natural dilation of the patient’s eyes. This enclosure may need to touch the patient around their eyes sufficiently to significantly reduce the presence of external visible light entering the enclosure. This enclosure could be around a single eye individually, or it could be around both eyes at once. If individually, the exam would occur one eye at a time. If the enclosure covers both eyes, both eyes could be examined at once or one eye at a time.
[0031] If the patient’s eyes are enclosed in darkness, any of the plurality of sensors inside the enclosure could be used to detect target areas of the patient such that appropriate instructions could be given to a user regarding how to position the device. Some sensors may use light in the far red wavelength (such as radar. .. etc.), infrared, ultraviolet, or any wavelength of light not visible to the unaided human eye could be used in combination with a sensor capable of detecting the same light (far red, infrared, ultraviolet, x- ray .. . etc.). In another embodiment, ultrasonic or subsonic sound sensors may be used to detect the location of target areas of the patient. In another embodiment, any combination of far red, infrared, ultraviolet, or sonic sensors may be used to detect the location of target areas of the patient. [0032] Many target areas of the patient may be detected. These include the regions around the eye, and inside the eye. Examples of target areas around the eye to detect would be eye socket(s), eye(s), eyelids, the nose, eyebrows, eyelashes, cornea, the iris, etc. Examples of target areas inside an eye would be the iris, the retina, the lens, the fovea, the optic nerve (disc), the macula, the blood vessels including but not limited to inferior arcade and superior arcade, etc.
[0033] If the patient’s eyes are dilated, then any number of visible light sources (such as incandescent, halogen, fluorescent, LED, laser, etc.) in combination with a visible-light sensor can be used to view target areas of the patient. In another embodiment, ultrasonic or subsonic sound sensors may be used to detect target areas of the patient. In another embodiment, any combination of far red, infrared, visible light, ultraviolet, or sound sensors may be used to view the patient’s body parts. These light sources can illuminate the region around the eye, or they can illuminate the eye directly, including being shined directly into the eye. Any visible light shown directly into the eye may conform with a light hazard protection standard such as ANSI Z80.36 or its equivalents.
[0034] In one embodiment, the device utilizes visible light sources, i.e., light which is visible to the human eye. In another embodiment, the device utilizes light sources in wavelengths not visible to the human eye. In a third embodiment, it may utilize any combination of light sources, whether visible to the human eye, not visible to the human eye, or both.
[0035] Whether dilated or not, the target areas of the patient may be detectable to various sensors through various means. In one embodiment, a distance sensor may be used to detect the distance between the device and a target area of the patient. The distance sensor may use any form of light (far red, infrared, visible, ultraviolet, etc.) or sound (subsonic, sonic, ultrasonic, etc.) to achieve its range-finding capabilities. In one embodiment, the distance sensor would detect only the distance to the area directly in front of it. In other embodiments, the distance sensor would be an image sensor array such as a Complementary Metal-Oxide Semiconductor light sensitive circuit (CMOS), Charge- Coupled Device light sensitive circuit (CCD), etc.) or a sound sensor array. In another embodiment, a touch sensor such as a physical electric switch, or a capacitive touch sensor, could be used to detect the immediate presence of a patient’s skin, cornea,  eyelash, or other area. In another embodiment, a moisture sensor could detect the presence of skin via detection of sweat, or the cornea via detection of fluid on the cornea. In another embodiment, a force sensor such as an accelerometer could detect movement of the device in any direction, or a tension/compression sensor could detect contact with a patient, user, or other object. In another embodiment, a directional position of the device could be detected using a magnetometer, to detect either the earth’s magnetic fields, or an induced magnetic field. In another embodiment, temperature sensors could be used to detect the proximity of a heat source such as a patient, a user, or an induced heat source positioned so as to provide further guidance. In another embodiment, a positional sensor such as a Global Positioning System (GPS) unit could be used to determine the position of the device relative to the patient. In another embodiment, a camera light sensor array (CMOS, CCD, etc.) would be used to view the patient with the output being analyzed to recognize target areas of the patient. This visual sensor array could detect far-red light, infrared, visible light, ultraviolet, etc. or any combination of light spectrums, such as with a multispectral camera. In another embodiment, any combination of camera sensor(s), light sensor(s), distance sensor(s), touch sensor(s), force sensor(s), magnetic sensor(s), temperature sensor(s), positional sensors, and/or moisture sensor(s) could be used to detect the relative position of the device with respect to the patient. These sensors could touch the patient or could not touch the patient.
[0036] The sensors could be placed outside or inside the device, and they could be placed in any arrangement at any angle so as to optimize the ability to detect the patient. The sensor(s) could be in communication with a plurality of microprocessors in communication with each other or attached to a single microprocessor. The microprocessor would then analyze the input from these sensors to determine an instruction regarding how to position, move, or operate the device. Instructions regarding how to move, operate, or position the device and/or patient would be delivered to a user and/or patient or to a system through one or more guidance systems.
[0037] Additionally, instructions from the guidance system could be communicated to a system of actuators (electrical, air, chemical, or hydraulic) in communication with the camera(s) so as to move the camera(s) into position that will allow for appropriate media to be captured of the patient’s eye. In one embodiment, the guidance system analyzes the  camera feed from one camera, and moves a camera along at least 2 axes until the camera is positioned so as to clearly see the retina of the patient. In another embodiment, the guidance system analyzes the camera feed from two or more cameras, and moves the primary camera along at least 2 axes until the camera is positioned so as to collect media of the desired region of the eye, while the other camera(s) are positioned around the primary camera to observe the area around the primary camera and between the primary camera and the patient and thereby provide additional information needed. Camera movement can be achieved using electrical, pneumatic and/or hydraulic actuators. In another embodiment, in addition to the primary camera, distance, positional, and force sensors and/or additional camera(s) are used by the guidance system to determine how to actuate the primary camera. In another embodiment, in addition to the primary camera, actuators move the light source that illuminates the area to be collected. In another embodiment, actuators move the primary camera as well as any number of sensors or illumination sources in communication with the guidance system. In another embodiment, in addition to the above, actuators move a substance delivery tool so as to administer a substance (such as a mydriatic) to aid in the collection of eye media. In another embodiment, media is collected using multiple cameras at various angles instead of just from the primary camera.
[0038] The user may turn on (activate) the device. The device may be wirelessly connected (automatically or with the aid of the user) to a remote or local support system via the internet. In one embodiment, the user is connected with a guidance system on the device. In another embodiment, the guidance system may be located on a remote server. The user may additionally be connected to a remote secondary advisory user (described in WO2020/214612 entitled “DEVICE NAVIGATION AND CAPTURE OF MEDIA DATA” which was filed on April 14, 2020.
[0039] Upon activation by the user, the device commences a communication pathway between the user and the guidance system. In one embodiment, the communication pathway is one-way from the guidance system to the user. In another embodiment, the communication is two-way between the user and the guidance system. In another embodiment, the communication is a multi-way communication pathway between the user, the guidance system, and potentially a secondary advisory user. In another  embodiment, the guidance system is the secondary user. In an embodiment, the guidance system is a software application. In another embodiment, the guidance system is a circuit board. In another embodiment, any permutation of communication directionality and inclusion or exclusion of a secondary user is implemented. In another embodiment, the guidance system is both a software application and a circuit board.
[0040] The device can communicate with the user, secondary advisory user, or the guidance system via wireless (e.g. Wi-Fi, Bluetooth, Zigbee, cellular network, etc.) or wired communication methods, using any communication protocol (Transport Layer Security (TLS), Transmission Control Protocol (TCP), USART/UART, USB, Serial Peripheral Interface (SPI), Inter-Integrated Circuit (I2C), Controller Area Network (CAN), custom, etc.).
[0041] As described in FIG. 1 and FIG. 4, the guidance system can receive as input video captured by the device and may receive additional information, including but not limited to audio, information, sensor data, such as gyroscopic, distance, touch, force, magnetic, temperature, light, positional, moisture, voltage, additional cameras, eye side, etc. Sensor data can be located within the device, such as battery charge status, device operation mode, position and proximity sensors, light sensors and power meters, humidity readings, etc., or external to the device as communicated either through wired or wireless means.
[0042] The guidance system returns (or outputs) information back to the user. The information can include auditory information, visual information, and/or haptic information and instructions. When the guidance system is a software algorithm, auditory information can include pre-recorded messages verbally telling the user instructions. For example, “move up” “move closer,” etc. Auditory information can also be pings, such as sonar noises that vary in volume, pitch, and/or frequency. When the guidance system is a secondary user, the audio commands are whatever the secondary user tells the user or the patient to do so that ocular media can be collected. These auditory commands can and often include commands like “move up,” “move closer,” etc.
[0043] FIG. 10 describes examples of the kinds of instructions delivered by the guidance system. The guidance system may give any combination of instructions to a user and/or to the patient. As described in FIG. 10, these instructions include moving the device in the six cardinal directions (up, down, left, right, closer and further); rotating the device  (pitch, roll and/or yaw) or giving the patient instructions on how to move their eye, such as looking higher, lower, looking left/right/straight/in the middle, looking at a target in front of the patient such as a light or a display with a movable point to fixate onto.
[0044] Additionally, if there are issues with the imaging such as patient requiring additional intervention, such as restart, point the device at the patient’s eye, hold patient’s eyelid up, hold patient’s eyelid down, dilate the eyes of the patient more, administer additional drugs to the patient, point light to the top of the eye, point the device or a laser pointer attached to the device at a body part or other fiducial, adjust the hardware in particular way (adjust light intensity, a sensor, a fiducial attached to the patient, a fiducial behind the patient, adjust a mount, adjust a harness, etc.), adjust the patient in particular way (sit up, move head left right, relax shoulders, etc.), etc. These instructions are not exhaustive, and additional prompts may be added as needed. These instructions can be in real-time, while the user is moving the device, or they can be given before or after the user starts to use the device. The instructions can be given audibly, visually, haptically, or any combination or permutation thereof, described below. For example, the audio instructions may be given alone or in conjunction with visual instructions, with haptic instructions, or other types of feedback.
[0045] The audio instructions could be in English, or in any other language. The audio instructions could be noises or sounds, such as a beep or ping that vary in length, frequency, pitch, and volume. The audio instructions could be given in constant audio, or may have gaps and pauses and occasional audio when needed. The audio instructions may be mixed with other audio, such as music. The audio instructions could be given to the user from the device speakers, headphones connected to the device through physical wire or through wireless connections such as Bluetooth, Wi-Fi, etc., or other audio devices connected wirelessly or wired connected audio devices, such as Bluetooth headphones, Virtual Reality (VR) headphones, external speakers, etc.
[0046] Haptic information can include vibrations that the user can feel while holding the device without needing to look at or further interact with the device. These vibrations can vary in intensity, frequency, and length. For example, the haptic feedback could be 3 short and light vibrations, 3 short and hard vibrations, 1 long soft vibration, 1 long hard vibration, 1 soft short and 1 hard long vibration in rapid succession, etc. These vibrations  can be used to communicate any number of directional instructions such as 2 short soft means to move left, while 1 long soft means to move right, 3 hard long vibrations means to pull back and restart, etc. The vibrations could also be used to communicate that certain mile markers have been reached. For example, 1 soft short vibration to indicate that the retina has been detected, 2 soft short vibrations to indicate that the optic nerve has been detected, 3 long vibrations to indicate that appropriate media has been collected, and the user is done with the current eye or with all imaging.
[0047] Visual information can include displaying images or video on a screen attached to the device and/or separate from the device, such as arrows indicating the direction to move the device, animations or live video demonstrations showing how to position the device relative to the patient, animations or live video demonstrations showing how to position, prepare, and/or administer drugs to the patient, animations or live video demonstrations showing the patient how to move their eye or other body parts. Visual information could also display graphs to guide the user on the distance or location. Visual information for positioning the device could also include video-game style instruction in response to the user’s motions, such as having the user try to accurately position a dot within a circle by moving the device up/down/left/right/closer/further relative to the patient; or “fly” a simulated airplane through various targets by moving the device up/down/left/right/closer/further. Visual information could include flashes of light that vary in intensity, frequency, or color, and/or other visual cues to give guidance and or feedback to the user and patient.
[0048] In one embodiment, the guidance system could be a software application installed onto a mobile device with at least a touchscreen, camera, speakers, haptic system, microprocessor, transient storage, and persistent storage. The camera feed is analyzed by the guidance system to recognize the patient’s body parts relative to the device, and then audio, visual, and/or haptic feedback is provided to the user regarding how to move the device. When appropriate, audio, visual, and/or haptic feedback is also given to the patient regarding how to move their eyes, eyelids and where to look. When the guidance system recognizes appropriate ocular media has been collected by a camera or other sensor, it records the media to storage for later retrieval or delivery. [0049] Additional embodiments can include variations wherein features of the above embodiment are not used, such as the haptic system being excluded or the touchscreen is not a touchscreen but a (display only) screen. Or no visual feedback is provided, but rather only audio and haptic. Or any other embodiment where only one modality of feedback is provided or any combination of modalities is used to provide feedback to a user. Another embodiment could include where the guidance system is a hardware circuit board in communication with or installed into the mobile device.
[0050] Additional embodiments can include variations wherein any number, placement, or combination of additional sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.) provide input to the guidance system. For example, accelerometer(s) can be added to the device in any number of combinations and orientations, and the input from the accelerometer can smoothed using various high or low pass filters or smoothing algorithms to compute the positional change of the device and/or to compute how the device is oriented (upright, horizontal, vertical, facing down, etc.). The information may be used by the guidance system to give instructions to a user on how to move the device. In another embodiment, magnetometer (gyroscope) sensor(s) could be used to determine the tilt and angle of the device with respect to the Earth. Or any number of external magnetic field(s) could be placed at a known location(s) and orthogonal directionality(ies) relative to the patient. The guidance system can then use the input from the magnetometer(s) to determine the location and tilt of the device relative to the earth or external magnetic field(s). In another embodiment, distance sensor(s) could be placed on the device with known orientation(s) relative to the camera. This(ese) sensor(s) could measure the distance from the device to the patient or the distance from the device to a known object (such as the operator, wall, or other mounted structure close to the patient).
[0051] In another embodiment, the guidance system could be installed as software or a hardware circuit board into a non-mobile ophthalmic camera or kiosk, wherein the device used to collect the media of the eye and surrounding area is movable in its position relative to a patient. The camera feed is analyzed by the guidance system to recognize target areas of the patient relative to the device, and then audio, visual, and/or haptic feedback is provided to a user regarding how to move the camera relative to the patient.  When appropriate, audio, visual, and/or haptic feedback may also be given to the patient regarding how to move their eyes, eyelids and where to look, etc. When the guidance system recognizes acceptable media, it may automatically record the media for later retrieval or delivery, or it may instruct an external user to record the media.
[0052] In another embodiment, the guidance system gives output to both a display facing a user and another display facing the patient. The display to a user informs a user how to position the device, while the display to the patient informs the patient as to where to look or how to move their body. The patient looks at a fiducial on the display and that fiducial could move up/down/left/right, thereby moving the patient’s eye. In one embodiment, both the patient display and a user display are mounted to the same device. In another embodiment, the patient display is separate from the device while a user display is mounted to the device. In another embodiment, a user display is separate from the device, while the patient display is mounted to the device. In another embodiment, both the patient and user displays are mounted separate from the device. Communication between the device and the displays can be either wired or wireless. If separate from the device, the displays may include a dedicated processor to aid in proper display of instructions from the guidance system, or may use the same processor.
[0053] In another embodiment, a fiducial is placed on the patient such as a sticker or a laser pointer light shown on the patient. The guidance system then tracks the device’s positive relative to this fiducial and gives appropriate instructions to guide a user until acceptable media is collected.
[0054] For example, the guidance system may guide a user through the following series of steps to capture media. a. The patient is asked to look in a fixed direction. For example, look straight ahead. b. A user approaches the device to the correct working distance. The correct working distance varies by device and patient. c. The goal is to position the device at the right location, with the light focus spot located on the Patient’s lens or cornea, with the light illuminating inside the pupil. d. The guidance system may guide a user into position, by giving a user information on how to optimally position the device to capture the desired media.  e. The guidance system may trigger the device to capture the desired media (images and/or video and/or data from various other inputs). f. A user may get some feedback from the guidance system to indicate a successful examination has occurred. g. The media is stored in a location that can be accessed at a later point.
[0055] For the device, a visible light camera (CMOS, CCD, etc.) is used to collect digital camera media of the eye and surrounding area. This media could be in any number of single image formats such as tiff, raw, jpg, png, heic, dicom, etc. Or this media could also be a multi-image media video such as mp4, mp3, mov, xvid, heic, dicom, etc. This media could be a combination of both single-image and multi-image formats.
[0056] One or more media samples could be taken of a single patient’s eye, both eyes separately, or both eyes at the same time. These media samples can be fdtered or unfiltered. In one embodiment, the filtering could exclude all media that do not include the retina. In another embodiment, the exclusion of all camera media that are out of focus. In another embodiment, the exclusion of all media where the optic disc or fovea are not in a predefined location. In another embodiment, the exclusion of all media where the optic disc or fovea are not a predefined location and are not in focus. Any other combination of filters could be used based on clarity, patient areas present, target area position in the media, camera position, patient position, media quality characteristics, events occurring at the time of capture (move up, down, hold still, look up. . .etc.), etc.
[0057] The device saves the captured media to a storage location. This could be persistent storage (e.g. hard disc, flash, etc.), transient storage on the device (e.g. Random Access Memory (RAM), etc.). It could be located in removable storage (flash memory, magnetic storage, optical storage, etc.), or it could be located in remote storage (cloud storage provider, local network storage, etc.), each in communication with the device.
[0058] The computation described herein of the guidance system may occur locally on the device, on another nearby device, or remotely, such as in the cloud.
[0059] GUIDANCE SYSTEM LOGIC
[0060] An overview of the guidance system is described in FIG. 1 and reference number 9 (which includes items 3, 4, 5 and 6), which illustrates that the purpose of the guidance  system is to receive all inputs (reference numbers 1 and 2 in FIG. 1) and process them to provide an output, to a user and/or patient (reference numbers 5 and 6 in FIG. 1), and ultimately the delivery of ocular media (reference numbers 7 and 8 in FIG. 1). One or more algorithms can be used to accomplish the processing of the inputs. These algorithms may be used together to completely automate the process or separately as an aid to a user. The inputs may go through zero, one, or a multitude of pre-processing phase(s) and then zero, one, or a multitude of processing phase(s), ultimately arriving at zero, one, or multiple output(s), which may be subsequently delivered to a user and/or patient. FIG. 13 shows an embodiment of FIG. 1 wherein the guidance system receives inputs (reference numbers 1 and 2 in FIG. 13) and processes them to give regular instructions to a user and/or patient (reference numbers 4, 5, 6, and 7 in FIG. 13), to guide the user to achieve predefined criteria (reference numbers 8, and 9 in FIG. 13) and communicate such status to the user (reference number 10 in FIG. 13).
[0061] In the pre-processing stage (FIG. 1 or FIG. 13, reference number 3), the input media (video stream) may be separated into individual images (such as a video into individual video frames). The camera media may undergo any number of appropriate transformations, filters, etc. to prepare the camera media for processing. For example, the camera media could be cropped, scaled, down sampled, binned, reflected, sheared, rotated, scaled, reshaped to a desired aspect ratio, transformed to a different color space (such as greyscale), padded, etc. The camera media may go through morphological, wavelet, gaussian, linear and non-linear transforms, for example to correct, remove, or introduce distortions, such as chromatic aberrations, spherical aberrations, pincushion/barrel/fisheye distortions, and other aberrations. For filtering, the camera media could undergo zero, one, or a multitude of filters and transforms, such as low pass filter, high pass filter, Fourier transforms, Gaussian, Laplacian, Hough transforms, denoising texturing, edge detection, etc. These transformations and filters may be applied to the entire camera media, or to any part of the camera media in isolation.
[0062] A part of camera media pre-processing (FIG. 1 or FIG. 13, reference number 3) may include removing parts of the camera media that are not necessary for analysis to reduce file size and speed up subsequent steps. For example, many IS010940 cameras utilize spherical lenses to reduce curvature transformation when imaging the retina  (which is also spherical), while many camera sensors utilize rectangular photosensitive arrays. This presents a central region in the middle of the camera media which is the focus of analysis (focus region) and is typically circular - though not required to be circular. This focus region can be identified using any number of means. In one embodiment, the camera media are cropped to a square, where the width of the camera media is the length of the shortest distance of the sensor, and the height is the same as the width, where the cropped frame is centered on the center of the camera media. In another embodiment, the location of the focus region in the camera media is previously known by the device model. In another embodiment, an edge-detection algorithm such as Canny, Deriche, Differential, Hough, Sobel, Prewitt, Roberts, etc. is used to find the outside edges of the focus region. In another embodiment, differentials are utilized to find two “cliffs” in the color intensity indicating the beginning and end of the focus region for each row or column (FIGs. 11A-1 IE).
[0063] FIG. 11 A. is an example vertical cross-section of the image that is taken to determine differential “cliffs”.
[0064] FIG. 1 IB. is the corresponding pixel intensity for each pixel in the cross-section selected in FIG. 11 A.
[0065] FIG. 11C. is an example horizontal cross-section of the image that is taken to determine differential “cliffs”.
[0066] FIG. 1 ID. is the corresponding pixel intensity for each pixel in the cross-section selected in FIG. 11C.
[0067] FIG 1 IE. the white circle is the focus region result if you repeat the process described in FIGs. 11A-1 IB and FIGs. 11C-1 ID over each column and each row.
[0068] In another embodiment, template matching is employed to identify the focus region. In another embodiment, a deep learning model is trained using labeled data to identify the focus region. In another embodiment, the images are pre-processed with any number of color, denoising, noising, smoothing, or other computer-vision transformations before employing any of the previously described focus region detecting methods. Any number of the above-described embodiments can be combined for the purpose of identifying the focus region. The focus region can be used as the input for processing or  the input for more pre-processing. The above pre-processing items can be applied to a single camera media or to groups of camera media.
[0069] In additional camera media pre-processing (FIG. 1 or FIG. 13, reference number 3), the camera media may be reduced to minimize the size of the input camera media to the model. Smaller camera media typically enable the model to analyze the camera media faster. Additionally, camera media may also be increased in size using any number of interpolation techniques such as nearest neighbor, linear, polynomial interpolation, etc., in order to create additional information for the processing step. A certain pattern of camera media may also be removed from the analysis, for example, every other image (or video frame). This reduces the number of images that need to be analyzed, for example, from 30 to 15 images per second.
[0070] Different enhancements may be used in camera media pre-processing (FIG. 1 or FIG. 13, reference number 3) to help the algorithm in its function. These enhancements include converting the camera media to various colorspaces (e.g. greyscale, red-free, blue-free, green-free, CIELAB, CYMK, YUV, HSV, Bayer variants of the aforementioned, etc.), and/or applying various transformations such as Bayer, Gaussian, Top Hat, Fourier transforms, etc. In one embodiment, the camera media is transformed to grayscale by reducing the number of dimensions by combining the various channels in the camera media to a single value. In another embodiment, one or more channels from CIELAB space may be used to isolate the feature of the camera media (such as the focus region, retina).
[0071] Additional pre-processing (FIG. 1 or FIG. 13, reference number 3) may be performed using a deep-learning model such as a convolutional neural net, recursive cortical network, generative adversarial network, etc. These models could be off-the-shelf pre-trained models that enhance the camera media (still image or video) in a way that will improve object detection such as improving lighting, media stitching, providing sharper contrast, creating super-resolution, creating more even lighting, etc., or they could be models trained using previously pre-processed media.
[0072] Additional inputs can be received from a variety of sensors (distance, touch force, magnetic, temperature, light, positional, etc.) and these sensor inputs can also undergo signal pre-processing (FIG. 1 or FIG. 13, reference number 3) including smoothing and  filtering. Filtering can be performed using linear, non-linear, time-variant, time-invariant, causal, and/or discrete time filters. Smoothing can be accomplished using any number of smoothing techniques such as low and high pass filters being used in either or both the forward and backward directions. Both or either smoothing techniques can be used. In one embodiment, the input data is received in real-time, and in order to perform backward smoothing/filter operations, the guidance system takes regions of samples either by count (5, 7, 10, 20, etc. samples) or time (0.01, 0.5, 1, etc. seconds) and performs the backward filter on that region. In another embodiment, the output from the various sensors could be inputted into one or more deep-learned models, which are trained to perform smoothing or any number of pre-processing operations.
[0073] As described in FIG. 1 or FIG. 13, after preprocessing, the preprocessed data is processed, resulting in an output of an instruction or series of instructions to a user and/or patient. The output can be communicated as a floating point number, an integer, a string of characters, a high voltage on a hardware pin or any number of delivery methods, with each output corresponding to a known instruction to give to a user and/or patient.
[0074] Processing (FIG. 1 or FIG. 13 reference number 4): The processing step can be any series of steps necessary to provide the instructions to a user and/or patient. In one embodiment, a single model receives all input and provides the output. In an exemplary embodiment, one or more models receive input, and their output is collated into another model that gives the final output.
[0075] The models can be deep-learning models such as convolutional neural networks, recursive cortical network, long short term memory networks, recurrent neural networks, generative adversarial networks, radial basis function networks, large language models, large image models, random decision forests models, decision tree models, tabular data machine learning/deep learning models, multilayer perceptrons, self-organizing maps, deep belief networks, restricted Boltzmann machines, autoencoders, etc. The models could also be regression, clustering, biclustering, covariance estimation, composite estimators, cross decomposition, decomposition, gaussian mixture, feature selection Gaussian process, linear, quadratic, discriminant, matrix decomposition, kernel approximation, isotonic regression, manifold learning, or classification models, objectdetection models, support vector machines, Naive Bayes algorithms, k-nearest neighbors  algorithm, dimensionality-reducing models, random forest models, decision tree models, tabular data machine learning models, etc. Any of these models can be used singularly, or in any ensemble method. When training any model, the term label or labelled is defined by a tag used to characterize the media. These labels can be bounding boxes, semantic segmentations, image classifications, keypoints, visual, audio or haptic annotations, etc. For example, an image may be labelled as "up", while another can be labelled as "not up".
[0076] In one embodiment, a classification deep-leaming model could determine the acceptability of retinal media. The goal of this model is to remove media within the stream that are not of the retina in general or do not show enough characteristics of the retina for a specialist, such as an ophthalmologist or an acceptability algorithm to properly work. FIG. 7a illustrates how the model may be trained to predict a label regarding the current state of the device. To train a classification model, images are labeled into at least two categories such as having the label or not having the label. For example, an image with the retina in it could be labeled as “retina” while an image with no retina in it can be labeled as “not retina.” Labels can be any label and number of labels that are necessary to accomplish the purposes of the device. For example, the label can be a structure in the image (retina, fovea, eyebrow, eyelash, etc.), or an instruction to the user that will allow the device to achieve a predetermined criteria (up, down, left, further, etc.), a characteristic of the image (clear, blurry, 3rd-order-clarity, etc.), or any other label that is necessary. Once the model is trained, FIG. 7b illustrates how that image can be given as an input to the model and the model outputs the predicted label. In one embodiment, a classification model or models may be used to label camera media that fall into different stages (FIG. 7b). The stages could represent various levels of media acceptability. For example, if 5 stages are used, Stage 0 video frames could be frames that do not contain a blood vessel or retina. For example, these frames are mostly black, or show the area outside the patient’s eye including furniture or other aspects of the patient’s environment, etc. Stage 1 frames are frames that do contain a blood vessel. Stage 2 frames have at least 50% of the media that is the retina. Stage 3 frames are frames that contain at least 2 features. Features may include: superior arcade, inferior arcade, optical nerve or the fovea. Stage 4 frames are frames that contain 2nd order blood vessels. 1st order blood vessels are the biggest and thickest blood vessels, like an arcade. 2nd order  blood vessels are the blood vessels that branch from those blood vessels. If they branch one more time, then they are considered as 3rd order blood vessels. 3rd order clarity is the presence of 3rd order blood vessels(see FIG. 12). Additional stages may be added or removed depending on the application. Certain stages may include media that are above or below a certain threshold. The threshold can be based on color content, brightness, contrast, sharpness, signal characteristics, algorithmic transforms, or other heuristics.
The output from the model may be used in different ways. For example, as a visual aid for a user. In one instance, the device may display the color red if the frame is Stage 0, yellow if it falls between Stage 1-3 and green if it is a Stage 4. Another instance, it can be used to give an indicator to a user on how the media collected appears. Any other type of indicators can be given to a user, such as visual, auditory, or haptics, as previously described herein. Another way that the model may be used is to automatically process the video by removing certain frames. In one instance, all stage 0 frames are removed. With this, we have the possibility of also removing other stages like 0-1, 0-2, 0-3, 1-2, 2-3, etc. [0077] In another embodiment, a gradability or quality model could be used to determine the acceptability of ocular media. The algorithm may be looking at specific pathology and features in the media. This algorithm may be used as an aid to a person, such as an eye care specialist, an optometrist or ophthalmologist, or can be used to automatically grade images of the eye, such as the retina. This model could be trained by using images, sensor, user, or patient inputted data. The images, sensor, user, or patient inputted data may be labeled by certified specialists, such as ophthalmologists or optometrists, or labeled by other algorithms (e.g. synthetic data) or trained individuals. This model can be trained to include detection of multiple pathologies and retrained to add more pathologies as needed. The output of this model can be a label of individual video frames where a pathology was detected, a specific region of interest on a frame or signal, or an alert and escalation to the exam where a notice that the patient should be referred to a specialist for follow up. In some cases, the output may include the urgency of the referral, for example, “immediately”, “within the next few days”, “within the next 3 months”, etc. In some cases, a list of one or more specialists may be provided. The list may additionally contain the specialist’s contact information, location, distance from the patient's home, cost range, and next appointment availability. The output may also tell the patient that their scan  looks normal, but they should continue to do tests at regular intervals. The algorithm may be trained by labeling multiple pathologies or one at a time. There may also be multiple algorithms.
[0078] In another embodiment, an object detection model could be trained and used to detect the location of zero, one, or any number of objects in an image frame (FIG. 6). Objects (anatomical features or landmarks) that could be detected might include the iris, the retina, the optic nerve, the arcades, the fovea, microaneurysm, edema, scarring, pathology, eyelashes, eyelids, eyebrows, noses, a face, objects in the patient’s room, regions of the retina with particular attributes (3rd order clarity, disease), etc. One model could detect all objects. In another embodiment, a different model could specialize in each object. In another embodiment, multiple models could be used, with each model specializing in a group of one or more objects. For example, one object detection model detects the iris, retina, optic nerve, fovea, and arcades, while another object detection model detects regions of 3rd order clarity (FIG. 12) and yet another detects pathology. While in another embodiment, one model detects the same features as all three or any combination of two of the three models.
[0079] In another embodiment, a classification model could detect if a particular object is present, if region(s) of the ocular area with particular attributes is(are) present, or if a particular phase of the screening has been reached. For example, a classification model could be trained to detect whether or not an iris, retina, and/or optic nerve are present or detected in the image or sensor signals (FIG. 7a.). Alternatively, a classification model could be trained to detect if regions of 3rd order clarity, or if regions of disease are present. For example, the screening process could be split into different phases such as described in FIG. 8, where each phase is defined by what eye structures are present. For example, Phase 0 could be the target phase when acceptable media is detected (the optic disc is in the correct location, and blood vessels with 3rd order clarity are present for two optic nerve diameters around the fovea). Phase 1 is when the optic disc is present, but not in the correct location. Phase 2 is when the retina is visible, and the optic disc or iris may still be partially visible. Phase 3 is when the full iris is visible, and the retina may or may not be visible. Phase 4 is when no ocular features are visible. In another embodiment, the phases include the presence of the fovea and/or disease, and more phases may be used or  introduced with models trained to recognize each phase. In another embodiment, the phases are differentiated by other approaches, such as a model that can detect whether or not the media shows images inside the eye vs outside the eye.
[0080] In another embodiment, a classification model would receive inputs from one or more object detection models, classification models, pre-processed sensor outputs, and/or unprocessed sensor outputs and output an instruction to a user and/or patient (see FIG.s 2, 3, 4 and 5). FIG. 2 shows the training process for a model that outputs an instruction to the user/patient (e.g. up, down, left, right, look up, etc.). In FIG. 2 object detection model(s) analyzes each training image that meets an intended instruction and statistics may or may not be computed about the image and these values are inputted into the model with the target given a value of 1. Additionally, training images that do not meet the intended instruction are also analyzed and inputted into the model with the target instruction given a value of 0. In this way, if a model is trained with images of video frames where the user needs to move “up” to achieve a predefined criteria and other images that would need an instruction other than “up” to achieve a predefined criteria, then, after training, the model could be provided with a video frame image and, optionally, some statistics about the image and it would produce a value between 0 and 1 with a value closer to 1 indicating higher confidence that the user should move the device up and a value closer to 0 indicating lower confidence that a user should move the device up. Using this approach, FIG. 3 shows how dedicated models trained for each instruction needed to allow the device to achieve its predefined criteria, could each be given the same video image frame and any desired metadata (statistics) and each would output their confidence that the user should move in the direction they were trained to evaluate. Logic could then be used to evaluate the outputs of each model to determine the instruction that should be given to the user to best allow the predefined criteria to be met. In explaining this embodiment, there need not be any conveyance that the models can only be trained to one instruction each. Indeed, each model could be trained to output multiple labels such that values between 0.9 and 1 indicate up, values between 0.8 and less than 0.9 are left, values between 0.7 and less than 0.8 are down, etc. In another embodiment, one or more models could be trained to give all instructions needed for images of the left eye, while one or more models could be trained to give instructions needed for all images of the  right eye. Tn another embodiment, one or more models could be trained to give instructions needed for the user, while one or more models could be trained to give instructions to the patient. The number of models or labels to which the models have been combination and purpose of each model need not be limited so long as they allow for the computation of the best instruction that will allow the predefined criteria (FIG. 13, reference number 8) to be met. In another embodiment, a tabular data model would receive inputs from one or more object detection models, classification models, pre- processed sensor outputs, and/or unprocessed sensor outputs and output one or more instructions to a user and/or patient (FIG. 1 or FIG. 13). In another embodiment, various outputs are received into logic with pre-set thresholds that are used to output an instruction to the patient and/or user. For example, a target region could be defined for the optic disc, and if the object detection model detects the center or edge of the optic disc to the left and below the target region, logic could return the instruction that would shorten the largest distance the most. The same logic could be used to provide instructions using a separate target region (e.g. retina, arcades, fovea, iris, nose, eyebrow, etc.). Distance would be computed as the distance between the center of the structure and the center of its target, or the distance between the center of the structure and the closest edge of the target, or the center of the target and the closest edge of the structure, or the closest edge of the target the closest edge of the structure.
[0081] Another embodiment would include various permutations of the above classification or object detection models and/or logic. For example, the object detection model could provide input into a classification model that then outputs the phase. In another example, an object detection model detects appropriate structures, while a classification algorithm detects if the device is inside the eye region or outside the eye region, then the output of the away-from-eye classification model and the output from the object detection model are inputted into another classification model which determines an instruction to a user and/or patient (FIG. 5, which shows one possible embodiment of combinations of classification, object detection, and logic). In another example, an object detection model could detect appropriate retina structures on a video frame (e.g. retina, arcades, fovea, iris, disease, etc.), here named the structure model, while a second object detection model could detect regions with 3rd order clarity on the same video frame, here  named the 3rd order model, then using the output of the structure model, the location of 3rd order clarity output on the retina could be determined so that by doing the same on subsequent video frames, all the location of 3rd order clarity outputs can be mapped on the retina, creating a union of regions of 3rd order clarity. If the union of regions of 3rd order clarity sufficiently covers a region of interest, an instruction of done can be given to a user and/or patient.
[0082] Another embodiment of the above items would include using time or iterations to provide further input. For example, camera media and/or sensor media could only be analyzed after 10 frames or 0.5 seconds have passed. Any number of media or length of seconds could be used as the input. Computation may occur for each collected media (image or sensor frame), every 2nd media (image or sensor frame), 3rd media (image or sensor frame), etc., or the media that occurs at every 0.1, 0.2, 0.5, 1, 2 second(s), etc. In other words, picking an image frame in x-number of seconds intervals.
[0083] Further logic can be applied to ensure that a user and/or patient is not overwhelmed with instructions. As instructions are computed, in addition to communicating the instruction to a user and/or patient immediately with each computed value or received input, the instructions could be delayed by being added to a list, and this list could be analyzed every 0.1, 0.2, 1, etc. seconds or every 1, 2, 5, 10, 20, etc. media. The analysis would determine the final instruction that would be returned to a user. This analysis could include taking an average, a time, placement, or value-weighted average, the median, mode, or any number of statistical analyses to determine the final instruction to the patient/user. A threshold could be given that would prevent an instruction from being given to a user sooner than every 0.25, 0.5, 0.7, 1, 2, etc. seconds. Further logic could also be applied to ensure that before certain instructions are played to a user and/or patient other instructions are played first. For example, before communicating to a patient to “look left” or “look right”, the device could communicate to a user to “pull back and restart" and, optionally, wait before finally communicating to the patient to “look left” or “look right”. Additionally, this instruction could be provided to a user and/or patient in real time in series or simultaneously with no delay or logical separation. As described previously, this(ese) output(s) could be communicated through audio, visual, and/or haptic feedback using a variety of mediums. Additionally, any delay  in an instruction could be implemented based on time or based on conditions identified from the outputs of sensors, camera(s), and/or models.
[0084] In another embodiment, a combination of computer vision and/or deep learning models(s), and/or logic are used to help a user take or collect acceptable media. This algorithm may use the gyroscopic, accelerometer, and other sensors and information from the device, as well as the video feed to provide prompts to a user on how to position the phone (described earlier). The processing step may contain an algorithm that may be separated into two or more parts. For example, the first part may be getting to the retina and identifying the optical nerve, and the second part the smaller movements needed to get the retina in the right place. A deep learning algorithm or other algorithms may be used to track the movement of the optical nerve. Computer vision algorithms or other algorithms may be used to track items that can be presented to a secondary user to then relay to a user/patient. For example, to navigate to the retina and the optical nerve, computer vision tools may be used to detect features in the image which indicate a specific command. As discussed previously, these commands can be given at different time intervals and be given in different forms, including, voice prompts, visual prompts, such as a game user interface, vibrations, haptics, or other methods. Once the retina or optic nerve is identified, either can be tracked and prompts may be given based on the combined movements after a set amount of time or once a goal is reached. Other structures or regions may be identified and tracked without limitation to the number, size, or type of structures or regions.
[0085] The algorithm may keep track of where any object located in position space using any and all of the information it is provided. It may use Simultaneous localization and mapping (SLAM) or other positioning algorithms to estimate the device’s relative position to the patient.
[0086] The Guidance System may output all the collected ocular media (FIG. 1, reference number 7) into any number of storage formats such as individual images, video, DICOM, j son, xml, etc. or other single media or multimedia formats. The outputted ocular media could be one or more media each with a different purpose. In one embodiment, the device could output a video of all the camera media collected or collated into one video file. In another embodiment, the device could output a video of all the  camera media that is considered “acceptable” or that has been fdtered using predefined “acceptability” criteria (FIG. 13, reference numbers 8 and 9). In another embodiment, the device could output a single image that best meets a predefined set of “acceptability” criteria. In another embodiment, the device could output a video of all the media collected or collated into one video file with and data containing pointers to particular frames or timestamps within the video file where media that meets predefined “acceptability” criteria is located. In another embodiment, any combination of the above output embodiments could be outputted. The type, amount, and criteria of outputted ocular media and/or output data described here is not exhaustive and additional media or output data may be added.
[0087] Furthermore, the Guidance System may optionally or additionally output any variety of data including data from the various sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.), or computed values (pre-processed data, image or data statistics, model outputs, bounding boxes, classification labels, pointers to images that meet predefined criteria, etc.,) and save the raw data, pre-processed, or computed data to a one or more files. The format of these files may be text or binary, and may be in a format that is raw or that supports multiple labeled entries such as a json, xml, csv, etc.
[0088] The Guidance System can continue to provide guidance to the user until predefined criteria are met (FIG. 13, reference numbers 8 and 9), at which point the user is notified (FIG. 13, reference number 10). Any number of criteria can be established. In one embodiment the predefined criteria may be defined to include portions of the patient’s outer eye region ensuring that the eyebrows, nose, iris, etc, are in-focus either collectively or individually. Alternatively, instead of or in addition to being in-focus, the predefined criteria may be based on any number of characteristics, whether used individually or together, including image clarity, image focus, image illumination, 1st, 2nd, 3rd, or 4th order clarity, etc., image noise level, structures identified, a region or combination of regions that meets one or more of the previous items, etc. (characteristics). In another embodiment, predefined criteria may be defined to include any combination of the patient’s outer eye (nose, eyelid, eyebrows, iris, etc.) with the inner eye (retina, optic nerve, retinal reflex, etc.) ensuring that they are in-focus or meet any other characteristic either collectively or individually. In another embodiment,  predefined criteria may be defined to include portions of the patient’s inner eye ensuring that the optic nerve, fovea, arcades, and/or quadrants of the eye or portions thereof are infocus or meet any other characteristic either collectively or individually. In an exemplary embodiment, the predefined criteria may include the union of bounding boxes across the previous media having 3rd order clarity that covers the circular region two optic disc diameters around the fovea. In another embodiment the criteria could be the square region three optic disc diameters in length centered around the optic disc. Another embodiment could include the square region 5mm in length with the lower left comer starting 1 mm to the upper left of the optic nerve. The criteria could include any number or combination of characteristics around or in relation to any number of combinations of internal or external eye structures.
[0089] Predefined criteria (FIG. 13, reference numbers 8 and 9) may also be defined including any combination of characteristics from non-camera data sensors (distance, touch, moisture, force, magnetic, temperature, light, GPS, etc.) in combination with the ocular media.
[0090] Once predefined criteria are met, the device may notify the user (FIG. 13, reference number 10). Predefined criteria may not be limited to one set of criteria, but may include any criteria or sets of criteria. The method of notifying the user can be through any of the same methods or combination of methods previously described in providing human interpretable instructions to the user (visual, audio, haptic, etc ). Once notified, the user may indicate to the device to stop the guidance. In another embodiment, the user may not respond to the communication and continue with the guidance. In another embodiment, the device may automatically stop guidance and notify the user of such. In another embodiment, the device may automatically stop guidance and await the user’s input before making the next action. In another embodiment, achieving the predefined criteria may result in the user moving to the next eye or using the device on the same eye. In another embodiment, the user may cease using the device. In another embodiment, achieving a first set of predefined criteria may result in saving media, while achieving a 2nd set of predefined criteria may result in the user completing usage of the device on one eye and moving to another eye or finishing using the device altogether. In another embodiment, achieving a first set of criteria may result in communicating  feedback to the user about their progress, while achieving a 2nd set of predefined criteria may result in saving media, while achieving a 3rd set of predefined criteria may result in communicating to the user that a milestone has been reached. Any number or combination of criteria, responses to the criteria (e.g. delivering media, changing criteria etc.), or communication methods with the user may be used or employed while using the device.
[0091] The device may deliver the media through a variety of means (FIG. 1, reference number 8). The device may save the captured media to a persistent storage location and/or a transient storage on the device (e.g. Random Access Memory (RAM), etc.). This could be storage located on the device (e.g. hard disc, flash, RAM, etc.), it could be located in removable storage (flash memory, magnetic storage, optical storage, etc.), or it could be located in remote storage (cloud storage provider, local network storage, etc.), each in communication with the device.
[0092] Although the present invention has been described in detail with regard to the preferred embodiments and drawings thereof, it should be apparent to those of ordinary skill in the art that various adaptations and modifications of the present invention may be accomplished without departing from the spirit and the scope of the invention. Accordingly, it is to be understood that the detailed description and the accompanying drawings as set forth hereinabove are not intended to limit the breadth of the present invention.