[0051] The slider and fixation arrangements are limited but useful arrangements. For the slider arrangement, a user must explicitly position the slider to the appropriate point in the document (which would require non-passive effort from the user). The fixation arrangement requires the user to explicitly look and fixate (again requiring nonpassive effort from the user) at the passage which is to be mapped with the recorded audio annotation. The machine learning arrangement when trained correctly on features that capture the broad reading and annotation taking patterns of users allow more efficient mapping of audio annotation to passages. [0052] Depending on the application, the slider arrangement or fixation arrangement may also be provided where for example a more simplified arrangement is desired. A slider would be an option when the eye-tracker component is not available or offline. The fixation arrangement may be useful if the user does not want the algorithm to decide and the user wants to direct the system to map an audio annotation to a passage.

[0053] Figure 4 is a schematic diagram illustrating the steps carried out in operation form the point of view of a user using the system and method of the present invention. In operation, software associated with the system 400 is running on the device 205C. At 405 the user clicks the start button on the display of the device 205C or via a keyboard or alternatively via speech to text commands on the document on the screen of the device 205C. The user speaks and the system maps audio annotations to the passages in the document which in this case is a PDF file. The audio annotation from the user based on the prediction of the machine learning component are then anchored to the reference passages which are inferred from their gaze behaviour.

[0054] As shown in Figure 4, the system 400 offers two features, notably recording and retrieval. For example, to record an audio annotation while reading the document the user either presses the recording button at the left side of the document viewer or, in a possible embodiment, via a speech to text command and speaks out loud their annotations in relation to the passage that there are looking at. A prediction is then made by the machine learning component regarding the reference passage based on the users gaze activity. The audio annotation is saved in a memory and an indicia which may take the form of a sound icon may then be provided an displayed beside the reference passage to provide an indication that an annotation has been made as shown in 415. Clicking on the audio annotation indicia plays the recorded audio annotation and preferably also highlights the referenced passages when playback is occurring. As shown in 410 the reader presses the stop button to finish making a recording or they may do so by voice to text command. As shown in 420 the relevant text portion is highlighted as the audio annotation is playing.

[0055] Figure 5 is a flow diagram illustrating a method for anchoring an audio annotation to a passage within an electronic document. The method 500 starts at step 505 where a user (say 305) associated with, for example a device (i.e. 205C) which may be a mobile phone, tablet desktop computer or the like, has a document open and a microphone and eye-tracker component in operation associated with the device. At step 505, audio input is received from the user 305 to a microphone 210C associated with the device while they are looking at the device. Control then moves to step 510 in which, while the audio input is being received, the user's gaze is evaluated via the eyetracker component. Control then moves to step 515 in which the passage in the document that the user's gaze is directed to is determined. Control then moves to step 520 where the audio input is then mapped to the passage in the document that the user's gaze is directed to, as determined in step 515.

[0056] The evaluation may occur in a number of different ways. In an embodiment, the evaluation may occur by way of the position of a slider associated with the electronic document. The slider having a value of 0 where 0 is the start of the page to MAX, where MAX is the end of the page, and determining the location of the user's gaze.

[0057] In an alternative, the evaluation may occur by way of fixation gaze data, which is gaze data observed and analysed during a window spanning the audio annotation. The gaze points observed in this window may be cluttered into "fixations" to determines a dispersion threshold (by setting the value or dispersion and duration to a threshold parameter, for example, 200 and 100 respectively). The fixation points may be assigned to the nearest passage. The fixation count may be counted as one gaze feature for each passage. In this way the audio annotation is primarily mapped to the passage at which the user has looked at the most whilst speaking. For the fixationbased approach a classifier may be utilised (which may be for example a logistic regression classifier), but a feature vector in this embodiment, consists of the count of fixation feature for each passage. Therefore, the system and method of the present invention may predict the passage at which the user has looked the most while recording the annotation which is indicated by the fixation count feature.

[0058] Preferably, the evaluation occurs by way of a machine learning component trained on one or more of gaze and/or temporal features that reflects the reading and annotation-taking patterns of the user. In operation, the classifier is trained and then is fed a feature vector as described with reference to Figure 3. [0059] The method may further include the steps of providing an indicia for display on the document associated with the audio annotation, preferably the display is an audio icon. The method may further include the step of determining via the audio input the start and stop of an audio annotation. For example, it may be by way of the user dictation a voice command or the microphone sensing a voice command when the software is being used. The method may further include the step of providing playback of the audio annotation to the user and highlighting the relevant passage while the audio annotation is being played.

[0060] While the invention has been described in conjunction with a limited number of embodiments, it will be appreciated by those skilled in the art that many alternative, modifications and variations in light of the foregoing description are possible. Accordingly, the present invention is intended to embrace all such alternative, modifications and variations as may fall within the spirit and scope of the invention as disclosed.

[0061] The present application may be used as a basis or priority in respect of one or more future applications and the claims of any such future application may be directed to any one feature or combination of features that are described in the present application. Any such future application may include one or more of the following claims, which are given by way of example and are non-limiting in regard to what may be claimed in any future application.

Claims

The claims defining the invention are as follows:

1 . A system for anchoring an audio annotation to a passage within an electronic document, the system including: a controller having a processor; a recording component operable by the controller, the recording component including a microphone and an eye-tracker component to capture the gaze of the user, wherein the processor carries out the steps of: in response to an audio input to the microphone, while the audio input is being received, evaluating via the eye-tracker component, the users gaze, thereby determining the passage in the document that the user’s gaze is directed to; and mapping the audio input to the passage in the document.

2. The system of claim 1 , wherein the evaluation includes determining position data associated with a slider on the electronic document.

3. The system of claim 1 , wherein the position data includes the position of the slider in the document and the page number of the document at the time the audio input was received.

4. The system of claim 1 , wherein the evaluation includes determining fixation gaze data, the fixation gaze data being data that is observed during a window spanning the audio annotation.

5. The system of claim 4, wherein the fixation gaze data includes one or more gaze points observed in the window and grouped into fixation groups to determine a dispersion and/or duration threshold.

6. The system of claim 1 , wherein the evaluation includes a machine learning component trained on one or more of gaze and/or temporal features of one or more users that reflects the reading and annotation-taking patterns of the user.

7. The system of claim 1 , wherein indicia is displayed for the audio mapped the to the passage in the document. The system of claim 7, wherein a highlight is provided in the relevant passage of the document on user engagement with the indicia. A method for anchoring an audio annotation to a passage within an electronic document, the method including: receiving an audio input to a microphone, and while the audio input is being received, evaluating via an eye-tracker component, the users gaze, thereby determining the passage in the document that the user’s gaze is directed to and mapping the audio input to the passage in the document.