WO2025168753A1

Movatterモバイル変換

Info

Publication number: WO2025168753A1
Application number: PCT/EP2025/053213
Authority: WO
Inventors: Matthias SEIBOLD; Philipp Fürnstahl
Original assignee: Zurich Universitaet Institut fuer Medizinische Virologie
Current assignee: Zurich Universitaet Institut fuer Medizinische Virologie
Priority date: 2024-02-08
Filing date: 2025-02-07
Publication date: 2025-08-14
Anticipated expiration: 2026-08-08

Abstract

Method, computer program product and system for detection and localization of surgical events in a surgical scene (100), comprising: receiving, an acoustic signal from an acoustic capture device (10) arranged to capture acoustic signals originating from an area (150) of interest of the surgical scene (100); detecting surgical events by identifying acoustic patterns associated with surgical procedures in the acoustic signal emitted by a surgical tool and/or by an interaction of a surgical tool (80) with one or more entities in the surgical scene (100), such as patient anatomies and/or objects; determining respective locations of the one or more surgical events relative to a spatial model of the surgical scene (100) by processing the acoustic signal; and generating a digital representation of the surgical scene (100) comprising digital representations of the one or more surgical events and the respective locations of the one or more surgical events.

Description

SYSTEM AND METHOD FOR DETECTION AND LOCALIZATION OF SURGICAL EVENTS IN A SURGICAL SCENE

FIELD OF THE DISCLOSURE

The present disclosure relates to a computer implemented method, a computer program product and a system for detection and localization of surgical events in a surgical scene.

BACKGROUND OF THE DISCLOSURE

Surgical interventions are complex procedures and involve highly variant interactions of the surgical staff, medical devices, tools and instruments, and patients. As the basis for the next generation of intelligent computer aided surgery systems, the systematic and structured description and analysis is a crucial component, modeling the events, structure, and interactions during surgical interventions.

In the past, efforts have been made to describe surgical interventions in a systematic way. Surgical Process Modeling (SPM) has been introduced to provide a simplified structural representation of surgeries, representing surgical events of a surgical workflow by decomposing it into its main phases, individual events and sub-events. The field of Surgical Workflow Recognition deals with the (automated) analysis and detection of surgical events based on video data captured in an operating room and can be employed to provide feedback to the surgical staff. It can furthermore be used to provide alarms in the case of adverse events, facilitate the analysis of surgeries for documentation and surgical training, and can be utilized as the basis for automated surgical guidance and assistance systems. In this context, different representations have been proposed in related works, such as surgical phase descriptions, action triplets, or surgical scene graphs.

While the aforementioned approaches have great potential for the automated description and analysis of surgical workflows, however, they heavily rely on visual data, such as regular color (RGB) images or 3D data from depth (RGB-D) sensors. Despite the advantages of vision-based systems such as software and hardware availability, they are also associated with drawbacks such as line-of-sight issues and tedious calibration and registration procedures.

In previous work, acoustic signals have been identified to have great potential for the development of novel multimodal sensing solutions for medical applications. Using Acoustic Sensing (AS), surgical events which are associated with the emission of acoustic signals, such as coagulation, suction, sawing, milling, drilling, etc. can be detected in the surgical context using airborne or contact microphones. Known AS systems have been developed for a wide variety of intraoperative surgical applications, e.g., for surgical error prevention during surgical drilling, the analysis of surgical milling, the monitoring of an implant insertion process in orthopedic surgery, analysis of tissue penetration during needle insertion, or tissue classification based on analysis of coagulation sounds in minimally invasive surgery.

However, known approaches to the use of AS in a surgical scene are limited to a mere detection of surgical events but do not offer a suitable solution for a true digital representation of a surgical scene enabling understanding (i.e. analysis) of the surgical scene.

SUMMARY OF THE DISCLOSURE

It is an object of this disclosure to provide a method, a computer program product and a system for detection and localization of surgical events in a surgical scene which do not have at least some of the disadvantages of the prior art. In particular, it is an object of this disclosure to provide a method, a computer program product and a system for detection and localization of surgical events in a surgical scene which provide a true digital representation of the one or more surgical events and the respective locations of the one or more surgical events. According to the present disclosure, these objects are addressed by the features of the independent claims. In addition, further advantageous embodiments follow from the dependent claims and the description.

According to the present disclosure, the above-mentioned objects are particularly achieved by a computer implemented method for detection and localization of surgical events in a surgical scene. In a step of the method, an acoustic signal is received by a computing device, from an acoustic capture device arranged to capture acoustic signals originating from an area of interest of the surgical scene.

Thereafter, one or more surgical events are detected, by the computing device, by identifying acoustic patterns associated with one or more surgical procedures in the acoustic signal emitted by a surgical tool. Alternatively, or additionally, one or more surgical events are detected by identifying acoustic patterns associated with one or more surgical procedures in the acoustic signal emitted by an interaction of a surgical tool with one or more entities in the surgical scene. According to embodiments, the one or more entities (the interaction of which with a surgical tool emits the acoustic signal which is analyzed for occurrence of an acoustic pattern) comprise anatomic body parts, other surgical tool, objects present in a surgical scene - such as an operating table, tool rest surfaces - as well as humans, such as surgical staff and their interactions.

Having detected one or more surgical events, respective locations of the one or more surgical events is/are determined, by the computing device, relative to a spatial model of the surgical scene. Alternatively, or additionally, a spatial model is created based on the detected and localized acoustic events. The locations of the one or more surgical events is/are determined by processing of the acoustic signal. In an embodiment, a beamforming method is applied, whereby a multichannel acoustic signal - captured by an acoustic capture device having an array of microphones with a known is geometry - is processed by the computing device. After determining the respective locations of the one or more surgical events relative to a spatial model of the surgical scene, a digital representation of the surgical scene is generated by the computing device. The digital representation of the surgical scene comprises digital representations of the one or more surgical events and the respective locations of the one or more surgical events. In particular, generating the digital representation of the surgical scene comprises generating a spatio-temporal digital representation of the surgical scene, encoding the one or more surgical events in relation to the spatial model of the surgical scene.

The digital representation of surgical events in the surgical scene is advantageous as it allows tracking of surgical events both in spatial and temporal dimensions. In other words, surgical acoustic events which could merely be detected (but not localized) using known acoustic sensing systems can now be extended with spatial information to enrich the structured description and digital representation of surgical procedures.

The acoustic localization of surgical events according to present invention is advantageous, as it is not delimited by the field-of-view of vision-based systems.

The localization of acoustic events in a surgical scene has great potential for many relevant applications such as surgical context and workflow modeling and prediction.

According to embodiments, semantic data associated with the surgical tool and/or the one or more entities in the surgical scene is determined by processing of the acoustic signal. For example, a type of surgical tool, such as a drill or a saw, is determined based in the acoustic signal, i.e. sound, emitted by the surgical drill. As a further example, a quality of tissue (as an entity) material, such as bone density, is determined by processing of the acoustic signal, i.e. sound, emitted by the interaction of the surgical tool, e.g. a drill or saw, with the tissue.

According to embodiments, generating the digital representation of the surgical scene further comprises encoding, within the spatio-temporal digital representation, semantic data and data representative of interactions between the surgical tool with the one or more entities in the surgical scene.

In particular, encoding, within the spatio-temporal digital representation, of semantic data and data representative of interactions and/or relationships between the surgical tool with the one or more entities in the surgical scene comprises generation of a structured representation of the one or more entities in the surgical scene (such as patient anatomies and/or surgical staff), objects (such as the surgical tool), and their interactions with semantic and spatial encoding. A more reliable modeling of surgical activity and context allows for the extension of medical Acoustic Sensing methods with spatial information and the design of novel surgical guidance and assistance systems. Generating a structured representation - such as a non-hierarchical representation (e.g. a graph) and/or a hierarchical representation (e.g. a tree) - as a spatio-temporal digital representation of a surgical scene has several advantages - especially as compared to storage of video and streams for providing an audiovisual record of a surgical scene: compact representation, reducing the amount of occupied storage space and required processing power for faster, more efficient storage, retrieval and processing; efficient and reliable retrieval of surgical events, their location, semantic data associated with entities of the surgical scene as well as the interactions and/or relationships between these from the digital representation, without having to re-process an audiovisual record.

It is an object of further embodiments to track surgical events with reference to a prescribed surgical workflow. This object is addressed according to embodiments, wherein a prescribed surgical workflow is retrieved by the computing device, the prescribed surgical workflow comprising data indicative of one or more prescribed workflow steps. The prescribed surgical workflow is retrieved by the computing device from a datastore comprised by and/or commu- nicatively connected to the computing device. Thereafter, a current workflow step is identified, by the computing device, amongst the one or more prescribed workflow steps corresponding to one or more of the detected surgical events. In particular, a current workflow step is identified by comparing acoustic patterns associated with the one or more prescribed workflow steps with acoustic patterns associated with the detected surgical events.

According to an embodiment, the method further comprises determining a current state within the prescribed surgical workflow based on the prescribed surgical workflow and the digital representation of the surgical scene. The current state may comprise an indication of a progress of the prescribed surgical workflow, e.g. by indication (such as progress bar) of the current workflow step with reference to all workflow steps of the prescribed surgical workflow.

Alternatively, or additionally, the method further comprises detecting discrepancies between the prescribed workflow steps of the prescribed surgical workflow and the one or more surgical events based on the prescribed surgical workflow and the digital representation of the surgical scene, thereby enabling detection of a surgical step being carried out which does not correspond to the prescribed surgical workflow (be it on purpose or inadvertently).

Alternatively, or additionally, the method further comprises generating one or more control commands for controlling a surgical system - communicatively connected to the computing device - based on the prescribed surgical workflow and the digital representation of the surgical scene. In particular, the control commands for controlling a surgical system are generated based on control commands associated with the current workflow step, e.g. control of a communicatively connected support system like a surgical robot or surgical navigation system.

Alternatively, or additionally, the method further comprises generating user guidance for surgical personnel based on the prescribed surgical workflow and the digital representation of the surgical scene. The user guidance may comprise information retrieved by the computing device in accordance with the current workflow step, such as operating instructions of the surgical tool.

It is an object of further embodiments to enable monitoring of the quality of the surgical procedure being carried out, a quality reflected by one or more event parameters. This object is addressed, according to further embodiments, by determining, representing and comparing event parameters with prescribed event parameters. First, one or more event parameters corresponding to the one or more surgical events are determined, by the computing device, by processing of the acoustic signal. For example, the one or more surgical events are determined is/are determined in the context of the prescribed surgical workflow (e.g. sawing at a prescribed stage of a surgery). As a further example, a drill breakthrough is detected based on the frequency of the noise emitted by the drill. Thereafter, the one or more event parameters are represented in the digital representation of the surgical scene.

Furthermore, one or more prescribed parameters corresponding to a surgical event are retrieved, by the computing device, e.g. from a datastore comprised by or communicatively connected to the computing device. According to embodiments, the prescribed parameters comprise discrete values, such as discrete numerical values (e.g. the number of incisions to be made in a tissue). Alternatively, or additionally the prescribed parameters comprise ranges, such as numerical ranges (e.g. a range of rotation speed of a drill). Thereafter, deviations of the one or more event parameters from the one or more prescribed parameters are detected by the computing device using the digital representation of the surgical scene.

It is an object of embodiments to further enable detection and localization of surgical tools in addition to detection and localization of surgical events. To address this object, according to embodiments - a surgical tool corresponding to one or more of the surgical events is identified by comparing the acoustic signal with a collection of one or more acoustic fingerprints associated with one or more types of surgical tools. Having identified surgical tool corresponding to one or more of the surgical events, the location of the surgical tool is tracked in the surgical scene by processing the acoustic signal. It is an object of embodiments to not only track the location of surgical tools within the surgical scene but to also enable monitoring of the steps of use. To address this object, according to embodiments - a tool-specific surgical workflow corresponding to the identified surgical tool is retrieved. The computing device retrieves the tool-specific surgical workflow from a datas- tore comprised by or communicatively connected thereto. The tool-specific surgical workflow comprises data indicative of one or more prescribed tool-specific workflow steps. Thereafter, discrepancies between the prescribed tool-specific workflow steps of the tool-specific surgical workflow and the one or more surgical events are detected by the computing device.

It is an object of embodiments to ensure proper spatial use of surgical tools within a patient anatomy. To address this object, according to embodiments, one or more prescribed locations associated with the identified surgical tool; are retrieved (from a datastore comprised by or communicatively connected to the computing device). In particular, the prescribed locations define relative positions of the surgical tool relative to a patient anatomy. Thereafter, erroneous placement of the surgical tool is detected by detecting a deviation of the location of the surgical tool from the prescribed locations. For example, for a surgical drill, the prescribed location defines the location of the drill with respect to a bone structure as a maximum depth within the bone to avoid an undesired bone breakthrough. A bone breakthrough or an imminent bone breakthrough being detectable by analyzing the acoustic signal emitted by the interaction of the drill with the bone structure.

It is an object of embodiments to not only track the location of surgical tools within the surgical scene but to also enable monitoring of parameters of their use. To address this object, according to embodiments - prescribed operating parameters associated with the identified surgical tool are retrieved. Thereafter, deviations of an operation of the identified surgical tool from the prescribed operating parameters are detected by processing the acoustic signal.

It is an object of further embodiments to provide a multimodal solution to detection and localization of surgical events in a surgical scene to improve versatility, redundancy, detail level and/or quality. To address this object, according to embodiments - in addition to the acoustic- based localization according to any of the embodiments disclosed herein - a visual signal of the surgical scene is received by the computing device from a visual capture device such as a video camera. The visual capture device is directed to the area of interest of the surgical scene and communicatively connected to the computing device.

One or more of the surgical events are then detected further based on processing the visual signal. In other words, the acoustic signal and the visual signal are both processed for reliably and precisely detect the surgical events.

Alternatively, or additionally the location of the surgical tool is determined further based on processing of the visual signal. Alternatively, or additionally the location of the interaction of the surgical tool with one or more entities further based on processing of the visual signal. Such embodiments are particularly advantageous as they provide a high degree of flexibility in the use of the acoustic signal in combination with the visual signal. For example, there may be situations where a surgical event is detected based on processing of the acoustic signal - such as operating a drill, which often cannot be distinguished solely by processing of visual signals from merely holding of the drill, while the location of the surgical tool is determined by processing the visual signal or the visual signal complemented by processing of the acoustic signal.

In order to optimize the consideration of the acoustic respectively of the visual signals, according to embodiments, presence respectively absence of one or more of the surgical events in the visual signal is detected and, the visual signal and/or the acoustic signal is/ are selected for detecting the location of the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities in dependence of the presence respectively absence of one or more of the surgical events from the visual signal.

According to embodiments, receiving the acoustic signal from the acoustic capture device comprises receiving a plurality of acoustic streams from a plurality of microphones directed to the area of interest of the surgical scene from different directions. Correspondingly, detection of the location of the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities comprises multilateration of the location using the plurality of acoustic streams and positional data of the plurality of microphones.

According to embodiments, detecting one or more surgical events further comprises detecting verbal expressions of medical staff by applying a speech recognition algorithm, in particular a natural language processing algorithm, on the acoustic signal.

It is an object of further embodiments disclosed herein to make training data available for training of a machine learning-based model for detecting and/or localizing of a surgical tool and/or of an interaction of a surgical tool with one or more entities. This object is addressed by generating simulated acoustic signals associated with a surgical tool type and/or a known interaction of a surgical tool type with one or more entities. In particular, the simulated acoustic signals are generated by the computing device using a multimodal digital twin of the surgical tool and/or a multimodal digital twin of the surgical scene using a multimodal digital twin of the surgical tool and/or a multimodal digital twin of the surgical scene. According to embodiments, the multimodal digital twin comprises data representing acoustic and visual characteristics of the surgical tool and/or the surgical scene. Acoustic characteristics comprise (but are not limited to) frequency domain of emitted acoustic signals in operation, in standby, in interaction with other entities, acoustic reflectance respectively absorbency. Visual characteristics comprise the visual appearance, physical dimensions, material characteristics such light reflectance, absorbency, etc. The digital twin can capture realistic surgical scenarios but also generate novel surgical scenarios. Realistic multimodal data is generated using the digital twin, e.g. for the training of deep learning-based computer assisted surgery systems.

After generation of the simulated acoustic signals, synthetic training data is generated by labelling the simulated acoustic signals with data indicative of the known surgical tool type or the known interaction of a surgical tool type with the one or more entities. Generation of synthetic training data is particularly advantageous in the surgical field since real-world training data is sparse and difficult to obtain due to the fact that providing surgical scenes with the required hardware is expensive and the capture of such training data is subject to strict regulations. Furthermore, even if acoustic signals of sufficient quality would be available from real-world surgical scenes (which is not as explained before), labelling such real-world acoustic signals would be a resource-intensive task.

Having generated the synthetic training data, a machine learning-based model is trained for the purpose of detecting and/or localizing of a surgical tool and/or an interaction of a surgical tool with one or more entities using the synthetic training data. In order to make the synthetic training data more realistic, training of the machine learning-based model comprises providing simulated acoustic signals as an input to the machine learning-based model and providing the labels (data indicative of the known surgical tool type or the known interaction of a surgical tool type with the one or more entities) as expected output of the machine learningbased model. For example, the simulated acoustic signals are provided at an input layer of a neural network (of the machine learning-based model) and the labels are provided as an expected output of an output layer of the neural network, the training of the machine learningbased model comprising optimizing of parameters (e.g. weights) of nodes and connections between nodes of the input layer, the output layer and of a plurality of hidden layers of the neural network connecting the input layer and the output layer.

After training of the machine learning-based model, the received acoustic signal is input to an input of the machine learning-based model. Then, the surgical tool and/or the interaction of the surgical tool with the one or more entities is detected and localized at an output of the machine learning-based model. In an embodiment, the surgical tool and/or the interaction of the surgical tool with the one or more entities is detected first followed by its localization. Alternatively, the surgical tool and/or the interaction of the surgical tool with the one or more entities is both detected and localized by a single machine learning-based model in an end- to-end fashion. According to embodiments, the method further comprises training a machine learning-based model for filtering out noise, considering sound reflections and/or occlusions in detecting the location of the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities.

The present disclosure further relates to a computer program product for detection and localization of surgical events in a surgical scene, the computer program product comprising instructions which, when executed by a computing device, cause the computing device to carry out the method for detection and localization of surgical events according to any one of the embodiments disclosed herein.

The present disclosure further relates to a system for detection and localization surgical events in a surgical scene, the system comprising: an acoustic capture device arranged to capture acoustic signals originating from an area of interest of the surgical scene and configured to capture acoustic patterns emitted by a surgical tool and/or by an interaction of a surgical tool with one or more entities associated with one or more surgical events. The system further comprises a computing device communicatively connected to the acoustic capture device, the computing device being configured to carry out the method for detection and localization of surgical events according to any one of the embodiments disclosed herein.

It is to be understood that both the foregoing general description and the following detailed description present embodiments, and are intended to provide an overview or framework for understanding the nature and character of the disclosure. The accompanying drawings are included to provide a further understanding, and are incorporated into and constitute a part of this specification. The drawings illustrate various embodiments, and together with the description serve to explain the principles and operation of the concepts disclosed.

BRIEF DESCRIPTION OF THE DRAWINGS The herein described disclosure will be more fully understood from the detailed description given herein below and the accompanying drawings which should not be considered limiting to the disclosure described in the appended claims. The drawings in which:

Figure 1 shows a conceptual block diagram of a system for detection and localization surgical events in a surgical scene;

Figure 2 shows a highly schematic illustration of an acoustic capture device arranged to capture acoustic signals originating from an area of interest of the surgical scene and configured to capture acoustic patterns emitted by a surgical tool and/or by an interaction of a surgical tool with one or more entities associated with one or more surgical events;

Figure 3 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to an embodiment of the present disclosure;

Figure 4 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, whereby surgical events are tracked with reference to a prescribed surgical workflow;

Figure 5 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, further comprising monitoring of the quality of the surgical procedure being carried out;

Figure 6 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, enabling detection and localization of surgical tools in addition to localization and detection and localization of surgical events;

Figure 7 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, combining acoustic and video signals;

Figure 8 shows a flow diagram illustrating steps of a computer implemented method for generating training data, training of a machine learning-based model and use of the trained machine learning-based model for detection and localization of surgical events in a surgical scene.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Figure 1 shows a conceptual block diagram of a particular embodiment of the system 1 for detection and localization surgical events in a surgical scene 100. The area delimited by a circle drawn with dashed lines illustrates the area of interest 150 within the surgical scene 100, such as an area where a surgery is performed. As delimited by a rounded rectangle drawn with dashed lines, the system 1 comprises an acoustic capture device 10 arranged to capture acoustic signals originating from an area 150 of interest of the surgical scene 100. The acoustic capture device 10 is configured to capture acoustic patterns emitted by a surgical tool and/or by an interaction of a surgical tool with one or more entities associated with one or more surgical events. The system 1 further comprises a computing device 20 communicatively connected to the acoustic capture device 10, the computing device 20 being configured to carry out the method for detection and localization of surgical events according to any one of the embodiments disclosed herein. The computing device 20 is communicatively connected to a datastore 50 for storing prescribed surgical workflows, prescribed parameters corresponding to surgical events, tool-specific surgical workflows, prescribed locations associated with surgical tools and/or training data. Optionally, according to embodiments, the computing device 20 is communicatively connected to a surgical system 200, such as drill, an imaging device such as an X-ray or ultrasound device. The computing device 20 is further communicatively connected to a visual capture device 30 such as a video camera directed to the area of interest 150 of the surgical scene 100.

Figure 2 shows a highly schematic illustration of an acoustic capture device 10 arranged to capture acoustic signals originating from an area 150 of interest of the surgical scene 100 and configured to capture acoustic patterns emitted by a surgical tool and/or by an interaction of a surgical tool with one or more entities associated with one or more surgical events. Correspondingly, receiving the acoustic signal from the acoustic capture device 10 comprises receiving a plurality of acoustic streams from a plurality of microphones directed to the area 150 of interest of the surgical scene 100 from different directions. Correspondingly, detection of the location of the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities comprises multilateration of the location using the plurality of acoustic streams and positional data of the plurality of microphones.

Figure 3 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to an embodiment of the present disclosure.

In a preparatory step SO, an acoustic capture device 10 is arranged to capture acoustic signals originating from an area 150 of interest of the surgical scene 100 and communicatively connected to the computing device 20.

In a step S10 of the method, an acoustic signal is received by the computing device 20, from the acoustic capture device 10. Thereafter, in a step S10 one or more surgical events are detected, by the computing device 20, by identifying, in the acoustic signal, acoustic patterns associated with one or more surgical procedures, emitted by a surgical tool. Alternatively, or additionally, one or more surgical events are detected by identifying, in the acoustic signal, acoustic patterns associated with one or more surgical procedures, emitted by an interaction of a surgical tool with one or more entities in the surgical scene 100. According to embodiments, the one or more entities (the interaction of which with a surgical tool emits the acoustic signal which is analyzed for occurrence of an acoustic pattern) comprise anatomic body parts, other surgical tool, objects present in a surgical scene - such as an operating table, tool rest surfaces - as well as humans, such as surgical staff.

Having detected one or more surgical events, in a step S20 respective locations of the one or more surgical events is/are determined, by the computing device 20, relative to a spatial model of the surgical scene 100. The locations of the one or more surgical events is/are determined by processing the acoustic signal.

After determining the respective locations of the one or more surgical events relative to a spatial model of the surgical scene, in a step S40 a digital representation of the surgical scene 100 is generated by the computing device 20. The digital representation of the surgical scene 100 comprises digital representations of the one or more surgical events and the respective locations of the one or more surgical events. In particular, generating the digital representation of the surgical scene 100 comprises a step S42 of generating a spatio-temporal digital representation of the surgical scene 100, encoding the one or more surgical events in relation to the spatial model of the surgical scene 100. In a step S44, semantic data and data representative of interactions between the surgical tool 80 with the one or more entities in the surgical scene 100 is encoded in the digital representation. In particular, in a step S45, a structured representation of the one or more entities in the surgical scene 100 (such as patient anatomies and/or surgical staff), objects (such as the surgical tool), and their interactions is generated with semantic and spatial encoding.

Figure 4 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, surgical events being tracked with reference to a prescribed surgical workflow. In a step S50, a prescribed surgical workflow is retrieved by the computing device 20, the prescribed surgical workflow comprising data indicative of one or more prescribed workflow steps. The prescribed surgical workflow is retrieved by the computing device 20 from a datas- tore 22 comprised by and/or communicatively connected to the computing device 20. Thereafter, in a step S52, a current workflow step is identified, by the computing device 20, amongst the one or more prescribed workflow steps corresponding to one or more of the detected surgical events. In particular, a current workflow step is identified by comparing acoustic patterns associated with the one or more prescribed workflow steps with acoustic patterns associated with the detected surgical events.

According to an embodiment, the method further comprises the step S54 of determining a current state within the prescribed surgical workflow based on the prescribed surgical workflow and the digital representation of the surgical scene 100. The current state may comprise an indication of a progress of the prescribed surgical workflow, e.g. by indication (such as progress bar) of the current workflow step with reference to all workflow steps of the prescribed surgical workflow.

Alternatively, or additionally, the method further comprises the step S55 of detecting discrepancies between the prescribed workflow steps of the prescribed surgical workflow and the one or more surgical events based on the prescribed surgical workflow and the digital representation of the surgical scene 100, thereby enabling detection of a surgical step being carried out which does not correspond to the prescribed surgical workflow (be it on purpose or inadvertently).

Alternatively, or additionally, the method further comprises the step S56 of generating one or more control commands for controlling a surgical system 200 - communicatively connected to the computing device 20 - based on the prescribed surgical workflow and the digital representation of the surgical scene 100. In particular, the control commands for controlling a surgical system 200 are generated based on control commands associated with the current workflow step. Alternatively, or additionally, the method further comprises the step S57 of generating user guidance for surgical personnel based on the prescribed surgical workflow and the digital representation of the surgical scene 100. The user guidance may comprise information retrieved by the computing device 20 in accordance with the current workflow step, such as operating instructions of the surgical tool.

Figure 5 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, further comprising monitoring of the quality of the surgical procedure being carried out.

In a step S60, one or more event parameters corresponding to the one or more surgical events are determined, by the computing device 20, by processing of the acoustic signal. For example, the current rotational speed of a drill (as surgical tool) is determined based on the frequency of the noise emitted by the drill. Thereafter, in a step S62 representing the one or more event parameters in the digital representation of the surgical scene 100.

Furthermore, in a step S64, one or more prescribed parameters corresponding to a surgical event are retrieved, by the computing device 20, e.g. from a datastore comprised by or communicatively connected to the computing device 20. According to embodiments, the prescribed parameters comprise discrete values, such as discrete numerical values (e.g. the number of incisions to be made in a tissue). Alternatively, or additionally the prescribed parameters comprise ranges, such as numerical ranges (e.g. a range of rotation speed of a drill). Thereafter, in a step S66 deviations of the one or more event parameters from the one or more prescribed parameters are detected by the computing device 20 using the digital representation of the surgical scene 100. Figure 6 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, enabling detection and localization of surgical tools in addition to localization and detection and localization of surgical events.

A surgical tool corresponding to one or more of the surgical events is identified, in a step S70, by comparing the acoustic signal with a collection of one or more acoustic fingerprints associated with one or more types of surgical tools. Having identified surgical tool corresponding to one or more of the surgical events, in a step S72, the location of the surgical tool is tracked in the surgical scene 100 by processing the acoustic signal.

In order to not only track the location of surgical tools within the surgical scene but to also enable monitoring of the steps of use, in a step S73 a tool-specific surgical workflow corresponding to the identified surgical tool is retrieved. The computing device 20 retrieves the tool-specific surgical workflow from a datastore 50 comprised by or communicatively connected thereto. The tool-specific surgical workflow comprises data indicative of one or more prescribed tool-specific workflow steps. Thereafter, in a step S74, discrepancies between the prescribed tool-specific workflow steps of the tool-specific surgical workflow and the one or more surgical events are detected by the computing device 20.

To ensure proper spatial use of surgical tools within a patient anatomy , in a step S75, one or more prescribed locations associated with the identified surgical tool; are retrieved (from a datastore comprised by or communicatively connected to the computing device 20). Thereafter, in a step S76 erroneous placement of the surgical tool is detected by detecting a deviation of the location of the surgical tool from the prescribed locations.

In order to not only track the location of surgical tools within the surgical scene but to also enable monitoring of parameters of their use, in a step S77, prescribed operating parameters associated with the identified surgical tool are retrieved. Thereafter, in a step S78 deviations of an operation of the identified surgical tool from the prescribed operating parameters are detected by processing the acoustic signal.

Figure 7 shows a flow diagram illustrating steps of a computer implemented method for detection and localization of surgical events in a surgical scene according to a further embodiment of the present disclosure, combining acoustic and video signals.

In order to provide a multimodal solution to detection and localization of surgical events in a surgical scene to improve versatility, redundancy, detail level and/or quality, in addition to the acoustic-based localization, a visual signal of the surgical scene 100 is received, in a step S80, by the computing device 20 from a visual capture device 30 such as a video camera. The visual capture device 30 is directed to the area of interest 150 of the surgical scene 100 and communicatively connected to the computing device 20.

One or more of the surgical events are then detected, in a step S85, further based on processing the visual signal. In other words, the acoustic signal and the visual signal are both processed for reliably and precisely detect the surgical events.

Alternatively, or additionally in a step S86, the location of the surgical tool is determined further based on processing of the visual signal. Alternatively, or additionally in a step S87, the location of the interaction of the surgical tool with one or more entities further based on processing of the visual signal. Such embodiments are particularly advantageous as they provide a high degree of flexibility in the use of the acoustic signal in combination with the visual signal. For example, there may be situations where a surgical event is detected based on processing of the acoustic signal - such as operating a drill, which often cannot be distinguished solely by processing of visual signals from merely holding of the drill, while the location of the surgical tool is determined by processing the visual signal or the visual signal complemented by processing of the acoustic signal. In order to optimize the consideration of the acoustic respectively of the visual signals, presence respectively absence of one or more of the surgical events in the visual signal is detected , in a step S81 and in a step S82, the visual signal and/or the acoustic signal is/ are selected for detecting the location of the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities in dependence of the presence respectively absence of one or more of the surgical events from the visual signal.

In order to make training data available for training of a machine learning-based model for detecting a location of a surgical tool and/or a location of an interaction of a surgical tool with one or more entities, in a step S91 simulated acoustic signals associated with a surgical tool type and/or a known interaction of a surgical tool type with one or more entities is generated.

After generation of the simulated acoustic signals, in a step S92 synthetic training data is generated by labelling the simulated acoustic signals with data indicative of the known surgical tool type or the known interaction of a surgical tool type with the one or more entities.

Alternatively, or additionally, in a step S93, (real-life) acoustic signals associated with a surgical tool type and/or a known interaction of a surgical tool type with one or more entities is captured using an acoustic capture device 10. Thereafter, in a step S94, real-life training data is generated using the captured acoustic signals, labeled with the surgical tool type and/or a known interaction of a surgical tool type with one or more entities.

Having generated the synthetic training data and/or real-life training data, in a step S95 a machine learning-based model is trained for the purpose of detecting a location of a surgical tool and/or a location of an interaction of a surgical tool with one or more entities using the synthetic training data and/or the real-life training data. In particular, training of the machine learning-based model comprises providing the simulated acoustic signals as an input to the machine learning-based model and providing the labels (data indicative of the known surgical tool type or the known interaction of a surgical tool type with the one or more entities) as expected output of the machine learning-based model. For example, the simulated and/or captured acoustic signals are provided at an input layer of a neural network (of the machine learning-based model) and the labels are provided as an expected output of an output layer of the neural network, the training of the machine learning-based model comprising optimizing of parameters (e.g. weights) of nodes and connections between nodes of the input layer, the output lawyer and of a plurality of hidden layers of the neural network connecting the input layer and the output layer.

After training of the machine learning-based model, in a step S96 the received acoustic signal (and optionally additional multimodal data such as data from sensors, the visual signal) is input to an input of the machine learning-based model. Then, in a step S97 the surgical tool and/or the location of the interaction of the surgical tool with the one or more entities is detected and localized at an output of the machine learning-based model.

It should be noted that, in the description, the sequence of the steps has been presented in a specific order, one skilled in the art will understand, however the order of at least some of the steps could be altered, without deviating from the scope of the disclosure.

REFERENCE LIST system 1 acoustic capture device 10 computing device 20 visual capture device 30 datastore 50 surgical tool 80 surgical scene 100 area of interest 150 surgical system 200

Claims

1. A computer implemented method for detection and localization of surgical events in a surgical scene (100), the method comprising: receiving, an acoustic signal from an acoustic capture device (10) arranged to capture acoustic signals originating from an area (150) of interest of the surgical scene (100); detecting one or more surgical events by identifying acoustic patterns associated with one or more surgical procedures in the acoustic signal emitted by a surgical tool and/or by an interaction of a surgical tool (80) with one or more entities in the surgical scene (100), such as patient anatomies and/or objects; determining respective locations of the one or more surgical events relative to a spatial model of the surgical scene (100) by processing the acoustic signal; and generating a digital representation of the surgical scene (100) comprising digital representations of the one or more surgical events and the respective locations of the one or more surgical events.

2. The method according to claim 1 , wherein generating the digital representation of the surgical scene (100) comprises generating a spatio-temporal digital representation of the surgical scene (100), encoding the one or more surgical events in relation to the spatial model of the surgical scene (100).

3. The method according to claim 2, wherein generating the digital representation of the surgical scene (100) further comprises encoding, within the spatio-temporal digital representation: semantic data associated with the surgical tool and/or the one or more entities in the surgical scene (100) and/or data representative of interactions between the surgical tool (80) with the one or more entities in the surgical scene (100).

4. The method according to one of the claims 1 to 3, further comprising: retrieving a prescribed surgical workflow comprising data indicative of one or more prescribed workflow steps, and identifying a current workflow step amongst the one or more prescribed workflow steps corresponding to one or more of the detected surgical events, the method further comprising one or more of: determining a current state within the prescribed surgical workflow; detecting discrepancies between the prescribed workflow steps of the prescribed surgical workflow and the one or more surgical events; generating one or more control commands for controlling a surgical system (200); generating user guidance for surgical personnel based on the prescribed surgical workflow and the digital representation of the surgical scene (100).

5. The method according to one of the claims 1 to 4, further comprising: determining one or more event parameters corresponding to the one or more surgical events by processing of the acoustic signal; representing the one or more event parameters in the digital representation of the surgical scene (100); retrieving one or more prescribed parameters corresponding to a surgical event; and detecting deviations of the one or more event parameters from the one or more prescribed parameters using the digital representation of the surgical scene (100).

6. The method according to one of the claims 1 to 5, further comprising: identifying a surgical tool (80) corresponding to one or more of the surgical events by comparing the acoustic signal with a collection of one or more acoustic fingerprints associated with one or more types of surgical tools (80); determining the location of the surgical tool (80) in the surgical scene (100) by processing the acoustic signal.

7. The method according to claim 6, further comprising: retrieving a tool-specific surgical workflow corresponding to the identified surgical tool (80) identified, the tool-specific surgical workflow comprising data indicative of one or more prescribed tool-specific workflow steps; detecting discrepancies between the prescribed tool-specific workflow steps of the tool-specific surgical workflow and the one or more surgical events.

8. The method according to claim 6 or 7, further comprising: retrieving one or more prescribed locations associated with the identified surgical tool (80); and detecting erroneous placement of the surgical tool (80) by detecting a deviation of the location of the surgical tool (80) from the prescribed locations.

9. The method according to one of the claims 6 to 8, further comprising: retrieving prescribed operating parameters associated with the identified surgical tool (80); detecting deviations of an operation of the identified surgical tool (80) from the prescribed operating parameters by processing the acoustic signal.

10. The method according to one of the claims 1 to 9, further comprising: receiving a visual signal of the surgical scene (100) from a visual capture device (30) directed to the area (150) of interest of the surgical scene (100); detecting the one or more surgical events further based on processing the visual signal and/or detecting the location of the surgical tool and/or the location of the interaction of the surgical tool (80) with one or more entities further based on processing of the visual signal.

11 . The method according to claim 10, further comprising: detecting presence respectively absence of one or more of the surgical events in the visual signal; and selecting the visual signal and/or the acoustic signal for detecting the location of the surgical tool and/or the location of the interaction of the surgical tool (80) with the one or more entities in dependence of the presence respectively absence of one or more of the surgical events from the visual signal.

12. The method according to one of the claims 1 to 11 , wherein: receiving the acoustic signal from the acoustic capture device (10) comprises receiving a plurality of acoustic streams from a plurality of microphones directed to the area (150) of interest of the surgical scene (100) from different directions; and detecting the location of the surgical tool and/or the location of the interaction of the surgical tool (80) with the one or more entities comprises triangulating the location using the plurality of acoustic streams and positional data of the plurality of microphones.

13. The method according to one of the claims 1 to 12, wherein detecting one or more surgical events further comprises detecting verbal expressions of medical staff by applying a speech recognition algorithm, in particular a natural language processing algorithm, on the acoustic signal.

14. The method according to any one of claims 1 to 13, further comprising: generating simulated acoustic signals associated with a surgical tool type and/or a known interaction of a surgical tool type with one or more entities; generating synthetic training data by labelling the simulated acoustic signals with data indicative of the known surgical tool type or the known interaction of a surgical tool type with the one or more entities; training a machine learning-based model for detecting and/or localizing of a surgical tool and/or of an interaction of a surgical tool (80) with one or more entities using the synthetic training data; inputting the acoustic signal to an input of the machine learning-based model and detecting and/or localizing of the surgical tool and/or of the interaction of the surgical tool (80) with the one or more entities at an output of the machine learning-based model.

15. The method according to any one of claims 1 to 14, further comprising training the machine learning-based model for filtering out noise, considering sound reflections and/or occlusions in detecting and/or localizing of the surgical tool and/or of the interaction of the surgical tool (80) with the one or more entities.

16. A computer program product comprising instructions which, when executed by a computing device (20), cause the computing device (20) to carry out the method according to one of the claims 1 to 15.

17. A system (1) for detection and localization of surgical events in a surgical scene (100), comprising: an acoustic capture device (10) arranged to capture acoustic signals originating from an area (150) of interest of the surgical scene (100) and configured to capture acoustic patterns emitted by a surgical tool and/or by an interaction of a surgical tool (80) with one or more entities associated with one or more surgical events; - a computing device (20) communicatively connected to the acoustic capture device

(10), the computing device (20) being configured to carry out the method according to one of the claims 1 to 15.