CROSS-REFERENCE TO RELATED APPLICATIONSThe present application claims priority from PCT/CN2010/076555 (title: “Video Analytics for Security Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076569 (title: “Video Classification Systems and Methods”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, from PCT/CN2010/076564 (title: “Rho-Domain Metrics”) which was filed in the Chinese Receiving Office on Sep. 2, 2010, and from PCT/CN2010/076567 (title: “Systems And Methods for Video Content Analysis) which was filed in the Chinese Receiving Office on Sep. 2, 2010, each of these applications being hereby incorporated herein by reference. The present Application is also related to concurrently filed U.S. Patent non-provisional applications entitled “Video Classification Systems and Methods” (attorney docket no. 043497-0393274), “Rho-Domain Metrics” (attorney docket no. 043497-0393276) and “Systems And Methods for Video Content Analysis” (attorney docket no. 043497-0393278), which are expressly incorporated by reference herein.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a block schematic illustrating a simplified example of a video security surveillance analytics architecture according to certain aspects of the invention.
FIG. 2 is a block schematic depicting an example of a video analytics engine according to certain aspects of the invention.
FIG. 3 depicts an example of H.264 standards-defined bitstream syntax.
FIG. 4A is an image that includes both foreground and background objects.
FIG. 4B is the image of4A from which foreground objects have been extracted using techniques according to certain aspects of the invention.
FIGS. 5A and 5B are images illustrating virtual line counting according to certain aspects of the invention.
FIG. 6 is a simplified block schematic illustrating a processing system employed in certain embodiments of the invention.
DETAILED DESCRIPTIONEmbodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the disclosed embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the disclosed embodiments. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, certain embodiments of the present invention encompass present and future known equivalents to the components referred to herein by way of illustration.
Certain embodiments of the invention comprise systems having an architecture that is operable to perform video analytics for security applications. Video analytics may also be referred to as video content analysis. In a video security surveillance analytics architecture where the server encodes captured video images, certain embodiments provide greatly improved video analytics efficiency for client side processing applications and systems. By improving and/or optimizing client side video analytics efficiency, client-side performance can be greatly improved, consequently enabling processing of an increased number of video channels. Moreover, video analytics metadata (“VAMD”) created on the server side according to certain aspects of the invention can enable high accuracy video analytics on the server side and for the video security surveillance system as a whole. According to certain aspects of the invention, the advantages of a layered video analytics system architecture can include facilitating and/or enabling a balanced partition of video analytics at multiple layers. These layers may include server and client layers, pixel domain layers and motion domain layers. For example, global analytics defined to include information related to background frame, segmented object descriptors and camera parameters can enable cost efficient yet complex video analytics in the receiver side for many advanced video intelligent application and can enable an otherwise difficult or impossible level of video analytics efficiency in terms of computational complexity and analytic accuracy.
A simplified example of a video security surveillance analytics architecture is shown inFIG. 1. In the example, the system is partitioned intoserver side10 andclient side12 elements. The terms server and client are used here to include hardware and software systems, apparatus and other components that perform types of functions that can be attributed toserver side10 andclient side12 operations. It will be appreciated that certain elements may be provided on either or bothserver side10 andclient side12, and that at least some client and server functionality may be committed to hardware components such as application specific integrated circuits, sequencers, custom logic devices as needed, typically to improve one or more of efficiency, reliability, processing speed and security.Server side10 components may be embodied in a security surveillance or other camera.
Onserver side10, avideo sensor100 can be configured to capture information representative a sequence of images, including video data, and passes the information to avideo encoder module102 adapted for use in embodiments of the invention. One example of suchvideo encoder module102 is the TW5864 from Intersil Techwell Inc., which can be adapted and/or configured to generate VAMD103 related tovideo bitstream105. In certain embodiments,video encoder102 can be configured to generate one or morecompressed video bitstream105 that complies with industry standards and/or that is generated according to a proprietary specification. Thevideo encoder102 is typically configurable to produce VAMD103 that can comprise pixel domain video analytics information, such as information obtained directly from an analog-to-digital (“ND”) front end (e.g. at the video sensor100) and/or from anencoding engine102 as theencoding engine102 is performing video compression to obtainvideo bitstream103. VAMD103 may comprise block base video analytics information including, for example, macroblock (“MB”) level information such as motion vector, MB-type and/or number of non-zero coefficients, etc. A MB typically comprises a 16×16 pixel block.
In certain embodiments, VAMD123 can comprise any video encoding intermediate data such as MB-type, motion vectors, non-zero coefficient (as per the H.264 standard), quantization parameter, DC or AC information, motion estimation metric sum of absolute value (“SAD”), etc. VAMD123 can also comprise useful information such as motionFlag information generated in an analog to digital front end module, such module being found, for example, in the TW5864 device referenced above. VAMD is typically processed in VAE104 to generate more advanced video intelligent information that may include, for example, motion indexing, background extraction, object segmentation, motion detection, virtual line detection, object counting, motion tracking and speed estimation.
Video analytics engine104 can be configured to receive the VAMD103 from theencoder102 and to process the VAMD103 using one or more video analytics algorithms based on application requirements.Video analytics engine104 can generate useful video analytics results, such as background model, motion alarm, virtual line detections, electronic image stabilization parameters, etc. A more detailed example of avideo analytics engine104 is shown inFIG. 2. Video analytics results can comprise video analytics messages (“VAM”) that may be categorized into a global VAM class and a local VAM class. Global VAM includes video analytics messages applicable to a group of pictures, such as background frames, foreground object segmentation descriptors, camera parameters, predefined motion alarm regions coordination and index, virtual lines, etc. Local VAM can be defined as localized VAM applied to a specific individual video frame, and can include global motion vectors of a current frame, motion alarm region alarm status of the current frame, virtual line counting results, object tracking parameters, camera moving parameters, and so on.
In certain embodiments, an encoder generatedvideo bitstream105, VAMD103 and VAM generated byvideo analytics engine104 are packed together as a layered structure into anetwork bitstream106 following a predefined packaging format. Thenetwork bitstream106 can be sent though a network to client side of the system. Thenetwork bitstream106 may be stored locally, on a server and/or on a remote storage device for future playback and/or dissemination.
FIG. 3 depicts an example of an H.264 standards-defined bitstream syntax, in which VAM and VAMD103 can be packed into a supplemental enhancement information (“SEI”) network abstraction layer package unit. Following SPS, PPS and IDR network abstraction layer units, a global video analytics (“GVA”) SEI network abstraction layer unit can be inserted intonetwork bitstream106. The GVA network abstraction layer unit may include the global video analytics messages for a corresponding group of pictures, a pointer to the first local video analytics SEI network abstraction layer location within the group of pictures, and pointer to the next GVA network abstraction layer unit, and may include an indication of the duration of frames which the GVA applicable. Following each individual frame which is associated with VAM or VAMD elements, a local video analytics (“LVA”) SEI network abstraction layer unit is inserted right after the frame's payload network abstraction layer unit. The LVA can comprise local VAM, VAMD information and a pointer to a location of the next frame which has LVA SEI network abstraction layer unit. The amount of VAMD packed into an LVA network abstraction layer unit depends on the network bandwidth condition and the complexity of user video analytics requirement. For example, if sufficient network bandwidth is available, additional VAMD can be packed. The VAMD can be used by client side video analytics systems and may simplify and/or optimize performance of certain functions. When network bandwidth is limited, less VAMD may be sent to meet the network bandwidth constraints. WhileFIG. 3 illustrates a bitstream format for H.264 standards, the principles involved may be applied in other video standards and formats.
In certain embodiments of the invention, aclient side system12 receives and decodes the network bitstream106 sent from aserver side system10. The advantages of a layered video analytics system architecture, which can include facilitating and/or enabling a balanced partition of video analytics at multiple layers, become apparent at theclient side12. Layers can include server and client layers, pixel domain layers and motion domain layers. Global video analytics messages such as background frame, segmented object descriptors and camera parameters can enable a cost efficient yet complicated video analytics in the receiver side for many advanced video intelligent applications. The VAM enables an otherwise difficult or impossible level of video analytics efficiency in term of computational complexity and analytic accuracy.
In certain embodiments of the invention, theclient side system12 separates thecompressed video bitstream125, theVAMD123 and the VAM from thenetwork bitstream106. The video bitsream can be decoded usingdecoder124 and provided withVAMD123 and associated VAM toclient application122. Client application typically employs video analytics techniques appropriate for the application at hand. For example, analytics may include background extraction, motion tracking, object detection, and other functions. Known analytics can be selected and adapted to use theVAMD103 and VAM that were derived from theencoder102 andvideo analytics engine104 at theserver side10 to obtain richer and moreaccurate results120. Adaptions of the analytics may be based on speed requirements, efficiency, and the enhanced information available through the VAM andVAMD123.
Certain advantages may be accrued from video analytics system architecture and layered video analytics information embedded in network bitstreams according to certain aspects of the invention. For example, greatly improved video analytics efficiency can be obtained on theclient side12. In one example,video analytics engine104 receives and processes encoder feedback VAMD to produce the video analytics information that may be embedded in thenetwork bitstream106. The use of embedded layered VAM provides users direct access to a video analytics message of interest, and permits use of VAM with limited or no additional processing. In one example, additional processing would be unnecessary to access the motion frame, number of object passing a virtual line, object moving speed and classification, etc. In certain embodiments, information related to object tracking may be generated using additional, albeit limited, processing related to the motion of the identified object. Information related to electronic image stabilization may be obtained by additional processing based on the global motion information provided in VAM. Accordingly, in certain embodiments,client side12 video analytics efficiency can be optimized and performance can be greatly improved, consequently enabling processing of an increased number of channels.
Certain embodiments enable operation of high-accuracy video analytics applications on theclient side12. According to certain aspects of the invention,client side12 video analytics may be performed using information generated on theserver side10. Without VAM embedded in thenetwork bitstream106, client side video analytics processing would have to rely on video reconstructed from the decodedvideo bitstream125.Decoded bitstream125 typically lacks some of the detailed information of the original video content (e.g. content provided by video sensor100), which may be discarded or lost in the video compression process. Consequently, video analytics performed solely on theclient side12 cannot generally preserve the accuracy that can be obtained if the processing was performed at theserver side10, or at theclient side12 usingVAMD123 derived from original video content on theserver side10. Loss of accuracy due to analytics processing that is limited toclient side12 can exhibit problems with geometric center of an object, object segmentation, etc. Therefore, embedded VAM can enable improved system-level accuracy.
Certain embodiments of the invention enable fast video indexing, searching and other applications. In particular, embedded, layered VAM in the network bitstream enables fast video indexing, video searching, video classification applications and other applications in the client side. For instance, motion detection information, object indexing, foreground and background partition, human detection, human behavior classification information of the VAM can simplify client-side and/or downstream tasks that include, for example, video indexing, classification and fast searching in the client. Without VAM, a client generally needs vast computational power to process the video data and to rebuild the required video analytics information for a variety of applications including the above-listed applications. It will be appreciated that not all VAM can be accurately reconstructed at theclient side12 usingvideo bitstream125 and it is possible that certain applications, such as human behavioral analysis applications, cannot even be performed if VAM created atserver side10 is not available.
Certain embodiments of the invention permit the use of more complex server/client algorithms, partitioning of computational capability and balancing of network bandwidth. In certain embodiments, the video analytics system architecture allows video analytics to be partitioned between server and client sides based on network bandwidth availability, server and client computational capability and the complexity of the video analytics. In one example, in response to low network bandwidth conditions, the system can embed more condensed VAM in thenetwork bitstream106 after processing by theVAE104. The VAM can include motion frame index, object index, and so on. After extracting the VAM from the bitstream, theclient side12 system can utilize the VAM to assist further video analytics processing.More VAMD103 can be directly embedded into thenetwork bitstream106 and processing by theVAE104 can be limited or halted when computational power is limited on theserver side10. Computational power on theserver side10 may be limited when, for example, theserver side10 system is embodied in a camera, a digital video recorder (“DVR”) or network video recorder (“NVR”). Certain embodiments may useclient side12 systems to process embeddedVAMD123 in order to accomplish the desired video analytics function system. In some embodiments, more video analytics functions can be partitioned and/or assigned toserver side10 when, for example, the client side is required to monitor and/or process multiple channels simultaneously. It will be appreciated, therefore, that a balanced video analytics system can be achieved for a variety of system configurations.
EXAMPLESWith reference toFIG. 2, certain embodiments provide electronic image stabilization (“EIS”)capabilities220.EIS220 finds wide application that can be used in video security applications. A current captured video frame is processed with reference to the previous reconstructed reference frame or frames and generates aglobal motion vector202 for the current frame, utilizing the global motion vector to compensate the reconstructed image in the client side to reduce or eliminate image instability or shaking.
In a conventional pixel domain EIS algorithm, the current and previous reference frames are fetched, a block based or grey-level histogram based matching algorithm is applied to obtain local motion vectors, and the local motion vectors are processed to generate a pixel domain global motion vector. The drawbacks of the conventional approach include the high computational cost associated with the matching algorithm used to generate local motion vectors and the very high memory bandwidth required to fetch both current reconstructed frame and previous reference frames.
In certain embodiments of the invention, thevideo encoding engine102 can generateVAMD103 including block-based motion vectors, MB-type, etc., as a byproduct of video compression processing.VAMD103 is fed intoVAE104, which can be configured to process theVAMD103 information in order to generateglobal motion vector202 as a VAM. The VAM is then embedded into thenetwork bitstream106 to transmit to theclient side12, typically over a network. Aclient side12 processor can parse thenetwork bitstream106, extract the global motion information for each frame and apply global motion compensation to accomplishEIS220.
Video Background ModelingCertain embodiments of the invention comprise a video background modeling feature that can construct or reconstruct abackground image222 which can provide highly desired information for use in a wide variety of video surveillance applications, including motion detection, object segmentation, abundant object detection, etc. Conventional pixel domain background extraction algorithms operate on a statistical model of multiple frame co-located pixel values. For example, a Gauss model is used to model N continuous frames' co-located pixels and to select the mathematical most likely pixel value as the background pixel. If a video frame's height is denoted as H, width as W and continuous N frames to satisfy the statistical model requirement, then total W*H*N pixels are needed to process to generate a background frame.
In certain embodiments, MB-basedVAMD103 is used to generate the background information rather than pixel-based background information. According to certain aspects of the invention, the volume of information generated fromVAMD103 is typically only 1/256 of the volume of pixel-based information. In one example, MB based motion vector and non-zero-count information can be used to detect background from foreground moving object.FIG. 4A shows an original image with background and foreground objects, andFIG. 4B shows a typical background extracted by processing VAMD.
Certain embodiments of the invention provide systems and methods formotion detection200 and virtual line counting201. Amotion detector200 can be used to automatically detect motion of objects including humans, animals and/or vehicles entering predefined regions of interest. Virtual line detection andcounting module201 can detect a moving object that crosses an invisible line defined by user configuration and that can count a number of objects crossing the line as illustrated inFIGS. 5A and 5B. The virtual line can be based on actual lines in the image and can be a delineation of an area defined by a polygon, circle, ellipse or irregular area. In some embodiments, the number of objects crossing one or more lines can be recorded as an absolute number and/or as a statistical frequency and an alarm may be generated to indicate any line crossing, a threshold frequency or absolute number of crossings and/or an absence of crossings within a predetermined time. In certain embodiments,motion detection200 and virtual line and counting201 can be achieved by processing one or more MB-based VAMDs. Information such as motion alarm and object count across virtual line can be packed as VAM is transmitting to theclient side12. Motion indexing, object counting or similar customized applications can be easily archived by extracting the VAM with simple processing. It will be appreciated that configuration information may be provided from client side to server side as a form of feedback, using packed information as a basis for resetting lines, areas of interest and so on.
Certain embodiments of the invention provide improved object tracking within a sequence of videoframes using VAMD103. Certain embodiments can facilitate client side measurement of speed of motion of objects and can assist in identifying directions of movement. Furthermore,VAMD103 can provide useful information related tovideo mosaics221, including motion indexing and object counting.
System DescriptionTurning now toFIG. 6, certain embodiments of the invention employ a processing system that includes at least onecomputing system60 deployed to perform certain of the steps described above.Computing system60 may be a commercially available system that executes commercially available operating systems such as Microsoft Windows®, UNIX or a variant thereof, Linux, a real time operating system and or a proprietary operating system. The architecture of the computing system may be adapted, configured and/or designed for integration in the processing system, for embedding in one or more of an image capture system, communications device and/or graphics processing systems. In one example,computing system60 comprises abus602 and/or other mechanisms for communicating between processors, whether those processors are integral to the computing system60 (e.g.604,605) or located in different, perhaps physically separatedcomputing systems60. Typically,processor604 and/or605 comprises a CISC or RISC computing processor and/or one or more digital signal processors. In some embodiments,processor604 and/or605 may be embodied in a custom device and/or may perform as a configurable sequencer.Device drivers603 may provide output signals used to control internal and external components and to communicate betweenprocessors604 and605.
Computing system60 also typically comprisesmemory606 that may include one or more of random access memory (“RAM”), static memory, cache, flash memory and any other suitable type of storage device that can be coupled tobus602.Memory606 can be used for storing instructions and data that can cause one or more ofprocessors604 and605 to perform a desired process.Main memory606 may be used for storing transient and/or temporary data such as variables and intermediate information generated and/or used during execution of the instructions byprocessor604 or605.Computing system60 also typically comprises non-volatile storage such as read only memory (“ROM”)608, flash memory, memory cards or the like; non-volatile storage may be connected to thebus602, but may equally be connected using a high-speed universal serial bus (USB), Firewire or other such bus that is coupled tobus602. Non-volatile storage can be used for storing configuration, and other information, including instructions executed byprocessors604 and/or605. Non-volatile storage may also includemass storage device610, such as a magnetic disk, optical disk, flash disk that may be directly or indirectly coupled tobus602 and used for storing instructions to be executed byprocessors604 and/or605, as well as other information.
In some embodiments,computing system60 may be communicatively coupled to adisplay system612, such as an LCD flat panel display, including touch panel displays, electroluminescent display, plasma display, cathode ray tube or other display device that can be configured and adapted to receive and display information to a user ofcomputing system60. Typically,device drivers603 can include a display driver, graphics adapter and/or other modules that maintain a digital representation of a display and convert the digital representation to a signal for driving adisplay system612.Display system612 may also include logic and software to generate a display from a signal provided by system600. In that regard,display612 may be provided as a remote terminal or in a session on adifferent computing system60. Aninput device614 is generally provided locally or through a remote system and typically provides for alphanumeric input as well ascursor control616 input, such as a mouse, a trackball, etc. It will be appreciated that input and output can be provided to a wireless device such as a PDA, a tablet computer or other system suitable equipped to display the images and provide user input.
In certain embodiments,computing system60 may be embedded in a system that captures and/or processes images, including video images. In one example, computing system may include a video processor oraccelerator617, which may have its own processor, non-transitory storage and input/output interfaces. In another example, video processor oraccelerator617 may be implemented as a combination of hardware and software operated by the one ormore processors604,605. In another example,computing system60 functions as a video encoder, although other functions may be performed by computingsystem60. In particular, a video encoder that comprisescomputing system60 may be embedded in another device such as a camera, a communications device, a mixing panel, a monitor, a computer peripheral, and so on.
According to one embodiment of the invention, portions of the described invention may be performed by computingsystem60.Processor604 executes one or more sequences of instructions. For example, such instructions may be stored inmain memory606, having been received from a computer-readable medium such asstorage device610. Execution of the sequences of instructions contained inmain memory606 causesprocessor604 to perform process steps according to certain aspects of the invention. In certain embodiments, functionality may be provided by embedded computing systems that perform specific functions wherein the embedded systems employ a customized combination of hardware and software to perform a set of predefined tasks. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
The term “computer-readable medium” is used to define any medium that can store and provide instructions and other data toprocessor604 and/or605, particularly where the instructions are to be executed byprocessor604 and/or605 and/or other peripheral of the processing system. Such medium can include non-volatile storage, volatile storage and transmission media. Non-volatile storage may be embodied on media such as optical or magnetic disks, including DVD, CD-ROM and BluRay. Storage may be provided locally and in physical proximity toprocessors604 and605 or remotely, typically by use of network connection. Non-volatile storage may be removable fromcomputing system604, as in the example of BluRay, DVD or CD storage or memory cards or sticks that can be easily connected or disconnected from a computer using a standard interface, including USB, etc. Thus, computer-readable media can include floppy disks, flexible disks, hard disks, magnetic tape, any other magnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
Transmission media can be used to connect elements of the processing system and/or components ofcomputing system60. Such media can include twisted pair wiring, coaxial cables, copper wire and fiber optics. Transmission media can also include wireless media such as radio, acoustic and light waves. In particular radio frequency (RF), fiber optic and infrared (IR) data communications may be used.
Various forms of computer readable media may participate in providing instructions and data for execution byprocessor604 and/or605. For example, the instructions may initially be retrieved from a magnetic disk of a remote computer and transmitted over a network or modem tocomputing system60. The instructions may optionally be stored in a different storage or a different part of storage prior to or during execution.
Computing system60 may include acommunication interface618 that provides two-way data communication over anetwork620 that can include alocal network622, a wide area network or some combination of the two. For example, an integrated services digital network (ISDN) may used in combination with a local area network (LAN). In another example, a LAN may include a wireless link. Network link620 typically provides data communication through one or more networks to other data devices. For example,network link620 may provide a connection throughlocal network622 to ahost computer624 or to a wide are network such as theInternet628.Local network622 andInternet628 may both use electrical, electromagnetic or optical signals that carry digital data streams.
Computing system60 can use one or more networks to send messages and data, including program code and other information. In the Internet example, aserver630 might transmit a requested code for an application program throughInternet628 and may receive in response a downloaded application that provides or augments functional modules such as those described in the examples above. The received code may be executed byprocessor604 and/or605.
Additional Descriptions of Certain Aspects of the Invention
The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.
Certain embodiments of the invention provide video processing systems and methods. Some of these embodiments comprise a processor configured to receive video frames representative of a sequence of images captured by a video sensor. Some of these embodiments comprise a video encoder operative to encode the video frames according to a desired video encoding standard. Some of these embodiments comprise a video analytics processor that receives video analytics metadata generated by the video encoder from the sequence of images. In some of these embodiments, the video analytics processor is configurable to produce video analytics messages for transmission to a client device. In some of these embodiments, the video analytics messages are used for client side video analytics processing.
In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information. In some of these embodiments, the pixel domain video analytics information includes information received directly from an analog-to-digital front end. In some of these embodiments, the pixel domain video analytics information includes information received directly from an encoding engine as the engine is performing compression. In some of these embodiments, the video analytics messages include information related to one or more of a background model, a motion alarm, a virtual line detection and electronic image stabilization parameters. In some of these embodiments, the video analytics messages comprise video analytics messages related to a group of images, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region.
In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter. In some of these embodiments, the video analytics messages are transmitted to the client device in a layered structure network bitstream comprising encoder generated video bitstream, a portion of the video analytics metadata. In some of these embodiments, the video analytics messages and the portion of the video analytics metadata are transmitted in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream.
Certain embodiments of the invention provide video decoding systems and methods. Some of these embodiments comprise a decoder configured to extract a video frame and one or more video analytics messages from a network bitstream. In some of these embodiments, the video analytics messages provide information related to characteristics of the video frame. Some of these embodiments comprise one or more video processors configured to produce video analytics metadata related to the video frame based on content of the video frame and the video analytics messages.
In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an analog-to-digital front end. In some of these embodiments, the video analytics metadata comprise pixel domain video analytics information received directly from an encoding engine as the engine was performing compression. In some of these embodiments, the video analytics messages comprise video analytics messages related to a plurality of video frames, including messages related to one or more of a background frame, a foreground object segmentation descriptor, a camera parameter, a virtual line and a predefined motion alarm region. In some of these embodiments, the video analytics messages comprise video analytics messages related to an individual video frame, including messages related to one or more of a global motion vector, a motion alarm region alarm status, a virtual line count, an object tracking parameter and a camera motion parameter.
In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream. In some of these embodiments, the video analytics messages are received in a supplemental enhancement information network abstraction layer package unit of an H.264 bitstream and together with a portion of the pixel domain video analytics information. In some of these embodiments, the one or more video processors configured to produce a global motion vector. In some of these embodiments, the one or more video processors provide electronic image stabilization based on the video analytics messages. In some of these embodiments, the one or more video processors extract a background image for a plurality of video frames based on the video analytics messages. In some of these embodiments, the one or more video processors use the video analytics messages to monitor objects crossing a virtual line in a plurality of video frames.
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident to one of ordinary skill in the art that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.