CN113377326B

Movatterモバイル変換

Info

Publication number: CN113377326B
Application number: CN202110639239.3A
Authority: CN
Inventors: 黄永杰
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd
Priority date: 2021-06-08
Filing date: 2021-06-08
Publication date: 2023-02-03
Anticipated expiration: 2041-06-08
Also published as: CN113377326A

Abstract

The application discloses an audio data processing method, an audio data processing device, a terminal and a storage medium. The audio data processing method comprises the following steps: acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a track mark of each target track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data. According to the scheme, the user can know the current sounding object in real time in the process of playing the marked audio data, and the workload of manual marking is reduced.

Description

Audio data processing method and device, terminal and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an audio data processing method, apparatus, terminal, and storage medium.

Background

In the playing process of the audio software, the user cannot identify the current sounding object due to the absence of the picture. For example, in the playing process of some audio playing tools, if a person introduction link is missed, the user cannot perceive who the currently uttered person is. The user's solution is to rewind to re-hear the portion of the character introduction, or to remain confused to continue hearing down, to learn slowly, and not to let the user know the currently speaking person in real time and to quickly establish a global experience of the audio content. For another example, in some music software, the traditional solution is to make lyrics to represent the information of the current sound object, but the entry and correction of the lyrics are very troublesome.

Therefore, how to let the user understand the audio content more intuitively becomes a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides an audio data processing method, an audio data processing device, a terminal and a storage medium, which can enable a user to know a current sounding object in real time in the process of playing audio, and reduce the workload of manual marking.

The embodiment of the application provides an audio data processing method, which comprises the following steps: acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a sound track mark of each target sound track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting a corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data.

In an optional embodiment, before the obtaining of the track label of each target track in the target audio data, the method further includes:

each target track is labeled based on the portion of sound present in the target track.

In an alternative embodiment, said tagging each target audio track based on a portion of sound occurring in each target audio track comprises:

performing audio track analysis on the target audio data to obtain all audio tracks of the target audio data;

identifying at least one target audio track of the audio tracks containing sound;

receiving a mark setting instruction for a target audio track, wherein the mark setting instruction comprises an audio track mark of each target audio track;

and setting a corresponding audio track mark for a target audio track in the target audio data based on the mark setting instruction.

In an alternative embodiment, the target audio tracks include at least one sound-appearing part, and the labeling of each target audio track based on the sound-appearing part in each target audio track includes:

and respectively setting corresponding track marks for the parts of the target track where the sound appears.

and when the target audio data is recorded, setting a corresponding track mark for a part where sound appears in the target track.

In an optional embodiment, the obtaining target audio data, wherein the sound in the target audio data comes from at least one recording object, includes:

collecting sound output from at least one recorded object through an audio recording device, wherein different audio recording devices are accessed to different channels;

and taking the sound collected by each audio recording device as an audio data component, and synthesizing the target audio data based on the audio data component.

In an optional embodiment, the setting, based on the object information corresponding to each target audio track, a correspondence between an audio track marker of each target audio track and object information of a recording object corresponding to each target audio track in target audio data to obtain marked audio data includes:

determining a sound time period in each target audio track;

and setting a corresponding relation of a track mark, a sound time period and object information of each target track in the target audio data based on the object information corresponding to each target track to obtain marked audio data, wherein the corresponding relation is used for displaying the object information corresponding to the target track in the sound time period of the target track when the target audio data is played.

The embodiment of the present application further provides an audio marker display method, including: acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object; identifying a track label for a target track contained in the labeled audio data; and selecting object information corresponding to the track mark of the target track from the corresponding relation based on the track mark of the target track, and displaying the object information.

In an optional embodiment, the selecting, from the correspondence, object information corresponding to the track label of the target track to be displayed based on the track label of the target track includes:

determining a first target audio track of currently playing sound in the target audio tracks;

acquiring target object information of a recording object of the first target audio track based on the audio track mark of the first target audio track and the corresponding relation;

displaying the target object information until the first target track stops playing sound.

acquiring object information of a recording object of the target audio track based on the audio track mark of the target audio track and the corresponding relation;

acquiring sound time periods in all target audio tracks based on the marked audio data, and determining sound playing time periods of all recording objects in the marked audio data based on the sound time periods;

displaying the marked playing progress bar of the audio data in the playing page, and determining the position of the sound playing time period of each recorded object in the playing progress bar;

and displaying the object information of the corresponding recording object at each position in the playing page.

In an optional embodiment, the correspondence includes: the corresponding relation of the track mark, the sound time period and the object information of each target track;

the obtaining of the sound time period in each target audio track based on the marked audio data comprises:

and acquiring the sound time period of each target audio track from the corresponding relation of the marked audio data.

In an optional embodiment, the selecting, based on the track label of the target track, object information corresponding to the track label of the target track from the correspondence to display includes:

displaying the marked audio data playing progress bar in a playing page, and determining the audio track mark of the target audio track corresponding to each adjusting moment of the playing progress bar;

and when the playing progress bar is adjusted, displaying object information corresponding to the audio track mark corresponding to the current adjusting time of the playing progress bar.

determining the object information of all recorded objects as target object information, and displaying the target object information in a playing page;

and identifying a second target audio track of the current playing sound, and highlighting target object information corresponding to the second target audio track in the playing page.

In an optional embodiment, after the selecting, based on the track label of the target track, object information corresponding to the track label of the target track from the correspondence to display, the method includes:

responding to a touch operation on object information corresponding to the track mark of the target track, and determining a recording object corresponding to the target object information operated by the touch operation;

and acquiring the object description information of the recording object, and displaying the object description information.

An embodiment of the present application further provides an audio data processing apparatus, including:

a first acquisition unit, configured to acquire target audio data, where sound in the target audio data is from at least one recording object;

a first determination unit configured to determine at least one target track containing sound in the target audio data;

a second acquisition unit configured to acquire a track label of each target track in the target audio data;

the second determining unit is used for determining the recording object corresponding to each target audio track and acquiring the object information of the recording object;

and the marking unit is used for setting the corresponding relation between the track mark of each target track and the object information of the recording object corresponding to each target track in the target audio data to obtain the marked audio data.

The embodiment of the present application further provides an audio marker display device, including:

the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring marked audio data, sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object;

an identification unit configured to identify a track tag of a target track included in the tagged audio data;

and the display unit is used for selecting object information corresponding to the track mark of the target track from the corresponding relation to display on the basis of the track mark of the target track.

The embodiment of the present application further provides a terminal, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the audio data processing method or the audio mark display method when executing the computer program.

Embodiments of the present application further provide a storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the audio data processing method or the audio marker display method as described above.

The embodiment of the application provides an audio data processing method, an audio data processing device, a terminal and a storage medium, wherein target audio data can be acquired by the method, and sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a track mark of each target track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting a corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data. Therefore, the method and the device can enable the user to know the current sounding object in real time in the process of playing the marked audio data, and reduce the workload of manual marking.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a scene of an audio data processing method provided in an embodiment of the present application;

fig. 2 is a flowchart of an audio marker display method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a display of target object information according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of another example of displaying target object information provided in an embodiment of the present application;

FIG. 5 is a diagram illustrating another example of displaying target object information provided by an embodiment of the present application;

FIG. 6 is a schematic view of another flow chart of audio data from processing to playing provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of an audio data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an audio marker display device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an audio data processing method, an audio data processing device, a terminal and a storage medium. In particular, the present embodiment provides an audio data processing method applicable to an audio data processing apparatus that can be integrated in a computer device.

The computer device may be a terminal or other device, such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, or other device. The computer device may also be a device such as a server, and the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, but is not limited thereto.

In an embodiment of the present application, an audio data processing method includes: acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a track mark of each target track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

Embodiments of the present application will be described from the perspective of an audio data processing apparatus, which may be specifically integrated in a computer device.

An embodiment of the present application provides an audio data processing method, and as shown in fig. 1, a flow of the audio data processing method may be as follows:

101. target audio data is obtained, and sound in the target audio data comes from at least one recording object.

In this embodiment of the application, the terminal may be a terminal device with a recording function, and after the terminal starts the recording function, the terminal collects sound of at least one recording object as target audio data of the application, where the sound of the recording object may be a human voice, or music played by the recording object, and the like.

The sound recording device may record sound through an audio recording device, such as a microphone, the microphone collects sound of at least one recording object, and due to different channels connected to different microphones, sound collected by each microphone may be used as an audio data component, and at least one audio data component is synthesized into the target audio data of the present application. The audio recording device may be a MIDI (musical instrument digital interface) or the like.

Optionally, the recorded audio file may also be directly obtained without using an audio recording device for acquisition, and sound in the audio file comes from at least one recorded object and is used as target audio data of the application.

102. At least one target audio track containing sound in the target audio data is determined.

In the embodiment of the present application, at least one target track containing sound is included in the target audio data, and the target track containing sound is determined in the target audio data. For example, a microphone tracks the sound and outputs target audio data with different target audio tracks, each of which corresponds to a different target audio track.

103. And acquiring the track mark of each target track in the target audio data.

In the embodiment of the application, after at least one target audio track containing sound in target audio data is determined, the terminal acquires an audio track mark of each target audio track.

Optionally, the track label of the target track is already automatically taken when the recording object records, and the track label of each target track in the target audio data may be directly obtained.

Optionally, after acquiring the target audio data, the terminal marks each target audio track based on a part of the target audio track where sound appears, so as to acquire an audio track mark of each target audio track. All audio tracks in the target audio data are obtained by performing audio track analysis on the target audio data. After the terminal identifies at least one target audio track containing sound in the audio track, the terminal needs to mark each target audio frequency, receives a mark setting instruction of the target audio track, the mark setting instruction contains the audio track mark of each target audio track, and based on the mark setting instruction, the terminal sets a corresponding audio track mark for each target audio track in the obtained target audio data.

Wherein, a target audio track comprises at least one part where sound appears, and corresponding audio track marks are respectively set for the parts where sound appears in the target audio track. For example, a track may have one or more portions that include sound and one or more portions that are relatively silent (which may be distinguished by setting a decibel threshold and/or a duration threshold), wherein the one or more portions that include sound are marked, i.e., a target track may include one or more track marks. For example, in a target audio track corresponding to a recording object, there are three parts including sound, wherein two parts are the voices of the recording object and correspond to one audio track label, and the other part is the music played by the recording object and corresponds to another audio track label.

The setting of the track mark can be set to T1, T2, T3, etc., or can be set by a user, the setting form of the track mark is not limited in the present application, and each target track is set with a different track mark. If a target track includes multiple track tags, the set track tags are different.

104. And determining a recording object corresponding to each target audio track, and acquiring object information of the recording object.

In the embodiment of the present application, after completing the track marking for each target track, it is necessary to determine a recording object corresponding to each target track, and obtain object information of the recording object. For example, if recording is performed by using microphones, each microphone corresponds to a recording object, and different microphones are connected to different channels, that is, each microphone corresponds to an audio track, and the recording object corresponding to each target audio track can be determined by using different microphones. The object information of the recording object may be an avatar, a nickname, and the like of the recording object.

105. And setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data.

In the embodiment of the application, after the terminal sets the track marks of the target tracks and acquires the object information of the recording object, the corresponding relation between the track marks of the target tracks and the object information is set in the target audio data based on the object information corresponding to the target tracks. In the area of the terminal for storing the target audio data, a module is arranged for storing the corresponding relation between the audio track mark and the object information, and the target audio data and the corresponding relation module are output to obtain the marked audio data. And when the target audio data is played, acquiring the corresponding relation from the corresponding relation module, wherein the corresponding relation is used for displaying the object information corresponding to the target audio track with sound when the target audio data is played.

Alternatively, after the target tracks of the target audio data are determined, the sound time periods in the respective target tracks are determined. And setting the corresponding relation among the audio track mark, the sound time period and the object information of each target audio track in the target audio data based on the object information corresponding to each target audio track to obtain the marked audio data. The corresponding relation is used for displaying object information corresponding to the target audio track in the sound time period of the target audio track when the target audio data is played. Here, since sounds are not necessarily all present in the entire target track, and there may be a time period in which no sound is present, the sound time period refers to a time period in which sound is present in the target track.

It can be understood that, in the embodiment of the present application, the total length corresponding to each target audio track is the same, and at the same playing time, there may be one or more than one target audio track in which sound appears. For example, when audio data is played, if only one target audio track is recognized to be voiced, corresponding object information is displayed; if more than one target audio track occurrence sound is identified, more than one corresponding object information is displayed, wherein the condition that more than one target audio track occurrence sound is identified can be that: when recording audio data, the recording object performs chorus; or more than one recorded object can speak or play music at the same time.

After the marked audio data is obtained, an audio mark display method is further provided in the embodiment of the application. In particular, the present embodiment provides an audio marker display method suitable for an audio marker display apparatus that may be integrated in a computer device. The computer device may be a terminal or other device, such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, or other device.

In an embodiment of the present application, an audio marker display method includes: acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object; identifying a track label for a target track contained in the labeled audio data; and selecting object information corresponding to the track mark of the target track from the corresponding relation based on the track mark of the target track, and displaying the object information.

The following are detailed descriptions. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

Embodiments of the present application will be described in the context of an audio marker display device, which may be particularly integrated in a computer device.

An embodiment of the present application provides an audio marker display method, and as shown in fig. 2, a flow of the audio marker display method may be as follows:

201. acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object.

The terminal provided by the embodiment of the application has the function of playing the audio, and generates a playing page, such as a player, in the process of playing the audio. Firstly, the marked audio data is obtained, and the marked audio data obtained by the audio data processing method can be uploaded to a background of a terminal (such as a player). The sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object.

202. A track marker for a target track contained in the marked audio data is identified.

In the embodiment of the application, the terminal receives a play starting instruction of a user, responds to a touch operation of the user on a play control on a play page, and starts to play the marked audio data, and in the process of playing the marked audio data, the terminal firstly identifies the audio track mark of a target audio track contained in the marked audio data.

203. And selecting object information corresponding to the track mark of the target track from the corresponding relation based on the track mark of the target track, and displaying the object information.

In the embodiment of the application, after the terminal identifies the track mark of the target track, the object information corresponding to the track mark of the target track is selected and displayed in the playing page of the terminal. The target object information can be selected from the object information corresponding to the audio track mark for display, and the target object information includes the object information corresponding to the target audio track of the currently played sound. It can be understood that, in the process of playing the marked audio data, the object information corresponding to all target audio tracks may be displayed in the playing page, or only the object information corresponding to the target audio track of the currently played sound may be displayed.

The method comprises the steps that in the process of playing marked audio data, a terminal determines a first target audio track corresponding to currently played sound in a target audio track; acquiring target object information of a recording object corresponding to a first target audio track based on an audio track mark of the first target audio track and a corresponding relation between the audio track mark and the object information; and displaying the target object information on a playing page until the first target audio track stops playing the sound, wherein the position for displaying the target object information is not limited by the application. In the process of playing the marked audio data, the terminal generates a playing progress, the playing progress can be displayed in the playing page in a playing progress bar mode, and the playing progress can also be directly stored in the background and does not need to be displayed in the playing page. It is understood that the playing progress is generally displayed in the playing page in the form of a playing progress bar, but may be displayed in other forms in the playing page, and the present application is not limited thereto.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating target object information according to an embodiment of the present disclosure. As shown in fig. 3, for example, in the process of playing the marked audio data by the terminal, aplay progress bar 302 is displayed on aplay page 301 of the terminal. The terminal starts to play the selected audio data in response to the touch operation of the user on the play start control in theplay control 305, wherein the shaded area of theplay progress bar 302 is the part of the played audio data represented in theplay progress bar 302, and theobject information 304 of the corresponding recording object a is displayed at theposition 303 of the play progress bar corresponding to the current playing voice. The object information of the recording object may not be displayed at the position of the progress bar corresponding to the currently played voice, but may be displayed at any position on theplay page 301, which is not limited in the present application. In fig. 3, aplay control 305 is arranged in the middle of the bottom of aplay page 301, where theplay control 305 includes a play start control and a switch control, and through a touch operation of a user on the play start control, selected audio data can be played or paused, and through a touch operation of the user on the switch control, the audio data to be played currently can be switched back and forth; the touch operation may include clicking, sliding, and the like, and the position and the form of theplay control 305 in fig. 3 are only examples, and may also be displayed in other positions or in other forms in the play page, which is not limited in this application.

And acquiring object information of a recording object of the target audio track based on the audio track mark of the target audio track and the corresponding relation between the audio track mark and the object information in the process of playing the marked audio data by the terminal. The terminal acquires the sound time periods in the target audio tracks based on the marked audio data, and determines the sound playing time periods of the recording objects in the marked audio data based on the sound time periods, namely, the sound playing time periods are identified in real time on the terminal side so as to determine the sound time periods in the target audio tracks. Then, displaying the marked playing progress bar of the audio data in the playing page, determining the position of the voice playing time period of each recording object in the playing progress bar, and displaying the object information of the corresponding recording object in each corresponding position in the playing page. As shown in fig. 4, fig. 4 is a schematic diagram of another display target object information provided in the embodiment of the present application. Referring to fig. 4, in the process of playing the marked audio, aplay page 301 is displayed, and object information of all recorded objects is displayed in a display area on aplay progress bar 302; the audio frequency after the mark of the playing is set has two recorded objects B and C, theposition 401 of the voice playing time period of the recorded object B in the playing progress bar and theposition 402 of the voice playing time period of the recorded object C in the playing progress bar are respectively determined, theobject information 403 of the recorded object B is always displayed at theposition 401 of the playing progress bar, and theobject information 404 of the recorded object C is always displayed at theposition 402 of the playing progress bar. In the process of playing the marked audio data, a user can know the object information of all the recorded objects, can drag the progress bar to select the interested recorded objects according to the requirements of the user, and starts to directly play the voice time period corresponding to the recorded objects. The number of the recording objects in fig. 4 is only an example, and there is only one recording object, and the specific number of the recording objects is not limited in the present application.

When the user adjusts the progress bar, whether in playing or pausing, the object information corresponding to the audio track mark corresponding to the current adjustment time of the playing progress bar is displayed, and the object information corresponding to the audio track mark at a certain time of the playing progress bar can be previewed.

Alternatively, the object information for displaying all the recording objects in theplaying page 301 as shown in fig. 4 may include two cases: one is that after the user selects the audio to be played, in response to the touch operation of the user on theplay control 305, after the terminal starts playing the audio, the object information of the corresponding recording object is displayed at each corresponding position of the play progress bar in the play page; the other is that after responding to the operation of the user selecting the audio, the terminal has not started playing the audio, since the audio has been loaded in the background, the terminal starts recognizing to determine the sound time period in the target track, and then, object information of the corresponding recording object is displayed at each corresponding position of the playing progress bar in the playing page.

Optionally, the marked audio data includes two corresponding relationships, where the first corresponding relationship is a corresponding relationship between only the audio track mark and the object information, and the second corresponding relationship includes a corresponding relationship between the audio track mark, the sound time period, and the object information of each target audio track. And if the corresponding relation contained in the marked audio data acquired by the terminal is of the second type, the terminal acquires the sound time periods in the target audio tracks from the corresponding relation in the marked audio data. And determining the sound playing time period of each recording object in the marked audio data based on the sound time period, then displaying the playing progress bar of the marked audio data in the playing page, determining the position of the sound playing time period of each recording object in the playing progress bar, and displaying the object information of the corresponding recording object at each corresponding position in the playing page.

As shown in fig. 5, fig. 5 is another schematic diagram for displaying target object information provided in this embodiment of the application, please refer to fig. 5, taking displaying a playingprogress bar 302 in aplaying page 301 as an example, displaying corresponding object information at a corresponding position of a track mark of each target track, and performing enlarged display on the object information corresponding to a currently playing voice; the marked audio data played this time includes recording objects D, E, and F, object

information

504, 505, and 506 corresponding to the recording objects D, E, and F has been displayed in the playing page, and as can be seen from the playingprogress bar 302 at this time, the current playing progress bar is in theprogress bar area 502, the terminal identifies a second target audio track of the currently played voice, displays object information corresponding to the second target audio track, that is, identifies that the audio track corresponding to the currently played voice corresponds to the recording object E, and amplifies and displays the object information of the recording object E.

In the embodiment of the present application, since the play page has size limitation, the object information displayed on the play page may be simple information (such as a nickname, a head portrait, and the like) of the recording object. When the user wants to further know the recording object, in response to a touch operation of the user on the target object information, such as clicking, sliding and other operations, the recording object corresponding to the target object information operated by the touch operation is determined, object description information of the recording object is acquired, and the object description information is displayed, wherein the object description information is detailed introduction information of the recording object, such as works, ages, work experiences and the like of the recording object. The obtained object description information of the recording object may be displayed on a playing page, or a new page may be created for displaying the object description information of the recording object.

Referring to fig. 6, fig. 6 is a flowchart illustrating audio data processing to playing according to an embodiment of the present disclosure. As shown in fig. 6, after acquiring target audio data, the terminal performs track identification on tracks in the target audio data, identifies two target tracks containing human voice, performs track marking on the two target tracks, sets track marks T1 and T2, that is, a track mark T1 corresponding to the target track 1 and a track mark T2 corresponding to the target track 2, and outputs an audio data file with the track marks. Determining recording objects of a target audio track 1 and a target audio track 2, acquiring object information of the recording objects, setting corresponding relations between audio track marks T1 and T2 of the target audio tracks 1 and 2 and the object information in target audio data based on the object information corresponding to the target audio tracks 1 and 2, obtaining the marked audio data and outputting the marked audio data. The above process is the recording and editing stage of the audio data, the audio data after the recording stage and the editing stage is the marked audio data, and the marked audio data can be uploaded to a terminal with the audio playing function for playing. And uploading the marked audio data to a background of a playing device, identifying a corresponding target audio track when the voice occurs in the process of playing the marked audio data, and displaying object information corresponding to the target audio track in which the voice occurs on a playing page. Let the recording object of the target audio track 1 be G, in fig. 6, the object information of the recording object G is displayed on the playing page at this time, and it indicates that the current playing progress corresponds to the voice of the recording object G. Therefore, in the embodiment of the application, by the marking processing of the audio data, in the process of playing the marked audio data, a user can know the currently sounding object and the information of the currently sounding object in real time, so that the audio is visualized; and through audio track identification, manual proofreading is not needed, the display precision is improved, and the workload of manual marking is reduced.

In order to better implement the above method, correspondingly, the embodiment of the present application further provides an audio data processing apparatus, which may be specifically integrated in a terminal, for example, in the form of a client.

Referring to fig. 7, the audio data processing apparatus includes afirst acquisition unit 701, afirst determination unit 702, asecond acquisition unit 703, asecond determination unit 704, and a marking unit 705:

a first obtainingunit 701, configured to obtain target audio data, where sound in the target audio data is from at least one recording object;

a first determiningunit 702 configured to determine at least one target track containing sound in the target audio data;

a second obtainingunit 703, configured to obtain a track label of each target track in the target audio data;

a second determiningunit 704, configured to determine a recording object corresponding to each target audio track, and obtain object information of the recording object;

the markingunit 705 is configured to set, in the target audio data, a correspondence between a track mark of each target track and object information of a recording object corresponding to each target track, and obtain marked audio data.

In an optional embodiment, the second obtainingunit 703 further includes:

each target track is labeled based on the portion of sound that appears in the target track.

In an optional embodiment, the second obtainingunit 703 further includes:

carrying out audio track analysis on the target audio data to obtain all audio tracks of the target audio data;

In an optional embodiment, the target audio track includes at least one sound-appearing part, and the second obtainingunit 703 further includes:

In an optional embodiment, the second obtainingunit 703 further includes:

In an optional embodiment, the first obtainingunit 701 further includes:

In an optional embodiment, the markingunit 705 further includes:

determining a sound time period in each target audio track;

and setting a corresponding relation among the audio track mark, the sound time period and the object information of each target audio track in the target audio data based on the object information corresponding to each target audio track to obtain the marked audio data, wherein the corresponding relation is used for displaying the object information corresponding to the target audio track in the sound time period of the target audio track when the target audio data is played.

In order to better implement the method, correspondingly, the embodiment of the application also provides an audio marker display device which can be specifically integrated in the terminal.

Referring to fig. 8, the audio data processing apparatus includes anacquisition unit 801, arecognition unit 802, a display unit 803:

an obtainingunit 801, configured to obtain tagged audio data, where sound in the tagged audio data comes from at least one recording object, the tagged audio data includes a target audio track corresponding to the at least one recording object, and a correspondence relationship between an audio track tag of the target audio track and object information of the recording object is set in the tagged audio data;

an identifyingunit 802 for identifying a track label of a target track contained in the labeled audio data;

adisplay unit 803, configured to select and display object information corresponding to the track label of the target track from the correspondence relationship based on the track label of the target track.

In an optional embodiment, thedisplay unit 803 further includes:

displaying the target object information until the first target audio track stops playing sound.

In an optional embodiment, thedisplay unit 803 further includes:

In an optional embodiment, the correspondence includes: the corresponding relation of the audio track mark, the sound time section and the object information of each target audio track;

thedisplay unit 803 further includes:

In an optional embodiment, thedisplay unit 803 further includes:

and identifying a second target audio track of the currently played sound, and highlighting target object information corresponding to the second target audio track in the playing page.

In an optional embodiment, thedisplay unit 803 further includes:

and acquiring the object description information of the recording object and displaying the object description information.

Correspondingly, the embodiment of the application further provides a terminal, which may be a terminal device such as a smart phone, a tablet Computer, a notebook Computer, a touch screen, a game console, a Personal Computer (PC), a Personal Digital Assistant (PDA), and the like. As shown in fig. 9, fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 900 includes aprocessor 901 with one or more processing cores,memory 902 with one or more computer-readable storage media, and a computer program stored on thememory 902 and executable on the processor. Theprocessor 901 is electrically connected to thememory 902. Those skilled in the art will appreciate that the terminal structure shown in fig. 9 does not constitute a limitation of the terminal, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Theprocessor 901 is a control center of the terminal 900, connects various parts of theentire terminal 900 by various interfaces and lines, performs various functions of the terminal 900 and processes data by running or loading software programs and/or modules stored in thememory 902 and calling data stored in thememory 902, thereby monitoring theentire terminal 900.

In this embodiment, theprocessor 901 in the terminal 900 loads instructions corresponding to processes of one or more application programs into thememory 902 according to the following steps, and theprocessor 901 runs the application programs stored in thememory 902, thereby implementing various functions:

acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a track mark of each target track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data. Or the like, or, alternatively,

acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object; identifying a track marker for a target track contained in the marked audio data; and selecting object information corresponding to the track mark of the target track from the corresponding relation based on the track mark of the target track, and displaying the object information.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Optionally, as shown in fig. 9, the terminal 900 further includes: touch-sensitive display screen 903,radio frequency circuit 904,audio circuit 905,input unit 906 andpower 907. Theprocessor 901 is electrically connected to thetouch display 903, theradio frequency circuit 904, theaudio circuit 905, theinput unit 906 and thepower supply 907. Those skilled in the art will appreciate that the terminal structure shown in fig. 9 does not constitute a limitation of the terminal, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

Thetouch screen 903 may be used for displaying a graphical user interface and receiving operation instructions generated by a user acting on the graphical user interface. Thetouch display 903 may include a display panel and a touch panel. Among other things, the display panel may be used to display information input by or provided to the user and various graphical user interfaces of the terminal, which may be made up of graphics, text, icons, video, and any combination thereof. Alternatively, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. The touch panel may be used to collect touch operations of a user on or near the touch panel (for example, operations of the user on or near the touch panel using any suitable object or accessory such as a finger, a stylus pen, and the like), and generate corresponding operation instructions, and the operation instructions execute corresponding programs. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to theprocessor 901, and can receive and execute commands sent by theprocessor 901. The touch panel may cover the display panel, and when the touch panel detects a touch operation on or near the touch panel, the touch panel is transmitted to theprocessor 901 to determine the type of the touch event, and then theprocessor 901 provides a corresponding visual output on the display panel according to the type of the touch event. In the embodiment of the present application, a touch panel and a display panel may be integrated into thetouch display screen 903 to realize input and output functions. However, in some embodiments, the touch panel and the touch panel can be implemented as two separate components to perform the input and output functions. That is, thetouch display 903 may also be used as a part of theinput unit 906 to implement an input function. In this embodiment, thetouch display screen 903 may be used to display a playback page.

Therf circuit 904 may be configured to transmit and receive rf signals to establish wireless communication with a network device or other terminals through wireless communication, and transmit and receive signals with the network device or other terminals.

Theaudio circuit 905 may be used to provide an audio interface between the user and the terminal through a speaker, microphone. Theaudio circuit 905 can transmit the electrical signal converted from the received audio data to a speaker, and the electrical signal is converted into a sound signal by the speaker and output; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by theaudio circuit 905 and converted into audio data, and then the audio data is processed by the audiodata output processor 901, and then the processed audio data is sent to another terminal through theradio frequency circuit 904, or the audio data is output to thememory 902 for further processing. Theaudio circuitry 905 may also include an earbud jack to provide communication of peripheral headphones with the terminal.

Theinput unit 906 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint, iris, facial information, etc.), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

Power supply 907 is used to provide power to the various components ofterminal 900. Optionally, thepower supply 907 may be logically connected to theprocessor 901 through a power management system, so as to implement functions of managing charging, discharging, power consumption management, and the like through the power management system.Power supply 907 may also include any component such as one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown in fig. 9, the terminal 900 may further include a camera, a sensor, a wireless fidelity module, a bluetooth module, etc., which are not described in detail herein.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

As can be seen from the above, the terminal provided in this embodiment may: acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a sound track mark of each target sound track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data. Or the like, or, alternatively,

acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object; identifying a track label for a target track contained in the labeled audio data; and selecting object information corresponding to the track mark of the target track from the corresponding relation based on the track mark of the target track, and displaying the object information.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer-readable storage medium, in which a plurality of computer programs are stored, and the computer programs can be loaded by a processor to execute the steps of any one of the audio data processing methods or the audio mark display method provided by the embodiments of the present application. For example, the computer program may perform the steps of:

acquiring target audio data, wherein sound in the target audio data comes from at least one recording object; determining at least one target audio track containing sound in the target audio data; acquiring a sound track mark of each target sound track in the target audio data; determining a recording object corresponding to each target audio track, and acquiring object information of the recording object; and setting the corresponding relation between the audio track mark of each target audio track and the object information of the recording object corresponding to each target audio track in the target audio data to obtain the marked audio data. Or the like, or, alternatively,

Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the computer program stored in the storage medium may execute the steps of any audio data processing method or audio marker displaying method provided in the embodiments of the present application, the beneficial effects that any audio data processing method or audio marker displaying method provided in the embodiments of the present application can achieve can be achieved, for which details are shown in the foregoing embodiments and are not repeated herein.

The foregoing describes in detail an audio data processing method, an audio data processing apparatus, an audio data processing terminal, and a storage medium provided in the embodiments of the present application, and a specific example is applied in the present application to explain the principles and embodiments of the present application, and the description of the foregoing embodiments is only used to help understand the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An audio marker display method, comprising:

acquiring marked audio data, wherein sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object;

identifying track markers for all target tracks contained in the marked audio data;

in the process that the terminal plays the marked audio data, selecting object information of all recording objects corresponding to the audio track marks of all target audio tracks from the corresponding relation based on the audio track marks of all target audio tracks;

2. The method according to claim 1, wherein selecting, from the correspondence, object information corresponding to the track label of the target track based on the track label of the target track, and displaying the object information, comprises:

3. The audio tag display method according to claim 2, wherein selecting, from the correspondence, object information corresponding to the track tag of the target track to display, based on the track tag of the target track, comprises:

displaying the marked audio data playing progress bar in a playing page, and determining the position of the sound playing time period of each recorded object in the playing progress bar;

4. The audio marker display method of claim 3, wherein the correspondence comprises: the corresponding relation of the audio track mark, the sound time section and the object information of each target audio track;

the obtaining of the sound time periods in the respective target tracks based on the marked audio data comprises:

and acquiring the sound time periods of the target audio tracks from the corresponding relation of the marked audio data.

5. The audio tag display method according to claim 1, wherein selecting, from the correspondence, object information corresponding to the track tag of the target track to display, based on the track tag of the target track, comprises:

6. The audio tag display method according to claim 1, wherein, after selecting and displaying object information corresponding to the track tag of the target track from the correspondence based on the track tag of the target track, the method comprises:

7. An audio marker display device, comprising:

the device comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring marked audio data, the sound in the marked audio data comes from at least one recording object, the marked audio data comprises a target audio track corresponding to the at least one recording object, and the marked audio data is provided with a corresponding relation between an audio track mark of the target audio track and object information of the recording object;

an identification unit configured to identify the track labels of all target tracks included in the labeled audio data;

the display unit is used for selecting object information of all recording objects corresponding to the audio track marks of all target audio tracks from the corresponding relation to display on the basis of the audio track marks of the target audio tracks in the process that the marked audio data are played by the terminal;

the display unit is also used for determining the object information of all recorded objects as target object information and displaying the target object information in a playing page; and identifying a second target audio track of the current playing sound, and highlighting target object information corresponding to the second target audio track in the playing page.

8. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the audio marker display method of any of claims 1-6 when executing the computer program.

9. A storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements an audio marker display method as claimed in any one of claims 1-6.