CN106791916B

Movatterモバイル変換

Info

Publication number: CN106791916B
Application number: CN201611057391.6A
Authority: CN
Inventors: 宁可
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2016-11-26
Filing date: 2016-11-26
Publication date: 2020-06-30
Anticipated expiration: 2036-11-26
Also published as: CN106791916A

Abstract

The invention discloses a method, a device and a system for recommending audio data, and belongs to the technical field of computers. The method comprises the following steps: receiving an audio recommendation instruction, and shooting an image to obtain a target image; performing image recognition on the target image to obtain object information of a target object contained in the target image; sending object information of the target object to a server; and receiving display information of the target audio data corresponding to the object information of the target object sent by the server, and displaying the display information of the target audio data. By adopting the invention, the flexibility of recommending the audio data can be improved.

Description

Method, device and system for recommending audio data

Technical Field

The present invention relates to the field of computers, and in particular, to a method, an apparatus, and a system for recommending audio data.

Background

With the development of terminal technology and audio processing technology, listening to various audio data (such as songs, news, stories, etc.) by using terminals such as mobile phones has become a habit for many people.

In the prior art, a method for recommending audio data to a user is proposed, and generally, a server on a network side pushes some very popular audio data to a terminal of the user, and the audio data is displayed to the user for the user to select to play.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the audio data are recommended in the above way, and the recommended audio data are the same for different users, so that personalized recommendation cannot be performed based on different requirements of different users, and the flexibility of recommending the audio data is poor.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a method, an apparatus, and a system for recommending audio data. The technical scheme is as follows:

in a first aspect, a method for recommending audio data is provided, the method comprising:

receiving an audio recommendation instruction, and shooting an image to obtain a target image;

performing image recognition on the target image to obtain object information of a target object contained in the target image;

sending object information of the target object to a server;

and receiving display information of the target audio data corresponding to the object information of the target object sent by the server, and displaying the display information of the target audio data.

Optionally, after the displaying the display information of the target audio data, the method further includes:

when a selection instruction of display information of first audio data in the display information corresponding to the target audio data is received, sending an acquisition request of the first audio data to the server;

and receiving the first audio data sent by the server, and playing the first audio data.

sending an acquisition request of the target audio data to the server;

and receiving the target audio data sent by the server, and playing the target audio data.

Optionally, the article information is text information, article type information, or article image information.

In a second aspect, a method of recommending audio data is provided, the method comprising:

receiving object information of a target object sent by a terminal;

determining target audio data corresponding to the object information of the target object according to a corresponding relation between pre-stored object information and audio data;

and sending the display information of the target audio data to the terminal.

Optionally, after sending the display information of the target audio data to the terminal, the method further includes:

receiving an acquisition request of first audio data in the target audio data sent by the terminal;

and sending the first audio data to the terminal.

receiving an acquisition request of the target audio data sent by the terminal;

and sending the target audio data to the terminal.

In a third aspect, a terminal is provided, where the terminal includes:

the shooting module is used for receiving the audio recommendation instruction, shooting images and acquiring target images;

the identification module is used for carrying out image identification on the target image to obtain object information of a target object contained in the target image;

the sending module is used for sending the object information of the target object to a server;

and the receiving module is used for receiving the display information of the target audio data corresponding to the object information of the target object sent by the server and displaying the display information of the target audio data.

Optionally, the sending module is further configured to send, when receiving a selection instruction of display information of first audio data in display information corresponding to the target audio data, an acquisition request of the first audio data to the server;

the receiving module is further configured to receive the first audio data sent by the server;

the terminal also comprises a playing module used for playing the first audio data.

Optionally, the sending module is further configured to send an acquisition request of the target audio data to the server;

the receiving module is further configured to receive the target audio data sent by the server;

the terminal also comprises a playing module used for playing the target audio data.

In a fourth aspect, a server is provided, the server comprising:

the receiving module is used for receiving object information of a target object sent by the terminal;

the determining module is used for determining target audio data corresponding to the object information of the target object according to the corresponding relation between the pre-stored object information and the audio data;

and the sending module is used for sending the display information of the target audio data to the terminal.

Optionally, the receiving module is further configured to receive an obtaining request of first audio data in the target audio data sent by the terminal;

the sending module is further configured to send the first audio data to the terminal.

Optionally, the receiving module is further configured to receive an acquisition request of the target audio data sent by the terminal;

the sending module is further configured to send the target audio data to the terminal.

In a fifth aspect, a system for recommending audio data is provided, the system comprising a terminal and a server, wherein:

the terminal is used for receiving the audio recommendation instruction, shooting images and acquiring target images; performing image recognition on the target image to obtain object information of a target object contained in the target image; sending object information of the target object to the server; receiving display information of target audio data corresponding to the object information of the target object sent by the server, and displaying the display information of the target audio data;

the server is used for receiving the object information of the target object sent by the terminal; determining target audio data corresponding to the object information of the target object according to a corresponding relation between pre-stored object information and audio data; and sending the display information of the target audio data to the terminal.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, an audio recommendation instruction is received, image shooting is carried out, a target image is obtained, the target image is subjected to image recognition, object information of a target object contained in the target image is obtained, the object information of the target object is sent to a server, display information of target audio data corresponding to the object information of the target object sent by the server is received, and the display information of the target audio data is displayed. Therefore, related audio can be recommended based on the object shot by the user, and the flexibility of recommending audio data can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram of a system architecture according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for recommending audio data according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for recommending audio data according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for recommending audio data according to an embodiment of the present invention;

FIG. 5 is a schematic view of an interface provided by an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a method for recommending audio data, which can be realized by a terminal and a server together. As shown in fig. 1, the system architecture of the method may be composed of a terminal and a server. The terminal can be a mobile terminal such as a mobile phone and a tablet personal computer, and can be provided with a music playing application program. The server may be a background server for the music playing application.

The server may include a processor, memory, transceiver, etc. The processor, which may be a CPU (central processing Unit), may be configured to determine target audio data corresponding to the object information of the target object according to a correspondence relationship between pre-stored object information and audio data, and perform other processing. The Memory may be a RAM (Random Access Memory), a Flash Memory, or the like, and may be configured to store received data, data required by the processing procedure, data generated in the processing procedure, or the like, such as a correspondence between object information and audio data, target audio data, or the like. A transceiver, which may be used for data transmission with a terminal or other server, for example, to transmit targeted audio data to the terminal, may include an antenna, matching circuitry, a modem, and the like.

The terminal may include a processor, memory, transceiver, image capture component, display component, audio output component, and the like. And the processor can be a CPU and the like. The Memory may be a RAM (Random Access Memory), a Flash Memory, or the like, and may be configured to store received data, data required by the processing procedure, data generated in the processing procedure, or the like, such as a target image, object information of a target object, display information of target audio data, or the like. The transceiver may be used for data transmission with the server, for example, may receive target audio data sent by the server, may send object information of a target object to the server, and may include an antenna, a matching circuit, a modem, and the like. The image capturing means may be a camera for capturing an image of the object. The display component may be used to display presentation information, target images, and the like. The audio output component may be a sound box, an earphone, etc., and may be used to play the target audio data. The terminal may also include an input component, an audio detection component, and the like. The input means may be a touch screen, keyboard, mouse, etc. And the audio detection component can be a microphone and the like.

As shown in fig. 2, the processing flow of the terminal in the method may include the following steps:

step 201, receiving an audio recommendation instruction, and performing image shooting to obtain a target image.

Step 202, performing image recognition on the target image to obtain object information of the target object included in the target image.

Step 203, sending object information of the target object to the server.

And 204, receiving display information of the target audio data corresponding to the object information of the target object sent by the server, and displaying the display information of the target audio data.

As shown in fig. 3, the processing flow of the server in the method may include the following steps:

step 301, receiving object information of a target object sent by a terminal.

Step 302, determining target audio data corresponding to the object information of the target object according to the pre-stored correspondence between the object information and the audio data.

Step 303, sending the display information of the target audio data to the terminal.

As shown in fig. 4, the process of interacting and processing between the terminal and the server in the method may include the following steps:

step 401, the terminal receives an audio recommendation instruction, performs image shooting, and acquires a target image.

The audio in the present embodiment may be music, news, stories, etc.

In implementation, a user may install an audio playback application in the terminal. The audio playback application may be turned on when the user wants to listen to audio. The audio playing application may be provided with a "scan recommended audio" function, as shown in fig. 5, which provides a way for the user to obtain relevant audio by scanning an object. The user can click the function key of 'scanning and recommending audio', and the terminal can receive the audio recommending instruction. At this time, the terminal can be triggered to start the image shooting component to shoot images. At this time, the terminal may display a rectangular frame in the screen for prompting the user to control the image of the object desired to be scanned within the rectangular frame, as shown in fig. 5. Meanwhile, the terminal can display the confirmed key, and when the terminal detects a click instruction of the confirmed key, the terminal can acquire a target image shot at the current moment. Or, the terminal may perform object detection during continuous image capturing, and acquire a current target image when a certain type of object (such as a text, a real object, and the like) specified in advance is detected. Or, when the terminal receives the audio recommendation instruction, the terminal can directly acquire the target image obtained by image shooting at the current moment.

Step 402, the terminal performs image recognition on the target image to obtain object information of the target object included in the target image.

The article information is text information, article type information or article image information. The item type information may be apples, televisions, tables, glasses, etc. The item image information may be the entire target image containing the target item or a portion of the target image containing the target item.

In implementation, after the terminal acquires the target image, the terminal can identify different types of content in the target image in different working modes. If the working mode is a text recognition mode, the target object can be regarded as text content, and the terminal can perform text recognition on the target image to acquire a text contained in the target image. If the working mode is an article type identification mode, the terminal can determine the article type based on a preset object type identification algorithm (which can be an existing third-party identification algorithm), and in the specific algorithm, the terminal can pre-store a large number of image features corresponding to different object types, perform image feature extraction on the target image, determine which object type the target image conforms to, and further determine the object type. If the operation mode is the image obtaining mode, the whole target image may be used as the object information of the target object, or a partial image including the target object in the target image may be used as the object information of the target object.

In the audio playing application program, a working mode option may be set, and a user may select to start one of the working modes. When a user wants to find related audio based on a certain text, the text recognition mode can be turned on, for example, the user can watch the name of a certain TV play displayed on a TV and want to search related songs, or the user can watch a bullet screen comment issued by a certain person on a computer and want to search related articles. The item type recognition mode may be enabled when the user wants to search for relevant audio based on an object, for example, if the user sees a fruit that has not been seen, wants to find an introduction to the fruit, or if the user sees an object, wants to search for a song about the object, the item type recognition mode may be enabled. The user may start the image capturing mode when he wants to find the relevant audio based on a certain picture, for example, he may start the image capturing mode when he sees a poster of a new album of a certain singer and wants to listen to a song in the album, or he may start the image capturing mode when he sees a cover of a certain book and wants to listen to an audio version of the book.

Instep 403, the terminal sends object information of the target object to the server.

Instep 404, the server receives the object information of the target object sent by the terminal.

The above data transmission modes can be various, and the embodiment will not be described in detail.

Step 405, the server determines target audio data corresponding to the object information of the target object according to the correspondence between the pre-stored object information and the audio data.

In implementation, technicians can set audio data related to different article information in advance, establish a corresponding relation between the object information and the audio data, and store the object information and the audio data in a server. For example, the article information is text information "city of sky", and the corresponding audio data may be set as original sound music of song "city of sky", animation "city of sky", introduction audio of animation "city of sky", and the like. For another example, the article information is an apple, and the corresponding audio data may be set to be a song "small apple", some songs related to the apple, introduction audio of related knowledge of the apple, and the like. For another example, the article information is a poster of a certain movie, and the corresponding audio data may be set to be an original sound of the movie, an introduction audio of the movie, related news of the movie, and the like. Each item information may correspond to one or more audio data.

After the server receives the object information of the target object sent by the terminal, the server can search the target audio data corresponding to the object information of the target object in the corresponding relationship.

Step 406, the server sends the display information of the target audio data to the terminal.

In implementation, after the server finds the target audio data, the server may further obtain display information of each locally stored target audio data, where the display information may include a name of the target audio data, and may also include some summary information, related pictures, and the like. Further, the server may transmit presentation information of each target audio data to the terminal.

Step 407, the terminal receives the display information of the target audio data corresponding to the object information of the target object sent by the server, and displays the display information of the target audio data.

In implementation, after receiving the presentation information sent by the server, the terminal may display an audio recommendation list, as shown in fig. 5, display an option of each target audio data in the audio recommendation list, and display corresponding presentation information at a position of the option of each target audio data, so that the user may browse and select to play the presentation information.

In the embodiment of the present invention, the recommended target audio data may be played in various ways, and several feasible processing ways are given as follows:

in the first mode, a user selects audio data to be played from all target audio data, and accordingly, after displaying presentation information of the target audio data, the following processing may be performed:

step one, when a selection instruction of the display information of the first audio data in the display information corresponding to the target audio data is received, the terminal sends an acquisition request of the first audio data to the server.

And step two, the server receives an acquisition request of first audio data in the target audio data sent by the terminal.

And step three, the server sends the first audio data to the terminal.

And step four, the terminal receives the first audio data sent by the server and plays the first audio data.

In implementation, after the terminal displays the audio recommendation list, the user may browse the options of each target audio data displayed therein. In some cases, the recommended target audio data may be quintic, for example, in the above-described manner of determining audio data based on object type information, some songs and item introduction audio may be determined, and news may be provided. The user can search the audio recommendation list for the audio data which the user wants to play. Then, in the audio recommendation list, the option of the audio data (i.e., the first audio data) is clicked, at this time, the terminal receives the corresponding selection instruction, the terminal may be triggered to send an acquisition request of the first audio data to the server, and the server may send the first audio data to the terminal after receiving the acquisition request. After receiving the first audio data, the terminal can automatically play the first audio data.

In the second mode, all the target audio data are played, and accordingly, after the display information of the target audio data is displayed, the following processing may be performed:

step one, a terminal sends an acquisition request of target audio data to a server.

And step two, the server receives an acquisition request of the target audio data sent by the terminal.

And step three, the server sends the target audio data to the terminal.

And step four, the terminal receives the target audio data sent by the server and plays the target audio data.

In implementation, after the terminal displays the audio recommendation list, a play key may be displayed corresponding to the audio recommendation list. In some cases, the recommended target audio data may be highly relevant and may be the audio data desired by the user, for example, in the above manner of determining the audio data based on the article image information, if the user scans the poster of a song album, the recommended audio data may be all the songs of the song album. At this time, the user may click the play key if he wants to play all the target audio, and at this time, the terminal may receive a play instruction corresponding to all the target audio data, and may trigger the terminal to send an acquisition request of the target audio data to the server. After receiving the acquisition request, the server may send all the target audio data to the terminal. After receiving the target audio data, the terminal can play the target audio data sequentially or randomly.

Based on the same technical concept, an embodiment of the present invention further provides a terminal, as shown in fig. 6, where the terminal includes:

theshooting module 610 is used for receiving the audio recommendation instruction, shooting images and acquiring target images;

theidentification module 620 is configured to perform image identification on the target image to obtain object information of a target object included in the target image;

a sendingmodule 630, configured to send object information of the target object to a server;

the receivingmodule 640 is configured to receive display information of the target audio data corresponding to the object information of the target object sent by the server, and display the display information of the target audio data.

Optionally, the sendingmodule 630 is further configured to send, when receiving a selection instruction of display information of first audio data in display information corresponding to the target audio data, an acquisition request of the first audio data to the server;

the receivingmodule 640 is further configured to receive the first audio data sent by the server;

Optionally, the sendingmodule 630 is further configured to send an obtaining request of the target audio data to the server;

the receivingmodule 640 is further configured to receive the target audio data sent by the server;

Based on the same technical concept, an embodiment of the present invention further provides a server, as shown in fig. 7, where the server includes:

areceiving module 710, configured to receive object information of a target object sent by a terminal;

a determiningmodule 720, configured to determine, according to a correspondence between pre-stored object information and audio data, target audio data corresponding to the object information of the target object;

a sendingmodule 730, configured to send the display information of the target audio data to the terminal.

Optionally, the receivingmodule 710 is further configured to receive an obtaining request of first audio data in the target audio data sent by the terminal;

the sendingmodule 730 is further configured to send the first audio data to the terminal.

Optionally, the receivingmodule 710 is further configured to receive an obtaining request of the target audio data sent by the terminal;

the sendingmodule 730 is further configured to send the target audio data to the terminal.

Based on the same technical concept, the embodiment of the invention also provides a system for recommending audio data, which comprises a terminal and a server, wherein:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

It should be noted that: in the terminal and the server provided in the above embodiments, when recommending audio data, only the division of the above functional modules is used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structures of the terminal and the server are divided into different functional modules to complete all or part of the above described functions. In addition, the terminal, the server and the method for recommending audio data provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Referring to fig. 8, a schematic structural diagram of a terminal according to an embodiment of the present invention is shown, where the terminal may be used to implement the method for recommending audio data provided in the above-described embodiment. Specifically, the method comprises the following steps:

the terminal 1200 may include components such as an RF (Radio Frequency)circuit 110, amemory 120 including one or more computer-readable storage media, aninput unit 130, adisplay unit 140, asensor 150, anaudio circuit 160, a WiFi (wireless fidelity)module 170, aprocessor 180 including one or more processing cores, and apower supply 190. Those skilled in the art will appreciate that the terminal structure shown in fig. 8 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

theRF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information from a base station and then sends the received downlink information to the one ormore processors 180 for processing; in addition, data relating to uplink is transmitted to the base station. In general, theRF circuitry 110 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, theRF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (short messaging Service), etc.

Thememory 120 may be used to store software programs and modules, and theprocessor 180 executes various functional applications and data processing by operating the software programs and modules stored in thememory 120. Thememory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal 1200, and the like. Further, thememory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, thememory 120 may further include a memory controller to provide theprocessor 180 and theinput unit 130 with access to thememory 120.

Theinput unit 130 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, theinput unit 130 may include a touch-sensitive surface 131 as well asother input devices 132. The touch-sensitive surface 131, also referred to as a touch display screen or a touch pad, may collect touch operations by a user on or near the touch-sensitive surface 131 (e.g., operations by a user on or near the touch-sensitive surface 131 using a finger, a stylus, or any other suitable object or attachment), and drive the corresponding connection device according to a predetermined program. Alternatively, the touchsensitive surface 131 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to theprocessor 180, and can receive and execute commands sent by theprocessor 180. Additionally, the touch-sensitive surface 131 may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. In addition to the touch-sensitive surface 131, theinput unit 130 may also includeother input devices 132. In particular,other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

Thedisplay unit 140 may be used to display information input by or provided to a user and various graphical user interfaces of the terminal 1200, which may be made up of graphics, text, icons, video, and any combination thereof. TheDisplay unit 140 may include aDisplay panel 141, and optionally, theDisplay panel 141 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 131 may cover thedisplay panel 141, and when a touch operation is detected on or near the touch-sensitive surface 131, the touch operation is transmitted to theprocessor 180 to determine the type of the touch event, and then theprocessor 180 provides a corresponding visual output on thedisplay panel 141 according to the type of the touch event. Although in FIG. 8, touch-sensitive surface 131 anddisplay panel 141 are shown as two separate components to implement input and output functions, in some embodiments, touch-sensitive surface 131 may be integrated withdisplay panel 141 to implement input and output functions.

The terminal 1200 can also include at least onesensor 150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of thedisplay panel 141 according to the brightness of ambient light, and a proximity sensor that may turn off thedisplay panel 141 and/or a backlight when the terminal 1200 is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which may be further configured on the terminal 1200, detailed descriptions thereof are omitted.

Audio circuitry 160,speaker 161, andmicrophone 162 may provide an audio interface between a user and terminal 1200. Theaudio circuit 160 may transmit the electrical signal converted from the received audio data to thespeaker 161, and convert the electrical signal into a sound signal for output by thespeaker 161; on the other hand, themicrophone 162 converts the collected sound signal into an electric signal, converts the electric signal into audio data after being received by theaudio circuit 160, and then outputs the audio data to theprocessor 180 for processing, and then to theRF circuit 110 to be transmitted to, for example, another terminal, or outputs the audio data to thememory 120 for further processing. Theaudio circuitry 160 may also include an earbud jack to provide communication of peripheral headphones with theterminal 1200.

WiFi belongs to a short-distance wireless transmission technology, and the terminal 1200 may help a user to send and receive e-mails, browse webpages, access streaming media, and the like through theWiFi module 170, and provide the user with wireless broadband internet access. Although fig. 8 shows theWiFi module 170, it is understood that it does not belong to the essential constitution of the terminal 1200, and may be omitted entirely as needed within the scope not changing the essence of the invention.

Theprocessor 180 is a control center of the terminal 1200, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the terminal 1200 and processes data by operating or executing software programs and/or modules stored in thememory 120 and calling data stored in thememory 120, thereby performing overall monitoring of the mobile phone. Optionally,processor 180 may include one or more processing cores; preferably, theprocessor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into theprocessor 180.

The terminal 1200 also includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically coupled to theprocessor 180 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. Thepower supply 190 may also include any component including one or more of a dc or ac power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal 1200 may further include a camera, a bluetooth module, and the like, which will not be described herein. Specifically, in this embodiment, the display unit of the terminal 1200 is a touch screen display, and the terminal 1200 further includes a memory and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for:

sending object information of the target object to a server;

sending an acquisition request of the target audio data to the server;

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention. Theserver 1900, which may vary widely in configuration or performance, may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) andmemory 1932, one or more storage media 1930 (e.g., one or more mass storage devices)storing applications 1942 ordata 1944.Memory 1932 andstorage medium 1930 can be, among other things, transient or persistent storage. The program stored in thestorage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, acentral processor 1922 may be provided in communication with thestorage medium 1930 to execute a series of instruction operations in thestorage medium 1930 on theserver 1900.

Theserver 1900 may also include one ormore power supplies 1926, one or more wired orwireless network interfaces 1950, one or more input-output interfaces 1958, one ormore keyboards 1956, and/or one ormore operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Server 1900 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors include instructions for:

receiving object information of a target object sent by a terminal;

and sending the display information of the target audio data to the terminal.

and sending the first audio data to the terminal.

receiving an acquisition request of the target audio data sent by the terminal;

and sending the target audio data to the terminal.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for recommending audio data, the method being applied to a terminal, the method comprising:

performing image recognition on the target image according to a selected working mode to obtain object information of a target object corresponding to the working mode, wherein the object information is text information, article type information or article image information, and the working mode comprises a text recognition mode, an article type recognition mode and an image acquisition mode;

sending the object information of the target object to a server so that the server determines target audio data corresponding to the object information of the target object according to a corresponding relation between pre-stored object information and audio data;

receiving display information of target audio data corresponding to the object information of the target object sent by the server, and displaying the display information of the target audio data, wherein the target audio data is music, news or stories;

the displaying the display information of the target audio data comprises:

displaying an audio recommendation list, displaying options of each target audio data in the audio recommendation list, and displaying corresponding display information at the position of the option of each target audio data for a user to browse and select to play.

2. The method of claim 1, wherein after displaying the presentation information of the target audio data, further comprising:

3. The method of claim 1, wherein after displaying the presentation information of the target audio data, further comprising:

sending an acquisition request of the target audio data to the server;

4. A method of recommending audio data, the method comprising:

receiving object information of a target object corresponding to a selected working mode, wherein the object information of the target object is obtained by the terminal through image recognition according to the working mode, the object information is text information, article type information or article image information, and the working mode comprises a text recognition mode, an article type recognition mode and an image acquisition mode;

determining target audio data corresponding to the object information of the target object according to a corresponding relation between pre-stored object information and audio data, wherein the target audio data are music, news or stories;

and sending the display information of the target audio data to the terminal so as to enable the terminal to display an audio recommendation list, displaying the option of each target audio data in the audio recommendation list, and displaying the corresponding display information at the position of the option of each target audio data so as to enable a user to browse and select to play.

5. The method of claim 4, wherein after sending the presentation information of the target audio data to the terminal, further comprising:

and sending the first audio data to the terminal.

6. The method of claim 4, wherein after sending the presentation information of the target audio data to the terminal, further comprising:

receiving an acquisition request of the target audio data sent by the terminal;

and sending the target audio data to the terminal.

7. A terminal, characterized in that the terminal comprises:

the identification module is used for carrying out image identification on the target image according to a selected working mode to obtain object information of a target object corresponding to the working mode, wherein the object information is text information, article type information or article image information, and the working mode comprises a text identification mode, an article type identification mode and an image acquisition mode;

the sending module is used for sending the object information of the target object to a server so that the server can determine target audio data corresponding to the object information of the target object according to the corresponding relation between the pre-stored object information and the audio data;

the receiving module is used for receiving display information of target audio data corresponding to the object information of the target object and sent by the server and displaying the display information of the target audio data, wherein the target audio data is music, news or stories;

the receiving module is further configured to display an audio recommendation list, display an option of each target audio data in the audio recommendation list, and display corresponding presentation information at a position of the option of each target audio data, so that a user can browse and select to play the presentation information.

8. The terminal according to claim 7, wherein the sending module is further configured to send an acquisition request of the first audio data to the server when receiving a selection instruction of presentation information of the first audio data in the presentation information corresponding to the target audio data;

9. The terminal according to claim 7, wherein the sending module is further configured to send a request for obtaining the target audio data to the server;

10. A server, characterized in that the server comprises:

the receiving module is used for receiving object information of a target object corresponding to the selected working mode, the object information of the target object is obtained by the terminal through image recognition according to the working mode, the object information is text information, article type information or article image information, and the working mode comprises a text recognition mode, an article type recognition mode and an image acquisition mode;

the sending module is used for sending the display information of the target audio data to the terminal so that the terminal can display an audio recommendation list, the option of each target audio data is displayed in the audio recommendation list, corresponding display information is displayed at the position of the option of each target audio data so that a user can browse and select to play, and the target audio data are music, news or stories.

11. The server according to claim 10, wherein the receiving module is further configured to receive an obtaining request of first audio data in the target audio data sent by the terminal;

12. The server according to claim 10, wherein the receiving module is further configured to receive an acquisition request of the target audio data sent by the terminal;

13. A system for recommending audio data, the system comprising a terminal and a server, wherein:

the terminal is used for receiving the audio recommendation instruction, shooting images and acquiring target images; performing image recognition on the target image according to the selected working mode to obtain object information of a target object corresponding to the working mode and contained in the target image; sending object information of the target object to the server; receiving display information of the target audio data corresponding to the object information of the target object sent by the server, displaying the display information of the target audio data, and displaying the display information of the target audio data, wherein the display information of the target audio data comprises: displaying an audio recommendation list, displaying options of each target audio data in the audio recommendation list, and displaying corresponding display information at the position of the option of each target audio data for a user to browse and select to play;

the server is used for receiving object information of a target object corresponding to the selected working mode, which is sent by the terminal, wherein the object information of the target object is obtained by the terminal through image recognition according to the working mode; determining target audio data corresponding to the object information of the target object according to a corresponding relation between pre-stored object information and audio data; sending display information of the target audio data to the terminal, wherein the target audio data are music, news or stories;

the object information is text information, article type information or article image information, and the working mode comprises a text recognition mode, an article type recognition mode and an image acquisition mode.