Detailed Description
Example embodiments in the present disclosure provide systems and methods for bidding keyword suggestions. The system may implement the method to assist in providing advertising services. Using these systems and methods, an advertiser may receive suggested keywords for bidding on-line advertising auctions without providing category information and/or initial seed keywords for its advertising creative. To this end, the system and method may implement a two-stage keyword analysis approach. In stage 1 analysis, the systems and methods may select a plurality of candidate keywords from a keyword database based on feature similarity analysis. In stage 1 analysis, the systems and methods may further refine the selection by comprehensively evaluating feature similarity, semantic similarity, and category similarity of candidate keywords to ad creatives. The final selection may be used by advertisers as bidding keywords.
The subject matter now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. However, the subject matter may be embodied in various different forms, and thus, it is intended that covered or claimed subject matter be construed as not limited to any example embodiment set forth herein, which is provided for illustrative purposes only. As such, it is intended to include a reasonably broad range of claimed or covered subject matter. Further, for example, the subject matter may be implemented as a method, apparatus, component, or system. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
Throughout the specification and claims, terms may have a subtle meaning that is implied or implied from the context or meaning explicitly stated. Likewise, the phrase "in one embodiment" as used herein does not necessarily refer to the same embodiment, and the phrase "in another embodiment" as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter include all or a combination of portions of the example embodiments.
In general, terms may be understood at least in part from the context of usage. For example, terms such as "and," "or" and/or "as used herein may include various meanings that may depend, at least in part, on the context in which the terms are used. In general, if used with respect to an association list, such as A, B or C, then "or" is intended to mean not only A, B and C, which are used herein in an inclusive sense, but also A, B or C, which are used herein in a unique sense. Furthermore, the term "one or more" as used herein, depending at least in part on the context, may be used to describe any feature, structure, or characteristic in the singular or may be used to describe a combination of features, structures, or characteristics in the plural. Similarly, terms such as "a" or "an" may, again, be understood to convey a singular use or to convey a plural use, depending, at least in part, on the context. Moreover, the term "based on" may be understood as not necessarily intended to convey an exclusive set of factors, but may instead allow for the presence of additional factors (again, depending at least in part on context) that are not necessarily explicitly stated.
The online information system places the advertiser's advertisement within a content service (e.g., a web page, mobile application ("app"), TV app, or other audio or visual content service) that is available to the end user. The advertisement is provided along with other content. Other content may include text, graphics, audio, video, or any combination of links to such content. Advertisements are conventionally selected based on a variety of criteria, including those specified by the advertiser. Conventionally, advertisers define advertising campaigns to control how and when advertisements are available to users and to specify the content of those advertisements. The content of the ad itself is sometimes referred to as an ad creative or ad creatives.
Various monetization techniques or models may be used in conjunction with sponsored advertisements. In an auction-type online advertising marketplace, advertisers may bid in connection with placement of advertisements, but other factors may also be included in determining advertisement selection or ranking. For keyword-based advertising, bids may be associated with one or more search queries associated with one or more keywords or certain specific events (occurrences). Bids may also be associated with amounts paid by advertisers for certain specific events (e.g., for placement or clicking of an advertisement). Advertisers' payments for online advertisements may be divided among parties, including one or more publishers or publisher networks, one or more market promoters or providers, or potentially other parties.
Some models may include guaranteed delivery advertisements in which an advertiser may pay based at least in part on an agreement that guarantees or provides a guaranteed measure that the advertiser will receive some agreed-upon number of suitable advertisements, or non-guaranteed delivery advertisements that may include, for example, independent service opportunities or spot market(s). In various models, an advertiser may pay based at least in part on any of a variety of metrics associated with ad delivery or performance, or associated with a measure or approach to a particular advertiser target(s). For example, the model may include a payment based, among other things, on a cost per impression (CPM) or quantity cost per impression (CPM), cost per Click (CPM), Cost Per Action (CPA) for a particular action(s), cost per conversion or purchase(s), or a cost based, at least in part, on some combination of metrics, which may include online or offline metrics.
FIG. 1 is a block diagram of a presence information system 100. The online information system 100 in the example embodiment of FIG. 1 may include an account server 102, and account database 104, a search engine 106, an advertisement (ad) server 108, and an advertisement database 110. The online information system 100 may be accessed over the network 120 by one or more advertiser devices (e.g., advertiser devices 112a, 112b) and by one or more user devices (e.g., user devices 124a, 124 b). In various examples of such online information systems, a user may search for and obtain content from sources on network 120. Advertisers may provide advertisements for placement on web pages and other communications sent over a network to user devices (e.g., user devices 124a, 124 b). In one example, the online information system may be comprised of a network such as Yahoo! Online provider deployments and operations such as companies.
Account server 102 may store account information for advertisers. The account server 102 may be in data communication with an account database 104. The account information may include one or more database records associated with the respective advertisers. Any suitable information may be stored, maintained, updated, and read from the account database 104 by the account management server 102. Examples include advertiser identification information, advertiser security information such as secrets and other security credentials, and account balance information. In some embodiments, an online provider managing the online information system 100 may assign one or more account managers to various advertisers, and information about the one or more account managers, as well as information obtained by the account managers and recorded for subsequent access, may be maintained in the account database 104.
The account server 102 may be implemented using any suitable device. For example, account management server 102 may be implemented as a single server, multiple servers, or any other type of computing device known in the art. Access to the account server 102 may be accomplished through a firewall (not shown) that protects the account management program and account information from external tampering. Additional security may be provided via enhancements to standard communication protocols, such as secure HTTP or a secure sockets layer.
The account server 102 may provide an advertiser front-end to simplify the process of accessing the advertiser's account information. The advertiser front end may be a program, application, or software routine that forms a user interface. According to an example embodiment of the present disclosure, an advertiser front end may be accessible as a web site having one or more web pages that may be viewed by an accessed advertiser on an advertiser device, such as advertiser devices 122a, 122 b. The advertiser may view and edit the account data using the advertiser front end. After editing the advertisement data, the account data is then saved to the account database 104.
Search engine 106 may be a computer system, one or more servers, or any other computing device known in the art. Alternatively, the search engine 106 may be a computer program, instructions, or software code stored on a computer-readable storage medium that runs on a processor of a single server, multiple servers, or any other type of computing device known in the art. The search engine 106 may be accessed over the network 120, for example, by user devices operated by users (e.g., user devices 124a, 124 b). The user devices 124a, 124b may transmit user queries to the search engine 106. The search engine 106 may locate the matching information and return the information to the user devices 124a, 124b using any suitable protocol or algorithm. The search engine 106 may be designed to assist users in finding information located on the internet or an intranet. According to example embodiments of the disclosure, the search engine 106 may also provide web pages with the following to the user devices 124a, 124b over the network 120: search results, information matching the context of the user query, links to other network destinations or information, and files of information of interest to the user operating the user device 124a, 124 b.
The search engine 106 may cause a device (e.g., user device 124a, 124b, or any other client device) to search for files of interest using the search query. In general, the search engine 106 may be accessible by client devices over the network 120 via one or more servers or directly. Search engine 106 may include, for example, a crawler component, an indexer component, an index storage component, a search component, a ranking component, a cache, a profile storage component, a login component, a profile builder, and one or more Application Program Interfaces (APIs). The search engine 106 may be deployed in a distributed fashion (e.g., via a set of distributed servers). The components may be duplicated within the network, for example for redundancy or better access.
The ad server 108 is operable to serve ads to user devices, such as user devices 124a, 124 b. The advertisement includes data defining advertisement information that may be of interest to a user of the user device. The advertisement may include text data, graphics data, image data, video data, or audio data. The advertisement may also include data defining one or more links to other network resources that provide such data. The other location may be other locations on the internet, other locations on an intranet operated by an advertiser, or any access.
For online information providers, advertisements may be displayed on web pages that originate from user-defined searches that are based, at least in part, on one or more search terms. Advertisements may also be displayed based on the content of the web page opened by the user. If the displayed advertisement is relevant to one or more of the user's interests, the advertisement is beneficial to the user, advertiser, or web portal.
The ad server 108 may include logic and data operative to format ad data for transmission to user devices. The ad server 108 may be in data communication with an ad database 110. The advertisement database 110 may store information including data defining advertisements to be served to user devices. This advertisement data may be stored in the advertisement database 110 by another data processing device or by the advertiser.
Additionally, the ad server 108 may be in data communication with the network 120. The ad server 108 may transmit ad data and other information to the devices over the network 120. This information may include advertisement data that is transmitted to the user device. This information may also include advertising data and other information communicated with advertiser devices, such as advertiser devices 122a, 122 b. Advertisers operating advertiser devices may access the ad server 108 over a network to access information including ad data. This access may include developing ad creatives, editing ad data, deleting ad data, and other activities.
The ad server 108 may provide an advertiser front-end to simplify the process of accessing the advertiser's ad data. The advertiser front end may be a program, application, or software routine that forms a user interface. In one particular embodiment, the advertiser front end may be accessible as a web site having one or more web pages that may be viewed by an accessed advertiser on an advertiser device. The advertiser may view and edit the account data using the advertiser front end. After editing the advertisement data, the advertisement data may then be saved to the advertisement database 110 for subsequent communication in advertisements to the user device.
The ad server 108 may be a computer system, one or more servers, or any other computing device known in the art. Alternatively, the ad server 108 may be a computer program, instructions, or software code stored on a computer-readable storage medium that runs on a processor of a single server, multiple servers, or any other type of computing device known in the art.
The account server 102, search engine 106, and advertisement server 108 may be implemented as any suitable computing device. A computing device may be capable of sending or receiving data (e.g., via a wired or wireless network), or may be capable of processing or storing signals (e.g., in memory as physical memory states), and thus may function as a server. Thus, a device capable of functioning as a server may include, for example, a dedicated rack-mounted server, a desktop computer, a laptop computer, a set-top box, an integrated device that incorporates various features (e.g., two or more features) of the aforementioned devices, and so forth.
Network 120 may include any data communication network or combination of networks. The network may couple the devices such that communications may be interchanged, for example, between the server and a client device or other type of device, including between wireless devices coupled via the wireless network. The network may also include mass storage devices, such as Network Attached Storage (NAS), a Storage Area Network (SAN), or other forms of computer or machine readable media. The network may include the internet, one or more Local Area Networks (LANs), one or more Wide Area Networks (WANs), cable type connections, wireless type connections, or any combination thereof. Similarly, sub-networks (e.g., which may employ different architectures or be compatible or compatible with different protocols) may interoperate within a larger network, such as network 120. Various types of devices may be used, for example, to provide interoperability for different architectures or protocols. As one illustrative example, a router may provide a link between LANs that are otherwise separate and independent. The communication links or channels may include, for example, analog telephone lines (e.g., twisted pair, coaxial cable), all-digital lines or portions of digital lines (including T1, T2, T3, or T4 type lines), an infrastructure digital network (IDSN), Digital Subscriber Lines (DSL), wireless links including satellite links, or other communication links or channels known to those skilled in the art. Additionally, the computing device or other related electronic devices may be remotely coupled to the network (e.g., via a telephone line or link).
The advertiser devices 122a, 122b may include any data processing device accessible to the online information system 100 via the network 120. The advertiser devices 122a, 122b are operable to interact with the account server 102, the search engine 106, the ad server 108, content servers, and other data processing systems over the network 120. The advertiser devices 122a, 122b may, for example, implement web browsers for viewing web pages and submitting user requests. The advertiser devices 122a, 122b may transmit data to the online information system 100, including data defining web pages and other information. The advertiser devices 122a, 122b may receive communications from the online information system 100 that include data defining web pages and advertising creatives.
The user devices 124a, 124b may comprise any data processing device that may access the online information system 100 through the network 120. The user devices 124a, 124b are operable to interact with the search engine 106 over the network 120. The user devices 124a, 124b may, for example, implement a web browser for viewing web pages and submitting user requests. A user operating a user device 124a, 124b may enter a search request and communicate the search request to the online information system 100. The search request may be processed by a search engine and search results may be returned to the user devices 124a, 124 b. In other examples, a user of a user device 124a, 124b may request data, such as a page of information, from the online information processing system 100. The data may then be provided in another environment, such as a native mobile application, a TV application, or an audio application. The online information processing system 100 may provide data or redirect a browser to another web site. Further, the advertisement server may select advertisements from the advertisement database 110 and include data defining the advertisements in the data provided to the user devices 124a, 124 b.
The advertiser devices 122a, 122b and user devices 124a, 124b may act as client devices when accessing information on the online information system. Client devices, such as advertiser devices 122a, 122b and user devices 124a, 124b, may include computing devices capable of sending or receiving data (e.g., via a wired or wireless network). The client devices may include, for example, desktop computers or portable devices such as cellular telephones, smart phones, display pagers, Radio Frequency (RF) devices, Infrared (IR) devices, Personal Digital Assistants (PDAs), handheld computers, tablet computers, laptop computers, set-top boxes, wearable computers, integrated devices incorporating various features (e.g., two or more features) of the aforementioned devices, and the like. In the example of fig. 1, laptop 124b and smartphone 124a may interchangeably function as advertiser devices or as user devices.
Fig. 2 is a schematic diagram illustrating an example embodiment of a server 200. Server 200 may be used as account server 102, search engine 106, and advertisement server 108 of fig. 1. Server 200 may vary widely in configuration or capabilities, but it may include one or more central processing units 222 and memory 232, one or more media 230 (e.g., one or more mass storage devices) storing applications 242 or data 244, one or more power supplies 226, one or more wired or wireless network interfaces 250, one or more input/output interfaces 258, and/or one or more operating systems 241 (e.g., Windows Server)TM、Mac OS XTM、UnixTM、LinuxTM、FreeBSDTMEtc.). Thus, server 200 may comprise, for example, a dedicated rack-mounted server, a desktop computer, a laptop computer, a set-top box, a mobile computing device such as a smart phone, an integrated device incorporating various features (e.g., two or more features) of the aforementioned devices, and so forth.
Account Server shown in FIG. 1102. The search engine 106, content server 112, and advertisement server 108 may be implemented as or may be in communication with a content server. A content server may include a device that includes a configuration to provide content to another device via a network. The content server may host, for example, a site such as a social networking site, examples of which may include, but are not limited to: yahoo!TM、FlickerTM、TwitterTM、FacebookTM、LinkedInTMOr an individual user site (e.g., a blog, microblog, online dating site, etc.). The content server may also host various other sites including, but not limited to: business sites, educational sites, dictionary sites, encyclopedia sites, wiki, financial sites, government sites, and the like. The content server may also provide various services including, but not limited to: web services, third party services, audio services, video services, email services, Instant Messaging (IM) services, SMS services, MMS services, FTP services, Voice Over IP (VOIP) services, calendar services, photo services, and the like. Examples of content may include text, images, audio, video, etc., which may be processed, for example, in the form of physical signals (e.g., electronic signals), or may be stored in memory, for example, as physical states. Examples of devices that may be used as content servers include desktop computers, multiprocessor systems, microprocessor-based or programmable consumer electronics, and the like. The content server may not share usage rights or control with one or more ad servers.
FIG. 3 is a schematic diagram illustrating an example embodiment of a client device that may be used as the user devices 124a, 124b and the advertiser devices 122a, 122 b. The client device may include means to perform the methods and software systems introduced in this disclosure. Client device 300 may be a computing device capable of enabling a software system. The client device 300 may be, for example, a device such as a personal desktop computer or a portable device (e.g., a laptop computer, a tablet computer, a cellular telephone, or a smart phone).
Client device 300 may vary in capabilities and characteristics. The claimed subject matter is intended to cover a broad range of potential variations. For example, client device 300 may include a keyboard/keypad 356. It may also include a display 354 such as a Liquid Crystal Display (LCD) or a display with advanced functionality such as a touch sensitive color 2D or 3D display. In contrast, however, as another example, web-enabled client device 300 may include one or more physical or virtual keyboards and mass storage media 330.
Client device 300 may also include or may allow a variety of operating systems 341, including, for example, WindowsTMOr LinuxTMSuch as an operating system or the like, such as iOSTM、AndroidTMOr Windows MobileTMSuch as a mobile operating system. The client device 300 may include or may allow a variety of possible applications 342, such as an electronic game 345. Application 342 may enable communication with other devices via a network, such as with another computer, another client device, or a server via a network.
Additionally, the client device 300 may include one or more non-transitory processor-readable storage media 330 and one or more processors 322 in communication with the non-transitory processor-readable storage media 530. For example, the non-transitory processor-readable storage medium 330 may be RAM memory, flash memory, ROM 334, 340 memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of non-transitory storage medium known in the art. The one or more non-transitory processor-readable storage media 330 may store, or include units and/or modules to perform, the operations and/or method steps described by the present disclosure. Alternatively, the units and/or modules may be hardware disposed in the client device 300 configured to perform the operations and/or method steps described in the present disclosure. The one or more processors may be configured to allow multiple sets of instructions and to perform operations in example embodiments of the present disclosure.
For illustration only, only one processor will be described in the following example embodiments in a client device and server running the operations and/or method steps. However, it should be noted that the client devices and servers in the present disclosure may also include multiple processors, and thus operations and/or method steps performed by one processor in the present disclosure may also be performed by multiple processors, either jointly or separately. For example, if a processor performs both step a and step B in this disclosure, it should be understood that steps a and B may also be performed jointly or separately by two different processors in the client device (e.g., a first processor performs step a and a second processor performs step B, or a first and second processor performs steps a and B jointly).
FIG. 4 is an example illustrating a system 400 for providing a web page with query search results. The system 400 may include at least one server 450. Server 450 may be a general representation of servers 102, 106, and 108 in FIG. 1, or may be a general representation of a portion of these servers. The server 450 may communicate with at least one database 452 to provide data for the web page 400. The database 452 may include a content database that includes a plurality of articles (articles) and/or web page links to be displayed on the web page 400. The article may be any form of content item. For example, the article may be a text item (e.g., a text report, a story, etc.) or content of multimedia (e.g., an audio/video clip), or a combination thereof. Database 452 may also include an advertisement database that includes a plurality of advertisements to be displayed in a website. The database 452 may be stored in a non-transitory processor-readable storage medium in communication with the server 450. The web page 402 shown in fig. 4 is an example internet search page with search results corresponding to the search query "hard mattess". The web page 402 may also be a home page of a website, a landing page, or a web page of a particular topic (e.g., sports, finance, news, etc.). The web page 402 may be displayed on a browser of the user device 124a, 124 b.
The web page 402 may include a search input box 440. The user may output a search query 441 in the search input box 440 and the server 450 may return and display the search results on the web page 402. For example, in the web page 402 shown in FIG. 4, the user enters the search query "hardattress".
The center column of the web page 402 may be a column of web page content 424. The web page content 424 may include a plurality of slots in which a series of items 420, 422, 426, 428, 430, and 432 are displayed item by item. Items 422, 426, 428, 430, and 432 may be search results corresponding to the search query "hard match". Each item may include a text summary 412 of the item. Items 422, 426, 428, 430, and 432 may also include graphics/video 416, other data (not shown), and links 414 to additional information for the items. Clicking or otherwise selecting the link 414 may redirect the browser on the user device 124a, 124b to a web page having additional information.
The web page content 424 of items 422, 426, 428, 430, and 432 may include any type of content item. For example, the web page content 424 may include articles, including news, business-related articles, sports-related articles, and so forth. In addition to textual or graphical content, the articles 422, 426, 428, 430, and 432 may include other data, such as audio and video data or applications.
The location of the items 422, 426, 428, 430, and 432 in the web page content 424 may be determined based on relevance. For example, the first term 422 may be an article that is more relevant to the search query "hard match" than the sixth term 432. However, the location may or may not be an accurate indicator of the popularity of the item for the user. For example, while the second item 422 is an article that is more relevant to the search query "hard matches" than the sixth item 432, the sixth item 432 associated with the hard mattress provider, Ashley furniture industry, may receive more clicks than the second item 422, the second item 422 being an article that is relevant to back pain.
On the right-hand side, web page 402 may include a column 444 of advertisements (e.g., advertisement 442). The advertisement 442 may be designed to attract the user's attention and promote the advertiser's product and/or service. For example, the advertisement 442 in FIG. 4 is designed to promote home furniture produced by Ashley furniture industries, Inc. Further, advertisements 442 may also be placed in center bar 424, or any other suitable place in web page 402.
The creative of the ad 442 may include a name (e.g., the name of the advertiser); topic 442a (e.g., the topic of the advertisement); and a description 442b (e.g., a description of the advertiser's product and/or service). Only the title 442a and description 442b of the creative may be displayed in the advertisement 442. In addition, the topic 442a can be displayed as a hyperlink such that a user clicking on the topic will be directed to the advertiser's web page 460 (i.e., a landing page for a product and/or service). Table 1 shows an example of creatives for advertisement 442.
TABLE 1
The advertisement 442 displays the title 442a as a hyperlink and the description 442b as plain text on the web page 402. When the user clicks on the hyperlink, the user is directed to the home page of Ashley furniture industry, Inc.
Advertisers may display advertisements 442 through an online advertising auction service provided by a publisher (e.g., website 402 or a separate agent of website 402). The advertiser may decide his bid based on the similarity between the search query 441 and the list of bidding keywords associated with the advertisement 442. The listing of bid keywords may be provided by the publisher when the advertiser subscribes to the advertisement display service from the publisher, such that the advertiser need not provide seed keywords and/or category information for its own advertisement 442 to the publisher for keyword analysis.
Fig. 5 is a schematic diagram of a system 500 for providing a bidding keyword suggestion service to a server 502 according to an example embodiment of the present disclosure. The system 500 may belong to an advertisement publisher such that the keyword suggestion service is part of an online advertising service provided by the publisher. Alternatively, the system 500 may be a system that provides independent bid keyword suggestion services to advertisers independent of publishers.
The system 500 can include a keyword suggestion engine 504 configured to suggest bidding keywords to advertisers without requiring seed keywords associated with a creative as external input. The keyword suggestion engine 504 may be a server 200 that includes a processor 222 and a non-transitory storage medium 230. The storage medium 230 may have a set of instructions stored therein. The set of instructions may direct the processor 222 to perform a predetermined performance. For example, when an advertiser 502 inputs an advertising creative ("creative") 518 to the keyword suggestion engine 504, the processor 222 may allow a relevance model 506 (i.e., a set of instructions) stored in the medium 230 to perform a two-stage keyword analysis. The first stage analysis 508 and the second stage analysis 510 of the creative 518 do not require the use of seed keywords (i.e., keywords that initiate keyword analysis) suggested by the ad advertiser (or advertiser's agent) to be associated with the creative. Thus, the keyword suggestion engine 504 can return a list of suggested bid keywords 520 to the advertiser without requiring input of seed keywords or category information from the advertiser's creative. In an example implementation, the keyword suggestion engine 504 may rely entirely on the input of the creative 518 for keyword analysis. The list of suggested bid keywords may include a predetermined number (e.g., 50) of bid keywords that are ranked according to a relevance score (i.e., a recommendation) to the creative so that the advertiser 502 may treat the list of suggested bid keywords 520 as keywords for bidding on its advertisement in an online advertising auction to place its advertisement on the publisher's website. Additionally, the analysis can be made based entirely on the input of the creative 518 and is accurate and efficient enough so that the advertiser 502 is not required to provide its own set of bid keywords for the auction or its own set of seed keywords for extended keyword analysis.
In the first stage analysis 508, the processor 222 may select a predetermined number of candidate keywords from a keyword database based on the input. To this end, the keyword suggestion engine 504 may communicate with a keyword dictionary 512, which is a pre-built database that includes thousands of keywords, related keywords, ranking scores (vectors) of relevance between the keywords, and a feature vector (or feature vectors) corresponding to each keyword. The keywords may be provided by a frequency filter 516, where the frequency filter 516 collects keywords collected by the general public at Yahoo! Search queries entered in the internet during daily online activities over a period of time in the past. The frequency filter 516 may be configured to capture user viewing and clicking behavior with respect to a search results page. Frequency filter 516 may be used as a data source where the first hundred million frequently searched keywords are picked from keyword dictionary 512. The keywords stored in the keyword dictionary 512 are sufficiently complete that statistically, it covers almost all keywords required for a typical ad to conduct a bidding ad auction.
In the second stage analysis 510, the processor 222 may further refine the candidate keywords selected in the first stage analysis into a list of suggested bidding keywords. For example, the processor 222 may use a linear regression algorithm to select 50 keywords from approximately 500 candidate keywords as the suggested bidding keywords. The linear regression algorithm may be pre-optimized through the training model 514 using a set of training data 518 manually determined by an editor.
Fig. 6 is a flow chart illustrating how feature vectors for keywords in a pre-constructed keyword dictionary 512 are determined according to an example embodiment of the present disclosure. The process in the flowchart may be performed by a server such as server 200 that has access to a keyword database held in keyword dictionary 512. The server may independently analyze the keyword database before the advertiser enters a creative into the keyword suggestion engine 504.
In step 602, the server may perform an internet search for each keyword (hereinafter referred to as "database keyword") held in the keyword data and obtain a list of search results. Each search result may correspond to a URL (uniform resource location). In addition, the server may rank the list of URLs according to the relevance of the content in the URL to the database keywords. The higher the ranking of the URL, the more relevant the content of the URL is to the database keywords.
At step 604, the server may select a predetermined number of candidate URLs from the list of search results that have the highest likelihood of being clicked by a general user searching the internet using the database keyword. For example, the server may select only the top 10 URLs from the list of search results. Several factors may be considered in selecting a candidate URL. For example, one factor may be (but is not limited to) the location (i.e., ranking) of URLs in a URL list, i.e., the user may select those URLs that have more relevant content to the database keywords. Another factor may be the number of times the URL is accessed by the general public over a period of time, i.e., the server may also select the most popular URL in the list of URLs (i.e., the most clicked URL). Thus, the selected candidate URL may reflect both the relevance of the URL to the database keywords and the popularity of the URL among general users surfing the Internet, reflecting the likelihood that the URL will be selected by users searching the Internet using the corresponding database keywords.
In step 606, the server may extract a plurality of keywords (hereinafter referred to as "URL feature keywords") from the content of the page pointed to by each URL and calculate a value of the importance of each URL feature keyword. To do so, the server may first extract the content of the URL. For example, the server may only extract textual content from the URL, which excludes any non-relevant information such as advertisements. The server may then compare the content to a dictionary (e.g., keyword dictionary 512) to extract URL feature keywords from the content, where the dictionary serves as an encyclopedia-style keyword database. In addition, the server may calculate a value for each URL feature keyword that reflects the importance of the URL feature keyword in the content of the URL. The calculation may be based on semantic values of the URL feature keywords and the likelihood that the corresponding URL will be selected by the user. For example, the server may perform a TF-IDF (term frequency-inverse document frequency) analysis for each URL feature keyword in the entire page content pointed to by the URL and obtain a corresponding TF-IDF value for the URL feature keyword. The server may then use the formulaTo calculate the value of the importance of the URL feature key, where d is the document (web page content) pointed to by the URL, fidIs the ith URL feature key, α is an empirical value, [1+ log (click)d+1)]Is a weight corresponding to the number of clicks the URL received in the past, andis a weight corresponding to the URL's position in the list of URL search results (i.e., the rank or relevance of the keyword). Considering that repeated searches for the same keyword may not yield the same URL search results, the location may be an average location of the URL among a predetermined number of searches.
The server may perform the above URL feature keyword extraction and significance value calculation for each candidate URL and collect URL feature keywords together. When a URL feature key occurs in content corresponding to more than one candidate URL, the server may add each individual importance value of the URL feature key to obtain an overall importance value of the URL feature key according to the following formula:
in step 608, the server may determine a feature vector (hereinafter referred to as a "database keyword feature vector") for each database keyword in the keyword dictionary 512. To this end, the server may place all words in the dictionary or all keywords in the keyword database of the keyword dictionary 512 in a predetermined sequence and treat the sequence as a feature vector template, such that each word in the sequence has a fixed position and becomes an element of the feature vector template. Thus, all URL feature keys of a candidate URL may correspond to elements in the feature vector template. Next, the server may obtain a feature vector for the database key by assigning a value for each element in the feature vector template. If an element in the feature vector template is not a URL feature key, the server may assign a value of 0 to the element. If the element is a URL feature key, the server may assign an overall importance value for the URL feature key to the element. Thus, the database key feature vector may be:
V(URL_feature_keyword)={0,0,...,0,score(f1),0,...,0,score(f2),0,...,0,score(fi),0,...}
in step 610, the server may save the database key feature vector and associate it with the corresponding database key. The server may complete the above database keyword feature vector determination for each database keyword in the keyword dictionary 512 before the advertiser 505 enters the creative 518.
Fig. 7 is a flow diagram of a first stage analysis 508 according to an example embodiment of the present disclosure. After receiving the creative 518 from the advertiser 502 in step 702, the keyword suggestion engine 504 may determine a feature vector for the creative (hereinafter referred to as a "creative feature vector").
To this end, in step 704, the keyword suggestion engine 504 may extract keywords (hereinafter referred to as "creative keywords") from the creative based on the dictionary in a manner similar to the extraction process in step 606. For example, for creatives in Table 1, the extracted keywords may be:
<Ashley,look,furniture,visit,today,home,furniture,industries,store,...>。
the keyword suggestion engine may then also calculate an importance value for each creative keyword. For example, the keyword suggestion engine 504 may perform a TF-IDF analysis on each creative keyword and obtain its value. The TF-IDF value may be considered as an importance value for the corresponding creative key. Thus, the importance value for each creative key of the creatives in Table 1 may be:
<Ashley:0.465,look:0.140,furniture:0.447,visit:0.151,today:0.152,home:10.13,furniture:0.401,industries:0.161,store:0.234,...>。
in step 706, the keyword suggestion engine 504 can determine a creative feature vector for the creative. To do so, the keyword suggestion engine 504 may use the feature vector template described in step 608 and assign an element in the feature vector template a value of 0 if the element is not a creative keyword. If the element is a creative keyword, the keyword suggestion engine 504 may assign an importance value corresponding to the creative keyword to the element. Thus, the creative feature vector of the creative in table 1 may be:
V(creative)={0,....,0.465,...,0.140,...,0.447,...,0.151,...,0.152,...,10.13,...,0.401,...,0.161,...,0.234,...}。
in step 708, the keyword suggestion engine 504 may compute a similarity value (e.g., cosine similarity) between the creative feature vector and each database keyword feature vector stored in the keyword dictionary 512. The higher the similarity between the creative feature vectors and the database keyword feature vectors, the more relevant the creative is to the corresponding database keyword.
The keyword suggestion engine 504 may then select a set of candidate keywords at step 710, which includes a predetermined number (e.g., 500) of database keywords corresponding to the database keyword feature vector having the highest similarity to the creative feature vector. These candidate keywords may represent the keywords that are most relevant to the creative (e.g., the 500 most relevant keywords).
In some instances, not all of the candidate keywords are ideal or preferred by advertisers to bid on. For example, an advertiser may determine not to place an advertisement in response to a search query that includes the names of the advertiser's competitors. Thus, the keyword suggestion engine 504 may obtain an exclusion list for the advertiser. The exclusion list may be obtained from a database accessible to the keyword suggestion engine 504 or may be provided by the advertiser. The exclusion list may include the advertiser's competitor name, or may include other keywords that the advertiser does not wish to bid on.
Next, in step 712, the keyword suggestion engine 504 may refine the candidate keywords by filtering out keywords in the exclusion list from the candidate keywords. For example, the keyword suggestion engine 504 may analyze each candidate keyword and extract brand-related terms from the candidate keyword. The keyword suggestion engine 504 may also analyze the creative and extract brand-related terms therein (e.g., Ashley in the creative of table 1). If the candidate keyword does not include brand-related terms, the candidate keyword may be content-neutral. Further analysis may not be required. Otherwise, the keyword suggestion engine may compare the brand-related terms from the creative with the brand-related terms from the candidate keyword. If the terms have a large overlap (i.e., the two brand-related terms are similar), the keyword suggestion engine 504 may determine that the corresponding creative and candidate keywords are likely to refer to the same product or service brand. However, if a brand-related term from the candidate keyword exists but little or no overlap with the brand-related term from the creative, the keyword suggestion engine 504 may determine that the brand-related term is associated with a competitor. Thus, the corresponding candidate key may be removed from the candidate key group.
Fig. 8 is a flow chart illustrating a second stage analysis according to an example embodiment of the present disclosure. In the second stage analysis, the keyword suggestion engine 504 may evaluate the refined candidate keywords and further select to suggest a set of keywords 520. The evaluation and selection may be based on semantic similarity, category similarity, and feature similarity of the candidate keywords to the creative.
In step 802, the keyword suggestion engine 504 may decompose the terms in the creative 518. Any word separated from other words by spaces and punctuation in the creative can be considered a single term. As a result, the keyword suggestion engine 504 may obtain a set of creative terms. For example, for the creatives in Table 1, the corresponding term set may be: fine grain looping stop is the name in the name and looping view an looping stop and a saving stop >. Similarly, the keyword suggestion engine 504 may also obtain a term set for each refined candidate keyword. For example, for the keyword "homefurnituresgestigen", the term set may be: < home furniture suppression >.
In step 804, the keyword suggestion engine 504 may determine a word overlap count for each candidate keyword. The word overlap count may be the number of terms in the candidate keyword term set that also appear in the creative term set. In the above example, the two terms "home" and "furniture" are overlapping terms in that they both appear in both the set of keyword terms and the set of creative terms. Therefore, the word overlap count of the keyword "home furniture suppression" is 2. The word overlap count may reflect an absolute degree of overlap between the candidate keyword and the creative. The larger the value of the word overlap count, the more terms the candidate keyword shares with the creative. Thus, the word overlap count may reflect an aspect of word similarity between the candidate keyword and the creative.
In step 806, keyword suggestion engine 506 may determine a word overlap ratio for each candidate keyword. The word overlap ratio may be a ratio between the word overlap count and the number of terms in the candidate keyword term set. For example, the term set < home furniture suppression > includes three terms and has a word overlap count equal to 2. Therefore, its letter overlap ratio is 2/3. The word overlap ratio may reflect the completeness of overlap of the candidate keyword with the creative. The larger the proportion of text overlap, the better or "parallel" the overlap. Thus, the word overlap ratio may reflect another aspect of word similarity between the candidate keyword and the creative.
In step 808, the keyword suggestion engine 504 may further categorize each keyword of the creative and the refined candidate keywords. For example, the keyword suggestion engine 504 may access category analysis settings that are pre-constructed offline. The category analysis settings may include a category database and may be configured to map each category with a set of navigation keywords. As a result, when the category analysis setting receives a creative, it may search the mappings and determine one or more categories that best match keywords in the creative. For example, the creatives in Table 1 can be categorized into 3 categories: retail, home, and appliance; the keyword "home furniture subscription" can be classified into 2 categories: retail (retail) and home (home).
In step 810, the keyword suggestion engine 504 may further determine a category similarity between the creative and each of the keywords of the refined candidate keywords. The class similarity may be calculated according to the following formula: category similarity-category overlap count/creative category number. In the above example, the category overlap count of the keyword is 2 because there are two categories of the keyword "homefunniture deletion" (i.e., "retail" and "home") that overlap with three categories of the creative (i.e., "retail", "home", and "applications"). Therefore, the category similarity of the keyword is 2/3.
In step 812, the keyword suggestion engine 504 may determine a recommendation for each refined candidate keyword. The determination may be based on feature similarity, word overlap count, word overlap ratio, and category similarity of the candidate keywords with respect to the creative. For example, the keyword suggestion engine 504 may take as input the feature similarity, word overlap count, word overlap ratio, and category similarity of the candidate keywords to the creative and perform a pre-trained linear regression algorithm. The linear regression algorithm may return a score (e.g., 0 to 1) as the degree of recommendation by evaluating the value of the input. The keyword suggestion engine 504 may employ the candidate keyword only if the score is above or equal to a threshold (e.g., 0.4).
Finally, in step 814, the keyword suggestion engine 504 may select the candidate keyword with the highest recommendation as the suggested bid keyword for the advertiser 502 and return the suggested bid keyword.
Fig. 9 is a flow chart illustrating a regression training process according to an example embodiment of the present disclosure. This process may be run by a server, such as server 200, and may be used for the linear regression algorithm in step 810.
In step 902, an editor may prepare a set of example creative-keyword pairs. The editor may be a person, such as a designer of the keyword suggestion system 500. The example set of creative-keyword pairs may include approximately 100 creatives, and each creative may be paired with 30 to 50 keywords. Each keyword may be selected based on the creative.
In step 904, the server may determine feature similarity, word overlap count, word overlap ratio, and category similarity for the keywords in the same process as the first and second stage analysis.
In step 906, recommendations may be assigned to the creative-keyword pairs manually based on actual human experience with the creative-keyword pairs. For example, an editor (who is a person) may read each creative-keyword pair and tag the creative-keyword pair with a score that reflects how much he/she recommends keywords (i.e., how well the keywords match the creative based on his/her perception as a person). The score may be a value between 0 and 1. For example, a 1 may represent a perfect match, 0.7 may represent an excellent match, 0.5 may represent a good match, 0.4 may represent a general match, and 0 may represent a poor match. Thus, each creative-keyword pair may have a manually marked value.
In step 908, the scores of the keywords and the feature similarity, word overlap count, word overlap ratio, and category similarity may be used as training data to optimize the linear regression algorithm. As a result, a linear regression algorithm may be used to determine the score (i.e., recommendation) for the candidate keyword, with the feature similarity, word overlap count, word overlap ratio, and category similarity of the candidate keyword as inputs.
The above example embodiments of the present disclosure provide systems and methods for bidding keyword suggestion. These systems and methods may suggest bidding keywords to advertisers based on creatives submitted by the advertisers. An advertiser need not provide its ad creative with initial seed keywords and/or category information to receive suggested keywords for bidding on online advertising opportunities. To this end, the system performs a two-stage keyword suggestion analysis.
In a first stage of analysis, the system and method may collect a database of keywords from search queries used by the general public. By using the selected search results for each keyword, the systems and methods may construct a database keyword feature vector for each keyword. When the system and method receives a creative from an advertiser, the system and method may construct a creative feature vector and compare the creative feature vector to the database keyword feature vector. The system and method may then pick the keyword from the database for which the vector has the highest similarity (e.g., cosine similarity) compared to the creative feature vector. Finally, the system and method may remove those selected database keys that contain excluded information (e.g., the name of the competitor) and return the remaining selected database keys as candidate keys.
In a second stage analysis, the systems and methods may refine the selection by evaluating each candidate keyword using feature similarity, category similarity, and text similarity between each candidate keyword and the creative. The finally selected candidate keyword may be returned as the suggested keyword. Advertisers may use suggested keywords in bidding for online advertising opportunities.
Although example embodiments of the present disclosure relate to systems and methods for online advertising keyword suggestion, these systems and methods may also be applied in other applications. For example, in addition to suggesting bid keywords for use in a scenario when a user enters a search query, the systems and methods may also be implemented to provide suggested web page content for online advertising. In another example, in addition to analyzing advertising creatives, the systems and methods may also be implemented to analyze the content of web pages.
Thus, the example embodiments shown in fig. 1-9 serve only as examples of several ways to describe implementations of the present disclosure. They should not be construed as limiting the spirit and scope of the example embodiments of the disclosure. It should be noted that various modifications or alterations may still be made by those skilled in the art without departing from the spirit and scope of the exemplary embodiments. Such modifications or variations are intended to fall within the scope of the example embodiments as defined in the appended claims.