Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only a part of embodiment of the present invention, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Refer to Fig. 1, Fig. 1 is the schematic flow sheet of a kind of bookmark access method that the embodiment of the present invention provides, and in embodiments of the present invention, the method can comprise the following steps.
S101: when receiving bookmark access instruction, obtains the website information of the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance.
Accessed web page more easily for the convenience of the user, existing browser all achieves bookmark collection function, and interested web storage, when browsing webpage, can be become bookmark, be left in collection by the form of bookmark by user.Access this webpage if want after user, directly search in collection and click corresponding bookmark and can open the webpage that user needs to access fast, and all need not input tediously long URL network address at every turn or just can find this webpage after search for a long time.Be understandable that, in embodiments of the present invention, the corresponding bookmark data of each bookmark, described bookmark data comprises website information and the page key words of webpage corresponding to this bookmark, wherein, bookmark data can be kept at local terminal, also can be kept in bookmark service device, concrete, the embodiment of the present invention does not limit.
As one embodiment of the present invention, when receiving bookmark access instruction, obtain the website information of the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance, this website information can comprise the URL network address for accessing the first webpage.
S102: the request sending the webpage data information obtaining described first webpage according to described website information to web page server, responds described request to make described web page server.
Website information according to getting initiates request of access to web page server, described request is responded to make web page server, described request of access is used for the webpage data information obtaining the first webpage from web page server, with the webpage data information making web page server return the first webpage, when terminal receives the webpage data information that web page server returns, process can be played up to these webpage data information by render engine, and by the web displaying after playing up on a user interface, in this case, user can browse to the web page contents of first webpage corresponding with target bookmark.But, in actual applications, in the real process that terminal is served to web server request, web page server normally can not return the webpage data information of the first webpage, as the first webpage delete from web page server, the website information of the first webpage has been modified or web page server breaks down etc., in this case, web page server can return the error message that normally can not return webpage data information to the terminal sending request of access according to actual conditions, such as: the webpage of request does not exist; The URL network address of request is long, and server cannot process etc.
S103: when determining that described web page server normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance.
In the embodiment of the present invention, according to the data message that web page server returns, determine whether web page server normally can return the webpage data information of the first webpage, if so, then directly load and show the webpage data information returned in a browser; If not, illustrate that web page server normally can not return the webpage data information of the first webpage, the second then relevant to this first webpage according to the page key words search of the first webpage preserved in advance webpage, wherein, the second webpage is the webpage maximum with the first webpage similarity.It should be noted that, in embodiments of the present invention, second webpage maximum with the first webpage similarity can be one or more, specifically can refer to arrange according to carrying out descending/ascending order with the first webpage similarity to the webpage searched, rank can as second webpage maximum with the first webpage similarity at front/posterior one or more webpage.
Such as: data message when instruction web page server normally can not return webpage data information can be: the status information such as " webpage of request does not exist ", " server runs into mistake; cannot complete request ", " server cannot use at present ", also can be HTTP status code, as the status codes such as 404,200, concrete, the data type that should return according to web page server is determined, the embodiment of the present invention is not restricted.The data type that the embodiment of the present invention can return according to web page server pre-sets the situation belonging to and normally can not return webpage data information.Such as: suppose that web page server returns HTTP status code, then can pre-set the status code indicating web page server normally can not return webpage data information is 404,500 or 503, obtains the HTTP status code that web page server returns; If status code is the wherein a kind of situation in preset state code, namely web page server is indicated normally can not to return webpage data information, in this case, the second relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance webpage.
What deserves to be explained is, the website information and the page key words that obtain the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance in step S101 and step S103 directly can get from local data base, can download from bookmark service device and come, the embodiment of the present invention does not do concrete restriction yet.
As one embodiment of the present invention, according to the page key words that the method for the page key words search of the first webpage corresponding to the target bookmark preserved in advance second webpage relevant to the first webpage can be the first webpage that basis is preserved in advance, search for the similar web page of the first webpage in a search engine, then from the search and webpage that Search Results comprises, obtain second webpage maximum with described first webpage similarity.Usually, search engine is when according to keyword search similar web page, the search and webpage returned normally to carry out descending sort by similarity size, therefore, the method obtaining second webpage maximum with the first webpage similarity can be from the search and webpage that Search Results comprises, directly choose the webpage coming foremost.But, the algorithm possibility that each search engine calculates webpage similarity is different, and the order of the search and webpage returned also can be therefore different, therefore, specific search engine can be selected to search for according to self needing by user, the search engine that also can pre-set acquiescence is searched for.Such as: display user points out interface, described user points out interface from multiple search engines that described interface provides, to select one of them search engine to search for the similar web page of described first webpage, as Baidu, Google, soso, Netease etc. for pointing out user.When receiving selection search engine instruction, as user selects Google search engine, or during the non-selected search engine of user, adopt the search engine of system default, as Baidu, according to the page key words of described first webpage, in selected search engine, search for the similar web page of described first webpage.
As another embodiment of the invention, also can be the page key words according to the first webpage preserved in advance, search for the similar web page of the first webpage in a search engine, according to all search and webpages returned, the webpage maximum with the first webpage similarity is obtained further, such as: the page key words obtaining each search and webpage in Search Results by the method extracting page key words; According to the page key words of each search and webpage and the page key words of the first webpage, calculate the similarity of each search and webpage and the first webpage; Then according to calculated similarity, from all search and webpages, obtain the second webpage maximum with the first webpage Similarity value, thus this second webpage is the webpage maximum with the first webpage similarity.
As a kind of optimal way of the embodiment of the present invention, for cause search engine to return when avoiding the page key words quantity of the first webpage more search and webpage quantity very little or situation about not having occur, the embodiment of the present invention can be that selected part keyword is searched for from the page key words of the first webpage, for convenience of description, to choose M, then a described M keyword is front M the page key words that in the page key words of the first webpage, correlativity is maximum, what deserves to be explained is, the page key words of the first webpage should carry out descending according to the correlativity size of each keyword and the first webpage or ascending order arranges, according to a described M page key words, search for the similar web page of the first webpage in a search engine.
It should be noted that, in embodiments of the present invention, " correlativity is maximum " refers to and carries out ascending order/descending sort according to the correlativity of keyword and the first webpage, rank can be called " correlativity is maximum " at front/posterior one or more keyword, be not limited to rank before/a last keyword.
Also it should be noted that, in embodiments of the present invention, from the page key words of the first webpage, the method for selected part keyword is not limit, such as, the size of the frequency that can occur in the content of pages of the first webpage according to page key words determines the size of the correlativity of page key words and the first webpage, chooses the preceding M of the frequency rank keyword that page key words occurs in the content of pages of the first webpage and searches for.
As the another kind of optimal way of the embodiment of the present invention, during by page key words search similar web page, search engine may return a lot of similar web page, usually these similar web pages carry out descending sort according to similarity size, the webpage accuracy maximum with the first webpage similarity is chosen for improving, concrete, Ke Yishi: choose a front L search and webpage from the search and webpage that Search Results comprises, and extract the page key words of each search and webpage in a described L search and webpage; According to the page key words of each search and webpage in a described L search and webpage and the page key words of described first webpage, calculate the similarity of each search and webpage and described first webpage in a described L search and webpage; According to calculated similarity, from a described L search and webpage, obtain second webpage maximum with described first webpage Similarity value, thus this second webpage is the webpage maximum with the first webpage similarity.
S104: show the content of pages of described second webpage and/or the website information of described second webpage.
After obtaining the second webpage relevant to the first webpage by step S103, load and show the content of pages of the second webpage, or showing the website information of the second webpage, by user's selection the need of the content of pages browsing the second webpage.
In the bookmark access method described by the embodiment of the present invention, when receiving bookmark access instruction, the website information of the first webpage that the target bookmark specified by this bookmark access instruction is corresponding initiates request of access to web page server, the webpage data information that web page server normally can not return the first webpage if get, the second then relevant to the first webpage according to the page key words search of the first webpage webpage, and show the second webpage.Adopt the embodiment of the present invention, can when web page server normally can not return the webpage data information of institute's requested webpage, the webpage maximum with this webpage similarity is shown to user, the bookmark facilitating user to pass through to have collected has access to interested content, can when the webpage data information that the URL that web page server is deleted or amendment is collected is corresponding, user still can get some relevant webpages of the webpage corresponding to the URL collected, thus the information that can provide to user.
Refer to Fig. 2, Fig. 2 is another schematic flow sheet of a kind of bookmark access method that the embodiment of the present invention provides, and in embodiments of the present invention, the method can comprise the following steps.
S201: when the bookmark receiving the first webpage adds instruction, obtain the page key words of described first webpage and the website information of described first webpage.
When user is by browser access webpage, interested web storage can be become bookmark, after being convenient to during accessed web page, based on the bookmark of this storage, fast access is to interested webpage.No matter at PC or at mobile terminal, bookmark is all a very important assistant browsing instrument.As one embodiment of the present invention, user is in the process browsing webpage, when needing the webpage browsed to be added into collection as bookmark, web storage is the bookmark interpolation instruction of bookmark to terminal input by user, the interpolation bookmark that terminal receives user's input adds instruction, thus adding webpage to collection with the form of bookmark, the collection button of such as user's click browser realizes the interpolation of bookmark.
What deserves to be explained is, the terminal in the embodiment of the present invention can be the smart machines such as mobile phone, IPAD, computing machine, and the embodiment of the present invention is not restricted this, as long as the terminal of running browser class software can all belong to the protection domain of the embodiment of the present invention.
In embodiments of the present invention, when the bookmark receiving the first webpage adds instruction, the website information of automatic acquisition first webpage, as URL network address, and obtain the page key words of the first webpage, the mode obtaining the page key words of the first webpage can be from extracting directly the label data or title of webpage, also can be to adopt proprietary keyword extraction algorithm to extract.At present, the algorithm extracting the page key words of webpage has a lot, as adopted partitioning algorithm, TF-IDF (TermFrequency – InverseDocumentFrequency, term frequency-inverse document frequency) algorithm, keyword extraction algorithm etc. based on semanteme, in the embodiment of the present invention, the concrete mode of the page key words of acquisition first webpage is not restricted.
For convenience of description, the embodiment of the present invention is to adopt TF-IDF algorithm to extract the page key words of the first webpage.
TF-IDF is a kind of statistical method, in order to assess the significance level of a words for a copy of it file in a file set or a corpus, is a kind of conventional weighting technique prospected for information retrieval and information.The importance of words to be directly proportional increase along with the number of times that it occurs hereof, the decline but the frequency that also can occur in corpus along with it is inversely proportional to simultaneously.In the file that portion is given, word frequency (TermFrequency, TF) refers to the number of times that some given words occur in this document; Reverse document-frequency (InverseDocumentFrequency, IDF) is the tolerance of a word general importance; TF-IDF value=TF*IDF, known, the importance of certain word to article is higher, and its TF-IDF value is larger, therefore, when adopting TF-IDF algorithm to extract page key words, the TF-IDF value calculated according to page key words evaluates the correlativity of this page key words and the first webpage, if TF-IDF value is larger, then larger with the correlativity of the first webpage, otherwise correlativity is less.So after calculating the TF-IDF value of all words in article, during by the descending sort of TF-IDF value, coming several words of foremost, is exactly the Chief word of this section of article.The concrete computation process of TF-IDF algorithm is as follows:
(1) word frequency TF is calculated.
The number of times that word frequency (TF)=certain word occurs in article;
Consider that article has dividing of length, for the ease of the comparison of different article, " word frequency TF " carried out standardization:
Total word number of number of times/article that word frequency (TF)=certain word occurs in article.
(2) inverse document frequency IDF is calculated.
Need to pre-set a corpus (Corpus) when calculating inverse document frequency IDF, be used for the environment for use of analogous language.
Inverse document frequency (IDF)=log (total number of documents/(the comprising the number of files+1 of this word) of corpus);
If a word is more common, the number of files then comprising this word is more, denominator is larger, inverse document frequency is less, more close to 0, such as: " ", "Yes", " ", " with ", " in ", " ", this kind of everyday words such as " obtaining ", in TF-IDF algorithm, such word is called " stop words " (StopWords), representing finding result to have no to help, is the word that must filter out.Denominator adds 1, be in order to avoid denominator be 0 situation (namely all documents do not comprise this word).Log represents that the value to obtaining is taken the logarithm.
(3) TF-IDF value is calculated.
TF-IDF=word frequency (TF) × inverse document frequency (IDF).
Such as: if to add in the first webpage of bookmark one to have 1,000 words, for convenience of description, for 5 wherein concrete words, being respectively " release ", " energy ", " atomic energy ", " form " and " application ", there are 2 times, 3 times, 15 times, 1 time and 5 times in these 5 words respectively.Suppose that the total number of documents of corpus is 25,000,000,000, the webpage wherein, comprising " release " has 8,900,000,000, the webpage comprising " energy " has 1,400,000,000, the webpage comprising " atomic energy " is 0.484 hundred million, and the webpage comprising " form " word has 1,000,000,000, and the webpage comprising " application " is 300,000,000.
Then their word frequency TF, inverse document frequency IDF and TF-IDF are as shown in table 1, and the TF-IDF value of Partial key word has been shown in table 1.
Table 1 is the TF-IDF value of the Partial key word enumerated in the embodiment of the present invention
| Word frequency TF | Inverse document frequency IDF | TF-IDF |
| Release | 0.002 | 1.92 | 0.00245 |
| Energy | 0.003 | 1.25 | 0.00375 |
| Atomic energy | 0.015 | 2.71 | 0.04065 |
| Form | 0.001 | 1.92 | 0.00192 |
| Application | 0.005 | 0.49 | 0.00384 |
The word frequency that can be calculated them by (1) is respectively respectively 0.002,0.003,0.015,0.035 and 0.005, and their inverse document frequency IDF is respectively 1.92,1.25,2.71,1.92 and 0.49.
As known from Table 1, the TF-IDF value of " atomic energy " is the highest, next is " application ", " energy " and " release ", " form " is minimum, described keyword is carried out relevance ranking, suppose in descending sort, be " atomic energy " after 5 the word sequences then enumerated, " application ", " energy ", " release " and " form ", if getting N is 1, then get the larger keyword of correlativity for " atomic energy ", if getting N is 2, then get the larger keyword of correlativity for " atomic energy " and " application ", if getting N is 3, the top n keyword obtaining correlativity larger is " atomic energy ", " application " and " energy ".
Preferably, each page key words associates corresponding word frequency TF (TermFrequency), the number of times that word frequency TF i.e. certain page key words occurs in webpage, considers that article has dividing of length, for the ease of the comparison of different article, usually word frequency TF is carried out standardization.Such as: with the data instance in table 1, known: the word frequency that " atomic energy " associates is 0.015, the word frequency that " application " associates is 0.005, and the word frequency that " energy " associates is 0.003, and the word frequency that " release " associates is 0.002 word frequency associated with " form " is 0.001.Page key words associates corresponding word frequency, when facilitating user again to access preserved bookmark, if the first webpage is deleted from web page server, search engine can get the webpage maximum with former first webpage similarity according to the page key words preserved and word frequency thereof, thus by web page display maximum for similarity to user, make user get the content of needs.
Usually, very many keywords can be got when adopting keyword extraction algorithm to obtain page key words from the first webpage, therefore, the correlativity size of each keyword and the first webpage is obtained further after obtaining a large amount of keywords, correlativity according to each keyword and the first webpage filters out the larger keyword of some correlativitys, and namely reflects the first webpage meaning to be expressed often through the keyword that these correlativitys are larger.Therefore, preferably, from the content of pages of the first webpage, at least one keyword is extracted; From at least one keyword described, choose the page key words of the N number of keyword maximum with the first web page correlation as the first webpage, N is positive integer.Such as: the correlation values calculating each keyword and the first webpage, all keyword roots are carried out descending sort according to its correlativity associated, then larger with the first web page correlation keyword comes front, numerical value of N according to presetting chooses top n keyword, the N number of page key words maximum with the first web page correlation can be got, as N=100, thus obtain the page key words with the first webpage, specifically, each keyword can be calculated and appear at frequency in the content of pages of the first webpage, the frequency occurred according to keyword sorts, choose the page key words of the preceding N number of keyword of rank as the first webpage.If the first webpage is deleted from web page server, when user wants to access this first Web page bookmark by bookmark, search engine can get the webpage maximum with former first webpage similarity according to the page key words preserved, thus by web page display maximum for similarity to user, make user get the content of needs.
S202: preserve the described page key words of described first webpage and described website information.
The page key words of the first webpage got and website information are kept in the local data base of place terminal, namely often a new bookmark is added, then increase the record of a bookmark data in local data base newly, this bookmark data comprises page key words and the website information of the first webpage corresponding to bookmark.Also can be that the page key words of the first webpage got and website information are sent to bookmark service device, to make page key words and the website information of preserving the first webpage in described bookmark service device, the bookmark data in terminal be backed up.When user again accesses the first webpage by the bookmark added from the collection being added with bookmark time, can be obtain bookmark data from local data base, may also be and download corresponding bookmark data from bookmark service device, concrete, the present invention is not restricted.If this first webpage be stored the web page server of this webpage delete or web page server can not provide access this webpage function time, then can obtain the web page display maximum with this first webpage similarity to user according to page key words included in bookmark data further, thus, even if web page server can not provide the function of this webpage of access, user still can have access to interested webpage by the bookmark collected.
As a kind of possible embodiment, when receiving bookmark synchronization instruction, the bookmark data in synchronous terminal and bookmark service device.Browser can be arranged to automatic synchronization mode also can be arranged to manual synchronization mode.In embodiments of the present invention, be assumed to be automatic synchronization mode, then, when terminal increases newly or deletes a bookmark record, also increase newly accordingly in bookmark service device or delete this bookmark record.
Such as: log in bookmark service device by the mode of username and password, whether the bookmark data for account preservation in automatic identification bookmark service device is consistent with the bookmark data of preserving in local terminal, if inconsistent, then on synchronous terminal and bookmark service device for the bookmark data that the account preserves, as downloaded the bookmark data of not preserving in local terminal from bookmark service device, and/or the bookmark data in local terminal is uploaded on bookmark service device preserves; User is in the process browsing webpage, during a new interpolation bookmark record, automatically bookmark service device is sent to back up bookmark data corresponding for this bookmark, accordingly, when deleting a bookmark record, deletion action is informed bookmark service device, corresponding bookmark data deleted by bookmark service device, bookmark data in terminal and bookmark service device is consistent, better meets user's request.
S203: when receiving bookmark access instruction, obtains the website information of the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance.
In the embodiment of the present invention, when receiving bookmark access instruction, the website information and the page key words that obtain the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance directly can get from local data base, also can download from bookmark service device and come, concrete, the embodiment of the present invention does not limit.
It should be noted that, in embodiments of the present invention, when the website information of the first webpage and page key words be uploaded to bookmark service device is saved time, as the optional mode of one, after user is logged in by username and password, after setting up the connection with bookmark service device, from bookmark service, the website information of the first webpage and page key words are downloaded to this locality immediately to preserve, so that when user issues the instruction of access bookmarks, obtain website information and the page key words of the first webpage from this locality; As another kind of optional manner, also when user issues the instruction of access bookmarks, website information and the page key words of the first webpage can be downloaded in real time from bookmark service device.
S204: the request sending the webpage data information obtaining described first webpage according to described website information to web page server, responds described request to make described web page server.
When terminal receives bookmark access instruction, the bookmark data specified by this bookmark access instruction is obtained from the bookmark data that local data base or bookmark service device are preserved, bookmark data comprises website information and the page key words of the first webpage corresponding to target bookmark, initiates request of access according to the website information in this bookmark data to web page server.
S205: obtain the HTTP status code that described web page server returns.
At present, by the mode returning HTTP status code, web page server informs whether terminal normally can return the webpage data information of institute's requested webpage usually, therefore, terminal obtains the HTTP status code that described web page server returns after initiating request of access to web page server.Such as: common status code: 404 represent that web page server can not find the webpage of request, often can return this code for webpage non-existent on web page server; 200 represent that server has successfully processed request, and usually, this represents that server provides the webpage of request; 403 represent the situation that server refusal is asked; 400 represent that server does not understand grammer and the false request of request; 500 represent that server runs into mistake, cannot complete request, i.e. server internal error; 503 represent that server cannot use at present (due to overload or maintenance shut-downs); 505 represent that server does not support http protocol version used in request etc.
S206: if described status code is the status code that the default described web page server of instruction normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance.
In the embodiment of the present invention, the status code that web page server normally can not return webpage data information can be used to indicate, as 404 status codes by pre-setting.What deserves to be explained is, other one or more status codes that preset state code can return for web page server, such as: 403 status codes, 400 status codes or 500 status codes etc.After receiving the HTTP status code that web page server returns, identify that whether this status code is wherein a kind of in preset state code, if, then illustrate that web page server normally can not return the webpage data information of institute's requested webpage, therefore, the page key words according to the first webpage obtains second webpage maximum with the first webpage similarity, and loads the maximum webpage of this similarity, thus user obtains by accessing the maximum webpage of this similarity the content needing access.Accordingly, if the status code returned is not preset any one that be used to indicate that web page server can not normally return in the status code of webpage data information, such as, return 200 status codes, illustrate that web page server has successfully processed request, usually, this represents that server provides the webpage of request, so, directly load this first webpage, the web page contents of the first webpage is shown to user, and the bookmark namely completed by having collected visits the process of former first webpage.
Usually, adopt keyword in a search engine search and webpage time, the search and webpage returned carries out descending sort according to the correlativity of webpage and this keyword.Therefore, if described status code is the wherein one in preset state code, then carry out automatic search according to the page key words of the first webpage in a search engine, the Article 1 search and webpage returned is the maximum webpage of similarity.For improving degree of accuracy, according to page key words, the method obtaining the webpage maximum with the first webpage similarity can as shown in Figure 3, and Fig. 3 is a kind of schematic flow sheet obtaining the method for similarity webpage provided in Fig. 2, and the method can comprise the following steps (S301 ~ S305).
S301: choose M the page key words maximum with described first web page correlation from the page key words of described first webpage, M>0.
If the status code that web page server returns is the wherein one in preset state code, M the page key words maximum with the first web page correlation is chosen from the page key words of the first webpage of the target bookmark specified by bookmark access instruction, suppose that the page key words of the first webpage is N number of, wherein, M≤N, M>0.Such as: with the data instance in table 1, suppose N=4, M=2,4 page key words in then specified bookmark data are: " atomic energy ", " application ", " energy " and " release ", 2 page key words choosing correlativity maximum from these 4 page key words are: " atomic energy " and " application ".Usually, during newly-increased bookmark, for the N number of page key words by getting gives full expression to the meaning of the first webpage, the value of N is comparatively large, and returns the result without relevant search webpage possibly when inputting more keyword search similar web page in a search engine.Therefore, from more N number of page key words, choose the larger M of an a small amount of correlativity page key words in the embodiment of the present invention search in a search engine, obtain the related web page with former first webpage, then from these related web pages, extract the webpage maximum with former first webpage similarity further.
What deserves to be explained is, the page key words of the first webpage carries out ascending order/descending sort by correlativity size, therefore, directly chooses forward/M keyword rearward and is M the page key words maximum with the first web page correlation.Certainly, also can be the correlation values that page key words is associated with this page key words and the first webpage, M the page key words maximum with the first web page correlation can be obtained according to corresponding numerical value.
S302: according to a described M page key words, searches for the similar web page of described first webpage in a search engine.
In the embodiment of the present invention, can be that the search engine adopting user to select is searched for, also can be adopt the default search engine pre-set to search for.Such as: display user points out interface, described user points out interface from multiple search engines that described interface provides, to select one of them search engine to search for the similar web page of described first webpage, as Baidu, Google, soso, Netease etc. for pointing out user.Receive when selecting search engine instruction, as user selects Google search engine, or during the non-selected search engine of user, adopt the search engine pre-set, as Baidu, according to a described M page key words, in selected search engine, search for the similar web page of described first webpage.
S303: choose a front L search and webpage from the search and webpage that Search Results comprises, and extract the page key words of each search and webpage in a described L search and webpage.
Usually, search engine is when according to keyword search similar web page, the search and webpage returned carries out descending sort by similarity size, therefore, chooses a front L search and webpage and can get second webpage maximum with the first webpage similarity from the search and webpage that Search Results comprises.In the embodiment of the present invention, from the search and webpage that Search Results comprises, choose a front L search and webpage, L is default positive integer, after getting the larger search and webpage of L similarity, extracts the page key words of each search and webpage in L search and webpage.Such as: when the page key words of the first webpage is N number of, N number of page key words is extracted for each search and webpage in L search and webpage.Concrete, can method described in step S201, extract the maximum N number of page key words of correlativity for each search and webpage.Therefore, can get L bar data, every bar data comprise N number of page key words, and preferably, its corresponding word frequency of each page key words is associated.
S304: according to the page key words of each search and webpage in a described L search and webpage and the page key words of described first webpage, calculate the similarity of each search and webpage and described first webpage in a described L search and webpage.
In embodiments of the present invention, alternatively, word frequency associated by N number of page key words that correlativity in each search and webpage is maximum and the word frequency associated by the N number of page key words in specified bookmark data, N number of page key words in each webpage is configured to proper vector, L+1 proper vector, uses A respectively altogether1, A2..., Alrepresent with B, wherein, A1, A2..., Alrepresent the proper vector of N number of page key words that correlativity is maximum in L search and webpage respectively, B represents the proper vector that N number of page key words of the first webpage is formed.A is calculated based on the similarity calculating method such as the cosine law or Euclidean distance1, A2..., Aland the similarity such as cosine similarity or Distance conformability degree between B, thus get the similarity of each search and webpage and the first webpage.Be worth larger just expression two webpages more similar.
S305: according to calculated similarity, obtains second webpage maximum with described first webpage similarity from a described L search and webpage.
The webpage selecting numerical value maximum from calculated similarity numerical value is second webpage maximum with the first webpage similarity.Thus, get second webpage maximum with the first webpage similarity by step S301 ~ S305 according to the page key words of the first webpage.
S207: show the content of pages of described second webpage and/or the website information of described second webpage.
After obtaining the second webpage relevant to the first webpage by above-mentioned steps, load and show the content of pages of the second webpage, or showing the website information of the second webpage, by user's selection the need of the content of pages browsing the second webpage.
In the bookmark access method described by the embodiment of the present invention, when receiving bookmark access instruction, the website information of the first webpage that the target bookmark specified by this bookmark access instruction is corresponding initiates request of access to web page server, the webpage data information that web page server normally can not return the first webpage if get, the second then relevant to the first webpage according to the page key words search of the first webpage webpage, and show the second webpage.Adopting the embodiment of the present invention, can when web page server normally can not return the webpage data information of institute's requested webpage, show the webpage maximum with this webpage similarity to user, the bookmark facilitating user to pass through to have collected has access to interested content.
Be apparatus of the present invention embodiment below, apparatus of the present invention embodiment is for performing method of the present invention, and for convenience of explanation, illustrate only the part relevant to apparatus of the present invention embodiment, concrete ins and outs do not disclose, and please refer to the inventive method embodiment.
Refer to Fig. 4, Fig. 4 is the structural representation of a kind of bookmark access means that the embodiment of the present invention provides, and in embodiments of the present invention, this bookmark access means comprises: acquisition module 101, sending module 102, search module 103 and display module 104.
Acquisition module 101, for when receiving bookmark access instruction, obtains the website information of the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance.
Sending module 102, for sending the request of the webpage data information obtaining described first webpage to web page server according to described website information, responds described request to make described web page server.
Search module 103, for when determining that described web page server normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance.
Display module 104, for the website information of the content of pages and/or described second webpage that show described second webpage.
In the bookmark access means described by the embodiment of the present invention, when receiving bookmark access instruction, the website information of the first webpage that the target bookmark specified by this bookmark access instruction is corresponding initiates request of access to web page server, the webpage data information that web page server normally can not return the first webpage if get, the second then relevant to the first webpage according to the page key words search of the first webpage webpage, and show the second webpage.Adopting the embodiment of the present invention, can when web page server normally can not return the webpage data information of institute's requested webpage, show the webpage maximum with this webpage similarity to user, the bookmark facilitating user to pass through to have collected has access to interested content.
Refer to Fig. 5, Fig. 5 is another structural representation of a kind of bookmark access means that the embodiment of the present invention provides, in embodiments of the present invention, this bookmark access means comprises: acquisition module 201, sending module 202, search module 203, display module 204 and preservation module 205.
Acquisition module 201, for when receiving bookmark access instruction, obtains the website information of the first webpage corresponding to the target bookmark specified by described bookmark access instruction preserved in advance.
Sending module 202, for sending the request of the webpage data information obtaining described first webpage to web page server according to described website information, responds described request to make described web page server.
Search module 203, for when determining that described web page server normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance.
Display module 204, for the website information of the content of pages and/or described second webpage that show described second webpage.
Preserve module 205, for:
When the bookmark receiving the first webpage adds instruction, obtain the page key words of described first webpage and the website information of described first webpage;
Preserve the described page key words of described first webpage and described website information.
Preferably, search module 203 specifically for:
Obtain the HTTP status code that described web page server returns;
If described status code is the status code that the default described web page server of instruction normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance.
Preferably, preserve module 205 specifically for:
At least one keyword is extracted from the content of pages of described first webpage;
From at least one keyword described, choose the page key words of the N number of keyword maximum with described first web page correlation as described first webpage, N is positive integer.
Preferably, search module 203 can comprise: search unit 2031, acquiring unit 2032 and Tip element 2033, and as shown in Figure 6, Fig. 6 is the structural representation of wherein a kind of search module that Fig. 5 provides.
Search unit 2031, for when determining that described web page server normally can not return webpage data information, according to the page key words of described first webpage preserved in advance, searches for the similar web page of described first webpage in a search engine.
Acquiring unit 2032, for obtaining second webpage maximum with described first webpage similarity in the search and webpage that comprises from Search Results.
Tip element 2033, points out interface for showing user, and described user points out interface from multiple search engines that described interface provides, to select one of them search engine to search for the similar web page of described first webpage for pointing out user.
Preferably, search unit 2031 specifically for:
When determining that described web page server normally can not return webpage data information, from the page key words of described first webpage, choose M the page key words maximum with described first web page correlation, M>0;
According to a described M page key words, search for the similar web page of described first webpage in a search engine.
Preferably, acquiring unit 2032 specifically for:
From the search and webpage that Search Results comprises, choose a front L search and webpage, and extract the page key words of each search and webpage in a described L search and webpage;
According to the page key words of each search and webpage in a described L search and webpage and the page key words of described first webpage, calculate the similarity of each search and webpage and described first webpage in a described L search and webpage;
According to calculated similarity, from a described L search and webpage, obtain second webpage maximum with described first webpage similarity.
Preferably, when search unit 2031 is also for receiving selection search engine instruction, according to the page key words of described first webpage, in selected search engine, the similar web page of described first webpage is searched for.
Preferably, preserve module 205 specifically for:
The described page key words of described first webpage and described website information are sent to bookmark service device, to make to preserve the described page key words of described first webpage and described website information in described bookmark service device, or preserve the described page key words of described first webpage and described website information in this locality.
Preferably, when described preservation module 205 preserves the described page key words of described first webpage and described website information in described bookmark service device, acquisition module 201 is also for downloading page key words corresponding to target bookmark specified by bookmark access instruction and website information from described bookmark service device.
In the bookmark access means described by the embodiment of the present invention, when receiving bookmark access instruction, the website information of the first webpage that the target bookmark specified by this bookmark access instruction is corresponding initiates request of access to web page server, the webpage data information that web page server normally can not return the first webpage if get, the second then relevant to the first webpage according to the page key words search of the first webpage webpage, and show the second webpage.Adopting the embodiment of the present invention, can when web page server normally can not return the webpage data information of institute's requested webpage, show the webpage maximum with this webpage similarity to user, the bookmark facilitating user to pass through to have collected has access to interested content.
Refer to Fig. 7; Fig. 7 is the structural representation of a kind of terminal that the embodiment of the present invention provides; in embodiments of the present invention; this terminal 10 comprises bookmark access means as above; this terminal can be the equipment such as mobile phone, IPAD, computing machine; the embodiment of the present invention is not restricted this, as long as the terminal of running browser class software can all belong to the protection domain of the embodiment of the present invention.
What deserves to be explained is, terminal 10 comprises any one bookmark access means as above, for convenience of description, is described with wherein a kind of bookmark access means, interested webpage, can be added in collection by bookmark mode when browsing webpage by terminal by user.Concrete, when terminal 10 receives bookmark interpolation instruction, obtain the page key words of current web page and the website information of described current web page, and be kept in terminal by bookmark data, described bookmark data comprises page key words and the website information of current web page.Certainly, the bookmark data got also can send to bookmark service device to back up by terminal, and the bookmark data received is kept at server end after receiving the bookmark data that terminal sends over by bookmark service device.When user needs to access same collection on different terminals, can terminal be carried out synchronous with the bookmark data in bookmark service device, namely terminal obtains Unrecorded bookmark data and/or terminal in terminal from bookmark service device and uploads Unrecorded bookmark data in bookmark service device to bookmark service device, keeps terminal consistent with the bookmark data in bookmark service device.
When terminal receives bookmark access instruction, obtain the website information of the first webpage corresponding to the target bookmark specified by bookmark access instruction preserved in advance; Send the request of the webpage data information obtaining described first webpage according to described website information to web page server, respond described request to make described web page server; When determining that described web page server normally can not return webpage data information, second webpage relevant to described first webpage according to the page key words search of the first webpage corresponding to the described target bookmark preserved in advance; Show the content of pages of described second webpage and/or the website information of described second webpage.
In the terminal described by the embodiment of the present invention, when the bookmark receiving the first webpage adds instruction, obtain and preserve page key words and the website information of the first webpage; When receiving the bookmark access instruction of the first webpage, the website information of the first webpage that the target bookmark specified by this bookmark access instruction is corresponding initiates request of access to web page server, the webpage data information that web page server normally can not return the first webpage if get, the second then relevant to the first webpage according to the page key words search of the first webpage webpage, and show the second webpage.Adopting the embodiment of the present invention, can when web page server normally can not return the webpage data information of institute's requested webpage, show the webpage maximum with this webpage similarity to user, the bookmark facilitating user to pass through to have collected has access to interested content.
The feature of the different embodiment described in this instructions and different embodiment can carry out combining and combining by those skilled in the art.Module in all embodiments of the present invention or unit, universal integrated circuit can be passed through, such as CPU (CentralProcessingUnit, central processing unit), or realized by ASIC (ApplicationSpecificIntegratedCircuit, special IC).Wherein, the step in all embodiment methods of the present invention can be carried out order according to actual needs and be adjusted, merges and delete; Module in all embodiment device of the present invention or unit can carry out merging, divide and deleting according to actual needs.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
In flow charts represent or in this logic otherwise described and/or step, such as, the sequencing list of the executable instruction for realizing logic function can be considered to, may be embodied in any computer-readable medium, for instruction execution system, device or equipment (as computer based system, comprise the system of processor or other can from instruction execution system, device or equipment instruction fetch and perform the system of instruction) use, or to use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can to comprise, store, communicate, propagate or transmission procedure for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically (non-exhaustive list) of computer-readable medium comprises following: the electrical connection section (electronic installation) with one or more wiring, portable computer diskette box (magnetic device), random access memory (RAM), ROM (read-only memory) (ROM), erasablely edit ROM (read-only memory) (EPROM or flash memory), fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium can be even paper or other suitable media that can print described program thereon, because can such as by carrying out optical scanning to paper or other media, then carry out editing, decipher or carry out process with other suitable methods if desired and electronically obtain described program, be then stored in computer memory.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is that the hardware that can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, this program perform time, step comprising embodiment of the method one or a combination set of.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of unit exists, also can be integrated in a module by two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.If described integrated module using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.
The above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.
Above disclosedly be only preferred embodiment of the present invention, certainly the interest field of the present invention can not be limited with this, one of ordinary skill in the art will appreciate that all or part of flow process realizing above-described embodiment, and according to the equivalent variations that the claims in the present invention are done, still belong to the scope that invention is contained.