Movatterモバイル変換


[0]ホーム

URL:


CN105260469B - A kind of method, apparatus and equipment for handling site maps - Google Patents

A kind of method, apparatus and equipment for handling site maps
Download PDF

Info

Publication number
CN105260469B
CN105260469BCN201510676894.0ACN201510676894ACN105260469BCN 105260469 BCN105260469 BCN 105260469BCN 201510676894 ACN201510676894 ACN 201510676894ACN 105260469 BCN105260469 BCN 105260469B
Authority
CN
China
Prior art keywords
site maps
website
link
keyword
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510676894.0A
Other languages
Chinese (zh)
Other versions
CN105260469A (en
Inventor
梁捷
梁卡喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangzhou Shenma Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shenma Mobile Information Technology Co LtdfiledCriticalGuangzhou Shenma Mobile Information Technology Co Ltd
Priority to CN201510676894.0ApriorityCriticalpatent/CN105260469B/en
Publication of CN105260469ApublicationCriticalpatent/CN105260469A/en
Priority to PCT/CN2016/102215prioritypatent/WO2017063596A1/en
Application grantedgrantedCritical
Publication of CN105260469BpublicationCriticalpatent/CN105260469B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The present invention discloses a kind of method, apparatus and equipment for handling site maps.This method includes:The site maps of website are obtained according to presupposed information;Obtain the link of the page in site maps and conduct interviews;Influence to search for the link included in site maps according to accessing result and deleting;Generate new site maps.Technical scheme provided by the invention, site maps sitemap mass can be lifted, the possibility that searched engine is included can also be increased, meet website and the respective needs of search engine.

Description

A kind of method, apparatus and equipment for handling site maps
Technical field
The present invention relates to mobile internet technical field, and in particular to it is a kind of handle site maps method, apparatus and setIt is standby.
Background technology
At present, search engine would generally search net by the link on website (also referred to as website) internal and other websitesPage, site maps sitemap can facilitate the webpage which website notice search engine has be available for crawl on website.It is simplestSitemap forms, it is exactly XML (Extensible Markup Language, extensible markup language) file, lists whereinNetwork address in website and on each network address other metadata (time of last time renewal, the frequency of change and relative toSignificance level of other network address etc. on website), so that search engine can more intelligently capture web site contents.Briefly,Sitemap can be understood as the list linked on website.Generation sitemap simultaneously submits to search engine, can make the interior of websiteAppearance is easily included, including those hide the deep page, and this is a kind of website and the good mode of search engine dialogue.
But the quality of the web site url included inside the sitemap of current many websites offers is possible to occur muchProblem, such as break links, the content of link is inferior or does not upgrade in time, and these situations can all waste search engine and crawlResource, although which results in website to provide sitemap, search engine is not necessarily received according to the result crawledSitemap web site url is recorded, while is also possible to trigger the drop power rule of search engine, reduces the link number included to the websiteMeasure and reduce searching order of the website etc..
Therefore, the processing method of existing site maps, it is impossible to meet website and the respective needs of search engine.
The content of the invention
In order to solve the above technical problems, the present invention provides a kind of method, apparatus and equipment for handling site maps, can meetWebsite and the respective needs of search engine.
According to an aspect of the present invention, there is provided a kind of method for handling site maps, including:
The site maps of website are obtained according to presupposed information;
Obtain the link of the page in site maps and conduct interviews;
Influence to search for the link included in site maps according to accessing result and deleting;
Generate new site maps.
Preferably, it is described to obtain the link of the page in site maps and also include after conducting interviews:
Keyword and text characteristic value are extracted to the page of access;
According to the keyword of extraction and text characteristic value and the keyword and the comparative result of text characteristic value that prestore, deleteThe link that search is included is influenceed in site maps.
Preferably, influenceing the link that search is included in the result deletion site maps according to access includes:
Access result be occur the HTTP 404 that can not access it is wrong when, delete corresponding to link;Or,
When it is the page response time to be more than or equal to given threshold to access result, corresponding link is deleted;Or,
When accessing the title, keyword and imperfect description that result is the page, corresponding link is deleted;Or,
When title, keyword and the description for accessing body matter and the page that result is the page mismatch, delete correspondingLink.
Preferably, the keyword and text characteristic value according to extraction and the keyword and the ratio of text characteristic value that prestoreRelatively result, deleting influences the link that search is included in site maps include:
It is one according to the keyword of extraction and text characteristic value and the keyword and the comparative result of text characteristic value that prestoreCause, be judged as that content repeats to submit, delete corresponding link.
Preferably, methods described also includes:
It is supplied to search engine to access the new site maps of generation.
Preferably, methods described also includes:
Scanned for after recording the new site maps of the search engine access and that includes includes data.
According to another aspect of the present invention, there is provided a kind of device for handling site maps, including:
Acquisition module, for obtaining the site maps of website according to presupposed information;
Access modules, for the site maps obtained according to the acquisition module, obtain the link of the page in site mapsAnd conduct interviews;
First processing module, included for deleting influence search in site maps according to the access result of the access modulesLink;
Generation module, for generating new site maps after the first processing module is handled.
Preferably, described device also includes:
Second processing module, for extracting keyword and text characteristic value to the page of access, according to the keyword of extractionWith text characteristic value and the keyword and the comparative result of text characteristic value that prestore, deleting influences what search was included in site mapsLink;
The generation module generates new website after the first processing module and the Second processing module are handledMap.
Preferably, described device also includes:
Output module, the new site maps for the generation module to be generated are supplied to search engine to access.
Preferably, described device also includes:
Monitoring module, scanned for for recording after the search engine accesses new site maps and that includes includes numberAccording to.
Preferably, the first processing module includes:
First deletes unit, for access result be occur the HTTP 404 that can not access it is wrong when, corresponding to deletionLink;Or,
Second delete unit, for access result be the page response time be more than or equal to given threshold when, deletion pairThe link answered;Or,
3rd deletes unit, for when accessing the title, keyword and imperfect description that result is the page, deleting correspondingLink;Or,
4th deletes unit, for accessing title, keyword and the description that result is the body matter and the page of the pageDuring mismatch, corresponding link is deleted.
According to another aspect of the present invention, there is provided a kind of processing equipment, including:
Memory, for storage program,
Processor, for performing the following procedure of the memory storage:
The site maps of website are obtained according to presupposed information;
Obtain the link of the page in site maps and conduct interviews;
Influence to search for the link included in site maps according to accessing result and deleting;
Generate new site maps.
It can be found that the technical scheme of the embodiment of the present invention, is first carried out by obtaining in site maps after the link of the pageAccess, found according to result is accessed after having an impact the link that search is included, just deleting influences the chain that search is included in site mapsConnect, regenerate new site maps, can thus realize and processing is optimized to original site maps of website, avoid as far as possibleOccur the link that various contents are bad or easily malfunction in site maps, so as to lift site maps quality, can also increaseThe possibility for adding searched engine to include, meets the needs of website and search engine.
Brief description of the drawings
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and itsIts purpose, feature and advantage will be apparent, wherein, in disclosure illustrative embodiments, identical reference numberTypically represent same parts.
Fig. 1 is the indicative flowchart of the method for processing site maps according to an embodiment of the invention;
Fig. 2 is another indicative flowchart of the method for processing site maps according to an embodiment of the invention;
Fig. 3 is another indicative flowchart of the method for processing site maps according to an embodiment of the invention;
Fig. 4 is a kind of schematic block diagram of the device of processing site maps of the present invention;
Fig. 5 is a kind of another schematic block diagram of the device of processing site maps of the present invention;
Fig. 6 is a kind of schematic block diagram of processing equipment of the present invention.
Embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawingPreferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated hereFormula is limited.On the contrary, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosureScope is intactly communicated to those skilled in the art.
The present invention provides a kind of method for handling site maps, can meet website and the respective needs of search engine.
Fig. 1 is the indicative flowchart of the method for processing site maps according to an embodiment of the invention.
As shown in figure 1, including:
Step 101, the site maps according to presupposed information acquisition website.
In the step, can according to website it is consensus after, the configuration information that is provided according to website obtains the net of websiteStand map.
Step 102, obtain the link of the page in site maps and conduct interviews.
In the step, each URL (Uniform Resource Locator, unified resource positioning in site maps are obtainedSymbol) link, and URL link is conducted interviews to verify respectively.
Step 103, the link that search is influenceed in site maps and is included is deleted according to access result.
In the step, include according to the link that influence search is included in result deletion site maps is accessed:
Access result be occur the HTTP 404 that can not access it is wrong when, delete corresponding to link;Or,
When it is the page response time to be more than or equal to given threshold to access result, corresponding link is deleted;Or,
When accessing the title, keyword and imperfect description that result is the page, corresponding link is deleted;Or,
When title, keyword and the description for accessing body matter and the page that result is the page mismatch, delete correspondingLink.
Step 104, the new site maps of generation.
In the step, after each link that search is included is influenceed in deleting site maps, rearrange and generate new websiteMap.
It can be found that the technical scheme of the embodiment of the present invention, is first carried out by obtaining in site maps after the link of the pageAccess, found according to result is accessed after having an impact the link that search is included, just deleting influences the chain that search is included in site mapsConnect, regenerate new site maps, can thus realize and processing is optimized to original site maps of website, avoid as far as possibleOccur the link that various contents are bad or easily malfunction in site maps, so as to lift site maps quality, can also increaseThe possibility for adding searched engine to include, meets the needs of website and search engine.
Technical scheme is more specifically introduced further below.
Fig. 2 is another indicative flowchart of the method for processing site maps according to an embodiment of the invention.
As shown in Fig. 2 including:
Step 201, the site maps according to presupposed information acquisition website.
The step referring to above-mentioned steps 101 description.
Step 202, obtain the link of the page in site maps and conduct interviews.
The step referring to above-mentioned steps 102 description.
Step 203, the link that search is influenceed in site maps and is included is deleted according to access result.
The step referring to above-mentioned steps 103 description.
Step 204, the page to access extract keyword and text characteristic value.
In the step, keyword extraction is carried out to the content of the page using existing algorithms of different, and to body matterText characteristic value is extracted, the present invention is not limited.
Step 205, keyword and text characteristic value according to extraction and the keyword to prestore and the comparison of text characteristic valueAs a result, deleting influences the link that search is included in site maps.
It is keyword and text characteristic value and the keyword to prestore and the ratio of text characteristic value according to extraction in the stepRelatively result is consistent, is judged as that content repeats to submit, deletes corresponding link.
Step 206, the new site maps of generation.
Step 207, it is supplied to search engine to access the new site maps of generation.
In the step, the new site maps of generation can be replaced the original site maps in website, for search engine to netStand and access new site maps, can also be configured by website, new site maps are directly accessed to service platform by search engine,The present invention is not limited, as long as search engine can be allowed to access new site maps.
It should be noted that the processing of above-mentioned steps 202,203 is closed with step 204,205 processing without the order of certaintySystem, above-mentioned steps arrangement are only the convenience described.
It should be noted that it can also include after above-mentioned steps 207:After recording the new site maps of the search engine accessWhat is scanned for and include includes data.
It can be found that the technical scheme of the embodiment of the present invention, can delete shadow in site maps according to access result respectivelyRing link and the keyword and text characteristic value and the keyword to prestore and the ratio of text characteristic value according to extraction that search is includedRelatively result, deleting influences the link that search is included in site maps, there is provided effect of optimization.Furthermore it is also possible to record the searchEngine is scanned for after accessing new site maps and that includes includes data, so as to provide reference for follow-up site maps modificationOr analyzed for website.
Fig. 3 is another indicative flowchart of the method for processing site maps according to an embodiment of the invention.
As shown in figure 3, including:
Step 301, sitemap service platforms carry out data extraction according to the configuration information of website to the sitemap of website.
In the step, website is consensus in advance with sitemap service platforms (hereinafter referred service platform), is set by websitePut the mapping relations of sitemap and service platform, it is allowed to the configuration information such as address information that service platform provides according to websiteTo sitemap processing.Website sets mapping relations to be realized by XML.The setting that service platform provides according to website is believedBreath, data extraction can be carried out to sitemap, obtain the URL information of wherein each link.
Step 302, service platform are checked the URL in the sitemap of extraction respectively, judge to access whether URL goes outThe mistakes of HTTP 404 that can not now access, if it is, into step 311, the URL is deleted from sitemap and records reason, such asFruit is no, into step 303.
The mistakes of HTTP 404 mean that the webpage that link is pointed to is not present, i.e. the URL failures of original web page, such case warpIt can often occur, such as:Webpage URL create-rules change, web page files are renamed or shift position, importing link misspelling etc.,Original URL addresses are caused not access;When web page server is connected to similar request, 404 conditional codes can be returned, are toldThe resource to be asked of browser is simultaneously not present.Therefore, when occur HTTP 404 that URL can not access it is wrong when, represented the URLThrough failure, the URL is now deleted from sitemap and records reason.
Step 303, service platform judge whether the page response speed for accessing URL is abnormal, if it is, into step 311,The URL is deleted from sitemap and records reason, if not, into step 304.
When URL can be accessed normally, the response speed of the page is detected, response speed can be weighed by the response timeAmount.If the response time is more than or equal to given threshold, it is believed that response speed is abnormal, if less than given threshold, it is believed that responseSpeed is normal.Given threshold, can rule of thumb value, such as be arranged to 500 milliseconds or 1 second, the present invention be not limited.
When should be noted, it can also be contrasted according to page history access response speed with current accessed response speed,Judge whether response speed is abnormal.If the current response time is more much larger than the historical responses time, more than some threshold value, it is believed thatResponse speed is abnormal.
Therefore, when page response velocity anomaly, represent that the page corresponding to the URL may net corresponding to problematic or URLNetwork connection may be problematic, and these can all influence the viewing experience of user, and the URL is now deleted from sitemap and records originalCause.
Step 304, service platform judge whether the TKD of the page is imperfect, if it is, into step 311, from sitemapMiddle deletion URL simultaneously records reason, if not, into step 305.
TKD is title title, keyword keywords, the abbreviation for describing description.TKD format content can be withIt is as follows:
<title>Here it is title content</title>
<Meta name=" keywords " content=" being key words content here "/>
<Meta name=" description " content=" being description content here "/>
Keyword keywords is a website webmaster to some page setting of website so that user is drawn by searchThe vocabulary of this webpage can be searched out by holding up, and keyword represents the market orientation of website.Description, alternatively referred to as " content are describedLabel ", " description label " or " synopsis ", reflect the main contents of webpage.
Usually complete TKD just meets the search rule of search engine, if TKD is imperfect, does not meet search engineSearch rule, then search engine may not search for the page, or not include the linked contents.Thus, it is found that TKD is endlessThe URL is deleted from sitemap when whole and records reason.
Step 305, service platform judge whether page body content mismatches with TKD, if it is, into step 311, fromThe URL is deleted in sitemap and records reason, if not, into step 306.
In the step, according to the body matter in the page, the keyword for whether occurring in TKD in text is judged, textWhether content corresponding with TKD title and description, if there is the keyword in TKD, the content of text be with TKD title andDescription is corresponding, and expression is matching, is otherwise unmatched.If mismatch, then it is probably that text setting is wrong,Either TKD is set wrong, and these can all influence the search quality of search engine and influence the viewing experience of user.Therefore, send outWhether existing page body content deletes the URL from sitemap and records reason when being mismatched with TKD.
Step 306, service platform carry out keyword extraction to the content of the page, and to text contents extraction text featureValue.
In the step, service platform can carry out keyword extraction using existing algorithms of different to the content of the page, and rightBody matter extracts text characteristic value, and the present invention is not limited.
For example, keyword extraction can use existing TFIDF (term frequency-inverse documentFrequency, word frequency -- inverted file frequency) algorithm, the algorithm is mainly to preserve all word informations with a dictionary, soAccording to value value sorts to dictionary afterwards, and last weighting weight several words in the top are as keyword.For example, body matter is carriedText characteristic value is taken, can be using the text feature based on Context Framework or based on ontological Text character extractionMethod etc..
Step 307, service platform are by the keyword of the keyword of extraction and text characteristic value and service platform storage and justLiterary characteristic value is compared, and the situation that content is submitted in repetition is checked for, if it is, into step 311, from sitemapMiddle deletion URL simultaneously records reason, if not, into step 308.
The step passes through the keyword and text feature that store the keyword of extraction and text characteristic value with service platformValue is compared, to carry out the matching of the text degree of correlation, if having found same keyword and text feature in service platformValue, it is judged as that content repeats.By the matching detection, so as to check for the situation that content is submitted in repetition.TakingBusiness platform, prestore the keyword and text characteristic value of each page article detected.
Step 308, service platform is preserved the keyword of extraction, text characteristic value and corresponding link, for follow-upUsed in duplicate checking.
Step 309, the new sitemap data of service platform generation after treatment obtain for search engine.
In the step, it can be configured in website, instruction search engine directly arrives service platform and obtains sitemap, orPerson, service platform directly can replace new sitemap the original sitemap of website.
Step 310, service platform carry out collection situation monitoring to newest sitemap data.
Included if sitemap URL is searched engine, meeting return label information, service platform monitoring URL is searched to be drawnSituation about including is held up, reference can be provided for follow-up adjustment sitemap.
Step 311, service platform delete the link from sitemap, and record reason and analyzed for website.
In the step, the reason for link is deleted can be recorded in detail, is analyzed for website.
It can be found that the sitemap data of the website of acquisition analyzed by the technical scheme of the embodiment of the present inventionFilter, and the checking that conducted interviews to the sitemap links provided, also carry out keyword extraction and text feature to body matter in additionValue extraction, and the keyword with prestoring and text characteristic value are matched, so as to avoid submitting duplicate contents or poor qualityContent.Search engine can also be finally monitored to sitemap collection situation.By above-mentioned processing, the present invention canTo optimize sitemap quality, what the searched engine of lifting web site contents was included includes quantity, allows search engine preferably to includeThe page of website, also solve the problems, such as that duplicate contents, rubbish contents are submitted to search drop power caused by search engine, can be withThe preferably situation of monitoring web site contents.
The method of the above-mentioned processing site maps for describing the present invention in detail, accordingly, the present invention also provides a kind of processingThe device of site maps.
Fig. 4 is a kind of schematic block diagram of the device of processing site maps of the present invention.
As shown in figure 4, a kind of device for handling site maps, including:At acquisition module 401, access modules 402, firstManage module 403, generation module 404.The device of the processing site maps of the present invention, can be service platform or other equipment.
Acquisition module 401, for obtaining the site maps of website according to presupposed information.
Device can according to website it is consensus after, the configuration information that is provided by acquisition module 401 according to website, obtainThe site maps of website.
Access modules 402, for the site maps obtained according to the acquisition module 401, obtain the page in site mapsLink and conduct interviews.
Access modules 402 obtain each URL link in site maps, and URL link is conducted interviews to test respectivelyCard.
First processing module 403, influence to search for being deleted in site maps according to the access result of the access modules 402The link that rope is included.
First processing module 403 deletes the link that search is influenceed in site maps and is included according to various different access results.
Generation module 404, for generating new site maps after the first processing module 403 is handled.
Fig. 5 is a kind of another schematic block diagram of the device of processing site maps of the present invention.
As shown in figure 5, a kind of device for handling site maps, including:At acquisition module 401, access modules 402, firstModule 403, generation module 404 are managed, the function of each module is referring to described in Fig. 4.
In addition, described device also includes:Second processing module 405.
Second processing module 405, for extracting keyword and text characteristic value to the page of access, according to the key of extractionWord and text characteristic value and the keyword and the comparative result of text characteristic value to prestore, deleting influences search in site maps includesLink;The generation module 404 is raw after the first processing module 403 and the Second processing module 405 are handledInto new site maps.
Second processing module 405 is according to the keyword and text characteristic value of extraction and the keyword to prestore and text featureThe comparative result of value is consistent, is judged as that content repeats to submit, deletes corresponding link.
Described device also includes:Output module 406.
Output module 406, the new site maps for the generation module to be generated are supplied to search engine to access.
The new site maps of generation can be replaced the original site maps in website by the present invention, be visited for search engine to websiteNew site maps are asked, can also be configured by website, new site maps, this hair are directly accessed to service platform by search engineIt is bright not to be limited, as long as search engine can be allowed to access new site maps.
Described device also includes:Monitoring module 407.
Monitoring module 407, scanned for for recording after the search engine accesses new site maps and that includes includesData.
Wherein, the first processing module 403 includes:First deletion unit 4031, second is deleted unit the 4032, the 3rd and deletedExcept unit 4033 or the 4th deletes unit 4034.
First deletes unit 4031, for when it is the HTTP404 mistakes for occurring accessing to access result, deleting correspondingLink.
Second deletes unit 4032, for when it is the page response time to be more than or equal to given threshold to access result, deletingExcept corresponding link.
3rd deletes unit 4033, for when accessing the title, keyword and imperfect description that result is the page, deletingCorresponding link.
4th deletes unit 4034, for access body matter that result is the page and the title of the page, keyword andWhen description mismatches, corresponding link is deleted.
The present invention also provides a kind of processing equipment.
Fig. 6 is a kind of schematic block diagram of processing equipment of the present invention.
As shown in fig. 6, processing equipment includes:Memory 601 and processor 602.
Memory 601, for storage program,
Processor 602, the following procedure stored for performing the memory 601:
The site maps of website are obtained according to presupposed information;
Obtain the link of the page in site maps and conduct interviews;
Influence to search for the link included in site maps according to accessing result and deleting;
Generate new site maps.
It should be noted that other programs that memory 601 stores, referring specifically to the description in previous methods flow, hereinRepeat no more, processor 602 is additionally operable to perform other programs that memory 601 stores.
In summary, the technical scheme of the embodiment of the present invention, the sitemap data of the website of acquisition analyzedFilter, conduct interviews checking to the sitemap links provided, also carries out keyword extraction and text characteristic value to body matter in additionExtraction, and the keyword with prestoring and text characteristic value are matched, so as to avoid submitting duplicate contents or poor qualityContent.Search engine can also be finally monitored to sitemap collection situation.By above-mentioned processing, the present invention can be withOptimize sitemap quality, what the searched engine of lifting web site contents was included includes quantity, allows search engine preferably to include netThe page stood, also solve the problems, such as that duplicate contents, rubbish contents search for drop power caused by being submitted to search engine, can also be moreThe situation of good monitoring web site contents.
Technique according to the invention scheme above is described in detail by reference to accompanying drawing.
In addition, the method according to the invention is also implemented as a kind of computer program, the computer program includes being used forPerform the computer program code instruction of the above steps limited in the above method of the present invention.Or according to the present invention'sMethod is also implemented as a kind of computer program product, and the computer program product includes computer-readable medium, in the meterThe computer program for performing the above-mentioned function of being limited in the above method of the invention is stored with calculation machine computer-readable recording medium.AbilityField technique personnel will also understand is that, various illustrative logical blocks, module, circuit and algorithm with reference to described by disclosure hereinStep may be implemented as the combination of electronic hardware, computer software or both.
Flow chart and block diagram in accompanying drawing show that the possibility of the system and method for multiple embodiments according to the present invention is realExisting architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journeyA part for sequence section or code, a part for the module, program segment or code is comprising one or more defined for realizingThe executable instruction of logic function.It should also be noted that at some as in the realization replaced, the function of being marked in square frame also may be usedWith with different from the order marked in accompanying drawing generation.For example, two continuous square frames can essentially perform substantially in parallel,They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or streamThe combination of each square frame and block diagram in journey figure and/or the square frame in flow chart, function or operation as defined in performing can be usedSpecial hardware based system realize, or can be realized with the combination of specialized hardware and computer instruction.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, andIt is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skillMany modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purportThe principle of each embodiment, practical application or improvement to the technology in market are best being explained, or is making the artOther those of ordinary skill are understood that each embodiment disclosed herein.

Claims (10)

CN201510676894.0A2015-10-162015-10-16A kind of method, apparatus and equipment for handling site mapsActiveCN105260469B (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201510676894.0ACN105260469B (en)2015-10-162015-10-16A kind of method, apparatus and equipment for handling site maps
PCT/CN2016/102215WO2017063596A1 (en)2015-10-162016-10-14Method, apparatus and device for processing sitemap

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201510676894.0ACN105260469B (en)2015-10-162015-10-16A kind of method, apparatus and equipment for handling site maps

Publications (2)

Publication NumberPublication Date
CN105260469A CN105260469A (en)2016-01-20
CN105260469Btrue CN105260469B (en)2017-12-26

Family

ID=55100159

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201510676894.0AActiveCN105260469B (en)2015-10-162015-10-16A kind of method, apparatus and equipment for handling site maps

Country Status (2)

CountryLink
CN (1)CN105260469B (en)
WO (1)WO2017063596A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105260469B (en)*2015-10-162017-12-26广州神马移动信息科技有限公司A kind of method, apparatus and equipment for handling site maps
CN106095674B (en)*2016-06-072019-05-24百度在线网络技术(北京)有限公司A kind of website automation test method and device
CN107807937B (en)*2016-09-092021-11-30阿里巴巴集团控股有限公司Website SEO processing method, device and system
CN108255831B (en)*2016-12-282021-12-17航天信息股份有限公司Method and system for generating website map for website
CN111695056B (en)*2019-03-122024-03-22阿里巴巴集团控股有限公司Page processing and page return processing methods, devices and equipment
CN112307395B (en)*2020-08-102024-12-06北京沃东天骏信息技术有限公司 Method and device for generating website map
CN114996558A (en)*2022-06-072022-09-02抖音视界(北京)有限公司 A web page information processing method, device, system, device and storage medium
CN119249020B (en)*2024-09-192025-09-16广州盈风网络科技有限公司 Artificial intelligence-based website map generation method, system, device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1486457A (en)*2000-11-212004-03-31��ķɭ��ɹ�˾ A system and process for indirect crawling
CN102057372A (en)*2008-04-172011-05-11谷歌公司 Generate a sitemap
CN104317938A (en)*2014-10-312015-01-28北京国双科技有限公司Webpage validation method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7769742B1 (en)*2005-05-312010-08-03Google Inc.Web crawler scheduler that utilizes sitemaps from websites
US8126869B2 (en)*2008-02-082012-02-28Microsoft CorporationAutomated client sitemap generation
US7865497B1 (en)*2008-02-212011-01-04Google Inc.Sitemap generation where last modified time is not available to a network crawler
CN105260469B (en)*2015-10-162017-12-26广州神马移动信息科技有限公司A kind of method, apparatus and equipment for handling site maps

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1486457A (en)*2000-11-212004-03-31��ķɭ��ɹ�˾ A system and process for indirect crawling
CN102057372A (en)*2008-04-172011-05-11谷歌公司 Generate a sitemap
CN104317938A (en)*2014-10-312015-01-28北京国双科技有限公司Webpage validation method and device

Also Published As

Publication numberPublication date
CN105260469A (en)2016-01-20
WO2017063596A1 (en)2017-04-20

Similar Documents

PublicationPublication DateTitle
CN105260469B (en)A kind of method, apparatus and equipment for handling site maps
Bar-Yossef et al.Do not crawl in the DUST: Different URLs with similar text
US9614862B2 (en)System and method for webpage analysis
US9251157B2 (en)Enterprise node rank engine
US20170257390A1 (en)System and methods for scalably identifying and characterizing structural differences between document object models
KR101584123B1 (en) Search verification system and method
US7860971B2 (en)Anti-spam tool for browser
US20110087656A1 (en)Apparatus for question answering based on answer trustworthiness and method thereof
KR20100084510A (en)Identifying information related to a particular entity from electronic sources
CN105378731A (en)Correlating corpus/corpora value from answered questions
US20100011025A1 (en)Transfer learning methods and apparatuses for establishing additive models for related-task ranking
CN107341399A (en)Assess the method and device of code file security
US9792370B2 (en)Identifying equivalent links on a page
CN105718533A (en)Information pushing method and device
CN106407316A (en)Topic model-based software question and answer recommendation method and device
CN109547294A (en)Networking equipment model detection method and device based on firmware analysis
CN103812906A (en)Website recommendation method and device and communication system
US20110231415A1 (en)Web page searching system and method using access time and frequency
CN106603490A (en)Phishing website detecting method and system
CN110263283A (en)Website detection method and device
KR101556714B1 (en)Method, system and computer readable recording medium for providing search results
CN116561402A (en)Method, device and server for acquiring target content information in webpage
CN103617225B (en) Method and system for searching related web pages
CN108280102A (en)Internet behavior recording method, device and user terminal
US20090307344A1 (en)Web page ranking method and system based on user referrals

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
TR01Transfer of patent right

Effective date of registration:20200812

Address after:310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after:Alibaba (China) Co.,Ltd.

Address before:510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 12 layer self unit 01

Patentee before:GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

TR01Transfer of patent right

[8]ページ先頭

©2009-2025 Movatter.jp