Movatterモバイル変換


[0]ホーム

URL:


US20100287152A1 - System, method and computer readable medium for web crawling - Google Patents

System, method and computer readable medium for web crawling
Download PDF

Info

Publication number
US20100287152A1
US20100287152A1US12/435,774US43577409AUS2010287152A1US 20100287152 A1US20100287152 A1US 20100287152A1US 43577409 AUS43577409 AUS 43577409AUS 2010287152 A1US2010287152 A1US 2010287152A1
Authority
US
United States
Prior art keywords
url
web page
interaction data
data store
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/435,774
Inventor
Robert R. Hauser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lipari Paul
Oracle America Inc
Suboti LLC
Original Assignee
Suboti LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suboti LLCfiledCriticalSuboti LLC
Priority to US12/435,774priorityCriticalpatent/US20100287152A1/en
Assigned to LIPARI, PAUL A, SUBOTI, LLCreassignmentLIPARI, PAUL AASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: HAUSER, ROBERT R
Publication of US20100287152A1publicationCriticalpatent/US20100287152A1/en
Priority to US13/287,535prioritypatent/US9940391B2/en
Assigned to LIPARI, PAUL, SUBOTI LLCreassignmentLIPARI, PAULASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: HAUSER, ROBERT R.
Assigned to APRFSH17, LLCreassignmentAPRFSH17, LLCASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: LIPARI, PAUL, SUBOTI LLC
Assigned to Moat, Inc.reassignmentMoat, Inc.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: APRFSH17, LLC
Assigned to Oracle America, Inc.reassignmentOracle America, Inc.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: Moat, Inc.
Abandonedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

In a web crawler, a URL selection module selects URLs for pages to be downloaded. The URL selection module accesses an interaction data store that stores interaction data for web pages, including interaction data that indicates human interactions with the pages. To reduce the effects of link farms, the URL selection module filters the URLs to select only those URLs that have human interaction histories and provides the selected URLs to a download module for web page downloading.

Description

Claims (20)

13. A web crawler comprising:
at least one Uniform Resource Locator (URL) data store that stores a plurality of URLs;
at least one interaction data store that stores interaction data for a plurality of web pages, the interaction data indicating an interaction between a human and a web page corresponding to a URL;
at least one download module that downloads web page content corresponding to a URL; and
at least one URL selection module in communication with the at least one URL data store and the at least one interaction data store;
wherein the at least one URL selection module selects at least one URL from the at least one URL data store that has interaction data in the at least one interaction data store; and
wherein the at least one URL selection module provides the at least one selected URL to the at least one download module.
US12/435,7742009-05-052009-05-05System, method and computer readable medium for web crawlingAbandonedUS20100287152A1 (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
US12/435,774US20100287152A1 (en)2009-05-052009-05-05System, method and computer readable medium for web crawling
US13/287,535US9940391B2 (en)2009-05-052011-11-02System, method and computer readable medium for web crawling

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US12/435,774US20100287152A1 (en)2009-05-052009-05-05System, method and computer readable medium for web crawling

Related Child Applications (1)

Application NumberTitlePriority DateFiling Date
US13/287,535ContinuationUS9940391B2 (en)2009-05-052011-11-02System, method and computer readable medium for web crawling

Publications (1)

Publication NumberPublication Date
US20100287152A1true US20100287152A1 (en)2010-11-11

Family

ID=43062955

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US12/435,774AbandonedUS20100287152A1 (en)2009-05-052009-05-05System, method and computer readable medium for web crawling
US13/287,535ActiveUS9940391B2 (en)2009-05-052011-11-02System, method and computer readable medium for web crawling

Family Applications After (1)

Application NumberTitlePriority DateFiling Date
US13/287,535ActiveUS9940391B2 (en)2009-05-052011-11-02System, method and computer readable medium for web crawling

Country Status (1)

CountryLink
US (2)US20100287152A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20120284332A1 (en)*2010-11-032012-11-08Anantha PradeepSystems and methods for formatting a presentation in webpage based on neuro-response data
US20130174050A1 (en)*2011-12-302013-07-04Nokia CorporationMethod and apparatus for downloading third party content within the same web page context
US8799455B1 (en)*2011-03-182014-08-05Amazon Technologies, Inc.Addressable network resource selection management
US9280268B2 (en)2012-08-162016-03-08International Business Machines CorporationIdentifying equivalent javascript events
US9454646B2 (en)2010-04-192016-09-27The Nielsen Company (Us), LlcShort imagery task (SIT) research method
US9495453B2 (en)2011-05-242016-11-15Microsoft Technology Licensing, LlcResource download policies based on user browsing statistics
US9560984B2 (en)2009-10-292017-02-07The Nielsen Company (Us), LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US9569986B2 (en)2012-02-272017-02-14The Nielsen Company (Us), LlcSystem and method for gathering and analyzing biometric user feedback for use in social media and advertising applications
US9936250B2 (en)2015-05-192018-04-03The Nielsen Company (Us), LlcMethods and apparatus to adjust content presented to an individual
US9940391B2 (en)2009-05-052018-04-10Oracle America, Inc.System, method and computer readable medium for web crawling
US10079738B1 (en)*2015-11-192018-09-18Amazon Technologies, Inc.Using a network crawler to test objects of a network document
US20190057163A1 (en)*2017-08-182019-02-21Sap SeClassification of log entry types
US10303722B2 (en)2009-05-052019-05-28Oracle America, Inc.System and method for content selection for web page indexing
US10311362B1 (en)*2014-12-122019-06-04Amazon Technologies, Inc.Identification of trending content using social network activity and user interests
CN109862018A (en)*2019-02-212019-06-07中国工商银行股份有限公司Anti- crawler method and system based on user access activity
US10565588B2 (en)*2015-03-122020-02-18International Business Machines CorporationCryptographic methods implementing proofs of work in systems of interconnected nodes
US10987015B2 (en)2009-08-242021-04-27Nielsen Consumer LlcDry electrodes for electroencephalography
US11303714B2 (en)*2016-12-142022-04-12Rewardstyle, Inc.System and method for application traffic control
US11481788B2 (en)2009-10-292022-10-25Nielsen Consumer LlcGenerating ratings predictions using neuro-response data
US11704681B2 (en)2009-03-242023-07-18Nielsen Consumer LlcNeurological profiles for market matching and stimulus presentation

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8171156B2 (en)*2008-07-252012-05-01JumpTime, Inc.Method and system for determining overall content values for content elements in a web network and for optimizing internet traffic flow through the web network
US8832257B2 (en)2009-05-052014-09-09Suboti, LlcSystem, method and computer readable medium for determining an event generator type
US8862569B2 (en)*2012-01-112014-10-14Google Inc.Method and techniques for determining crawling schedule
US20140280554A1 (en)*2013-03-152014-09-18Yahoo! Inc.Method and system for dynamic discovery and adaptive crawling of content from the internet
US11829423B2 (en)*2021-06-252023-11-28Microsoft Technology Licensing, LlcDetermining that a resource is spam based upon a uniform resource locator of the webpage

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6848108B1 (en)*1998-06-302005-01-25Microsoft CorporationMethod and apparatus for creating, sending, and using self-descriptive objects as messages over a message queuing network
US7051042B2 (en)*2003-05-012006-05-23Oracle International CorporationTechniques for transferring a serialized image of XML data
US20070050338A1 (en)*2005-08-292007-03-01Strohm Alan CMobile sitemaps
US20070239701A1 (en)*2006-03-292007-10-11International Business Machines CorporationSystem and method for prioritizing websites during a webcrawling process
US20090106221A1 (en)*2007-10-182009-04-23Microsoft CorporationRanking and Providing Search Results Based In Part On A Number Of Click-Through Features
US20090276399A1 (en)*2008-04-302009-11-05Yahoo! Inc.Ranking documents through contextual shortcuts
US20090287645A1 (en)*2008-05-152009-11-19Yahoo! Inc.Search results with most clicked next objects

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP3696731B2 (en)1998-04-302005-09-21株式会社日立製作所 Structured document search method and apparatus, and computer-readable recording medium recording a structured document search program
US7653870B1 (en)1998-12-082010-01-26Idearc Media Corp.System and method of dynamically generating index information
US6981040B1 (en)1999-12-282005-12-27Utopy, Inc.Automatic, personalized online information and product services
US6581072B1 (en)2000-05-182003-06-17Rakesh MathurTechniques for identifying and accessing information of interest to a user in a network environment without compromising the user's privacy
US7877421B2 (en)2001-05-252011-01-25International Business Machines CorporationMethod and system for mapping enterprise data assets to a semantic information model
MXPA04011507A (en)2002-05-202005-09-30Tata Infotech LtdDocument structure identifier.
US7111000B2 (en)2003-01-062006-09-19Microsoft CorporationRetrieval of structured documents
US7289983B2 (en)2003-06-192007-10-30International Business Machines CorporationPersonalized indexing and searching for information in a distributed data processing system
CA2545940C (en)2003-11-142016-01-05Research In Motion LimitedSystem and method of retrieving and presenting partial (skipped) document content
US7991786B2 (en)2003-11-252011-08-02International Business Machines CorporationUsing intra-document indices to improve XQuery processing over XML streams
US20060004725A1 (en)2004-06-082006-01-05Abraido-Fandino Leonor MAutomatic generation of a search engine for a structured document
WO2006011819A1 (en)2004-07-302006-02-02Eurekster, Inc.Adaptive search engine
US7702671B2 (en)2005-04-292010-04-20Microsoft CorporationSystems and methods for discovery of data that needs improving or authored using user search results diagnostics
US7693817B2 (en)2005-06-292010-04-06Microsoft CorporationSensing, storing, indexing, and retrieving data leveraging measures of user activity, attention, and interest
US20100153836A1 (en)2008-12-162010-06-17Rich Media Club, LlcContent rendering control system and method
US7555480B2 (en)2006-07-112009-06-30Microsoft CorporationComparatively crawling web page data records relative to a template
US20080046218A1 (en)2006-08-162008-02-21Microsoft CorporationVisual summarization of activity data of a computing session
FR2907934B1 (en)2006-10-272009-02-06Inst Nat Rech Inf Automat COMPUTER TOOL FOR MANAGING DIGITAL DOCUMENTS
US20080228910A1 (en)2007-03-122008-09-18International Business Machines CorporationMethod for monitoring user interaction to maximize internet web page real estate
US20080270375A1 (en)*2007-04-272008-10-30France TelecomLocal news search engine
US7765236B2 (en)2007-08-312010-07-27Microsoft CorporationExtracting data content items using template matching
US8078624B2 (en)2007-12-202011-12-13International Business Machines CorporationContent searching for portals having secure content
US20120191691A1 (en)2008-04-072012-07-26Robert HansenMethod for assessing and improving search engine value and site layout based on passive sniffing and content modification
US8156120B2 (en)2008-10-222012-04-10James BradyInformation retrieval using user-generated metadata
US20100114706A1 (en)2008-11-042010-05-06Nokia CorporationLinked Hierarchical Advertisements
US10699235B2 (en)2009-05-052020-06-30Oracle America, Inc.System, method and computer readable medium for placing advertisements into web pages
US9442621B2 (en)2009-05-052016-09-13Suboti, LlcSystem, method and computer readable medium for determining user attention area from user interface events
US9336191B2 (en)2009-05-052016-05-10Suboti, LlcSystem, method and computer readable medium for recording authoring events with web page content
US9507870B2 (en)2009-05-052016-11-29Suboti, LlcSystem, method and computer readable medium for binding authored content to the events used to generate the content
US8327385B2 (en)2009-05-052012-12-04Suboti, LlcSystem and method for recording web page events
US8832257B2 (en)2009-05-052014-09-09Suboti, LlcSystem, method and computer readable medium for determining an event generator type
US9330395B2 (en)2009-05-052016-05-03Suboti, LlcSystem, method and computer readable medium for determining attention areas of a web page
US10303722B2 (en)2009-05-052019-05-28Oracle America, Inc.System and method for content selection for web page indexing
US20100287152A1 (en)2009-05-052010-11-11Paul A. LipariSystem, method and computer readable medium for web crawling
US8751628B2 (en)2009-05-052014-06-10Suboti, LlcSystem and method for processing user interface events

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6848108B1 (en)*1998-06-302005-01-25Microsoft CorporationMethod and apparatus for creating, sending, and using self-descriptive objects as messages over a message queuing network
US7051042B2 (en)*2003-05-012006-05-23Oracle International CorporationTechniques for transferring a serialized image of XML data
US20070050338A1 (en)*2005-08-292007-03-01Strohm Alan CMobile sitemaps
US20070239701A1 (en)*2006-03-292007-10-11International Business Machines CorporationSystem and method for prioritizing websites during a webcrawling process
US20090106221A1 (en)*2007-10-182009-04-23Microsoft CorporationRanking and Providing Search Results Based In Part On A Number Of Click-Through Features
US20090276399A1 (en)*2008-04-302009-11-05Yahoo! Inc.Ranking documents through contextual shortcuts
US20090287645A1 (en)*2008-05-152009-11-19Yahoo! Inc.Search results with most clicked next objects

Cited By (39)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US11704681B2 (en)2009-03-242023-07-18Nielsen Consumer LlcNeurological profiles for market matching and stimulus presentation
US9940391B2 (en)2009-05-052018-04-10Oracle America, Inc.System, method and computer readable medium for web crawling
US10324984B2 (en)2009-05-052019-06-18Oracle America, Inc.System and method for content selection for web page indexing
US10303722B2 (en)2009-05-052019-05-28Oracle America, Inc.System and method for content selection for web page indexing
US10987015B2 (en)2009-08-242021-04-27Nielsen Consumer LlcDry electrodes for electroencephalography
US11669858B2 (en)2009-10-292023-06-06Nielsen Consumer LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US9560984B2 (en)2009-10-292017-02-07The Nielsen Company (Us), LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US11481788B2 (en)2009-10-292022-10-25Nielsen Consumer LlcGenerating ratings predictions using neuro-response data
US11170400B2 (en)2009-10-292021-11-09Nielsen Consumer LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US10068248B2 (en)2009-10-292018-09-04The Nielsen Company (Us), LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US10269036B2 (en)2009-10-292019-04-23The Nielsen Company (Us), LlcAnalysis of controlled and automatic attention for introduction of stimulus material
US10248195B2 (en)2010-04-192019-04-02The Nielsen Company (Us), Llc.Short imagery task (SIT) research method
US11200964B2 (en)2010-04-192021-12-14Nielsen Consumer LlcShort imagery task (SIT) research method
US9454646B2 (en)2010-04-192016-09-27The Nielsen Company (Us), LlcShort imagery task (SIT) research method
US20120284332A1 (en)*2010-11-032012-11-08Anantha PradeepSystems and methods for formatting a presentation in webpage based on neuro-response data
US10089405B2 (en)*2011-03-182018-10-02Amazon Technologies, Inc.Addressable network resource selection management
US20140344290A1 (en)*2011-03-182014-11-20Amazon Technologies, Inc.Addressable network resource selection management
US8799455B1 (en)*2011-03-182014-08-05Amazon Technologies, Inc.Addressable network resource selection management
US9495453B2 (en)2011-05-242016-11-15Microsoft Technology Licensing, LlcResource download policies based on user browsing statistics
US20130174050A1 (en)*2011-12-302013-07-04Nokia CorporationMethod and apparatus for downloading third party content within the same web page context
US10881348B2 (en)2012-02-272021-01-05The Nielsen Company (Us), LlcSystem and method for gathering and analyzing biometric user feedback for use in social media and advertising applications
US9569986B2 (en)2012-02-272017-02-14The Nielsen Company (Us), LlcSystem and method for gathering and analyzing biometric user feedback for use in social media and advertising applications
US10901730B2 (en)2012-08-162021-01-26International Business Machines CorporationIdentifying equivalent javascript events
US10169037B2 (en)2012-08-162019-01-01International Business Machines CoprorationIdentifying equivalent JavaScript events
US9280268B2 (en)2012-08-162016-03-08International Business Machines CorporationIdentifying equivalent javascript events
US10311362B1 (en)*2014-12-122019-06-04Amazon Technologies, Inc.Identification of trending content using social network activity and user interests
US10565588B2 (en)*2015-03-122020-02-18International Business Machines CorporationCryptographic methods implementing proofs of work in systems of interconnected nodes
US11290779B2 (en)2015-05-192022-03-29Nielsen Consumer LlcMethods and apparatus to adjust content presented to an individual
US10771844B2 (en)2015-05-192020-09-08The Nielsen Company (Us), LlcMethods and apparatus to adjust content presented to an individual
US9936250B2 (en)2015-05-192018-04-03The Nielsen Company (Us), LlcMethods and apparatus to adjust content presented to an individual
US10079738B1 (en)*2015-11-192018-09-18Amazon Technologies, Inc.Using a network crawler to test objects of a network document
US11303714B2 (en)*2016-12-142022-04-12Rewardstyle, Inc.System and method for application traffic control
US11528335B2 (en)2016-12-142022-12-13Rewardstyle, Inc.System and method for application traffic control
US11785108B2 (en)2016-12-142023-10-10Rewardstyle, Inc.System and method for application traffic control
US11979469B2 (en)2016-12-142024-05-07Rewardstyle, Inc.System and method for application traffic control
US12166833B2 (en)2016-12-142024-12-10Rewardstyle, Inc.System and method for application traffic control
US10726069B2 (en)*2017-08-182020-07-28Sap SeClassification of log entry types
US20190057163A1 (en)*2017-08-182019-02-21Sap SeClassification of log entry types
CN109862018A (en)*2019-02-212019-06-07中国工商银行股份有限公司Anti- crawler method and system based on user access activity

Also Published As

Publication numberPublication date
US20120047122A1 (en)2012-02-23
US9940391B2 (en)2018-04-10

Similar Documents

PublicationPublication DateTitle
US9940391B2 (en)System, method and computer readable medium for web crawling
US12229206B2 (en)Auto-refinement of search results based on monitored search activities of users
US9619525B2 (en)Method and system of optimizing a web page for search engines
KR101298888B1 (en)Mobile sitemaps
US8775550B2 (en)Caching HTTP request and response streams
US8413042B2 (en)Referrer-based website personalization
US8756224B2 (en)Methods, systems, and media for content ranking using real-time data
Qiu et al.Analysis of user web traffic with a focus on search activities.
CN106339398A (en)Pre-reading method and device for webpage and intelligent terminal device
US20130046747A1 (en)Synthesizing directories, domains, and subdomains
CN102521251A (en)Method for directly realizing personalized search, device for realizing method, and search server
CN103412890A (en)Webpage loading method and device
US9582590B2 (en)Method and system for presenting a navigation path for enabling retrieval of content
WO2009149380A1 (en)Displaying online advertisements
CN106776983B (en)Search engine optimization device and method
CN103559203A (en)Method, device and system for web page sorting
CN103123640A (en)Method and device for searching novel
CN103049497A (en)Method and device for website navigation
CN104468720B (en)A kind of determining preview link simultaneously provides it method of dynamic previewing information
CN103460205A (en) Method and device for webpage prefetching
CN102681996B (en)Pre-head method and device
US20130212101A1 (en)Portlet processing apparatus, portal server, portal system, portlet processing method and recording medium
US8370365B1 (en)Tools for predicting improvement in website search engine rankings based upon website linking relationships
JPWO2016075832A1 (en) Automatic page editing method, automatic page editing program and automatic page editing apparatus
CN104392000B (en)Determine the method and apparatus that mobile site captures quota

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:LIPARI, PAUL A, TEXAS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HAUSER, ROBERT R;REEL/FRAME:022656/0430

Effective date:20090501

Owner name:SUBOTI, LLC, TEXAS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HAUSER, ROBERT R;REEL/FRAME:022656/0430

Effective date:20090501

ASAssignment

Owner name:SUBOTI LLC, TEXAS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HAUSER, ROBERT R.;REEL/FRAME:042120/0547

Effective date:20170417

Owner name:LIPARI, PAUL, TEXAS

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HAUSER, ROBERT R.;REEL/FRAME:042120/0547

Effective date:20170417

Owner name:APRFSH17, LLC, NEW YORK

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SUBOTI LLC;LIPARI, PAUL;REEL/FRAME:042120/0590

Effective date:20170417

ASAssignment

Owner name:MOAT, INC., NEW YORK

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:APRFSH17, LLC;REEL/FRAME:043268/0275

Effective date:20170717

ASAssignment

Owner name:ORACLE AMERICA, INC., CALIFORNIA

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MOAT, INC.;REEL/FRAME:043288/0748

Effective date:20170719

STCBInformation on status: application discontinuation

Free format text:ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION


[8]ページ先頭

©2009-2025 Movatter.jp