- Notifications
You must be signed in to change notification settings - Fork1.1k
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
License
dotnetcore/DotnetSpider
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
免责申明:本框架是为了帮助开发人员简化开发流程、提高开发效率,请勿使用此框架做任何违法国家法律的事情,使用者所做任何事情也与本框架的作者无关。
DotnetSpider, a .NET Standard web crawling library. It is a lightweight, efficient, and fast high-level web crawling & scraping framework.
If you want to get the latest beta packages, you should add the myget feed:
<addkey="myget.org"value="https://www.myget.org/F/zlzforever/api/v3/index.json"protocolVersion="3"/>
Visual Studio 2017 (15.3 or later) or Jetbrains Rider
Docker
MySql
docker run --name mysql -d -p 3306:3306 --restart always -e MYSQL_ROOT_PASSWORD=1qazZAQ! mysql:5.7
Redis (option)
docker run --name redis -d -p 6379:6379 --restart always redis
SqlServer
docker run --name sqlserver -d -p 1433:1433 --restart always -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=1qazZAQ!' mcr.microsoft.com/mssql/server:2017-latest
PostgreSQL (option)
docker run --name postgres -d -p 5432:5432 --restart always -e POSTGRES_PASSWORD=1qazZAQ! postgres
MongoDb (option)
docker run --name mongo -d -p 27017:27017 --restart always mongo
RabbitMQ
docker run -d --restart always --name rabbimq -p 4369:4369 -p 5671-5672:5671-5672 -p 25672:25672 -p 15671-15672:15671-15672 \ -e RABBITMQ_DEFAULT_USER=user -e RABBITMQ_DEFAULT_PASS=password \ rabbitmq:3-management
Docker remote api for mac
docker run -d --restart always --name socat -v /var/run/docker.sock:/var/run/docker.sock -p 2376:2375 bobrik/socat TCP4-LISTEN:2375,fork,reuseaddr UNIX-CONNECT:/var/run/docker.sock
HBase
docker run -d --restart always --name hbase -p 20550:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095 -p 16010:16010 dajobe/hbase
https://github.com/dotnetcore/DotnetSpider/wiki
Please see the Project DotnetSpider.Sample in the solution.
[DisplayName("博客园爬虫")]publicclassEntitySpider(IOptions<SpiderOptions>options,DependenceServicesservices,ILogger<Spider>logger):Spider(options,services,logger){publicstaticasyncTaskRunAsync(){varbuilder=Builder.CreateDefaultBuilder<EntitySpider>(options=>{options.Speed=1;});builder.UseSerilog();builder.IgnoreServerCertificateError();awaitbuilder.Build().RunAsync();}protectedoverrideasyncTaskInitializeAsync(CancellationTokenstoppingToken=default){AddDataFlow<DataParser<CnblogsEntry>>();AddDataFlow(GetDefaultStorage);awaitAddRequestsAsync(newRequest("https://news.cnblogs.com/n/page/1",newDictionary<string,object>{{"网站","博客园"}}));}[Schema("cnblogs","news")][EntitySelector(Expression=".//div[@class='news_block']",Type=SelectorType.XPath)][GlobalValueSelector(Expression=".//a[@class='current']",Name="类别",Type=SelectorType.XPath)][GlobalValueSelector(Expression="//title",Name="Title",Type=SelectorType.XPath)][FollowRequestSelector(Expressions=["//div[@class='pager']"])]publicclassCnblogsEntry:EntityBase<CnblogsEntry>{protectedoverridevoidConfigure(){HasIndex(x=>x.Title);HasIndex(x=>new{x.WebSite,x.Guid},true);}publicintId{get;set;}[Required][StringLength(200)][ValueSelector(Expression="类别",Type=SelectorType.Environment)]publicstringCategory{get;set;}[Required][StringLength(200)][ValueSelector(Expression="网站",Type=SelectorType.Environment)]publicstringWebSite{get;set;}[StringLength(200)][ValueSelector(Expression="Title",Type=SelectorType.Environment)][ReplaceFormatter(NewValue="",OldValue=" - 博客园")]publicstringTitle{get;set;}[StringLength(40)][ValueSelector(Expression="GUID",Type=SelectorType.Environment)]publicstringGuid{get;set;}[ValueSelector(Expression=".//h2[@class='news_entry']/a")]publicstringNews{get;set;}[ValueSelector(Expression=".//h2[@class='news_entry']/a/@href")]publicstringUrl{get;set;}[ValueSelector(Expression=".//div[@class='entry_summary']")][TrimFormatter]publicstringPlainText{get;set;}[ValueSelector(Expression="DATETIME",Type=SelectorType.Environment)]publicDateTimeCreationTime{get;set;}}}
Coming soon
timeout 0tcp-keepalive 60
Package | License |
---|---|
Bert.RateLimiters | Apache 2.0 |
MessagePack | MIT |
Newtonsoft.Json | MIT |
Dapper | Apache 2.0 |
HtmlAgilityPack | MIT |
ZCJ.HashedWheelTimer | MIT |
murmurhash | Apache 2.0 |
Serilog.AspNetCore | Apache 2.0 |
Serilog.Sinks.Console | Apache 2.0 |
Serilog.Sinks.RollingFile | Apache 2.0 |
Serilog.Sinks.PeriodicBatching | Apache 2.0 |
MongoDB.Driver | Apache 2.0 |
MySqlConnector | MIT |
AutoMapper.Extensions.Microsoft.DependencyInjection | MIT |
Docker.DotNet | MIT |
BuildBundlerMinifier | Apache 2.0 |
Pomelo.EntityFrameworkCore.MySql | MIT |
Quartz.AspNetCore | Apache 2.0 |
Quartz.AspNetCore.MySqlConnector | Apache 2.0 |
Npgsql | PostgreSQL License |
RabbitMQ.Client | Apache 2.0 |
Polly | BSD 3-C |
QQ Group: 477731655Email:zlzforever@163.com
About
DotnetSpider, a .NET standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.