Commit Graph

1128 Commits (4024230e1762ca24675dbd1a2a0e731249cf0630)
 

Author SHA1 Message Date
Linker Lin b05d605c4b Update pom.xml
在纯净的环境中,需要做上面的改动。
9 years ago
zhangheng09 6b179c3d55 这个改动的原因基于两点:1)代理归还给代理池的时机应该是执行完http请求后就要尽早归还 2)http代理应该是HttpClientDownloader该考虑的事,不应该有Spider来处理,Spider并不知道它的downloader是个HttpClientDownloader 9 years ago
zhangheng09 5f106c9c69 当page为null时,意味着非正常的响应状态,应该抛出异常,否则SpiderListener的onSuccess方法和onError方法都会执行 9 years ago
yihua.huang c0b8e8f8ae remove .classpath .project 9 years ago
yihua.huang ce89ebb2b4 remove en_doc zh_docs dir 9 years ago
yihua.huang b9ac76aa8b update readme 9 years ago
yihua.huang a0c74a6a26 readme in zh 9 years ago
yihua.huang cb61820d31 update spiderman link 9 years ago
yihua.huang a8e6de4b90 Merge branch 'master' of git.oschina.net:flashsword20/webmagic 9 years ago
yihua.huang cd08b230b8 update version to 0.5.3 9 years ago
yihua.huang 0fd4623f0a Merge branch 'osc' 9 years ago
yihua.huang ce5495ecd5 remove useless files 9 years ago
yihua.huang 8265c7dade remove submodules for relase 9 years ago
yihua.huang 972ccd3d56 update version to 0.5.3-SNAPSHOT 9 years ago
yihua.huang 7edfa26f90 complete javadoc 9 years ago
yihua.huang 8b90b91e33 complete some javadoc 9 years ago
yihua.huang 2b556cf053 update verison to 0.5.3-SNAPSHOT 9 years ago
yihua.huang 9c5716a543 complete javadoc 9 years ago
yihua.huang db3cbf6ca5 update version to 0.5.3-SNAPSHOT 9 years ago
yihua.huang 81ce1ffc5f fix ignore 9 years ago
yihua.huang 93764fa2c9 ignore some test 9 years ago
yihua.huang 5706bb90af update xsoup to 0.3.1 9 years ago
yihua.huang 7586e3d75c add some test for github repo downloader 9 years ago
yihua.huang 800f66c4cc Revert "remove some unkown config"
This reverts commit 0e245c9896.
9 years ago
yihua.huang 73ae7a1d52 remove ci for jdk6 9 years ago
yihua.huang 0e245c9896 remove some unkown config 9 years ago
yihua.huang 9ed06ccdf0 update surefire version 9 years ago
Yihua Huang 9d0eeb9000 Merge pull request #218 from bingoko/master
添加PhantomJS无界面浏览器支持
9 years ago
Yihua Huang 84b046e4c9 Merge pull request #227 from hsqlu/master
update deprecated method
9 years ago
Yihua Huang cfde3b7657 Merge pull request #237 from SpenceZhou/master
Update RedisScheduler.java
9 years ago
SpenceZhou 165e5a72eb Update RedisScheduler.java
修改redisscheduler中获取爬取总数bug
9 years ago
Yihua Huang 5f9e1a96f2 Merge pull request #233 from x1ny/master
修正FileCacheQueueScheduler导致程序不能正常结束和未关闭流
9 years ago
Yihua Huang 7d7eb033d3 Merge pull request #234 from chy996633/master
知乎爬虫抓取
9 years ago
chy996633 afd1617b58 知乎爬虫抓取 9 years ago
x1ny 90e14b31b0 修正FileCacheQueueScheduler导致程序不能正常结束和未关闭流
FileCacheQueueScheduler中开启了一个线程周期运行来保存数据但在爬虫结束后没有关闭导致程序无法结束,以及没有关闭io流。

解决方法:
让FileCacheQueueScheduler实现Closable接口,在close方法中关闭线程以及流。
在Spider的close方法中添加对scheduler的关闭操作。
9 years ago
Qiannan Lu 155215290f resolve issue #226 10 years ago
Qiannan Lu 21f81bb8c1 update deprecated method 10 years ago
bingoko 5d365f7bf4 update and validate pom.xml
Update selenium and GhostDriver (PhantomJSDriver) to latest version.
10 years ago
bingoko d3bbece202 Add PhantomJS support for selenium
The configuration file is config.ini
The dependencies are updated in pom.xml.
Update SeleniumDownloader and WebDriverPool to support PhantomJS. 
NOTE: The versions of GhostDriver, Selenium, and PhantomJS are stable
and validated.

A GooglePlay Example is under samples package: GooglePlayProcessor.java
10 years ago
yihua.huang 56e0cd513a compile error fix 10 years ago
yihua.huang c5740b1840 change assert #200 10 years ago
yihua.huang 67eb632f4d test for issue #200 10 years ago
Yihua Huang b30ca6ce1e Merge pull request #198 from okuc/master
修正site.setHttpProxy()不起作用的bug
10 years ago
高军 590561a6e4 修正site.setHttpProxy()不起作用的bug 10 years ago
edwardsbean 19474e4716 add SimpleProxyPool and IProxyPool 10 years ago
Yihua Huang 05a1f39569 Merge pull request #193 from EdwardsBean/fix-mppipeline
Bug fix:MultiPagePipeline and DoubleKeyMap concurrent bug
10 years ago
edwardsbean 74962d69b9 fix bug:MultiPagePipeline and DoubleKeyMap concurrent bug 10 years ago
Yihua Huang 6b9d21fcf3 Merge pull request #188 from EdwardsBean/retry_time
add retry sleep time
10 years ago
edwardsbean 4978665633 add retry sleep time 10 years ago
yihua.huang 862ceee674 groovy demo 10 years ago