Commit Graph

1016 Commits (cc2d7af70a5a8adba06071494c1f9d87df7853ca)
 

Author SHA1 Message Date
yihua.huang 3734865a6a fix package name =.= 11 years ago
yihua.huang e7668e01b8 fix SourceRegion error and add some tests on it #144 11 years ago
yihua.huang 4e5ba02020 fix test cont' 11 years ago
yihua.huang 4446669c24 fix test 11 years ago
yihua.huang 9866297ec4 Disable jsoup entity escape by Default. Set Html.DISABLE_HTML_ENTITY_ESCAPE to false to enable it. #149 11 years ago
yihua.huang 4e6e946dd7 more friendly exception message in PlainText #144 11 years ago
yihua.huang ebb931e0bf update assertj to test scope 11 years ago
yihua.huang af9939622b move thread package out of selector (because it is add by mistake at the beginning) 11 years ago
yihua.huang 2fd8f05fe2 change path seperator for varient OS #139 11 years ago
yihua.huang eae37c868b new sample 11 years ago
yihua.huang b3a282e58d some fix for tests #130 11 years ago
yihua.huang b75e64a61b t push origin masterMerge branch 'yxssfxwzy-proxy' 11 years ago
yihua.huang 074d767f45 Merge branch 'proxy' of github.com:yxssfxwzy/webmagic into yxssfxwzy-proxy 11 years ago
zwf 2f89cfc31a add test and fix bug of proxy module 11 years ago
yihua.huang 4efd471840 remove duplicate jar 11 years ago
yihua.huang 435922f00d Merge branch 'stable' of github.com:code4craft/webmagic 11 years ago
yihua.huang eb89d66566 fix test 11 years ago
yihua.huang 2a15bc0289 contributor 11 years ago
yihua.huang 5e8ca02ec6 contributor 11 years ago
yihua.huang baeb919cbe update bin 11 years ago
yihua.huang 8c33be48a6 Merge branch 'stable' of github.com:code4craft/webmagic 11 years ago
yihua.huang db0195babb update version in docs 11 years ago
yihua.huang 5f8c3fd5c5 update version 11 years ago
yihua.huang 0e9042eefa update pom 11 years ago
yihua.huang 03170178c4 update pom 11 years ago
yihua.huang c83b74f0f4 update pom for deploy 11 years ago
yihua.huang 7a64847a3c Bugfix: selector does not works well in element #113 11 years ago
yihua.huang 8d67fd0357 change back return proxy from spider to httpclientdownloader #128 11 years ago
yihua.huang 40bf8ca58f change return proxy from spider to httpclientdownloader #128 11 years ago
yihua.huang 1f21d9cc14 spell mistake fix #128 11 years ago
Yihua Huang e310139d00 Merge pull request #128 from yxssfxwzy/proxy
多个代理的管理
11 years ago
yihua.huang b165090434 Bugfix:Type convert error in JsonPathSelector #129 11 years ago
yihua.huang 95bdb30296 update xsoup version to release #113 11 years ago
yihua.huang a5d1b56e44 fix ut #113 11 years ago
yihua.huang 3939074a23 Bugfix: nodes() only return the first element #113 11 years ago
yihua.huang 41c2ea9498 refactor of selectable cont' #113
1. remove lazy init of Html
2. rename strings to sourceTexts for better meaning
3. make getSourceTexts abstract and DO NOT always store strings
4. instead store parsed elements of document in HtmlNode
11 years ago
yihua.huang f9825c214a refactor selectable for html fragment #113 11 years ago
yihua.huang 03d26c169b Enhance auto charset detect #126
1. Only read from content once to fix stream closed exception
2. invite moco as server test
11 years ago
zwf c146e2c7b4 add proxy pool 11 years ago
zwf 07ea04223f change_gitignore 11 years ago
yihua.huang 21982d3460 remove cpdetector temporary #126 11 years ago
fengwuze fcbfb75608 修改自动从网页中获取字符的代码块,抽取出来成为单独的方法。 11 years ago
fengwuze 95494d3c4d 增加处理meta的逻辑。
遗留:
3、网页没有指定编码的情况下,需要采用cpdetector,但目前cpdetector这个在Maven的中央库里面没有,不清楚如何解决。
11 years ago
yihua.huang dde2d89bbe Ignore content in json when bracket when remove padding #124 11 years ago
Yihua Huang 2913da4763 Merge pull request #123 from gsh199449/master
Update JsonFilePipeline.java #122
11 years ago
yihua.huang 928f98dd93 auto create folder in JsonFilePipeline #122 11 years ago
GaoShen 5883ed93d7 Update JsonFilePipeline.java
JsonFilePipeline可以自动新建尚不存在的文件夹
11 years ago
Yihua Huang 4e65dac249 Merge pull request #121 from ywooer/master
创建指定编码的Writer
11 years ago
ywooer 259f0a16c5 Update FilePipeline.java 11 years ago
ywooer 26d38851b5 add charset to Writer 11 years ago