yihua.huang
1fbfc92de2
Inherit support of Field annotation in Model #103
11 years ago
Yihua Huang
93c4a2afb7
Merge pull request #102 from ccliangbo/waitNewUrl
...
combine two try-catch block into one, make it cleaner.
11 years ago
Bo LIANG
163773af6b
combine two try-catch block into one, make it cleaner.
11 years ago
yihua.huang
c8014a9ae6
update readme
11 years ago
yihua.huang
ec446277b1
some refactor in httpclientdownloader
11 years ago
yihua.huang
4a035e729a
extension point for LocalDuplicatedRemovedScheduler #95
11 years ago
yihua.huang
b249e49748
[Bugfix]loop error when add TargetRequest #99
11 years ago
yihua.huang
3a79b1b64a
[Bugfix]formatter property does not work when field is String#100
11 years ago
Yihua Huang
cc9d319fd9
Merge pull request #94 from sebastian1118/master
...
update:PatternHandler
11 years ago
Yihua Huang
da2f023c12
Merge pull request #96 from ouyanghuangzheng/master
...
修改了Spider 和site 几处注释
11 years ago
yihua.huang
f7950ebcab
fix tests
11 years ago
yihua.huang
b14f0ee479
fix jsonpath in AngularJSProcessor
11 years ago
愤怒的番茄
32ba1b8889
修复几处注释问题
11 years ago
yihua.huang
84b897f83b
update AngularJSProcessor
11 years ago
yihua.huang
03c251237b
add Json parse support
11 years ago
Tian
99e12aafaa
update:PatternHandler
11 years ago
愤怒的番茄
53184f0390
test
11 years ago
愤怒的番茄
644e8d1f72
同步官方源码
11 years ago
愤怒的番茄
610ac42c07
更新
11 years ago
愤怒的番茄
5b254e446b
更新
11 years ago
yihua.huang
843e928c2c
comments on sinablogprocessor sample
11 years ago
yihua.huang
be37d8b216
sinablogprocessor sample
11 years ago
yihua.huang
094f9d1552
rename assets for spell mistake
11 years ago
yihua.huang
2b023c95c2
qqmeishi demo
11 years ago
yihua.huang
db65dfafb8
add baidunews sample
11 years ago
yihua.huang
3669e73e4a
update News163: use Xsoup 0.2.0 syntax instead of ComboExtract
11 years ago
yihua.huang
02b441ad38
disable NativeObject in Rhino because it is a hotspot internal api and compile error in OpenJDK #93
11 years ago
yihua.huang
9f5a6494a0
add support for JDK6 #93
11 years ago
yihua.huang
c6c56ad511
Merge branch 'master' of github.com:code4craft/webmagic
11 years ago
yihua.huang
c2873928c8
[prototype] extractrule
11 years ago
Yihua Huang
7cb4e37812
Merge pull request #93 from friddle/master
...
update the script
11 years ago
friddle
933800147b
update ruby
11 years ago
friddle
37666a7151
update the script
11 years ago
yihua.huang
c1e7207869
add FileCacheQueueScheduler support for cycleRetryTimes
11 years ago
yihua.huang
969ad1766b
change logger style to slf4j for cleaner code
11 years ago
yihua.huang
9b2cb43f47
ConfigurablePageProcessor #91
11 years ago
Yihua Huang
1090d070d9
Merge pull request #90 from ccliangbo/removeUnusedLines
...
Remove unused variable to make the project cleaner.
11 years ago
Bo LIANG
159eeea2f5
Remove unused variable to make the project cleaner.
11 years ago
yihua.huang
c143fc662c
add SubPageProcessor #86
11 years ago
Yihua Huang
2b2ce9ce13
Merge pull request #89 from ccliangbo/slf4jFormat
...
change the formatter of log.
11 years ago
Bo LIANG
b043ac76d6
change the formatter of log.
...
To use slf4j, we should insert {} into the formatter string.
11 years ago
Yihua Huang
474f785dab
Merge pull request #86 from sebastian1118/master
...
new feature: PatternProcessor
11 years ago
yihua.huang
8fe967ba8d
[BugFix]exclude log4j.xml from maven jar plugin #82
11 years ago
Tian
38a12f8641
new feature: PatternProcessor
11 years ago
yihua.huang
dafd0b5875
[BugFix]multi model in one pageprocessor will be skipped #85
11 years ago
yihua.huang
7aaf837e15
change logger to slf4j style for performance #84
11 years ago
yihua.huang
f9b157951d
Merge branch 'master' of github.com:code4craft/webmagic
11 years ago
yihua.huang
22c394e629
[doc]
11 years ago
Yihua Huang
3efa774191
Merge pull request #84 from ccliangbo/logInScheduler
...
Modify the log levels of LocalDuplicatedRemovedScheduler.java
11 years ago
Bo LIANG
762a3973fd
Modify the log levels of LocalDuplicatedRemovedScheduler.java
...
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
11 years ago