Commit Graph

545 Commits (snyk-fix-4eb94e676367a3667c00737cdabdf3de)

Author SHA1 Message Date
yihua.huang 8e4814bdc5 fix path seperator 11 years ago
yihua.huang e8d4a9be2b fix remove duplicate error #117 11 years ago
yihua.huang 04ade75606 Merge branch 'stable' of github.com:code4craft/webmagic
Conflicts:
	README.md
	pom.xml
	webmagic-avalon/pom.xml
	webmagic-core/pom.xml
	webmagic-extension/pom.xml
	webmagic-lucene/pom.xml
	webmagic-samples/pom.xml
	webmagic-saxon/pom.xml
	webmagic-scripts/pom.xml
	webmagic-selenium/pom.xml
11 years ago
yihua.huang a08d8cb167 update verion 11 years ago
yihua.huang 42a2676e8c update version 11 years ago
yihua.huang c25b32f1ca [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang 7ff83bb11a [maven-release-plugin] prepare release WebMagic-0.5.0 11 years ago
yihua.huang 1104122979 more abstraction in scheduler 11 years ago
yihua.huang 2770811a10 update monitor example 11 years ago
yihua.huang 5ecd909ef2 add timeout for wait/notify #111 11 years ago
yihua.huang c7afdb516e remove thread utils #110 11 years ago
yihua.huang 17e95f2a7f comments 11 years ago
yihua.huang 05eb7831b6 refactor and comments #110 11 years ago
yihua.huang 375e64e845 more monitor status 11 years ago
yihua.huang 018061d2cd fix error in thread pool 11 years ago
yihua.huang cdc423f2bf log 11 years ago
yihua.huang c6661899fd new thread pool #110 11 years ago
yihua.huang 179baa7a22 return when page is null 11 years ago
yihua.huang 0336f4cdb4 remove IllegalStateException when download error for less error log 11 years ago
yihua.huang 11ba5beb42 [refactor]move monitor to webmagic-extension #98 11 years ago
yihua.huang d61f65cef8 update mbean to mxbean #98 11 years ago
yihua.huang ad6a273b12 update test url 11 years ago
yihua.huang 30af23d003 split monitor to server and client mode #98 11 years ago
yihua.huang ced79630d3 specify jndi and jmx #98 11 years ago
yihua.huang 95d3802e77 add formdata support for post request #108 11 years ago
yihua.huang f49bb877c8 clean some code #109 11 years ago
yihua.huang e1aaf1dd11 fix mistake of guava Table #109 11 years ago
yihua.huang 8ba2da146c request method #108 and more cookie #109 config 11 years ago
yihua.huang b06aa489fb [BugFix]Only one url from sourceRegion can be extracted #107 11 years ago
Bo LIANG 08fa3b01c1 when download error, throw an exception instead of calling onError and returning peacefully. #105 11 years ago
yihua.huang 27b37e8164 extension point and sample for JMX support #98 11 years ago
yihua.huang a5db6cf292 some monitor and JMX support #98 11 years ago
yihua.huang f39aa435cf add null check #104 11 years ago
yihua.huang 42bbe40a37 [Bugfix]Urls will be lost when call setScheduler() #104 11 years ago
Bo LIANG 163773af6b combine two try-catch block into one, make it cleaner. 11 years ago
yihua.huang ec446277b1 some refactor in httpclientdownloader 11 years ago
yihua.huang a03f6a8431 eclipse project 11 years ago
yihua.huang 4a035e729a extension point for LocalDuplicatedRemovedScheduler #95 11 years ago
yihua.huang b249e49748 [Bugfix]loop error when add TargetRequest #99 11 years ago
Yihua Huang da2f023c12 Merge pull request #96 from ouyanghuangzheng/master
修改了Spider 和site  几处注释
11 years ago
yihua.huang f7950ebcab fix tests 11 years ago
愤怒的番茄 32ba1b8889 修复几处注释问题 11 years ago
yihua.huang 84b897f83b update AngularJSProcessor 11 years ago
yihua.huang 03c251237b add Json parse support 11 years ago
愤怒的番茄 644e8d1f72 同步官方源码 11 years ago
yihua.huang 969ad1766b change logger style to slf4j for cleaner code 11 years ago
yihua.huang 9b2cb43f47 ConfigurablePageProcessor #91 11 years ago
Bo LIANG b043ac76d6 change the formatter of log.
To use slf4j, we should insert {} into the formatter string.
11 years ago
yihua.huang 7aaf837e15 change logger to slf4j style for performance #84 11 years ago
yihua.huang f9b157951d Merge branch 'master' of github.com:code4craft/webmagic 11 years ago
yihua.huang 22c394e629 [doc] 11 years ago
Bo LIANG 762a3973fd Modify the log levels of LocalDuplicatedRemovedScheduler.java
The old version will print a debug level log each time the push method is
called. So sometimes, when a html page have multiple links for the same
page, the debug log will appears more than once. Also, when we meet a
duplicate URL, it will also print a log, which will be confusing.
I change the level of it to trace. And each time a URL is really push into
queue, print a debug level log.
11 years ago
yihua.huang a1c7e826f7 fix dep of slf4j-log4j12 11 years ago
yihua.huang 01848301d4 encode illegal charactors in url #80 11 years ago
yihua.huang 2780423e60 enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80 11 years ago
yihua.huang 97b6f46280 Bugfix: break loop in addTargetRequests #81 11 years ago
yihua.huang 8d8194bee4 Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76 11 years ago
yihua.huang 8b35d79569 Do not cache document in Selectable for selected Html element #73 11 years ago
yihua.huang 6201fd6966 add worker as container 11 years ago
yihua.huang 6c11718566 Clean project structure #70 11 years ago
yihua.huang 9606a173cd fix ZipCodePageProcessor 11 years ago
yihua.huang 4f68368db0 Merge branch 'master' of git.oschina.net:flashsword20/webmagic
Conflicts:
	webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
11 years ago
yihua.huang 98e2bba099 Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	pom.xml
	webmagic-core/pom.xml
	webmagic-extension/pom.xml
	webmagic-scripts/pom.xml
11 years ago
yihua.huang 757cc9b942 [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang 63ffb5c792 [maven-release-plugin] prepare release webmaigc-0.4.3 11 years ago
yihua.huang 66d4d3c192 Merge branch 'master' into 0.4.x 11 years ago
yihua.huang af07280176 remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59 11 years ago
yihua.huang d5a978e00f update version back to 0.4.3 11 years ago
yihua.huang 55368919df add attribute 'text' support for CssSelector #66 11 years ago
yihua.huang 88b50d4182 bigfix: cycleTry will not work when spawnUrl is set to false #62 11 years ago
yihua.huang 2768a1cae4 add test for cycleTriedTimes and fix cycleTriedTimes inc error #60 11 years ago
yihua.huang bbd0d7e600 update httpclient version to 4.3.3 #59 11 years ago
yihua.huang 571061454a #58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler 11 years ago
yihua.huang 0e98183f74 Change log4j to slf4j #55 11 years ago
yihua.huang fa33b15843 property loader 11 years ago
yihua.huang af809c4d55 update version to 0.5.0-snapshot 11 years ago
Almark Ming 2b46b11e55 Update RegexSelector.java
Optimize regex format check

Conflicts:
	webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
11 years ago
yihua.huang 2a8e1b654d Merge branch 'master' of git.oschina.net:flashsword20/webmagic into osc
Conflicts:
	pom.xml
11 years ago
Almark Ming 91ed66ecac Update RegexSelector.java 11 years ago
Almark Ming 83926970b2 Check valid left parenthesis 11 years ago
yihua.huang b51fb2696b update ut for cookie 11 years ago
yihua.huang ff2f588c41 #48 nullpointer exception 11 years ago
yihua.huang fc97cb58c5 update lib and version 11 years ago
yihua.huang 7c41bec92f Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	webmagic-samples/pom.xml
	webmagic-selenium/pom.xml
11 years ago
yihua.huang d274310cb2 [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang e8c32a32dc [maven-release-plugin] prepare release webmagic-0.4.2 11 years ago
yihua.huang 6a828e923c #46 Downloader thread hang up when timeout 11 years ago
shijinping 9a524aa364 double-check 中再取次httpClient的内容 11 years ago
yihua.huang fd23cb6dc0 Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	pom.xml
	webmagic-samples/pom.xml
	webmagic-selenium/pom.xml
11 years ago
yihua.huang e7083dc39d [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang ae623567b3 [maven-release-plugin] prepare release webmagic-0.4.1 11 years ago
yihua.huang 59ad4cad27 #42 Add jsonpath in annotation mode for json result 11 years ago
yihua.huang c2d6d495b3 #41 add getThreadAlive(),getStatus,getPageCount() to spider 11 years ago
yihua.huang cf62d707e0 #36 Spider does not exit when success 11 years ago
yihua.huang a01312930a #39 Parsing html after page.getHtml() 11 years ago
yihua.huang f63d33b457 update some comments 11 years ago
yihua.huang 04fcf3193f #38 Change algorithm of SmartContentSelector 11 years ago
yihua.huang 296a68920e fix javadoc and add setPipelines() for spider 11 years ago
yihua.huang 47a0360783 #35 add status code to page 11 years ago
yihua.huang bc5c30de17 update scripts 11 years ago
yihua.huang f9daae39cf [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang fdb9441519 [maven-release-plugin] prepare release webmagic-0.4.0 11 years ago
yihua.huang 1d75ae7f5b rollback version to 0.4.0 because not deploy success 11 years ago
yihua.huang df8ca8ad09 add scripts 11 years ago
yihua.huang e40b48e77b Merge tag 'webmagic-0.4.0' of github.com:code4craft/webmagic
[maven-release-plugin]  copy for tag webmagic-0.4.0

Conflicts:
	pom.xml
	webmagic-core/pom.xml
	webmagic-extension/pom.xml
11 years ago
yihua.huang 775eb9732f [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang 0b4fadc24d [maven-release-plugin] prepare release webmagic-0.4.0 11 years ago
yihua.huang fe6d9bb2e2 get keep-alive rework 11 years ago
yihua.huang fd6d2fd6f8 try to keepalive TCP connection 11 years ago
yihua.huang 425df08523 update version to 0.4.0 11 years ago
yihua.huang e046bb0723 remove useless code 11 years ago
yihua.huang 6e32a19f80 update api for direct download 11 years ago
yihua.huang 807aefe9df change EntityUtil to IOUtil because some encoding error 11 years ago
yihua.huang 00b0a751b4 #33 ignore 'content-encoding' when redirect 11 years ago
yihua.huang 8f774afc84 add direct download 11 years ago
yihua.huang c18b603399 optimize long compare 11 years ago
yihua.huang ed3f3583cc downloader refactor 11 years ago
yihua.huang a37f40e6e6 add cookie supoort 11 years ago
yihua.huang 3c6fced48e update connection client 11 years ago
yihua.huang 09153ff715 #22 http proxy support #32 update httpclient to 4.3.1 11 years ago
yihua.huang edfc319c45 update httpclient to 4.3.1 11 years ago
yihua.huang 160a149b05 todo bugfix 11 years ago
yihua.huang 583a0eba8c #29 refactor some method name 11 years ago
yihua.huang 6fa82a418b #29 seed urls with more information 11 years ago
yihua.huang 1446ada732 some refactor 11 years ago
yihua.huang 84976c81ec remove useless code 11 years ago
yihua.huang b4fcf41168 add exit when comlete option 11 years ago
yihua.huang 352887870c remove shutdown call 11 years ago
yihua.huang a3f9ad198f refactor multi thread code in Spider 11 years ago
yihua.huang 7fb44d2eec #30 reuse PoolingClientConnectionManager for HttpClientDownloader 11 years ago
yihua.huang 5a226387e0 #27 nullpointer fix 11 years ago
yihua.huang 16e12e3bc9 #27 customize http header for downloader 11 years ago
yihua.huang 1a2c84ea78 #27 add timeout config to site 11 years ago
yihua.huang 372cc0ad06 update jar 12 years ago
yihua.huang 4acbc19cee [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang cc3b787991 [maven-release-plugin] prepare release webmagic-0.3.2 12 years ago
yihua.huang b131878123 add example 12 years ago
yihua.huang 95ab4edec3 some bugfix 12 years ago
yihua.huang fba330872b fix a thread pool exception 12 years ago
yihua.huang 3c79d031bd fix thread pool 12 years ago
yihua.huang a2fba8caa2 update to 0.3.1 12 years ago
yihua.huang fb693a4ac4 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang bfaaa042b9 [maven-release-plugin] prepare release webmagic-parent-0.3.1 12 years ago
yihua.huang c17a31a21d fix null pointe exception #26 12 years ago
yihua.huang d2e0f0cd33 #25 use URL api in UrlUtils.canonicalizeUrl() 12 years ago
yihua.huang ef4cf49fee add stop method to spider #24 12 years ago
yihua.huang 58150a090d update jar 12 years ago
yihua.huang 57556ab879 merege 12 years ago
yihua.huang 692de76f86 fix issue #21 charset detect error 12 years ago
yihua.huang e7bf425df4 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang 77ff252316 [maven-release-plugin] prepare release webmagic-0.3.0 12 years ago
yihua.huang 1fc8e104ab add cycle retry 12 years ago
yihua.huang d141541ef3 add retry 12 years ago
yihua.huang a1ef2523cc update xsoup version 12 years ago
yihua.huang aefd0569a5 update version 12 years ago
yihua.huang 194518fd82 add switch 12 years ago
yihua.huang 326b97c65a update 12 years ago
yihua.huang 2c3574537a refactor in selectors 12 years ago
yihua.huang 85b7cf1563 complete test 12 years ago
yihua.huang d7cd9e5747 update pom 12 years ago
yihua.huang 55d4a76ab7 newselectors 12 years ago
yihua.huang d7abbd0e4b fix compile error 12 years ago
yihua.huang 5e9e8b2541 add TextContentSelector 12 years ago
yihua.huang 0cc0ccee35 add charset specific for easy call of HttpClientDownloader 12 years ago
yihua.huang 91dcccf7b5 add a sample 12 years ago
yihua.huang ad66d33f38 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang 9dc6b11954 [maven-release-plugin] prepare release webmagic-parent-0.2.1 12 years ago
yihua.huang 4f62dfc8a4 release 12 years ago
yihua.huang 74c940c758 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang a4bb4e3429 [maven-release-plugin] prepare release webmagic-parent-0.2.1 12 years ago
yihua.huang 194f16aa75 update 12 years ago
yihua.huang 0f0f1a9bcd release notes 12 years ago
yihua.huang c1471718df extractors 12 years ago
yihua.huang 20705b34ac add more option to extractors 12 years ago
yihua.huang c70ed57025 remove PriorityScheduler to core 12 years ago
yihua.huang 7003426898 update pom 12 years ago
yihua.huang 606417fdc7 update pom 12 years ago
yihua.huang d460e136ef update version 12 years ago
yihua.huang c79d6ecf09 complete all comments 12 years ago
yihua.huang 90bbe9b951 webmagic-core 12 years ago
yihua.huang 17f8ead28f update comments for selector 12 years ago
yihua.huang 77e6ca2945 update comments 12 years ago
yihua.huang 5073258237 closable 12 years ago
yihua.huang d01c0eb8ce update comments of spider 12 years ago
yihua.huang 5f1f4cbc46 update comments 12 years ago
yihua.huang 1148450ff9 update filecache to more useful 12 years ago
yihua.huang 3ba7a76f44 add combo extract to replace Extract2 Extract3... 12 years ago
yihua.huang 5cb45af3a4 +doc 12 years ago
yihua.huang ef673b985e add a method for httpclientdownloader 12 years ago
yihua.huang 067f3ea0cb add some null pointer check for httpclientdownloader 12 years ago
yihua.huang 9e82256ce3 update docs 12 years ago
yihua.huang 0a902b441c update docs 12 years ago
yihua.huang 0f2c5b5723 update redisscheduler 12 years ago
yihua.huang 787b952932 release notes and docs 12 years ago
yihua.huang 8b15f3c63d add test 12 years ago
yihua.huang ade5714d50 add https support 12 years ago
yihua.huang 21eca688e9 complete docs 12 years ago
yihua.huang 17d2d98cec remove invalid @date 12 years ago
yihua.huang 268bd8d0c4 remove saxon to extension 12 years ago
yihua.huang cff943f698 fix path format error 12 years ago
yihua.huang 5ef231a768 update version 12 years ago
yihua.huang 570533cce5 update readme 12 years ago
yihua.huang 36494bcfa5 add xpath2.0 api 12 years ago
yihua.huang 5c96407a3d fix a null domain error 12 years ago
yihua.huang c7005a0227 json fix 12 years ago
yihua.huang e5f4b3916f change file dir 12 years ago
yihua.huang 7d277e84d4 update lucene pipeline 12 years ago
yihua.huang b40cca1122 move model package to plugin 12 years ago
yihua.huang 4eb3d60083 fix nullpointer exception 12 years ago
yihua.huang b0af45f4bb complete redis support 12 years ago
yihua.huang f3a29d9315 fix pagedmodel bug 12 years ago
yihua.huang 629f8ac2d1 add extractors chain 12 years ago
yihua.huang 27ce3fc176 lazy init 12 years ago
yihua.huang dc9f574e27 update request 12 years ago
yihua.huang d56c681be1 add priority to request 12 years ago
yihua.huang 971e7b6ce2 add core 12 years ago
yihua.huang 619a12b303 add paged support 12 years ago
yihua.huang a5c85c3c8b add annotation ExtractByRaw 12 years ago
yihua.huang 1a50c64e33 update name 12 years ago
yihua.huang a3a868f584 rename 12 years ago
yihua.huang 04a7fa037a update pipeline 12 years ago
yihua.huang 21cae2ff2e update package 12 years ago
yihua.huang cfb8990453 update author 12 years ago
yihua.huang b393e38320 add multi entity extract 12 years ago
yihua.huang bfadac756a fix an attribute bug 12 years ago
yihua.huang 145628557d update afterextract api 12 years ago
yihua.huang aca165b132 add and or selector 12 years ago
yihua.huang 69245e8c03 fix Class.assinable bug 12 years ago
yihua.huang 65518f7672 add list support 12 years ago
yihua.huang d4de60a562 skip test 12 years ago
yihua.huang d26cd82d59 rename package 12 years ago
yihua.huang f84b53514f complete objectpipeline 12 years ago
yihua.huang 866ab0a056 update email 12 years ago
yihua.huang 7c9e9ce869 xpath2.0 12 years ago
yihua.huang 7f27c28d4c simplify api 12 years ago
yihua.huang d7899e94ae test saxon and invite XPath2.0 support 12 years ago
yihua.huang 3fe3d8f044 update 12 years ago
yihua.huang 516ff3310d add failfast 12 years ago
yihua.huang 7a4dbb1f15 invite notnull 12 years ago
yihua.huang 06a39af0f3 add setter support 12 years ago
yihua.huang abba3b7bff add extract by url 12 years ago
yihua.huang f08ffc34fd rename 12 years ago
yihua.huang c5cf05640a processor 12 years ago
yihua.huang 50edd22ef6 add annotation 12 years ago
yihua.huang 7020b8648d fix a thread problem 12 years ago
yihua.huang 52fd5cfc1c fix encoding 12 years ago
yihua.huang e87aabf8fd 为downloader增加了一个新方法,可设置线程数 12 years ago
yihua.huang 18fefa0c0a fix a spider init problem 12 years ago
yihua.huang 54904851ea add list output support 12 years ago
yihua.huang 42508af041 add huaban processor 12 years ago
yihua.huang fe224cbf66 release resource 12 years ago
yihua.huang 86a20eabd9 fix a httpclient pool size bug 12 years ago
yihua.huang fed3c0c98a update readme 12 years ago
yihua.huang d3e527fd6b try invite selenium 12 years ago
yihua.huang c2142f872b add iteye sample 12 years ago
yihua.huang 65dc372152 update pipeline api 12 years ago
yihua.huang cea866520d update version 12 years ago
yihua.huang de006333c8 update java docs 12 years ago
yihua.huang 827972d80f update java docs 12 years ago
yihua.huang 96454fd74c update java doc 12 years ago
yihua.huang 81e7f7982e invite jsoup and cssselector 12 years ago
yihua.huang c733046045 +sina blog 12 years ago
yihua.huang 2b34dc9d3f add retry 12 years ago
yihua.huang 5c79550fd9 add offline cache and process 12 years ago
yihua.huang a7316a1f57 add runasync 12 years ago
yihua.huang cad2594a08 add multithread support 12 years ago
yihua.huang 5a6a68a318 add gzip support 12 years ago
yihua.huang adeed3bcaf add extra 12 years ago
yihua.huang a0bcfb8567 add extra for page 12 years ago
yihua.huang 7e17c71c3e add page skip 12 years ago
yihua.huang 9b1ba6e8bc ignore unstable test 12 years ago
yihua.huang 5cfdb10f81 update api to support jdk 1.6 12 years ago
yihua.huang e1e25cb5e7 update javadoc 12 years ago
yihua.huang b1f023ead5 fix spell error=.= 12 years ago
yihua.huang 7bed01c9f2 update Spider api 12 years ago
yihua.huang 986ae0beaf update Select api: remove x() s() etc. 12 years ago
yihua.huang 586d23ef63 add package infos 12 years ago
yihua.huang 956d5cb3c8 docs 12 years ago
yihua.huang fb0797b65c update docs 12 years ago
yihua.huang 8f954c7997 fix samples 12 years ago
yihua.huang 312e1bce87 fix compile error 12 years ago
yihua.huang 49a4ad66d3 add uuid to spider 12 years ago
yihua.huang 6428e20543 add id 12 years ago
yihua.huang 0ae7adf324 add cookie support & add docs 12 years ago
yihua.huang 8cef8774cb change author info 12 years ago
yihua.huang 328f174d11 fix pom 12 years ago
yihua.huang f0fa1dad07 clean some code 12 years ago
yihua.huang 01f49aad3c fix a pom error 12 years ago
yihua.huang 1c1bf89522 Merge branch 'master' of github.com:code4craft/webmagic 12 years ago
yihua.huang 8774cce7da files 12 years ago
黄亿华 906e68cbfa update comment 12 years ago
yihua.huang ecb61d1385 update pipeline 12 years ago
yihua.huang 755b9aa84e remove samples in test 12 years ago
yihua.huang 9d04fe3a76 split modules 12 years ago
yihua.huang 6dc88fa111 split modules 12 years ago