Commit Graph

291 Commits (24e26c56449b45234ae69c370b159027c87d581b)

Author SHA1 Message Date
yihua.huang 01848301d4 encode illegal charactors in url #80 11 years ago
yihua.huang 2780423e60 enable blank space in quotes in UrlUtils.fixAllRelativeHrefs #80 11 years ago
yihua.huang 97b6f46280 Bugfix: break loop in addTargetRequests #81 11 years ago
yihua.huang 8d8194bee4 Change HashMap to LinkedHashMap in ResultItems for same order of input and output #76 11 years ago
yihua.huang 8b35d79569 Do not cache document in Selectable for selected Html element #73 11 years ago
yihua.huang 6201fd6966 add worker as container 11 years ago
yihua.huang 6c11718566 Clean project structure #70 11 years ago
yihua.huang 9606a173cd fix ZipCodePageProcessor 11 years ago
yihua.huang 4f68368db0 Merge branch 'master' of git.oschina.net:flashsword20/webmagic
Conflicts:
	webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
11 years ago
yihua.huang 98e2bba099 Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	pom.xml
	webmagic-core/pom.xml
	webmagic-extension/pom.xml
	webmagic-scripts/pom.xml
11 years ago
yihua.huang 757cc9b942 [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang 63ffb5c792 [maven-release-plugin] prepare release webmaigc-0.4.3 11 years ago
yihua.huang 66d4d3c192 Merge branch 'master' into 0.4.x 11 years ago
yihua.huang af07280176 remove defend code for httpclient 4.3.1 because it is fixed in 4.3.3 #59 11 years ago
yihua.huang d5a978e00f update version back to 0.4.3 11 years ago
yihua.huang 55368919df add attribute 'text' support for CssSelector #66 11 years ago
yihua.huang 88b50d4182 bigfix: cycleTry will not work when spawnUrl is set to false #62 11 years ago
yihua.huang 2768a1cae4 add test for cycleTriedTimes and fix cycleTriedTimes inc error #60 11 years ago
yihua.huang bbd0d7e600 update httpclient version to 4.3.3 #59 11 years ago
yihua.huang 571061454a #58 add CYCLE_TRIED_TIMES support to QueueScheduler and PriorityScheduler 11 years ago
yihua.huang 0e98183f74 Change log4j to slf4j #55 11 years ago
yihua.huang fa33b15843 property loader 11 years ago
yihua.huang af809c4d55 update version to 0.5.0-snapshot 11 years ago
Almark Ming 2b46b11e55 Update RegexSelector.java
Optimize regex format check

Conflicts:
	webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
11 years ago
Almark Ming 91ed66ecac Update RegexSelector.java 11 years ago
Almark Ming 83926970b2 Check valid left parenthesis 11 years ago
yihua.huang b51fb2696b update ut for cookie 11 years ago
yihua.huang ff2f588c41 #48 nullpointer exception 11 years ago
yihua.huang fc97cb58c5 update lib and version 11 years ago
yihua.huang 7c41bec92f Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	webmagic-samples/pom.xml
	webmagic-selenium/pom.xml
11 years ago
yihua.huang d274310cb2 [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang e8c32a32dc [maven-release-plugin] prepare release webmagic-0.4.2 11 years ago
yihua.huang 6a828e923c #46 Downloader thread hang up when timeout 11 years ago
shijinping 9a524aa364 double-check 中再取次httpClient的内容 11 years ago
yihua.huang fd23cb6dc0 Merge branch 'master' of github.com:code4craft/webmagic
Conflicts:
	README.md
	pom.xml
	webmagic-samples/pom.xml
	webmagic-selenium/pom.xml
11 years ago
yihua.huang e7083dc39d [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang ae623567b3 [maven-release-plugin] prepare release webmagic-0.4.1 11 years ago
yihua.huang 59ad4cad27 #42 Add jsonpath in annotation mode for json result 11 years ago
yihua.huang c2d6d495b3 #41 add getThreadAlive(),getStatus,getPageCount() to spider 11 years ago
yihua.huang cf62d707e0 #36 Spider does not exit when success 11 years ago
yihua.huang a01312930a #39 Parsing html after page.getHtml() 11 years ago
yihua.huang f63d33b457 update some comments 11 years ago
yihua.huang 04fcf3193f #38 Change algorithm of SmartContentSelector 11 years ago
yihua.huang 296a68920e fix javadoc and add setPipelines() for spider 11 years ago
yihua.huang 47a0360783 #35 add status code to page 11 years ago
yihua.huang bc5c30de17 update scripts 11 years ago
yihua.huang f9daae39cf [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang fdb9441519 [maven-release-plugin] prepare release webmagic-0.4.0 11 years ago
yihua.huang 1d75ae7f5b rollback version to 0.4.0 because not deploy success 11 years ago
yihua.huang df8ca8ad09 add scripts 11 years ago
yihua.huang e40b48e77b Merge tag 'webmagic-0.4.0' of github.com:code4craft/webmagic
[maven-release-plugin]  copy for tag webmagic-0.4.0

Conflicts:
	pom.xml
	webmagic-core/pom.xml
	webmagic-extension/pom.xml
11 years ago
yihua.huang 775eb9732f [maven-release-plugin] prepare for next development iteration 11 years ago
yihua.huang 0b4fadc24d [maven-release-plugin] prepare release webmagic-0.4.0 11 years ago
yihua.huang fe6d9bb2e2 get keep-alive rework 11 years ago
yihua.huang fd6d2fd6f8 try to keepalive TCP connection 11 years ago
yihua.huang 425df08523 update version to 0.4.0 11 years ago
yihua.huang e046bb0723 remove useless code 11 years ago
yihua.huang 6e32a19f80 update api for direct download 11 years ago
yihua.huang 807aefe9df change EntityUtil to IOUtil because some encoding error 11 years ago
yihua.huang 00b0a751b4 #33 ignore 'content-encoding' when redirect 11 years ago
yihua.huang 8f774afc84 add direct download 11 years ago
yihua.huang c18b603399 optimize long compare 11 years ago
yihua.huang ed3f3583cc downloader refactor 11 years ago
yihua.huang a37f40e6e6 add cookie supoort 11 years ago
yihua.huang 3c6fced48e update connection client 11 years ago
yihua.huang 09153ff715 #22 http proxy support #32 update httpclient to 4.3.1 11 years ago
yihua.huang edfc319c45 update httpclient to 4.3.1 11 years ago
yihua.huang 160a149b05 todo bugfix 11 years ago
yihua.huang 583a0eba8c #29 refactor some method name 11 years ago
yihua.huang 6fa82a418b #29 seed urls with more information 11 years ago
yihua.huang 1446ada732 some refactor 11 years ago
yihua.huang 84976c81ec remove useless code 11 years ago
yihua.huang b4fcf41168 add exit when comlete option 11 years ago
yihua.huang 352887870c remove shutdown call 11 years ago
yihua.huang a3f9ad198f refactor multi thread code in Spider 11 years ago
yihua.huang 7fb44d2eec #30 reuse PoolingClientConnectionManager for HttpClientDownloader 11 years ago
yihua.huang 5a226387e0 #27 nullpointer fix 11 years ago
yihua.huang 16e12e3bc9 #27 customize http header for downloader 11 years ago
yihua.huang 1a2c84ea78 #27 add timeout config to site 11 years ago
yihua.huang 372cc0ad06 update jar 12 years ago
yihua.huang 4acbc19cee [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang cc3b787991 [maven-release-plugin] prepare release webmagic-0.3.2 12 years ago
yihua.huang b131878123 add example 12 years ago
yihua.huang 95ab4edec3 some bugfix 12 years ago
yihua.huang fba330872b fix a thread pool exception 12 years ago
yihua.huang 3c79d031bd fix thread pool 12 years ago
yihua.huang a2fba8caa2 update to 0.3.1 12 years ago
yihua.huang fb693a4ac4 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang bfaaa042b9 [maven-release-plugin] prepare release webmagic-parent-0.3.1 12 years ago
yihua.huang c17a31a21d fix null pointe exception #26 12 years ago
yihua.huang d2e0f0cd33 #25 use URL api in UrlUtils.canonicalizeUrl() 12 years ago
yihua.huang ef4cf49fee add stop method to spider #24 12 years ago
yihua.huang 58150a090d update jar 12 years ago
yihua.huang 57556ab879 merege 12 years ago
yihua.huang 692de76f86 fix issue #21 charset detect error 12 years ago
yihua.huang e7bf425df4 [maven-release-plugin] prepare for next development iteration 12 years ago
yihua.huang 77ff252316 [maven-release-plugin] prepare release webmagic-0.3.0 12 years ago
yihua.huang 1fc8e104ab add cycle retry 12 years ago
yihua.huang d141541ef3 add retry 12 years ago
yihua.huang a1ef2523cc update xsoup version 12 years ago