You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
fxsjy 51765aa6dd first commit 13 years ago
jieba first commit 13 years ago
.gitattributes first commit 13 years ago
.gitignore first commit 13 years ago
README.md doc 13 years ago
test.py first commit 13 years ago
test2.py first commit 13 years ago

README.md

jieba

结巴中文分词:像结巴说话一样把所有词都分出来

Usage

  • 将jieba目录放置于当前目录或者site-packages目录

代码示例

#encoding=utf-8
import jieba

seg_list = jieba.cut("我爱北京天安门")
print "/ ".join(seg_list)

Output: * 我/ 爱/ 北/ 北京/ 天/ 天安/ 天安门/ 门

Performance

  • 1.5 MB / Second
  • Test Env: Intel(R) Core(TM) i7-2600 CPU @ 3.4GHz;《围城》.txt

Example