Commit Graph

45 Commits (237dc6625e5c65d7a2714ffdfa5238dba5cae7d4)

Author SHA1 Message Date
Sun Junyi cb1b0499f7 unittest for jieba.tokenize
Sun Junyi 11a3b10755 new method: jieba.tokenize
Sun Junyi c0816b9bb0 more mixed words
Sun Junyi c9e8da9e63 add more mix words to dict.txt
fxsjy 0087a4e7e3 adjust prob_trans for better support of name entity; fix some bad cases
Sun Junyi 4300f79788 add a example of using sklearn+jieba
Sun Junyi a8f902545c fix some bad cases
cloudaice 9ee20a5293 add generator test
cloudaice 0c050b5eb2 add jieba.posseg test case
cloudaice b0f9e6721e 添加cutall 测试用例
cloudaice a7ff398edc 添加cut,set_dictionary,cut_for_search三个测试用例
cloudaice 667203a9ae 替换tab为空格,使用join代替循环
cloudaice a2d2078465 将tab换成空格,使用is判断对象是否为None
cloudaice e0434871eb 修改demo.py的代码格式,使得符合pep8规范
Sun Junyi c1bf815343 update test case
Sun Junyi 94d455b079 hot fix of cut_all=True
Sun Junyi 59d5d3b811 fix bug and change version
fxsjy 8666428fb0 fix a bug of changing dictionary
fxsjy 9bebe6120b utf-8 output is more friendly to Linux
Sun Junyi d3339633d5 in the speed test: initialize first to ignore the time of dict loading
fxsjy bc049090a5 make lazy load thread safe
fxsjy b46166f768 use CRLF as seperator to make chunks in parallel mode
fxsjy 6b83593b5a rm stub.log
fxsjy 62cf22121f new feature: parallel segment with multiprocessing
Sun Junyi 8d89e8afda handle 的
fxsjy 45591bb9ab support flag '_'; ignore white space
Sun Junyi 94ad7e7035 support decimal point
Sun Junyi a383f035ba support decimal point: example PI=3.141569 = > PI / = / 3.14159
Sun Junyi 8e49199993 keep punctuation marks
Sun Junyi 58c363655c support user defined word tag
Sun Junyi 6cc0e95759 rm 1.log
Sun Junyi d2634a049b fix a bug in pypy
Sun Junyi 06ebc6f71c en-chn mix words in POS
Sun Junyi a8ae0398b4 add one example
Sun Junyi 6517119110 remove 1.log
Sun Junyi 8c05efed68 remove tlbb.txt
Sun Junyi 379cd4933a support en-chn mixed words, like B超
Sun Junyi e0bd9a6a50 version chage; doc update
Sun Junyi 176c49d15c remove some files
Sun Junyi 59c3efeb2f improve speed of tagging
fxsjy 1a2a64a13f one more example of POS tagging
fxsjy 90cd4b3014 improve POS tagging
Sun Junyi 15a5a2d50e add a sample script about tags extraction
fxsjy 64b3c0d0e0 add one more example
fxsjy d2bee13d9d add setup.py