Commit Graph

96 Commits (4d7b515801bd6cd509242a717321f1b0644960d6)

Author SHA1 Message Date
fxsjy 5bfa43a781 fix test scripts 10 years ago
fxsjy 8cbb26a7b6 fix test_file.py 10 years ago
Dingyuan Wang 22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 10 years ago
Dingyuan Wang 3dad899ec8 backport 2to3 scripts and changelog 10 years ago
Dingyuan Wang c6b386f65b update jieba3k 10 years ago
Dingyuan Wang a5ecf70f71 update to v0.35 10 years ago
Dingyuan Wang 4a6140081e fix problems in auto2to3 10 years ago
Dingyuan Wang 7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 10 years ago
walkskyer 6772f0282e 修复带权重测试脚本输出结果是调用顺序错误 11 years ago
Dingyuan Wang fd9f1f2c0e update README, textrank, etc. 11 years ago
fxsjy f5ca87e088 merge change of @fukuball 11 years ago
Dingyuan Wang bb1e6000c6 fix version; fix spaces at end of line 11 years ago
Dingyuan Wang 51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 11 years ago
Dingyuan Wang 6fad5fbb2c update to v0.33 11 years ago
Fukuball Lin b658ee69cb 讓 jieba 可以自行增加 stop words 語料庫
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
11 years ago
Fukuball Lin 7198d562f1 讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
11 years ago
Dingyuan Wang c04ccd0d12 Update to v0.32 according to the master branch. 11 years ago
fxsjy 18678d50c6 fix bug issue #132 11 years ago
gan 31d5845535 add better support for english. like input: 'this is interesting and interested me'-->output:'this interest interest',which 'interest' match 'interesting interested' 12 years ago
Sun Junyi 7e7fcc1184 add an option to disable HMM 12 years ago
ZoeyYoung d49542c06e fix bug 12 years ago
ZoeyYoung dce353f88b merge from master 12 years ago
ZoeyYoung 2857ae45cc Merge branch 'master' into jieba3k
Conflicts:
	Changelog
	jieba/__init__.py
	jieba/finalseg/__init__.py
	jieba/posseg/__init__.py
	setup.py
	test/parallel/test_file.py
	test/test_file.py
12 years ago
Sun Junyi 81390a2d23 test_file.py: close the file object 12 years ago
fxsjy b77645b3aa modify test_file.py; use less memory 12 years ago
Linker Lin 5d83855088 自动检测CPU数目,启动合适数目的进程。 12 years ago
Linker Lin 2ceb981da0 自动检测CPU数目,启动合适数目的进程。 12 years ago
Sun Junyi 6549deabbd merge change from master 12 years ago
Cheng wei 6035bb6320 fix invalid syntax for python3 12 years ago
Sun Junyi 9d0ea771a5 fix bug; decimals & digit-english mixed 12 years ago
Sun Junyi ba5114dc95 update whoosh example 12 years ago
Sun Junyi f424862222 clean the files in tmp 12 years ago
Sun Junyi b18d56d2a3 Merge pull request #72 from linkerlin/master
添加一个tmp目录,好让test_whoosh.py可以运行。
12 years ago
Sun Junyi b9b1f1a418 fix conflict of merging 12 years ago
miao.lin becd32b178 made test_whoosh.py happy.
添加一个tmp目录,好让test_whoosh.py可以运行。
12 years ago
Sun Junyi c01680c6a8 merge the new file 12 years ago
Sun Junyi b62f052927 PEP8 12 years ago
Sun Junyi 45daf561c7 follow PEP8: change tab to 4 white spaces 12 years ago
Sun Junyi dbec3ad9df add some comments 12 years ago
Sun Junyi efc784312c add ChineseAnalyzer for whoosh search engine 12 years ago
Sun Junyi f08690a2df add 'search mode' for jieba.tokenize 12 years ago
Sun Junyi cb1b0499f7 unittest for jieba.tokenize 12 years ago
Sun Junyi 11a3b10755 new method: jieba.tokenize 12 years ago
Sun Junyi ca97b19951 merge change from master 12 years ago
Sun Junyi c0816b9bb0 more mixed words 12 years ago
Sun Junyi c9e8da9e63 add more mix words to dict.txt 12 years ago
fxsjy 08bfabb9d7 Merge branch 'jieba3k' of https://github.com/fxsjy/jieba into jieba3k 12 years ago
fxsjy be1686654d merge master to jieba3k 12 years ago
fxsjy 0087a4e7e3 adjust prob_trans for better support of name entity; fix some bad cases 12 years ago
Sun Junyi 4300f79788 add a example of using sklearn+jieba 12 years ago