Commit Graph

104 Commits (87734d37857dcb8fd25975ce739ecb8d69545edb)

Author SHA1 Message Date
Dingyuan Wang 99d0fb1a8a use regex and fix encoding related issues in load_userdict 9 years ago
Dingyuan Wang ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 10 years ago
Dingyuan Wang 3b76328f2a allow ignoring word frequency while providing pos tag 10 years ago
Dingyuan Wang 94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
10 years ago
Dingyuan Wang 4a552ca94f suggest word frequency, support passing str to add_word 10 years ago
Dingyuan Wang 872a7039f2 Merge branch 'master' of https://github.com/fxsjy/jieba 10 years ago
Dingyuan Wang f808ea0ebb use only one dict to store words and prefixes 10 years ago
fxsjy 5bfa43a781 fix test scripts 10 years ago
Dingyuan Wang f3a53dd2da fix print() in tests 10 years ago
fxsjy 8cbb26a7b6 fix test_file.py 10 years ago
Dingyuan Wang 22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 10 years ago
Dingyuan Wang 3dad899ec8 backport 2to3 scripts and changelog 10 years ago
Dingyuan Wang c6b386f65b update jieba3k 10 years ago
Dingyuan Wang a5ecf70f71 update to v0.35 10 years ago
Dingyuan Wang 4a6140081e fix problems in auto2to3 10 years ago
Dingyuan Wang 7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 10 years ago
walkskyer 6772f0282e 修复带权重测试脚本输出结果是调用顺序错误 10 years ago
Dingyuan Wang fd9f1f2c0e update README, textrank, etc. 10 years ago
fxsjy f5ca87e088 merge change of @fukuball 10 years ago
Dingyuan Wang bb1e6000c6 fix version; fix spaces at end of line 10 years ago
Dingyuan Wang 51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 10 years ago
Dingyuan Wang 6fad5fbb2c update to v0.33 11 years ago
Fukuball Lin b658ee69cb 讓 jieba 可以自行增加 stop words 語料庫
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
11 years ago
Fukuball Lin 7198d562f1 讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
11 years ago
Dingyuan Wang c04ccd0d12 Update to v0.32 according to the master branch. 11 years ago
fxsjy 18678d50c6 fix bug issue #132 11 years ago
gan 31d5845535 add better support for english. like input: 'this is interesting and interested me'-->output:'this interest interest',which 'interest' match 'interesting interested' 12 years ago
Sun Junyi 7e7fcc1184 add an option to disable HMM 12 years ago
ZoeyYoung d49542c06e fix bug 12 years ago
ZoeyYoung dce353f88b merge from master 12 years ago
ZoeyYoung 2857ae45cc Merge branch 'master' into jieba3k
Conflicts:
	Changelog
	jieba/__init__.py
	jieba/finalseg/__init__.py
	jieba/posseg/__init__.py
	setup.py
	test/parallel/test_file.py
	test/test_file.py
12 years ago
Sun Junyi 81390a2d23 test_file.py: close the file object 12 years ago
fxsjy b77645b3aa modify test_file.py; use less memory 12 years ago
Linker Lin 5d83855088 自动检测CPU数目,启动合适数目的进程。 12 years ago
Linker Lin 2ceb981da0 自动检测CPU数目,启动合适数目的进程。 12 years ago
Sun Junyi 6549deabbd merge change from master 12 years ago
Cheng wei 6035bb6320 fix invalid syntax for python3 12 years ago
Sun Junyi 9d0ea771a5 fix bug; decimals & digit-english mixed 12 years ago
Sun Junyi ba5114dc95 update whoosh example 12 years ago
Sun Junyi f424862222 clean the files in tmp 12 years ago
Sun Junyi b18d56d2a3 Merge pull request #72 from linkerlin/master
添加一个tmp目录,好让test_whoosh.py可以运行。
12 years ago
Sun Junyi b9b1f1a418 fix conflict of merging 12 years ago
miao.lin becd32b178 made test_whoosh.py happy.
添加一个tmp目录,好让test_whoosh.py可以运行。
12 years ago
Sun Junyi c01680c6a8 merge the new file 12 years ago
Sun Junyi b62f052927 PEP8 12 years ago
Sun Junyi 45daf561c7 follow PEP8: change tab to 4 white spaces 12 years ago
Sun Junyi dbec3ad9df add some comments 12 years ago
Sun Junyi efc784312c add ChineseAnalyzer for whoosh search engine 12 years ago
Sun Junyi f08690a2df add 'search mode' for jieba.tokenize 12 years ago
Sun Junyi cb1b0499f7 unittest for jieba.tokenize 12 years ago