Commit Graph

495 Commits (68f2a64f7eee5af264bfdd1b340c461da3b8c351)
 

Author SHA1 Message Date
Sun Junyi 68f2a64f7e
Merge pull request #663 from JimCurryWang/patch-1
Fix  __init__ "-" symbol issue
6 years ago
Sun Junyi 4c8479cfa6
Merge pull request #667 from ZhengZixiang/patch-1
fix the error about importing ChineseAnalyzer
6 years ago
imzhengzx ca444fb4da
fix the error about imoprting ChineseAnalyzer
Because of the interface change about ChineseAnlayzer , the code 'from jieba.analyse import Chinese Analyzer' in this test file would report an ImportError like 'cannot import name 'ChineseAnalyzer'. Just change import code to 'from jieba.analyse.analyzer import ChineseAnalyzer' can fix it.
6 years ago
CY Wang 36a27302ce
Fix __init__ "-" symbol issue
Solving "-" symbol can't be analyze issue . 

For example,
In keyword , chap-EX喬沛詩 , SK-II  ...etc 
the present version will show "chap", "-", "EX喬沛詩" , "SK", "-", "II"

After the modify,
The new version will show  "chap-EX","喬沛詩" , "SK-II" 

ps: I have used the jieba.load_userdict() , and added  "chap-EX" , "喬沛詩", "SK-II" in the userdict.txt.
7 years ago
Sun Junyi 7653db2e33
Update README.md 7 years ago
fxsjy cb0de2973b version change 0.39 8 years ago
sunjunyi01 b4dd5b58f3 bug fix, issue: #511, #512 8 years ago
Sun Junyi 4eef868338 Merge pull request #455 from OOCZC/master
Update README.md
8 years ago
OOC b485ae916c Update README.md 8 years ago
OOC ee0ce32bbd Update 8 years ago
Sun Junyi 8ba26cf97e Merge pull request #382 from huntzhan/master
Bugfix for HMM=False in parallelism.
9 years ago
huntzhan 60acefd9b1 Bugfix for HMM=False in parallelism. 9 years ago
Sun Junyi 03cd4b5fb6 Merge pull request #367 from yanyiwu/patch-1
Update README.md
9 years ago
Yanyi Wu 76ae798137 Update README.md 9 years ago
Sun Junyi 0243d568e9 Merge pull request #351 from gumblex/master
fix del_word
9 years ago
Dingyuan Wang 12b2b17741 fix del_word 9 years ago
fxsjy 1d5ea9f061 version change 0.38 9 years ago
Sun Junyi e5c9af78e2 Merge pull request #315 from gumblex/master
命令行分词支持词性标注
9 years ago
Dingyuan Wang 87734d3785 support POS tagging in __main__ 9 years ago
Sun Junyi 3d29b0c8e8 Merge pull request #310 from gumblex/master
Fix compatibility problem with `with` statememt
9 years ago
Dingyuan Wang 1fcd3a417c fix compatibility problem with `with` statememt 9 years ago
Sun Junyi 093980647b Merge pull request #303 from jerryday/master
add a withFlag param to extract_tags
9 years ago
Sun Junyi f73a2183a5 Merge pull request #309 from gumblex/master
用 pkg_resources 载入默认字典
9 years ago
Dingyuan Wang 8814e08f9b load default dictionary from pkg_resources and improve the loading method;
change the serialized models from marshal to pickle
9 years ago
Sun Junyi 70f019b669 Merge pull request #307 from gumblex/master
扩充汉字范围;修正 load_userdict
9 years ago
Dingyuan Wang 5270ed66ff fix typo in type detection in load_userdict 9 years ago
Dingyuan Wang 99d0fb1a8a use regex and fix encoding related issues in load_userdict 9 years ago
Dingyuan Wang 1c33252fce change the recognized Chinese character range to [\u4E00-\u9FD5] 9 years ago
jerryday e5e41a4aad fix pair object in dict problem 9 years ago
jerryday 4f8ca83661 add a withFlag param in textrank 9 years ago
jerryday 26e339f8f7 add a withFlag param to extract_tags 9 years ago
Sun Junyi b6f1ce773e Merge pull request #298 from anderscui/master
Add introduction to jieba.NET port.
10 years ago
andersc 343bfe9783 Add introduction to jieba.NET port. 10 years ago
fxsjy cb414cb861 version update 10 years ago
Sun Junyi 8e99a13aa9 Merge pull request #275 from gumblex/master
防止跨文件系统创建缓存
10 years ago
Dingyuan Wang d0e68974bf improved doc for tmp_dir and cache_file 10 years ago
Dingyuan Wang 66fe17517d prevent moving across different filesystems at tempfile.mkstemp 10 years ago
Dingyuan Wang be46ddef9a use shutil.move for all platforms in case of different filesystems 10 years ago
Sun Junyi 17652e764f Merge pull request #271 from gumblex/master
修复 cut_for_search;改善 pair 对象
10 years ago
Dingyuan Wang ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 10 years ago
Sun Junyi 9f4d9376b0 Merge pull request #269 from gumblex/master
自定义字典允许指定词性同时省略词频
10 years ago
Dingyuan Wang 3b76328f2a allow ignoring word frequency while providing pos tag 10 years ago
Sun Junyi 3ec4c43788 Merge pull request #260 from gumblex/master
使用类包装全局函数
10 years ago
Dingyuan Wang 94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
10 years ago
Sun Junyi e359d08964 Merge pull request #257 from gip0/gip0-patch-1
fixed an error in load_userdict()
10 years ago
Gilbert Liu f6e57ab2ae fixed an error in load_userdict() 10 years ago
Sun Junyi 60f0028175 Merge pull request #252 from fukuball/master
更新 README
10 years ago
Fukuball Lin e712a4de61 更新 README
增加结巴分词 PHP 版本相關資訊
10 years ago
fxsjy 29d2b838dc a minor version on pypi, which removes *.pyc 10 years ago
fxsjy c07b7fef54 hot-fix version for pull request #248 10 years ago