Commit Graph

523 Commits (master)
 

Author SHA1 Message Date
Sun Junyi f73a2183a5 Merge pull request #309 from gumblex/master
用 pkg_resources 载入默认字典
9 years ago
Dingyuan Wang 8814e08f9b load default dictionary from pkg_resources and improve the loading method;
change the serialized models from marshal to pickle
9 years ago
Sun Junyi 70f019b669 Merge pull request #307 from gumblex/master
扩充汉字范围;修正 load_userdict
9 years ago
Dingyuan Wang 5270ed66ff fix typo in type detection in load_userdict 9 years ago
Dingyuan Wang 99d0fb1a8a use regex and fix encoding related issues in load_userdict 9 years ago
Dingyuan Wang 1c33252fce change the recognized Chinese character range to [\u4E00-\u9FD5] 9 years ago
jerryday e5e41a4aad fix pair object in dict problem 9 years ago
jerryday 4f8ca83661 add a withFlag param in textrank 9 years ago
jerryday 26e339f8f7 add a withFlag param to extract_tags 9 years ago
Sun Junyi b6f1ce773e Merge pull request #298 from anderscui/master
Add introduction to jieba.NET port.
9 years ago
andersc 343bfe9783 Add introduction to jieba.NET port. 9 years ago
fxsjy cb414cb861 version update 10 years ago
Sun Junyi 8e99a13aa9 Merge pull request #275 from gumblex/master
防止跨文件系统创建缓存
10 years ago
Dingyuan Wang d0e68974bf improved doc for tmp_dir and cache_file 10 years ago
Dingyuan Wang 66fe17517d prevent moving across different filesystems at tempfile.mkstemp 10 years ago
Dingyuan Wang be46ddef9a use shutil.move for all platforms in case of different filesystems 10 years ago
Sun Junyi 17652e764f Merge pull request #271 from gumblex/master
修复 cut_for_search;改善 pair 对象
10 years ago
Dingyuan Wang ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 10 years ago
Sun Junyi 9f4d9376b0 Merge pull request #269 from gumblex/master
自定义字典允许指定词性同时省略词频
10 years ago
Dingyuan Wang 3b76328f2a allow ignoring word frequency while providing pos tag 10 years ago
Sun Junyi 3ec4c43788 Merge pull request #260 from gumblex/master
使用类包装全局函数
10 years ago
Dingyuan Wang 94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
10 years ago
Sun Junyi e359d08964 Merge pull request #257 from gip0/gip0-patch-1
fixed an error in load_userdict()
10 years ago
Gilbert Liu f6e57ab2ae fixed an error in load_userdict() 10 years ago
Sun Junyi 60f0028175 Merge pull request #252 from fukuball/master
更新 README
10 years ago
Fukuball Lin e712a4de61 更新 README
增加结巴分词 PHP 版本相關資訊
10 years ago
fxsjy 29d2b838dc a minor version on pypi, which removes *.pyc 10 years ago
fxsjy c07b7fef54 hot-fix version for pull request #248 10 years ago
Sun Junyi 753c1be49c Merge pull request #248 from wangbin/master
exlucde word fragments from FREQ in posseg.cut
10 years ago
Wang Bin 84ffa0d4bf exlucde word fragments from FREQ 10 years ago
Sun Junyi 885417aed1 Merge pull request #247 from gumblex/master
更新文档
10 years ago
Dingyuan Wang eeaab012bf update docs 10 years ago
fxsjy 89481cfd84 version update 0.36 10 years ago
Sun Junyi 59aa8b69b1 Merge pull request #246 from gumblex/master
增加自动词频
10 years ago
Dingyuan Wang 4fa2728fb6 update README about new features 10 years ago
Dingyuan Wang 4a552ca94f suggest word frequency, support passing str to add_word 10 years ago
Sun Junyi 1b4721ebb8 Merge pull request #179 from changyy/master
新增自訂 cache_file 產生的目錄位置,可支援 jieba 運行在 Read-Only File System,如: Embedded Linux、Google App Engine 和 Heroku 等
10 years ago
Yuan-Yi Chang 62433a3205 讓 jieba 可以自行指定 cache_file 產生的目錄位置,提供 jieba 在 Read-only file system 環境中運行
1.在呼叫 jieba.cut() 等相關動作前,先透過 jieba.tmp_dir 指定目錄位置
2.當應用環境為 Read-Only File System,可透過預先產生 cache_file 的機制,讓 jieba 正常運行
3.實際案例為 Google App Engine 和 Heroku,其中前者免費版僅 128MB 記憶體空間無法運行,後者免費環境有 512MB 可正常運行。發佈前,先在本地端產生 cache_file 後,連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。
10 years ago
Sun Junyi 4b4aff6d89 Merge pull request #242 from gumblex/master
textrank 细节问题;文档更新
10 years ago
Dingyuan Wang f29430f49e details in textrank; update README 10 years ago
Sun Junyi a4fb439070 Merge pull request #241 from sing1ee/master
improve some details from other commiters' adivces
10 years ago
zhangcheng 01b7f6efcf improve some details from other commiters' adivces 10 years ago
Sun Junyi 4e05cde07e Merge pull request #240 from sing1ee/master
build stable sort for graph iteration
10 years ago
zhangcheng 8b8c6c85d0 remove unusage import 10 years ago
zhangcheng a6d1b2479e build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~ 10 years ago
zhangcheng 1152db7736 build stable sort for graph iteration, then we can get stable result. 10 years ago
fxsjy 49657c976d make extract_tags behavior compatiable with previous version 10 years ago
fxsjy abcaf3e475 fix bug: load_userdict 10 years ago
Jack a06b7d388e fix bug in __main__.py 10 years ago
Sun Junyi 9ca5b69907 Merge pull request #238 from gumblex/master
use str.splitlines to avoid losing line breaks
10 years ago