Sun Junyi
f73a2183a5
Merge pull request #309 from gumblex/master
...
用 pkg_resources 载入默认字典
9 years ago
Dingyuan Wang
8814e08f9b
load default dictionary from pkg_resources and improve the loading method;
...
change the serialized models from marshal to pickle
9 years ago
Sun Junyi
70f019b669
Merge pull request #307 from gumblex/master
...
扩充汉字范围;修正 load_userdict
9 years ago
Dingyuan Wang
5270ed66ff
fix typo in type detection in load_userdict
9 years ago
Dingyuan Wang
99d0fb1a8a
use regex and fix encoding related issues in load_userdict
9 years ago
Dingyuan Wang
1c33252fce
change the recognized Chinese character range to [\u4E00-\u9FD5]
9 years ago
jerryday
e5e41a4aad
fix pair object in dict problem
9 years ago
jerryday
4f8ca83661
add a withFlag param in textrank
9 years ago
jerryday
26e339f8f7
add a withFlag param to extract_tags
9 years ago
Sun Junyi
b6f1ce773e
Merge pull request #298 from anderscui/master
...
Add introduction to jieba.NET port.
9 years ago
andersc
343bfe9783
Add introduction to jieba.NET port.
9 years ago
fxsjy
cb414cb861
version update
10 years ago
Sun Junyi
8e99a13aa9
Merge pull request #275 from gumblex/master
...
防止跨文件系统创建缓存
10 years ago
Dingyuan Wang
d0e68974bf
improved doc for tmp_dir and cache_file
10 years ago
Dingyuan Wang
66fe17517d
prevent moving across different filesystems at tempfile.mkstemp
10 years ago
Dingyuan Wang
be46ddef9a
use shutil.move for all platforms in case of different filesystems
10 years ago
Sun Junyi
17652e764f
Merge pull request #271 from gumblex/master
...
修复 cut_for_search;改善 pair 对象
10 years ago
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
10 years ago
Sun Junyi
9f4d9376b0
Merge pull request #269 from gumblex/master
...
自定义字典允许指定词性同时省略词频
10 years ago
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
10 years ago
Sun Junyi
3ec4c43788
Merge pull request #260 from gumblex/master
...
使用类包装全局函数
10 years ago
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
10 years ago
Sun Junyi
e359d08964
Merge pull request #257 from gip0/gip0-patch-1
...
fixed an error in load_userdict()
10 years ago
Gilbert Liu
f6e57ab2ae
fixed an error in load_userdict()
10 years ago
Sun Junyi
60f0028175
Merge pull request #252 from fukuball/master
...
更新 README
10 years ago
Fukuball Lin
e712a4de61
更新 README
...
增加结巴分词 PHP 版本相關資訊
10 years ago
fxsjy
29d2b838dc
a minor version on pypi, which removes *.pyc
10 years ago
fxsjy
c07b7fef54
hot-fix version for pull request #248
10 years ago
Sun Junyi
753c1be49c
Merge pull request #248 from wangbin/master
...
exlucde word fragments from FREQ in posseg.cut
10 years ago
Wang Bin
84ffa0d4bf
exlucde word fragments from FREQ
10 years ago
Sun Junyi
885417aed1
Merge pull request #247 from gumblex/master
...
更新文档
10 years ago
Dingyuan Wang
eeaab012bf
update docs
10 years ago
fxsjy
89481cfd84
version update 0.36
10 years ago
Sun Junyi
59aa8b69b1
Merge pull request #246 from gumblex/master
...
增加自动词频
10 years ago
Dingyuan Wang
4fa2728fb6
update README about new features
10 years ago
Dingyuan Wang
4a552ca94f
suggest word frequency, support passing str to add_word
10 years ago
Sun Junyi
1b4721ebb8
Merge pull request #179 from changyy/master
...
新增自訂 cache_file 產生的目錄位置,可支援 jieba 運行在 Read-Only File System,如: Embedded Linux、Google App Engine 和 Heroku 等
10 years ago
Yuan-Yi Chang
62433a3205
讓 jieba 可以自行指定 cache_file 產生的目錄位置,提供 jieba 在 Read-only file system 環境中運行
...
1.在呼叫 jieba.cut() 等相關動作前,先透過 jieba.tmp_dir 指定目錄位置
2.當應用環境為 Read-Only File System,可透過預先產生 cache_file 的機制,讓 jieba 正常運行
3.實際案例為 Google App Engine 和 Heroku,其中前者免費版僅 128MB 記憶體空間無法運行,後者免費環境有 512MB 可正常運行。發佈前,先在本地端產生 cache_file 後,連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。
10 years ago
Sun Junyi
4b4aff6d89
Merge pull request #242 from gumblex/master
...
textrank 细节问题;文档更新
10 years ago
Dingyuan Wang
f29430f49e
details in textrank; update README
10 years ago
Sun Junyi
a4fb439070
Merge pull request #241 from sing1ee/master
...
improve some details from other commiters' adivces
10 years ago
zhangcheng
01b7f6efcf
improve some details from other commiters' adivces
10 years ago
Sun Junyi
4e05cde07e
Merge pull request #240 from sing1ee/master
...
build stable sort for graph iteration
10 years ago
zhangcheng
8b8c6c85d0
remove unusage import
10 years ago
zhangcheng
a6d1b2479e
build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~
10 years ago
zhangcheng
1152db7736
build stable sort for graph iteration, then we can get stable result.
10 years ago
fxsjy
49657c976d
make extract_tags behavior compatiable with previous version
10 years ago
fxsjy
abcaf3e475
fix bug: load_userdict
10 years ago
Jack
a06b7d388e
fix bug in __main__.py
10 years ago
Sun Junyi
9ca5b69907
Merge pull request #238 from gumblex/master
...
use str.splitlines to avoid losing line breaks
10 years ago