gitea/modules/indexer/code
Bruno Sofiato f64fbd9b74
Updated tokenizer to better matching when search for code snippets ()
This PR improves the accuracy of Gitea's code search. 

Currently, Gitea does not consider statements such as
`onsole.log("hello")` as hits when the user searches for `log`. The
culprit is how both ES and Bleve are tokenizing the file contents (in
both cases, `console.log` is a whole token).

In ES' case, we changed the tokenizer to
[simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.).
In such a case, tokens are words formed by digits and letters. In
Bleve's case, it employs a
[letter](https://blevesearch.com/docs/Tokenizers/) tokenizer.

Resolves 

---------

Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
..
bleve Updated tokenizer to better matching when search for code snippets ()
elasticsearch Updated tokenizer to better matching when search for code snippets ()
internal Allow code search by filename ()
git.go Fix index too many file names bug ()
indexer.go Fix tautological conditions ()
indexer_test.go Updated tokenizer to better matching when search for code snippets ()
search.go Render embedded code preview by permlink in markdown ()