Another Dev Notes

Dec 22, 2015

基于 Stanford NLP software 的中文文本预处理

本文已被cos.name转载：http://cos.name/2016/01/intro-to-chinese-nlp/

Dec 22, 2015

Converting Two formats of Chinese Texts with OpenCC

Often, we have a dataset with mixed formats of Chinese characters: the simplified Chinese used in mainland China, and the traditional Chinese used in other areas. It is not a good idea to ignore the mixed usage of these two forms, because it will bring further problems in the later processing. To overcome this, we use OpenCC by BYVoid.

Dec 17, 2015

基于 Stanford NLP software 的中文文本预处理

本文已被cos.name转载：http://cos.name/2016/01/intro-to-chinese-nlp/

Converting Two formats of Chinese Texts with OpenCC

General Pipelines for Chinese NLP Engineering with Stanford NLP Software

The Chinese version of this article can be found here.