Converting Two formats of Chinese Texts with OpenCC
Often, we have a dataset with mixed formats of Chinese characters: the simplified Chinese used in mainland China, and the traditional Chinese used in other areas. It is not a good idea to ignore the mixed usage of these two forms, because it will bring further problems in the later processing. To overcome this, we use OpenCC by BYVoid.
General Pipelines for Chinese NLP Engineering with Stanford NLP Software
The Chinese version of this article can be found here.
Older