NER Resources

NER is short for Name Entity Recognition, which is one of fundamental tasks in NLP and critical to other NLP tasks.

As machine learning develops, more and more new methods have been applied in this area. This resource book attempts to give a glance of these methods.

Vanilla machine learning methods

CRF

CRF is short for Conditional Random Fields.

Toolkits

CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data.

CRFsuite: A fast implementation of Conditional Random Fields (CRFs).

python-crfsuite is a python binding to CRFsuite.

sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface similar to scikit-learn.

CRF in Tensorflow Linear-chain CRF layer.

Papers

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ACM.

Sutton, C., & McCallum, A. (2012). An Introduction to Conditional Random Fields. Now Pub.

Sha, F., & Pereira, F. (2003). Shallow parsing with conditional random fields. ACLWeb.

Other readings

Tutorial of using sklearn-crfsuite for NER task

Learning2Search

Learning to Search is a nickname for Vowpal Wabbit.

Toolkits

Vowpal Wabbit on Github

Papers

Chang, K.-W., He, H., Daumé, H., III, & Langford, J. (2015, March 19). Learning to Search for Dependencies. arXiv.org

Other readings

Named Entity Classification by Themis Mavridis from booking.com

Deep learning methods

LSTM

Toolkits

NeuroNER is a program that performs named-entity recognition (NER).

Papers

Dernoncourt, F., Lee, J. Y., & Szolovits, P. (2017, May 16). NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. arXiv.org

Other readings

LSTM with CRF

Toolkits

Sequence Tagging with Tensorflow

Papers

Ma, X., & Hovy, E. (2016, March 4). End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. arXiv.org.

Other readings

Sequence Tagging with Tensorflow