61 lines
		
	
	
		
			2.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			61 lines
		
	
	
		
			2.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|   | --- | |||
|  | title: Natural Language Processing | |||
|  | localeTitle: 自然语言处理 | |||
|  | --- | |||
|  | ## 自然语言处理(NLP)
 | |||
|  | 
 | |||
|  | 正如维基百科所说,“自然语言处理(NLP)是计算机科学,信息工程和人工智能的一个子领域,涉及计算机与人类(自然)语言之间的相互作用,特别是如何对计算机进行编程以处理和分析大量数据自然语言数据。“ 简单来说,这是一个由人类产生的自然语言被计算机感知的过程。 | |||
|  | 
 | |||
|  | ### NLP面临的挑战
 | |||
|  | 
 | |||
|  | #### 1.轻松或大部分解决
 | |||
|  | ``` | |||
|  |           *Spam detection  | |||
|  |           *Part of Speech Tagging  | |||
|  |           *Named Entity Recognition  | |||
|  | ``` | |||
|  | 
 | |||
|  | #### 2.中级或取得良好进展
 | |||
|  | ``` | |||
|  |           *Sentiment analysis  | |||
|  |           *Coreference resolution  | |||
|  |           *Word sense disambiguation  | |||
|  |           *Parsing  | |||
|  |           *Machine Translation  | |||
|  |           *Information Translation  | |||
|  | ``` | |||
|  | 
 | |||
|  | #### 3.很难还是还需要很多工作
 | |||
|  | ``` | |||
|  |           *Text Summarization  | |||
|  |           *Machine dialog system  | |||
|  | ``` | |||
|  | 
 | |||
|  | ### 常用技巧
 | |||
|  | ``` | |||
|  |          *Structure extraction  | |||
|  |          *Identify and mark sentence, phrase, and paragraph boundaries  | |||
|  |          *Language identification  | |||
|  |          *Tokenization  | |||
|  |          *Acronym normalization and tagging  | |||
|  |          *Lemmatization / Stemming  | |||
|  |          *Entity extraction  | |||
|  |          *Phrase extraction  | |||
|  | ``` | |||
|  | 
 | |||
|  | ### 常用的图书馆
 | |||
|  | ``` | |||
|  |             *NLTK, the most widely-mentioned NLP library for Python.  | |||
|  |         *SpaCy, an industrial-strength NLP library built for performance.  | |||
|  |         *Gensim, a library for document similarity analysis.  | |||
|  |         *TextBlob, a user-friendly and intuitive NLTK interface.  | |||
|  |         *CoreNLP from stanford group  | |||
|  |         *PolyGlot, a natural language pipeline that supports massive multilingual applications.  | |||
|  | ``` | |||
|  | 
 | |||
|  | #### 更多信息:
 | |||
|  | 
 | |||
|  | 进一步阅读: | |||
|  | 
 | |||
|  | *   点击[此处](https://medium.com/@gon.esbuyo/get-started-with-nlp-part-i-d67ca26cc828)查看有关NLP介绍的文章。 | |||
|  | *   单击[此处](https://en.wikipedia.org/wiki/Natural_language_processing)查看Wikipedia参考。 |