Deep Learning for Persian Natural Language Processing|Data and Signal Processing Laboratory

Deep Learning for Persian Natural Language Processing

With the growth of unstructured text data over the Internet, which is mainly the result of the human interaction in web2.0 and social networks, finding a way to automatically process and extract knowledge from this data seems indispensable. Despite unstructured format this data contain valuable knowledge which can be extracted using knowledge discovery and machine learning techniques. There has been great progress in natural language processing task such as

Sentiment analysis,
Opinion mining,
Topic identification,
Automatic machine translation,
Name entity recognition,
Part of speech tagging,
Parsing,
Information extraction,
Question answering,
Paraphrase detection,
etc.

In most of NLP tasks we first develop an algorithm and then convert our data to be prepared to feed into that algorithm. This is called feature engineering which is very time consuming. Mainly, words are considered as features in text data. But there are two shortcomings in this method: First word order may be lost and second is the sparsity of feature vector which affect training time.

The aim of this project is to find a way to automatically do feature extraction from text data in Persian. We found deep learning as a way to deal with this problem. Neural network with more than one layer is called deep network. In this method each word is described with a numerical vector, which is called distributed representation or word vector. This representation contains semantic and syntactic information about words. Word concatenation represent sentences. If we can describe words with such vectors the sentence could be too. The range of this combination include simple mathematic operator like vector addition or multiplication, to recurrent neural network and recursive auto-encoder, etc. This representation improve solving most of natural language processing problems, like POS tagging, NER, topic identification, machine translation and automatic text summarization.

Other Projects

Persian Speech Recognition using Deep Learning

« Older Entries

Other Projects

Persian Speech Recognition using Deep Learning

Persian Medical Question Answering System

Speaker Recognition

Data Analysis

Address: