+98 21 8609-3065 h.veisi@ut.ac.ir
Persian Medical Question Answering System

In this project, a Persian Question Answering (QA) system is created to ease the access to information resources for doctors, health providers and users. To this aim, a set of Persian documents related to drugs and diseases are collected. The processing of the structured documents improves the performance of the QA system that’s why all documents were converted into semi-structured documents.

The developed system consists of three main units:

  1. question processing
  2. document retrieval
  3. answer extraction

The question processing unit, as in the most important module, consists of four components that sequentially extract keywords/queries. These components use a dictionary of drugs/diseases names and keywords/queries. This process is shown in the following figure. If a module fails to extract keywords from the question, based on the condition of the question, another component would make the extraction process instead. The first part of the QA system is question processing module. The main component of the question processing module includes Question Classifier, N-gram Tokenizer, Patterns Matching and Advanced Tokenizer.

In this architecture, the question asked by the user, is normalized and then the drug name or disease related to the question is extracted through Name Entity (NE) Dictionary. If this specified name is extracted from the question, the question would be sent to Question Classifier component for the extraction of the phrases that indicate the meaning of the question. Finally, by using the concept of the dictionary, the keywords would be extracted and the phrases would be mapped to the dictionary keywords. On the other hand, if the Question Classifier fails to extract any keywords, the question would be sent to the N-gram module in order to extract keywords by keywords dictionary.
In case the question matches any pattern, the keywords would be extracted with Advanced Tokenizer. If none of the components cannot understand the question, the Advanced Tokenizer which encompasses a list of specific stop words, tokenize the question and extracts the keywords. When this process finished successfully, extracted keywords are transferred to next module called Document Retrieval module.
In the Answer Extraction module, the appropriate answer is selected from the retrieved document in the previous step. Since the document is converted to the structured form, the answers would be extracted more accurate.

Other Projects

Medical Image Processing

Medical Image Processing

Imaging has become an essential component in many fields of medical and laboratory research and clinical practice. Biologists study cells and...

read more
Digital Image processing

Digital Image processing

Image Processing and Computer Vision are fields that include methods for acquiring information from a digital image and understanding it, then...

read more
Text Mining

Text Mining

Most of valuable data around us is in unstructured format. Discovering worthy knowledge from text which is kind of unstructured data is an important...

read more

Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home/smj97ir/public_html/wp-includes/functions.php on line 5464