An Intelligent Q&A System
The goal of this project was to improve the performance of a question-answer (QA) system using the VH Toolkit. This project was made in a group. The VH Toolkit was used to create a QA system in the first part of this course, resulting in very poor performance. First of all, we decided to focus on a closed-domain system to narrow down the types of questions asked. The topic chosen was the Eiffeltower, since the highlight is known by many people. Three different QA systems were compared. To gather input, some people were asked to write down 15 questions they would like to ask a tourist guide about the Eiffeltower, the answers on these questions were found online. The first system used was a basic QA system created by adding questions, answers and some variations to the VH toolkit.
The second and third system were both self-made, for these systems the database consisted of a tree structure of main topics, sub topics and a corresponding answer. The user could ask his questions by typing in a GUI. The basic self-made QA system made use of normalisation techniques for the input string, such as removing random symbols and punctuation, converting all words to lower case, deleting abbreviations and converting numbers to words. It also included a spellchecker, which used the Levenshtein distance and an English dictionary. The advanced self-made QA system added lexical, syntactic and sematic analysis. First, a part of speech tagger (POS) tagged all words. Afterwards, less important words, such as “the”, were removed, to remain only important words. “W-words”, such as “why” and “where” were directly mapped to the keywords “reason” and “place” of the database. A similarity matcher, using WordNet, matched the remaining words to the keywords in the database. As an example, “high” was matched to “size”. Finally, the answering unit used the keywords of the question as input and returned the answer if the keywords of the input string were found in the database.
The program was written in openFrameworks with C++. The WordNet functionality only ran in Python, so the system combined Python with C++. Evalution of the three systems showed that both self-made systems outperformed the VH Toolkit. Especially normalisation improved performance, but also the similarity matching had influence.
Tools used
C++ with openFrameworks
Python with WordNet