My current research interests are split into four strands, as below. The common thread of interest connecting them is natural language generation (NLG) and in particular making fieldable and/or data-driven NLG systems.
Following on from my PhD, I have an active interest in building knowledge sources that can underwrite semantic selectional constraints from corpus analysis. I am currently investigating the benefit of using a word-distance metric for shared content (called the Russian Doll metric) and a collocational semantic lexicon which describes the domain and range of lexical verbs and the domain of adjectives in English to assist fine-grained lexical choice in generation.
Natural Language Interfaces to Databases
This thread (with Catalina Hallett) builds on previous research in the NLG Group at the Open University into WYSIWYM-based NLIDB systems. In previous work the group constructed and evaluated a NLIDB system in the medical domain which was effective and easy-to-use. However, the development of the semantic and linguistic resources required by the system was time-consuming and expensive. We are currently exploring how to automate the construction of the semantic and linguistic resources for a NLG query engine given an ontology and a relational database to enable us to make portable NLIDB systems.
Concptually Aligned Generation
I am working with Richard Power on the construction of a Java-based generation system for controlled natural language texts such as manuals, information leaflets and technical documentation. The system works with semantic input encoded using standard description logic formalisms and produces a connected data structure linking semantic graphs with a syntactic tree representing the text. Our current prototype supports generation in multiple languages with full semantic annotation so that the generated texts are both machine and human readable and support the inclusion of a range of vizualisation controls to make the texts interactive.
Classification and Rewriting
I am also pursuing research with Catalina Hallett into text classification and rewriting. The domain of interest is medical texts and in particular patient narratives.