Research
I am engaged in projects on various aspects of discourse processing, with research both fundamental and applied (in collaboration with companies). Below is a list of topics with pointers to publications.
Text understanding and summarization: Can we build text-analyzing systems that are robust [lit] [lit] and also perform "deep", knowledge-based analysis [lit]? One domain for our investigations is information extraction from medical reports [lit], but we are also addressing a variety of other domains, in the context of automatic text summarization [lit].
Discourse structure: A prerequisite for any intelligent text understanding is an account of the various aspects of discourse structure. My research had drawn inspiration from Rhetorical Structure Theory (Bill Mann, Sandra Thompson), and I looked into a variety of individual coherence relations such as Concession [lit], Contrast [lit] or (non-standard!) Substitution [lit]. In recent years, I have become sceptical on the structural theses of RST, though - see [lit] for my criticism on the notion of 'nuclearity', and [lit] for a proposal on a less constrained scheme of multi-level discourse annotation. Often, coherence relations are signalled by explicit connectives, for which we developed a computational lexicon [lit]. Further, I am interested in individual connectives such as German 'bei' [lit], and similarly in other discourse particles [lit] [lit]. A distinct level of discourse structure that interests me is argumentation [lit] [lit] in texts that convey opinion and attitude.
Corpora, annotation, linguistic databases: As an empirical basis for discourse analyses, we are building up the Potsdam Commentary Corpus [lit]: German newspaper commentaries annotated on a variety of levels, including, for example, discourse connectives [lit]. The engine for viewing this corpus and lots of other data from SFB 632 is the Annis Linguistic Database [lit], geared especially towards multi-level annotation.
Dialog systems: We are developing protoypes of voice dialog systems based on VoiceXML [lit], and also a system for typed dialog: InfoBot [lit] provides information in the tourism domain. This links back to my earlier work on dialog act identification [lit] in the Verbmobil project (see below).
Text generation: In my dissertation, I had developed an architecture for knowledge-based, multilingual text generation [lit]. This work was attached to the 'Techdoc' project at FAW Ulm, an early case of using RST for generation [lit] [lit]. My approach emphasized the role of a flexible lexicon-KB link for generating paraphrases of the same message in more than one language [lit] [lit]. For complex sentences, the lexicon is extended to cover discourse connectives [lit]. Recently, my attention has concentrated on sentence planning [lit], embedded in our current generation environment 'Polibox' [lit]. Some time ago, we also did an experiment in knowledge-based generating from graphics-input [lit].
Lexical semantics: As mentioned, my generation work had one focus on lexical issues; in particular, I developed a framework for generating several verb alternations [lit]. Two other topics were the borderline between denotation and connotation (or 'lexical style') [lit] [lit], and the lexical relation of hyponymy [lit].
Machine translation: From 1995 to 2000 I worked in the TU Berlin group of the German Verbmobil project, focusing on knowledge-based disambiguation and identification of dialog acts [lit], and on representing and tracking temporal expressions [lit]. Never mind the idea of introducing humour into MT [lit]! At present, however, machine translation is not an active research area of mine.