|
|
Linguistic Modeling
I have worked for many years on theoretical
and practical issues of broad-coverage grammar writing.
This work has been situated in the context of the Parallel
Grammar Development project ParGram, based on the Lexical-Functional
Grammar (LFG) formalism. In have been involved
in the motivation
and development of various formal devices for linguistic
modeling such as parametrized categories (Kuhn, 1998a),
Optimality-Theory-style constraint ranking (Kuhn and Rohrer,
1997; Frank et al., 1998, 2001; King et al., 2000, 2004), and a
feature declaration mechanism (Butt et al., 2003). The guiding
principle of the ParGram project is to exploit linguistic insights
into cross-linguistic generalizations in order to write "parallel"
broad-coverage grammars for multiple languages (Butt et al.,
2003; King et al., forthcoming). There are numerous advantages
to parallelism, such as easy porting of applications for one
grammar to the others, applicability in machine translation or
analysis of parallel corpora, but also reduction of development
effort for new grammars or subgrammars. Most of these advantages
carry over to the PTOLEMAIOS approach, in which the
grammars induced for different languages produce very similar
representations.
Besides linguistic modeling work in the closer context of the
ParGram project, I have made theoretical contributions in syntax
and semantics, applying various formalisms such as LFG,
HPSG, DRT, and Glue Language Semantics: (Kuhn, 1994;
Kuhn and Heid, 1994; Kuhn, 1996a,b,c,d; Dogil et al., 1997;
Berman et al., 1998; Kuhn, 1999a, 2001d, in preparation; Denis
et al., 2003).
Formal properties of grammar formalisms
My second
line of research has addressed grammar formalisms, originally
growing out of ParGram-related work (Kuhn, 1999b, 2001c),
but ultimately representing a separate focus, in particular in
the work on Optimality-Theoretic (OT) Syntax. My OT work
(Kuhn, 2000a,c, 2002c, 2003b and in particular my dissertation Kuhn
2001b, and the CSLI Pubications book Kuhn 2003c) develops the framework
originally proposed by Joan Bresnan,
which builds on candidate representations from LFG (OT-LFG).
The central questions I have addressed in OT concern the
formalization of the candidate generation function and of the violable
constraints, the "direction" of optimization (i.e., whether
we compare alternative realizations of the same meaning or alternative
analyses of the same string), and decidability of the
question whether a given candidate is optimal according to an
OT grammar. The PTOLEMAIOS project builds
directly on a number of insights from the OT formalization
work.
Computational processing and tool building
I have developed
prototype systems for processing tasks related to grammar
formalisms and infrastructure tools in the context of linguistic
engineering. For example, Kuhn (1998b) discusses tools
for testing a grammar against an annotated testsuite; in (Zinsmeister
et al., 2002), a system converting LFG representations
into a dependency treebank format is discussed; Kuhn
(2000b, 2001a) presents a chart-based algorithm for processing
OT-syntactic grammars; Kuhn (2003a), addresses a finite-state
approximation of a feature-based morphological grammar
for word formation in German; in (Kuhn and Mateo-Toledo,
2004), various experiments with NLP tools applied in corpus
construction for the endangered Mayan language Qanjobal
are discussed; Palmer et al. (2004) report on experiments using
various linguistic resources and NLP tools for the implementation
of Carlota Smiths discourse-semantic theory.
Corpus-based learning
My fourth and last line of research
has attempted to exploit text corpora in order to acquire linguistic
knowledge, using a variety of techniques and typically
exploiting higher-level linguistic representations or background
knowledge. In (Kuhn et al., 1998), we focus specifically on
acquiring lexical subcategorization information for verbs, using
an existing large-coverage grammar for hypothesis testing.
In (Riezler et al., 2000), we trained a log-linear (or Maximum
Entropy) model for disambiguating the output of the large-scale German
LFG grammar from the
ParGram project. Part of (Kuhn and Mateo-Toledo, 2004) are experiments
in training a Maximum Entropy part-of-speech tagger
for Qanjobal, for which only very limited resources exist;
the Maximum Entropy approach is particularly suited for combining
many different features in learning, so the most effective
use can be made of the small set of learning data. I also use
a Maximum Entropy model in ongoing work on learning various
linguistic models, such as a coreference resolution model
for anaphoric expressions exploiting a deep syntactic grammar
and insights from Segmented Discourse Representation Theory
(in joint research with Nicholas Asher (Asher et al., 2004)).
Corpus-based learning based on an OT grammar architecture
was also one of the topics I addressed in my postdoctoral research
project Optimization Inside and Outside Grammar: a Formal
Linguistic Approach to Corpus-based Learning at Stanford University
in 2001/02 (compare Kuhn, 2002a,b).
|
|