The 8th Linguistic Annotation Workshop (LAW VIII 2014)

ACL SIGANN Workshop, held in conjunction with Coling 2014

August 23-24, 2014 in Dublin, Ireland

Workshop overview

Linguistic annotation of natural language corpora is the backbone of supervised methods for statistical natural language processing. It also provides valuable data for evaluation of both rule-based and supervised systems and can help formalize and study linguistic phenomena.

The LAW provides a forum for presentation and discussion of innovative research on all aspects of linguistic annotation, including creation/evaluation of annotation schemes, methods for automatic and manual annotation, use and evaluation of annotation software and frameworks, representation of linguistic data and annotations, etc.

Workshop Theme

This year, we in particular welcome contributions that address the workshop theme: The good, the bad, and the perfect: How good does annotation need to be?

It has been said that the perfect is the enemy of the good. This may be true for some machine learning applications where a small amount of rough annotation gives good results, but it also may be used to justify low quality annotation or give higher priority or higher amounts of funding to machine learning than to human annotation.

We solicit evidence for and against "The perfect is the enemy of the good".

In favor of high quality annotation, Manning (2011) suggests that the largest opportunity for improvement in part-of-speech tagging lies in improving the tag set and the accuracy of annotation. (But he also suggests that perfect annotation of words into discrete lexical categories is not possible because some words do not fall cleanly into one category.)

Reidsma and Carletta (2008) advocate caution in deciding how good annotation needs to be. They show that low agreement among annotators may not be harmful to machine learning as long as the disagreements are random, whereas disagreements that follow patterns can lead machine learning astray even when agreement among annotators is high.

In a related vein, Min and Grishman (2012) show that it can be more cost-effective for machine learning to have lots of single-pass less-accurate annotation, than a smaller amount of more-accurate adjudicated annotation.

Finally, one recent trend focuses on coarse-grained annotation schemes (McDonald et al., 2013; Petrov et al., 2013; Schneider et al., 2013) to speed up annotation and/or benefit cross-lingual training. Coarse-grained annotation schemes are attractive because they are easy to learn, but are they suitable for all applications?

Manning, Christopher D. (2011) "Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?" In Alexander Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing, 12th International Conference, CICLing 2011, Proceedings, Part I. Lecture Notes in Computer Science 6608, Springer

McDonald, Ryan, Joakim Nivre, Yoav Goldberg, Yvonne Quirmbach-Brundage, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello, Jungmee Lee (2013) "Universal Dependency Annotation for Multilingual Parsing." Proceedings of the Conference of the Association for Computational Linguistics (ACL)

Min, Bonan and Ralph Grishman (2012) "Compensating for Annotation Errors in Training a Relation Extractor." Proceedings of the Conference of the European Chapter of the ACL (EACL).

Petrov, Slav, Dipanjan Das, Ryan McDonald (2012) "A Universal Part-of-Speech Tagset." Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC '12).

Reidsma, Dennis and Jean Carletta (2008) "Reliability Measurement without Limits." Computational Linguistics 34(3): 319-326.

Schneider, Nathan, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, and Jason Baldridge (2013) "A framework for (under)specifying dependency syntax without overloading annotators." Proceedings of the 7th Linguistic Annotation Workshop (LAW) and Interoperability with Discourse.

Call for Papers

We welcome submissions of long (8 pages) and short (4 pages) papers, posters, and demonstrations, relating to any aspect of linguistic annotation, including:

(a) Annotation procedures:

  • Innovative automated and manual strategies for annotation
  • Machine learning and knowledge-based methods for automation of corpus annotation
  • Creation, maintenance, and interactive exploration of annotation structures and annotated data

(b) Annotation evaluation:

  • Inter-annotator agreement and other evaluation metrics and strategies
  • Qualitative evaluation of linguistic representation

(c) Annotation access and use:

  • Representation formats/structures for merged annotations of different phenomena, and means to explore/manipulate them
  • Linguistic considerations for merging annotations of distinct phenomena

(d) Annotation guidelines and standards:

  • Best practices for annotation procedures and/or development and documentation of annotation schemes
  • Interoperability of annotation formats and/or frameworks among different systems as well as different tasks, frameworks, modalities, and languages

(e) Annotation software and frameworks:

  • Development, evaluation and/or innovative use of annotation software frameworks

(f) Annotation schemes:

  • New and innovative annotation schemes
  • Comparison of annotation schemes

Submission Information

The papers should report original and unpublished research on topics of interest for the workshop. Accepted papers are expected to be presented at the workshop, and will be published in the workshop proceedings. They should emphasize obtained results rather than intended work, and should indicate clearly the state of completion of the reported results.

A paper accepted for presentation at the workshop must not be presented or have been presented at any other meeting with publicly available proceedings.

Submission Format: Submissions must be in PDF format and must be consistent with the Coling 2014 style files.

Maximum Length: The maximum length is eight (8) pages of content for long papers or four (4) pages of content for short papers, posters, and demonstrations, plus up to two (2) pages of references.

Anonymous Review: Reviewing of papers will be double-blind. Therefore, the paper must not include the authors' names and affiliations, and self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." should be replaced with citations such as "Smith (1991) previously showed ...". Papers that do not conform to these requirements will be rejected without review.

Double Submitting: Authors of papers that have been or will be submitted to other meetings or publications must provide this information on the START online submission page. Authors of accepted papers must notify the program chairs within 10 days of acceptance if the paper is withdrawn for any reason.

Submission site:

Submission deadline: May 2nd, 2014, 23:59 GMT. Papers submitted after the deadline will not be reviewed.

Important Dates

May 16, 2014 Submission deadline (general or special theme papers)
June 13, 2014 Notification of acceptance
July 4, 2014 Camera-ready paper due
August 23-24, 2014 Workshop in Dublin, Ireland


The workshop programme is available here.


Workshop Chairs

Lori Levin (Carnegie-Mellon University)
Manfred Stede (University of Potsdam)

Organizing Committee

Stefanie Dipper ((Ruhr University Bochum))
Chu-Ren Huang (The Hong Kong Polytechnic University)
Nancy Ide (Vassar College)
Adam Meyers (New York University)
Antonio Pareja-Lora (SIC & ILSA, UCM / ATLAS, UNED)
Massimo Poesio (University of Trento)
Sameer Pradhan (Harvard University)
Katrin Tomanek (University of Jena)
Fei Xia (University of Washington)
Nianwen Xue (Brandeis University)

Programme Committee

Collin Baker (UC Berkeley)
Archna Bhatia (Carnegie Mellon University)
Nicoletta Calzolari (ILC/CNR)
Christian Chiarcos (University of Frankfurt)
Stefanie Dipper (Ruhr University Bochum)
Tomaž Erjavec (Josef Stefan Institute)
Dan Flickinger (Stanford University)
Udo Hahn (Univ Jena)
Chu-Ren Huang (The Hong Kong Polytechnic University)
Nancy Ide (Vassar College)
Aravind Joshi (University of Pennsylvania)
Valia Kordoni (Humboldt University Berlin)
Adam Meyers (New York University)
Antonio Pareja-Lora (SIC & ILSA, UCM / ATLAS, UNED)
Massimo Poesio (University of Trento)
Sameer Pradhan (Harvard University)
James Pustejovsky (Brandeis University)
Katrin Tomanek (University of Jena)
Yulia Tsvetkov (Carnegie Mellon University)
Andreas Witt (IDS Mannheim)
Marie-Paule Péry-Woodley ( Université de Toulouse 2)
Fei Xia (University of Washington)
Nianwen Xue (Brandeis University)
Heike Zinsmeister (University of Hamburg)

Contact Information

Send all questions to