Phrase Detectives Corpus
hdl:11272/QCTAL
Version: 1 – Released: Thu May 18 09:31:05 PDT 2017
Cataloging Information
Data & Analysis
Comments (0)
Versions
 
Data Citation
If you use these data, please add the following citation to your scholarly references. Why cite?
Data Citation Details
TitlePhrase Detectives Corpus
Study Global IDhdl:11272/QCTAL
Other IDLinguistic Data Consortium: LDC2017T08; ISBN: 1-58563-798-X; ISLRN: 052-688-100-874-5
AuthorsChamberlain, Jon; Poesio, Massimo; Kruschwitz, Udo
ProducerLinguistic Data Consortium (LDC), University of Pennsylvania
Production DateMay 15, 2017
Production PlacePhiladelphia
DistributorLinguistic Data Consortium (LDC), University of Pennsylvania
Deposit DateMay 18, 2017
SeriesLDC, Linguistic Data Consortium
Versionv1.0, May 15, 2017
Original Dataverse
Description and Scope
Description

Introduction

Phrase Detectives Corpus was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 19,012 words across 40 documents anaphorically-annotated by the Phrase Detectives game, an online interactive "game-with-a-purpose" (GWAP) designed to collect data about English anaphoric coreference.

GWAPs for creating language resources are growing. In general, they employ non-monetary incentives, such as entertainment, to motivate participation and can be successful for large-scale persistent annotation efforts.

Data

The documents in the corpus are taken from Wikipedia articles and from narrative text in Project Gutenberg. Wikipedia articles and annotation files are presented as XML and Project Gutenberg source files are presented as plain text. All text is encoded as UTF-8. Annotations are comprised of a gold standard version created by multiple experts, as well as a set created by a large non-expert crowd (via the Phase Detectives game).

The data was annotated according to a prevalent linguistically-oriented approach for anaphora used in several tasks, including OntoNotes Release 5.0 (LDC2013T19), SemEval-2010 Task 1 Ontonotes English: Coreference Resolution in Multiple Languages (LDC2011T01) and The ARRAU Corpus of Anaphoric Information (LDC2013T22).

KeywordsLinguistics (ACV)
Time Period Covered2017 - 2017
Date of Collection2017 - 2017
Country/NationUnited States (US); United Kingdom (GB)
Kind of DataLinguistic data
Data Collection / Methodology
Data SourcesFiction, web collection
Data Availability
Number of Files 3
Terms of Use
Conditions

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

  1. incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
  2. incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
  3. publish statistics and other summaries of the LDC data.

B. License

Except as otherwise provided herein, the user shall have no right to copy, redistribute, transmit, publish, sell, transfer, or otherwise use the LDC data for any purpose. The user shall give appropriate attribution to the LDC data in all scholarly or similar publications for which the LDC data or potions thereof have been used.

C. Access to Individual Users

Only individuals who are then-current faculty, students or staff members of LDC Member institutions or consultants or individuals providing services or doing research for Member institutions shall have access to the LDC data.

D. Copyright

The LDC data is protected by copyright as a collective work or compilation under the laws of the United States and other countries. All content, material, and other elements comprising LDC data are also copyrighted works. Users must abide by all additional copyright notices or restrictions contained in the LDC data license agreement supplements.

Dataverse Terms of Use
View Terms of Use [+]

Terms of Use

  1. Introduction

    1. The "Service" means, collectively, all aspects of the Abacus / NESSTAR and associated services and websites.
    2. The term "Content" means the data, text, graphics, photos, sounds, music, videos, audiovisual combinations, interactive features, software, scripts, and any other electronic materials you may view on or access through the Service.
  2. Your Acceptance of this Agreement

    1. By clicking you agree to the terms and conditions of this Agreement, which supplement the policies, rules and requirements of your institution.
    2. If you do not agree to these Terms of Use you must not log in, access, browse or otherwise use the Service. If you have questions or concerns, please contact research.data@ubc.ca.
  3. Use of the Service and Content

    Use of the Service and Your Content. You may access and use the Content uploaded on the Service strictly in compliance with the copyright terms identified on or associated with such Content.
  4. General Conditions of Use

    1. Without limiting the foregoing and the prohibited uses set out in Policy #104, Acceptable Use and Security of UBC Electronic Information and Systems, which is hereby incorporated by reference, the following is not permitted:
      1. using any automated system, including without limitation, "robots," "spiders," or "offline readers," to harvest or scrape information from the Service or any part(s) thereof, or to send more request messages in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser; or
      2. in any way intentionally placing undue burden on the technical systems or networks connected to the Service.
    2. UBC may suspend your account, or access to the Service, if it learns or is credibly notified (as determined by UBC) that your conduct is in violation of these Terms of Use.
  5. Liability and Indemnity

    1. The Service and the Content is provided to you AS IS. You understand that UBC does not endorse any Content submitted to the Service by any user, or any opinion, recommendation, or advice expressed therein, and UBC expressly disclaims any and all liability in connection with Content, including without limitation all direct, indirect, special, incidental or consequential damage or any other damages whatsoever and howsoever caused, arising out of or in connection with the use of the Service or any Content, or in reliance on the Service or the Content.
    2. In addition, the Service may contain links to third party websites. UBC has no control over, and assumes no responsibility for, the content, privacy policies, or practices of any third party websites.
    3. You agree to indemnify and hold harmless UBC, its Board of Governors, agents, contractors, licensors, and licensees against any all claims arising from or in any way relating to your use of the Service.
  6. Trademarks

    Certain words, phrases, names, designs or logos used on the Site may constitute trademarks, service marks or trade names of the UBC or other entities. The display of any such marks or names on the Site does not imply that UBC or other entities have granted a license or authorization of any kind to use such marks or names. You may not use any of UBC's trademarks, service marks or trade names without UBC's prior written permission.
  7. Choice of Law

    The laws of the Province of British Columbia and the laws of Canada applicable therein shall govern as to the interpretation, validity and effect of this document, notwithstanding any conflict of laws provisions of your domicile, residence or physical location. You hereby consent and submit to the exclusive jurisdiction of the courts of the Province of British Columbia in any action or proceeding instituted under or related to your use of the Service.
Other Information
NotesInfo (DCMI type) Text; Info (Application) Information detection, parsing, information extraction, taggging; Info (Language) English; Info (Language ID) eng
Download the cataloging information in XML format - DDI (full)
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2017    

"Phrase Detectives Corpus", hdl:11272/QCTAL