BOLT Information Retrieval Comprehensive Training and Evaluation
Version: 1 – Released: Thu Sep 27 14:59:45 PDT 2018
Cataloging Information
Data & Analysis
Comments (0)
Data Citation
If you use these data, please add the following citation to your scholarly references. Why cite?
Data Citation Details
TitleBOLT Information Retrieval Comprehensive Training and Evaluation
Study Global IDhdl:11272/S6XNB
Other IDLinguistic Data Consortium: LDC2018T18; ISBN: 1-58563-855-2; ISLRN: 1-58563-855-2
AuthorsGriffitt, Kira; Strassel, Stephanie
ProducerLinguistic Data Consortium (LDC), University of Pennsylvania
Production DateSeptember 17, 2018
Production PlacePhiladelphia
Grant NumberBOLT - HR0011-11-C-0145, Defense Advanced Research Projects Agency
DistributorLinguistic Data Consortium (LDC), University of Pennsylvania
Deposit DateSeptember 27, 2018
SeriesLDC, Linguistic Data Consortium
Versionv1.0, September 17, 2018
Original Dataverse
Description and Scope


BOLT Information Retrieval Comprehensive Training and Evaluation was developed by the Linguistic Data Consortium (LDC) and consists of all data produced in support of the Information Retrieval (IR) task within the DARPA Broad Operational Language Translation (BOLT) Program, including annotations, source documents and scoring software.

The BOLT program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported BOLT by collecting informal data sources – discussion forums, text messaging and chat – in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.

The material in this release relates to the IR task, which sought to support development of systems that could: (1) take as input a natural language English query sentence; (2) return relevant responses to that query from a large corpus of informal documents in the three BOLT languages (Arabic, Chinese, and English); and (3) translate responses from non-English documents into English. Data

BOLT Information Retrieval Comprehensive Training and Evaluation contains the pilot, dry run, and evaluation data developed for each phase of the BOLT IR task, including: (1) natural-language IR queries, system responses to queries, and manually-generated assessment judgments for system responses; (2) discussion forum source documents in Arabic, Chinese and English; (3) scoring software for each evaluation phase; and (4) experimental data developed in Phase 2.

Source data is presented as a series of zip archives containing xml files. Queries and responses data are presented as XML as well. Judgments are included as tab delimited files.

KeywordsLinguistics (ACV)
Time Period Covered2012 - 2018
Date of Collection2012 - 2018
Country/NationUnited States (US)
Kind of DataLinguistic data
Data Collection / Methodology
Data SourcesDiscussion forum
Data Availability
Number of Files 5
Terms of Use

Linguistic Data Consortium Data Use Agreement

A. Except as to the extent prohibited by any user agreement, the user shall have the right to

  1. incorporate portions of the LDC (Linguistic Data Consortium) data into its own work products for internal, non-commercial use and not for redistribution,
  2. incorporate small excerpts of text or audio data from the LDC data for display or publication in a scientific or technical context, but only for the purpose of descriving the research and related issues, and
  3. publish statistics and other summaries of the LDC data.

B. License

Except as otherwise provided herein, the user shall have no right to copy, redistribute, transmit, publish, sell, transfer, or otherwise use the LDC data for any purpose. The user shall give appropriate attribution to the LDC data in all scholarly or similar publications for which the LDC data or potions thereof have been used.

C. Access to Individual Users

Only individuals who are then-current faculty, students or staff members of LDC Member institutions or consultants or individuals providing services or doing research for Member institutions shall have access to the LDC data.

D. Copyright

The LDC data is protected by copyright as a collective work or compilation under the laws of the United States and other countries. All content, material, and other elements comprising LDC data are also copyrighted works. Users must abide by all additional copyright notices or restrictions contained in the LDC data license agreement supplements.

Dataverse Terms of Use
View Terms of Use [+]

Terms of Use

  1. Introduction

    1. The "Service" means, collectively, all aspects of the Abacus / NESSTAR and associated services and websites.
    2. The term "Content" means the data, text, graphics, photos, sounds, music, videos, audiovisual combinations, interactive features, software, scripts, and any other electronic materials you may view on or access through the Service.
  2. Your Acceptance of this Agreement

    1. By clicking you agree to the terms and conditions of this Agreement, which supplement the policies, rules and requirements of your institution.
    2. If you do not agree to these Terms of Use you must not log in, access, browse or otherwise use the Service. If you have questions or concerns, please contact
  3. Use of the Service and Content

    Use of the Service and Your Content. You may access and use the Content uploaded on the Service strictly in compliance with the copyright terms identified on or associated with such Content.
  4. General Conditions of Use

    1. Without limiting the foregoing and the prohibited uses set out in Policy #104, Acceptable Use and Security of UBC Electronic Information and Systems, which is hereby incorporated by reference, the following is not permitted:
      1. using any automated system, including without limitation, "robots," "spiders," or "offline readers," to harvest or scrape information from the Service or any part(s) thereof, or to send more request messages in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser; or
      2. in any way intentionally placing undue burden on the technical systems or networks connected to the Service.
    2. UBC may suspend your account, or access to the Service, if it learns or is credibly notified (as determined by UBC) that your conduct is in violation of these Terms of Use.
  5. Liability and Indemnity

    1. The Service and the Content is provided to you AS IS. You understand that UBC does not endorse any Content submitted to the Service by any user, or any opinion, recommendation, or advice expressed therein, and UBC expressly disclaims any and all liability in connection with Content, including without limitation all direct, indirect, special, incidental or consequential damage or any other damages whatsoever and howsoever caused, arising out of or in connection with the use of the Service or any Content, or in reliance on the Service or the Content.
    2. In addition, the Service may contain links to third party websites. UBC has no control over, and assumes no responsibility for, the content, privacy policies, or practices of any third party websites.
    3. You agree to indemnify and hold harmless UBC, its Board of Governors, agents, contractors, licensors, and licensees against any all claims arising from or in any way relating to your use of the Service.
  6. Trademarks

    Certain words, phrases, names, designs or logos used on the Site may constitute trademarks, service marks or trade names of the UBC or other entities. The display of any such marks or names on the Site does not imply that UBC or other entities have granted a license or authorization of any kind to use such marks or names. You may not use any of UBC's trademarks, service marks or trade names without UBC's prior written permission.
  7. Choice of Law

    The laws of the Province of British Columbia and the laws of Canada applicable therein shall govern as to the interpretation, validity and effect of this document, notwithstanding any conflict of laws provisions of your domicile, residence or physical location. You hereby consent and submit to the exclusive jurisdiction of the courts of the Province of British Columbia in any action or proceeding instituted under or related to your use of the Service.
Other Information
NotesInfo (DCMI type) Text; Info (Application) Information retrieval, machine translation; Info (Language) Egyptian Arabic, Mandarin Chinese, English; Info (Language ID) arz, cmn, eng; Info (Project) BOLT; Info (Misc) This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2018

"BOLT Information Retrieval Comprehensive Training and Evaluation", hdl:11272/S6XNB