Studies
Sort By:
Studies: 1992 | Downloads: 44544
Fisher English Training Speech Part 1 Transcriptsby Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker, Kevin
Description:

Fisher English Training Speech Part 1 Transcripts represents the first half of a collection of conversational telephone speech (CTS) that was created at LDC in 2003. It contains time-aligned transcript data for 5,850 complete conversations, each lasting up to 10 minutes. In addition to the transcriptions, which are found under the trans directory, there is a complete set of tables describing the speakers, the properties of the telephone calls, and the set of topics that were used to initiate the conversations. The corresponding speech files are contained in Fisher English Training Speech Part 1 Speech (LDC2004S13).

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a large number of participants each calls an other participant, whom they typically do not know, for a short short period of time to discuss the assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so, however, the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak about an assigned topic chosen from a randomly generated list that changes every 24 hours. All participants that day will be assigned subjects from that list. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

Data

Overall, about 12% of the conversations were transcribed at LDC, and the rest were transcribed by BBN and WordWave using a significantly different approach to the task. A central goal in both sets was to maximize the speed and economy of the transcription process. This in turn involved certain aspects of mark-up detail and quality control that may have been common in previous, smaller corpora.

The LDC transcripts were based on automatic segmentation of the audio data, to identify the utterance end-points on both channels of each conversation. Given these time stamps, manual transcription was simply a matter of typing in the words for each segment and doing a rudimentary spell-check. No attempt was made to modify the segmentation boundaries manually, or to locate utterances that the segmenter might have missed. Portions of speech where the transcriber could not be sure exactly what was said were marked with double parentheses – (( … )) – and the transcriber could hazard a guess as to what was said, or leave the region between parentheses blank. The LDC transcription process yields one plain-text transcript file per conversation, in which the first two lines show the call-ID and the fact that the transcript was developed at LDC. The remainder of the file contains one utterance per line (with blank lines separating the utterances), with the start-time, end-time, speaker/channel-ID and utterance text.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

hdl:11272/IQON1
0 downloads
Last Released: Oct 19, 2017
Fisher English Training Speech Part 1 Speechby Cieri, Christopher; Graff, David; Kimball, Owen; Miller, Dave; Walker,Kevin
Description:

Fisher English Training Speech Part 1 Speech represents the first half of a collection of conversational telephone speech (CTS) that was created at the LDC during 2003. It contains 5,850 audio files, each one containing a full conversation of up to 10 minutes. Additional information regarding the speakers involved and types of telephones used can be found in the companion text corpus of transcripts, Fisher English Training Speech Part 1, Transcripts (LDC2004T19).

The Fisher telephone conversation collection protocol was created at LDC to address a critical need of developers trying to build robust automatic speech recognition (ASR) systems. Previous collection protocols, such as CALLFRIEND and Switchboard-II and the resulting corpora, have been adapted for ASR research but were in fact developed for language and speaker identification respectively. Although the CALLHOME protocol and corpora were developed to support ASR technology, they feature small numbers of speakers making telephone calls of relatively long duration with narrow vocabulary across the collection. CALLHOME conversations are challengingly natural and intimate. Under the Fisher protocol, a very large number of participants each make a few calls of short duration speaking to other participants, whom they typically do not know, about assigned topics. This maximizes inter-speaker variation and vocabulary breath while also increasing formality.

Previous protocols such as CALLHOME, CALLFRIEND and Switchboard relied upon participant activity to drive the collection. Fisher is unique in being platform driven rather than participant driven. Participants who wish to initiate a call may do so; however the collection platform initiates the majority of calls. Participants need only answer their phones at the times they specified when registering for the study.

To encourage a broad range of vocabulary, Fisher participants are asked to speak on an assigned topic which is selected at random from a list, which changes every 24 hours and which is assigned to all subjects paired on that day. Some topics are inherited or refined from previous Switchboard studies while others were developed specifically for the Fisher protocol.

Data

The individual audio files are presented in NIST SPHERE format, and contain two-channel mu-law sample data; “shorten” compression has been applied to all files.

Data collection and transcription were sponsored by DARPA and the U.S. Department of Defense, as part of the EARS project for research and development in automatic speech recognition.

hdl:11272/64I08
0 downloads
Last Released: Oct 19, 2017
Description:

The Canadian Centre for Justice Statistics (CCJS), in co-operation with the policing community, collects police-reported crime statistics through the Uniform Crime Reporting Survey (UCR). The UCR Survey was designed to measure the incidence of crime in Canadian society and its characteristics.

UCR data reflect reported crime that has been substantiated by police. Information collected by the survey includes the number of criminal incidents, the clearance status of those incidents and persons-charged information. The UCR Survey produces a continuous historical record of crime and traffic statistics reported by every police agency in Canada since 1962. In 1988, a new version of the survey was created, UCR2, and is since referred to as the “incident-based” survey, in which microdata on characteristics of incidents, victims and accused are captured.

Data from the UCR Survey provide key information for crime analysis, resource planning and program development for the policing community. Municipal and provincial governments use the data to aid decisions about the distribution of police resources, definitions of provincial standards and for comparisons with other departments and provinces.

To the federal government, the UCR survey provides information for policy and legislative development, evaluation of new legislative initiatives, and international comparisons.

To the public, the UCR survey offers information on the nature and extent of police-reported crime and crime trends in Canada. As well, media, academics and researchers use these data to examine specific issues about crime.

Statistical activity

The survey is currently administered as part of the National Justice Statistics Initiative (NJSI). Since 1981, the Federal, Provincial and Territorial Deputy Ministers responsible for the administration of justice in Canada, with the Chief Statistician, have been working together in an enterprise known as the National Justice Statistics Initiative. The mandate of the NJSI is to provide information to the justice community as well as the public on criminal and civil justice in Canada. Although this responsibility is shared among Federal, Provincial and Territorial departments, the lead responsibility for the development of Canada’s statistical system remains with Statistics Canada.

hdl:11272/O6UQ6
94 downloads
Last Released: Oct 18, 2017
IARPA Babel Lao Language Pack IARPA-babel203b-v3.1aby Benowitz, Daniel; Bills, Aric; Conners, Thomas; Dubinski, Eyal; Fiscus, Jonathan; Harper, Mary; Heighway, Melanie; Le, Hanh; Melot, Jennifer; Onaka, Akiko; Ray, Jessica; Rytting, Anton; Shen, Wade; Silber, Ronnie; Tzoukermann, Evelyne
Description:

Introduction

IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a was developed by Appen for the IARPA (Intelligence Advanced Research Projects Activity) Babel program. It contains approximately 207 hours of Lao conversational and scripted telephone speech collected in 2013 along with corresponding transcripts.

The Babel program focuses on underserved languages and seeks to develop speech recognition technology that can be rapidly applied to any human language to support keyword search performance over large amounts of recorded speech.

Data

The Lao speech in this release represents that spoken in the Vientiane dialect region in Laos. The gender distribution among speakers is approximately equal; speakers’ ages range from 16 years to 60 years. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments including the street, a home or office, a public place, and inside a vehicle.

Audio data is presented as 8kHz 8-bit a-law encoded audio in sphere format and 48kHz 24-bit PCM encoded audio in wav format. Transcripts are encoded in UTF-8. The romanization scheme was developed by Appen and was based on the scheme developed by the American Library Association and Library of Congress. Further information about transcription methodology is contained in the documentation accompanying this release.

Evaluation data is available from NIST in support of OpenKWS.

hdl:11272/EQQQR
0 downloads
Last Released: Oct 17, 2017
Description:

The Tuition and Living Accommodation Costs (TLAC) survey collects data for full-time students at Canadian degree-granting institutions that are publicly funded. This annual survey was developed to provide an overview of tuition and additional compulsory fees, and living accommodation costs for an academic year.

The TLAC survey data are used to

  • provide stakeholders, the public and students with annual tuition costs and changes in tuition fees from the previous year
  • contribute to a better understanding of the costs to obtain a degree
  • contribute to education policy development
  • contribute to the Consumer Price Index
  • facilitate interprovincial comparisons
  • facilitate comparisons between institutions.

Reference period: Academic year (September 1 to April 30)

Collection period: April through June

hdl:11272/LN2IO
53 downloads
Last Released: Oct 13, 2017
Description:

Since the beginning of 2005, the Travel Survey of Residents of Canada (TSRC) has been conducted to measure domestic travel in Canada. It replaces the Canadian Travel Survey (CTS). Featuring several definitional changes and a new questionnaire, this survey provides estimates of domestic travel that are more in line with the international guidelines recommended by the World Tourism Organization (WTO) and the United Nations Statistical Commission. In 2011, TSRC underwent a redesign.

The Travel Survey of Residents of Canada is sponsored by Statistics Canada, the Canadian Tourism Commission, and the provincial governments. It measures the size of domestic travel in Canada from the demand side. The objectives of the survey are to provide information about the volume of trips and expenditures for Canadian residents by trip origin, destination, duration, type of accommodation used, trip reason, mode of travel, etc.; to provide information on travel incidence and to provide the socio-demographic profile of travellers and non-travellers. Estimates allow quarterly analysis at the national, provincial and tourism region level (with varying degrees of precision) on:

  • total volume of same-day and overnight trips taken by the residents of Canada with destinations in Canada,

  • same-day and overnight visits in Canada,

  • main purpose of the trip/key activities on trip,

  • spending on same-day and overnight trips taken in Canada by Canadian residents in total and by category of expenditure,

  • modes of transportation (main/other) used on the trip,

  • person-visits, household-visits, spending in total and by expense category for each location visited in Canada,

  • person- and household-nights spent in each location visited in Canada, in total and by type of accommodation used,

  • use of travel packages and associated spending and source of payment (household, government, private employer),

  • demographics of adults that took or did not take trips, and

  • travel party composition.

The main users of the TSRC data are Statistics Canada, the Canadian Tourism Commission, the provinces, and tourism boards. Other users include the media, businesses, consultants and researchers.

hdl:11272/10511
0 downloads + analyses
Last Released: Oct 13, 2017
Description:

The Inter-corporate ownership product is the most authoritative and comprehensive source of information available on corporate ownership; a unique directory of "who owns what" in Canada. It provides up-to-date information reflecting recent corporate takeovers and other substantial changes. Ultimate corporate control is determined through a careful study of holdings by corporations, the effects of options, insider holdings, convertible shares and interlocking directorships. The number of corporations that make up the hierarchy of structures totals approximately 45,000.

The information that is presented is based on non-confidential returns filed by Canadian corporations under the Corporations Returns Act and on research using public sources such as internet sites. The data are presented in an easy-to-read tiered format, illustrating at a glance the hierarchy of subsidiaries within each corporate structure. The entries for each corporation provide both the country of control and the country of residence.

The product covers every individual corporation that is part of a group of commonly controlled corporations with combined assets exceeding 600 million dollars or combined revenue exceeding 200 million dollars. Individual corporations with debt obligations or equity owing to non-residents exceeding a net book value of 1 million dollars are covered as well.

hdl:11272/10475
0 downloads
Last Released: Oct 12, 2017
Description:

The interprovincial and international trade flows shows the origin and destination of trade flows by product among Canadian provinces and territories and from and to the rest of the world. The information is available at the four levels (Detail, Link-1997, Link-1961 and Summary) of hierarchy of the Supply and Use Product Classification (SUPC). The data is provided in spreadsheet format for ease of use.

Please note that the tables for 2010 and 2011 have been replaced with new tables were created for 2012 and 2013. The data is now available at the detail level without any data suppressions.

hdl:11272/10510
0 downloads
Last Released: Oct 12, 2017
Description:

No abstract available.

hdl:11272/YG1DY
0 downloads
Last Released: Oct 12, 2017
University of Victoria Herbarium Specimen Databaseby Allen, Geraldine A.; Anthony, Wendy.
Description:

This data set contains Darwin Core descriptions of the specimens held in the University of Victoria Herbarium. About 1/5 of the total specimens are recorded here.

DOI: 10.18357/Herbarium.2016.data01

The UVic collection includes vascular plants (~50,000 specimens), bryophytes and lichens mainly from British Columbia (especially Vancouver Island), but also from nearby areas including Washington, Oregon, Idaho, Alberta, and the Yukon. The collection houses a large collection of North American asters (Asteraceae), especially the genera Eucephalus and Symphyotrichum. Special collections include those of Arctic plants (Dewline Collection), lacustrine aquatics and macrophytes and marine algae.

The collection is an important resource for education and scientific research and receives wide use through loans solicited from researchers and institutions around the world, as well as researchers at UVic. Researchers have used UVic specimens to study genetic relationships among plants, species variation and plant phylogeography. This research has contributed to numerous publications. The specimens are classified using the Darwin Core metadata standard.

hdl:11272/10342
18 downloads
Last Released: Oct 11, 2017
 
 
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2017