The UBC Library data collection, which includes all data from common data sources for easy, one-stop data shopping.
UBC Library Data Services
Sort By:
Studies: 2067 | Downloads: 75216
Description:

Tables from National Graduates Survey, 2013 (Class of 2009-2010)

This survey was designed to determine such factors as: the extent to which graduates of postsecondary programs had been successful in obtaining employment since graduation; the relationship between the graduates' programs of study and the employment subsequently obtained; the graduates' job and career satisfaction; the rates of under-employment and unemployment; the type of employment obtained related to career expectations and qualification requirements; and the influence of postsecondary education on occupational achievement.

Production Date:2013
Producer:Statistics Canada (Statcan)
Distribution Date:June, 2014
Distributor:Statistics Canada (Statcan)
hdl:11272/10040
326 downloads + analyses
Last Released: Sep 13, 2019
TAC KBP Evaluation Source Corpora 2016-2017by Ellis, Joe; Getman, Jeremy; Strassel, Stephanie
Description:

TAC KBP Evaluation Source Corpora 2016-2017 was developed by the Linguistic Data Consortium (LDC) and contains the 180,003 Chinese, English and Spanish source documents used in support of all TAC KBP evaluation tracks conducted in 2016 and 2017.

Text Analysis Conference (TAC) is a series of workshops organized by the National Institute of Standards and Technology (NIST). TAC was developed to encourage research in natural language processing and related applications by providing a large test collection, common evaluation procedures, and a forum for researchers to share their results. Through its various evaluations, the Knowledge Base Population (KBP) track of TAC encourages the development of systems that can match entities mentioned in natural texts with those appearing in a knowledge base and extract novel information about entities from a document collection and add it to a new or existing knowledge base.

Data

The source data consists of Chinese, English and Spanish discussion forum and newswire text collected by LDC. Documents are released as UTF-8 encoded XML with corresponding DTDs. Also provided are a series of lists and tables to aid in the recreation of specific test sets. See the included documentation for more information.

Production Date:August 15, 2019
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/NT383
0 downloads
Last Released: Sep 10, 2019
Description:

LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, data on wage rates, union status, job permanency and establishment size are also produced.

These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

Production Date:February, 2018
Producer:Statistics Canada (Statcan)
Distribution Date:February 08, 2019
Distributor:Statistics Canada (Statcan)
hdl:11272/10714
45 downloads + analyses
Last Released: Sep 10, 2019
Description:

Corpus of Conversational Persian Transcripts consists of transcripts from approximately 20 hours of naturally occurring informal conversations in the Tehrani dialect of Iranian Persian. The corresponding speech is not included in this release.

Data

This corpus is extracted from 1,201 minutes of conversations among 22 participants, 12 male and 10 female. The participants recorded their daily phone calls and face-to-face interactions in a variety of informal settings. The conversations represent various interaction types, settings, types of relationship, and communicative goals.

The transcripts were annotated for gender, age, and recording method and setting. See the included documentation for more information about the annotations and transcription methodology.

Each conversation is presented as a UTF-8 encoded XML file.

Production Date:August 15, 2019
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/8NRYT
0 downloads
Last Released: Aug 30, 2019
Multi-Language Conversational Telephone Speech 2011 -- East Asianby Jones, Karen; Graff, David; Walker, Kevin; Strassel, Stephanie
Description:

Multi-Language Conversational Telephone Speech 2011 – East Asian was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 19 hours of telephone speech in two distinct languages of East Asia: Thai and Lao.

The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related.

LDC has also released the following as part of the Multi-Language Conversational Telephone Speech 2011 series:

  • Slavic (LDC2016S11)

  • Turkish (LDC2017S09)

  • South Asian (LDC2017S14)

  • Central Asian (LDC2018S03)

  • Central European (LDC2018S08)

  • Spanish (LDC2018S12)

  • Arabic (LDC2019S02)

  • English (LDC2019S06)

Data

Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC’s telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected.

All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data:

group lng #calls #hours #MB
e_asian lao 63 12.4 539
e_asian tha 38 6.9 354
totals 101 19.3 893

Production Date:August 15, 2019
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/AOF7T
1 download
Last Released: Aug 30, 2019
Description:

National Survey of Giving, Volunteering, and Participating (NSGVP) which was undertaken to better understand how Canadians support individuals and communities on their own or through their involvement with charitable and nonprofit organizations. The NSGVP was conducted as a supplement to the Labour Force Survey. For this survey, thousands of Canadians aged 15 and over were asked how they: gave money and other resources to individuals and to organizations; volunteered time to help others and to enhance their communities; and participated in the practices which help give substance to active citizenship. The results from this survey allow this report to tell a story about who Canada's volunteers, charitable donors and civic participators are and the ways in which they contribute to our society This survey collected information about: the activities of volunteers, who benefit from these activities and the settings in which the activity take place; the satisfaction people gain from volunteering; the amount and patterns of time that people spend volunteering through organizations; the training and supervision people receive during their volunteer experiences through organizations; the out-of-pocket expenses connected with voluntary activities through organizations. THERE ARE THREE DATA FILES FOR THE NSGVP. The main answer file, the volunteer event file and the giver event file. To link between files use the variable MICRO_ID. Main data file: This is the main answer file and contains one record per respondent. All questions except for those on the volunteer event files and giver event files are located here. In addition, summary derived variables have been created from the volunteer event and the giver event files and placed on the MAIN file. Use the MICRO_ID to link with other files. Volunteer data file: This is the volunteer organization answer file. It will contain 1-3 records per person who volunteered (1 per organization the respondent volunteered with). This file contains information on the type of organization for which the individual volunteered, and the number of hours volunteered for the organization. Giver data file: This is the charitable donation answer file. It will contain 1-55 records per person per solicitation method who made a charitable donation. Each record represents 1 donation made to a charitable organization. For each donation made, this file contains information on the type of organization to whom the donation was made as well as the value of the donation.

Production Date:2000
Producer:Statistics Canada: Special Surveys Division
Distribution Date:November 19, 2009
hdl:11272/AVB47
24 downloads
Last Released: Aug 27, 2019
Description:

Note March 2006: Statistics Canada has reweighted the SLID from 1996 - 2002 using 2001 census population counts. These files are original files. Newer, reweighted files can be found by searching for "Survey of Labour and Income Dynamics, Reweighted" The cross-sectional public-use microdata file for the Survey of Labour and Income Dynamics (SLID) is a collection of income, labour and family variables on persons in Canada and their families. The production of this file includes many safeguards to prevent the identification of any one person. Although often referred to as one file, the SLID public-use microdata file is three separate files: person, economic family and census family. The person file contains identifiers that allow a researcher to group persons into households, economic families and census families as well as to link each of these files. The following information about SLID is excerpted from Statistics Canada's Labour and Income Dynamics: Survey Overview The Survey of Labour and Income Dynamics adds a new dimension to existing survey data on labour market activity and income: the changes experienced by individuals through time. At the heart of the survey's objectives is to understand the economic well-being of Canadians: what economic shifts do individuals and families live through, and how does it vary with changes in their paid work, family make-up, receipt of government transfers, or other factors? The survey's longitudinal dimension makes it possible to see such concurrent and often related events. SLID will be the first household survey ever to provide national data on the fluctuations in income that a typical family or individual experiences through time, which will give greater insight on the nature and extent of poverty in Canada. Starting in 1993, SLID is following the same respondents for six years. A second "panel" will start in 1996, and so on every three years. The size of the first panel is 15,000 households, including about 31,000 adults.

Production Date:1996
Producer:Minister of Industry; Statistics Canada
Distribution Date:November 19, 2009
hdl:11272/GPGL7
7 downloads
Last Released: Aug 27, 2019
Description:

The following information about SLID is excerpted from Statistics Canada's Labour and Income Dynamics: Survey Overview The Survey of Labour and Income Dynamics adds a new dimension to existing survey data on labour market activity and income: the changes experienced by individuals through time. At the heart of the survey's objectives is to understand the economic well-being of Canadians: what economic shifts do individuals and families live through, and how does it vary with changes in their paid work, family make-up, receipt of government transfers, or other factors? The survey's longitudinal dimension makes it possible to see such concurrent and often related events. SLID will be the first household survey ever to provide national data on the fluctuations in income that a typical family or individual experiences through time, which will give greater insight on the nature and extent of poverty in Canada. Starting in 1993, SLID is following the same respondents for six years. A second "panel" will start in 1996, and so on every three years. The size of the first panel is 15,000 households, including about 31,000 adults.

Production Date:1994
Producer:Minister of Industry; Statistics Canada
Distribution Date:November 19, 2009
hdl:11272/SZVBY
49 downloads
Last Released: Aug 27, 2019
Description:

The main purpose of this survey is to study the coverage of the employment insurance program. It provides a meaningful picture of who does or does not have access to EI benefits among the jobless and those in a situation of underemployment. The Employment Insurance Coverage Survey also covers access to maternity and parental benefits. The survey was designed to produce a series of precise measures to identify groups with low probability of receiving benefits, for instance, the long-term jobless, labour market entrants and students, people becoming unemployed after uninsured employment, people who have left jobs voluntarily and individuals who are eligible, given their employment history, but do not claim or otherwise receive benefits. The survey provides a detailed description of the characteristics of the last job held as well as reasons for not receiving benefits or for not claiming. Through the survey data, analysts will also be able to observe the characteristics and situation of people not covered by EI and of those who exhausted EI benefits, the job search intensity of the unemployed, expectation of recall to a job, and alternate sources of income and funds. Survey data pertaining to maternity and parental benefits answer questions on the proportion of mothers of an infant who received maternity and parental benefits, the reason why some mothers do not receive benefits and about sharing parental benefits with their spouse. The survey also allows looking at the timing and circumstances related to the return to work, the income adequacy of households with young children and more.

Production Date:2001
Producer:Statistics Canada: Special Surveys Division
Distribution Date:November 19, 2009
hdl:11272/PPHKU
3 downloads
Last Released: Aug 27, 2019
Description:

Individuals Aged 15 Years and Over, With and Without Income, from Statistics Canada's Survey of Consumer Finances, contains income as well as personal and labour-related characteristics of individuals aged 15 years and over. A limited number of characteristics of the individual's economic and census families are also included on the file.

Production Date:1988
Producer:Statistics Canada: Household Surveys Division
Distribution Date:November 18, 2009
hdl:11272/HXY9R
0 downloads
Last Released: Aug 27, 2019
 
 
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2018