The UBC Library data collection, which includes all data from common data sources for easy, one-stop data shopping.
UBC Library Data Services
Sort By:
Studies: 1977 | Downloads: 51568
Description:

Our multidisciplinary team of legal, clinician, and perinatal epidemiology experts designed a study to assess the effects of state regulation of midwives on patient access to high quality maternity care in the US. We developed a novel, weighted scoring system that ranks all 50 states and DC on level of midwifery integration, and then linked state scores to maternal and newborn outcomes. In our study we demonstrate that greater integration of midwives is associated with significantly higher rates of physiologic birth outcomes, lower rates of obstetric interventions, and fewer adverse neonatal outcomes. Our new Midwifery Integration Scoring System provides an evidenced-informed tool that can identify barriers to effective health human resource allocation in maternity care, based on population-level health outcomes data. In the current context of the Sustainable Development Goals to facilitate equitable access to skilled maternity providers, we believe that our findings will be of great interest to your readers. We uploaded the 1) Midwifery Integration Scoring System and 2) the data set that includes all data points needed to replicate the results presented in our paper. Most of the data is for the year 2014 and comes from the CDC. Other data sources are detailed in the publication and a short data dictionary will be uploaded soon.

hdl:11272/10528
6 downloads
Last Released: Feb 21, 2018
Description:

LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, data on wage rates, union status, job permanency and establishment size are also produced.

These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

Production Date:February 09, 2018
Producer:Statistics Canada (Statcan)
Distributor:Statistics Canada (Statcan)
hdl:11272/10575
4 downloads + analyses
Last Released: Feb 13, 2018
DIRHA English WSJ Audioby Ravanelli, Mirco; Cristoforetti, Luca; Omologo, Maurizio
Description:

Introduction

DIRHA English WSJ Audio was developed as part of the Distant-Speech Interaction for Robust Home Applications (DIRHA) Project which addressed natural spontaneous speech interaction with distant microphones in a domestic environment. It is comprised of approximately 85 hours of real and simulated read speech by six native American English speakers. The target utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A), specifically, the 5,000 word subset of read speech from Wall Street Journal news text.

This release contains signals of different characteristics in terms of noise and reverberation making it suitable for various multi-microphone signal processing and distant speech recognition tasks. The corpus can be coupled with related Kaldi baselines and tools that are available here.

Data

Speech was collected in a real apartment setting with typical domestic background noise and inter/intra-room reverberation effects. A total of 32 microphones were placed in the living-room (26 microphones) and in the kitchen (6 microphones). The original recordings were made at a sampling frequency of 48 kHz. However, for the sake of compactness, the released signals in this publication are in wav format with 16 kHz sampling frequency and 16 bit resolution.

Annotations for each acoustic sequence are included in xml format, such as microphone positions, speaker id, speaker gender and speaker position. Additional metadata about the speakers and images of the apartment setting are also provided. Consult the documentation accompanying this release for more information about the collection.

Production Date:January 16, 2018
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/IMDPJ
0 downloads
Last Released: Jan 25, 2018
TRAD Chinese-French Parallel Text -- Blogby Linguistic Data Consortium; ELDA
Description:

Introduction

TRAD Chinese-French Parallel Text – Blog was developed by ELDA as part of the PEA-TRAD project. It contains French translations of a subset of approximately 10,000 Chinese words from GALE Phase 1 Chinese Blog Parallel Text (LDC2008T06).

The PEA-TRAD project (Translation as a Support for Document Analysis) was supported by the French Ministry of Defense (DGA). Its purpose was to develop speech-to-speech translation technology for multiple languages (e.g., Arabic, Chinese, Pashto) from a variety of domains. ELDA developed several corpora for this effort.

Data

This release consists of 444 segments (translations units) from 17 documents. The source data is Chinese blog text collected and translated into English by LDC for the DARPA GALE (Global Autonomous Language Exploitation) program. Information about the ELDA translation team, translation guidelines and validation results is contained in the documentation accompanying this release.

The Chinese source file contains 15,809 characters and the French reference translation contains 11,769 words. The data is presented in two unicode-encoded XML files along with an associated DTD.

Production Date:January 16, 2018
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania; ELDA (ELDA), European Languages Resources Association
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/1DVNQ
0 downloads
Last Released: Jan 25, 2018
DEFT Spanish Treebankby Taulé, Mariona; Martí, Maria Antonia; Bies, Ann; Garí, Aina; Nofre, Montserrat; Song, Zhiyi; Strassel, Stephanie; Ellis, Joe
Description:

Introduction

DEFT Spanish Treebank was developed by the Linguistic Data Consortium (LDC) and the Language and Computation Center (CLiC), University of Barcelona. It contains treebank annotation of international Spanish newswire text and Latin American Spanish discussion forum data created for the DARPA Deep Exploration and Filtering of Text (DEFT) program.

DEFT aimed to improve state-of-the-art capabilities in automated deep natural language processing with a particular focus on technologies dealing with inference, casual relationships and anomaly detection across several languages. DEFT Spanish Treebank supported the program's goal of deep natural language understanding.

Data

Newswire source files were selected from Spanish Gigaword Third Edition (LDC2011T12) and were manually sentence-segmented for DEFT. Discussion forum source files were selected from Spanish discussion forum source data collected by LDC, consisting of continuous multi-posts of 100-1000 words.

This release contains 114 files (54,394 tokens) of newswire data and 60 files (55,307 tokens) of discussion forum data all of which were annotated with constituents and syntactic functions. The annotation guidelines for DEFT Spanish Treebank are included in the documentation accompanying this release.

Source documents are presented as plain text files with one sentence unit per line. Treebank annotation files are in xml.

Production Date:January 16, 2018
Producer:Linguistic Data Consortium (LDC), University of Pennsylvania; Language and Computation Center (CLiC), University of Barcelona (CLiC), Language and Computation Center (CLiC), University of Barcelona
Distributor:Linguistic Data Consortium (LDC), University of Pennsylvania
hdl:11272/NX0EH
0 downloads
Last Released: Jan 23, 2018
Description:

CanMap Content Suite contains over 100 unique and rich content layers. Each layer has a unique file and layer name with associated definitions, descriptions, attribution and metadata. All layers, with a few exceptions, are vector data consisting of polygon, polyline, or point geometry representation.

Production Date:December 15, 2017
Producer:DMTI Spatial Inc. (DMTI); Statistics Canada (Statcan)
Distributor:DMTI Spatial, Inc. (DMTI); Statistics Canada (Statcan)
hdl:11272/MJRF2
520 downloads
Last Released: Jan 19, 2018
Description:

The Postal Codes by Federal Ridings File (PCFRF) is a digital file which provides a link between the six- character postal code and Canada’s federal electoral districts (which are also known as federal ridings). Elections Canada defines a federal electoral district (FED) as any place or territorial area entitled to return

a Member of Parliament (MP) to serve in the House of Commons. Federal electoral district legal limits and descriptions are the responsibility of the Chief Electoral Officer, and are usually revised every ten years after the results of the decennial census. There are 338 FEDs in the 2013 Representation Order, the most recent revision of the federal electoral districts limits.

Production Date:December 13, 2017
Producer:Statistics Canada (Statcan)
Distribution Date:January 16, 2018
Distributor:Statistics Canada (Statcan)
hdl:11272/10537
3 downloads
Last Released: Jan 16, 2018
Description:

The Postal Code Conversion File (PCCF) is a digital file which provides a correspondence between the Canada Post Corporation (CPC) six-character postal code and Statistics Canada’s standard geographic areas for which census data and other statistics are produced. Through the link between postal codes and standard geographic areas, the PCCF permits the integration of data from various sources.

The geographic coordinates, which represent the standard geostatistical areas linked to each postal code on the PCCF, are commonly used to map the distribution of data for spatial analysis (e.g., clients, activities). The location information is a powerful tool for marketing, planning, or research purposes. In April 1983, the Statistical Registers and Geography Division released the first version of the PCCF, which linked postal codes to 1981 Census geographic areas and included geographic coordinates. Since then, the file has been updated on a regular basis to reflect changes.

For this release of the PCCF, the vast majority of the postal codes are directly geocoded to 2016 Census geography while others are linked via various conversion processes. A quality indicator for the confidence of this linkage is available in the PCCF.

Production Date:December 13, 2017
Producer:Statistics Canada (Statcan)
Distribution Date:January 16, 2018
Distributor:Statistics Canada (Statcan)
hdl:11272/10536
20 downloads
Last Released: Jan 16, 2018
Description:

The Labour Force Survey provides estimates of employment and unemployment which are among the most timely and important measures of performance of the Canadian economy. With the release of the survey results only 10 days after the completion of data collection, the LFS estimates are the first of the major monthly economic data series to be released.

The Canadian Labour Force Survey was developed following the Second World War to satisfy a need for reliable and timely data on the labour market. Information was urgently required on the massive labour market changes involved in the transition from a war to a peace-time economy. The main objective of the LFS is to divide the working-age population into three mutually exclusive classifications - employed, unemployed, and not in the labour force - and to provide descriptive and explanatory data on each of these.

LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, wage rates, union status, job permanency and workplace size are also produced. For a full listing and description of LFS variables, see the Guide to the Labour Force Survey (71-543-G), available through the "Publications" link above.

These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

Important note -- 4 August 2017

Labour Force Survey (LFS) data from January 2017 – July 2017 contained errors with numerical variables. Variables such as HRLYARN and UHRSMAIN were missing decimal place holders. As such, their values were off by a factor of 100. The issue has been addressed and the data for the year re-released

Production Date:January, 2017
Producer:Statistics Canada (Statcan)
Distribution Date:February, 2017
Distributor:Statistics Canada (Statcan)
hdl:11272/10439
171 downloads + analyses
Last Released: Jan 16, 2018
Description:

The Social Policy Simulation Database and Model (SPSD/M) is a tool designed to assist those interested in analyzing the financial interactions of governments and individuals in Canada. It can help one to assess the cost implications or income redistributive effects of changes in the personal taxation and cash transfer system. As the name implies, SPSD/M consists of two integrated parts: a database (SPSD), and a model (SPSM). The SPSD is a non-confidential, statistically representative database of individuals in their family context, with enough information on each individual to compute taxes paid to and cash transfers received from government. The SPSM is a static accounting model which processes each individual and family on the SPSD, calculates taxes and transfers using legislated or proposed programs and algorithms, and reports on the results. A sophisticated software environment gives the user a high degree of control over the inputs and outputs to the model and can allow the user to modify existing programs or test proposals for entirely new programs. The model comes with full documentation including an on-line help facility.

Users and Applications

The SPSD/M has been used in hundreds of sites across Canada. These sites have diverse research interests in the area of income tax-transfer and commodity tax systems in Canada as well as varied experience in micro-simulation. Our growing client base includes federal departments, provincial governments, universities, interest groups, corporate divisions, and private consultants. The diverse applications of the SPSD/M can be seen in the following examples of studies and published research reports:

  • Costing out proposals for amendments to the Income Tax Act affecting the tax treatment of seniors and the disabled
  • Estimating the fiscal viability of major personal tax reform options, including three flat tax scenarios
  • The comparison low income (poverty) measures and their effect on the estimates of the number of poor
  • An Analysis of the Distributional Impact of the Goods and Services Tax
  • Married and Unmarried Couples: The Tax Question
  • Taxes and Transfers in Rural Canada
  • Equivalencies in Canadian Public Policy
  • When the Baby Boom Grows Old: Impact on Canada's Public Sector

Some potential uses of the model are illustrated by the following list of questions which may be answered using the SPSM:

  • How large an increase in the federal Child Tax Benefit could be financed by allocating an additional $500 million to the program?
  • Which province would have the most advantageous tax structure for an individual with $45,000 earned income, 2 children and $15,000 of investment income?
  • What is the after-tax value of the major federal child support programs on a per child basis, and how are these benefits distributed across family types and income groups?
  • How many individuals otherwise paying no tax would have to pay tax under various minimum tax systems, and what would additional government revenues be?
  • How much money would be needed to raise all low income families and persons to Statistics Canada's low income cut-offs in 2014?
  • How much would average household "consumable" income rise if a province eliminated its gasoline taxes?
  • How much would federal government revenue rise by if there was an increase in the GST rate?

Producer:Statistics Canada (Statcan)
Distributor:Statistics Canada (Statcan)
hdl:11272/10535
4 downloads
Last Released: Jan 15, 2018
 
 
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2018