UBC Dataverse Network
Sort By:
Studies: 1967 | Downloads: 46301

The population of Metro Vancouver (20110729Regional Growth Strategy Projections Population, Housing and Employment 2006 – 2041 File) will have increased greatly by 2040, and finding a new source of reservoirs for drinking water (2015_ Water Consumption_ Statistics File) will be essential. This issue of drinking water needs to be optimized and estimated (Data Mining file) with the aim of developing the region. Three current sources of water reservoirs for Metro Vancouver are Capilano, Seymour, and Coquitlam, in which the treated water is being supplied to the customer. The linear optimization (LP) model (Optimization, Sensitivity Report File) illustrates the amount of drinking water for each reservoir and region. In fact, the B.C. government has a specific strategy for the growing population till 2040, which leads them toward their goal. In addition, another factor is the new water source for drinking water that needs to be estimated and monitored to anticipate the feasible water source (wells) until 2040. As such, the government will have to make a decision on how much groundwater is used. The goal of the project is two steps: (1) an optimization model for three water reservoirs, and (2) estimating the new source of water to 2040.

The process of data analysis for the project includes: the data is analyzed with six software—Trifacta Wrangler, AMPL, Excel Solver, Arc GIS, and SQL—and is visualized in Tableau. 1. Trifacta Wrangler Software clean data (Data Mining file). 2. AMPL and Solver Excel Software optimize drinking water consumption for Metro Vancouver (data in the Optimization and Sensitivity Report file). 3. ArcMap collaborates the raw data and result of the optimization water reservoir and estimating population till 2040 with the ArcGIS software (GIS Map for Tableau file). 4. Visualizing, estimating, and optimizing the source of drinking water for Metro Vancouver until 2040 with SQL software in Tableau (export tableau data file).

Last Released: Nov 24, 2017

Orthorectified aerial imagery of the UBC Vancouver campus, 2017. Ortho Pixel size - 10 cm

Last Released: Nov 24, 2017
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Databy Mancarci, B. Ogan; Toker, Lilah; Tripathy, Shreejoy J; Li, Brenna; Rocco, Brad; Sibille, Etienne; Pavlidis, Paul

Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single cell RNA-sequencing studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at Neuroexpresso.org.

Last Released: Nov 23, 2017
Modeling sources of inter-laboratory variability in electrophysiological properties of mammalian neuronsby Tebaykin, Dmitry; Tripathy, Shreejoy J.; Binnion, Nathalie; Li, Brenna; Gerkin, Richard C.; Pavlidis, Paul

Patch-clamp electrophysiology is widely used to characterize neuronal electrical phenotypes. However, there are no standard experimental conditions for in vitro whole-cell patch-clamp electrophysiology, complicating direct comparisons between datasets. Here, we sought to understand how basic experimental conditions differ among labs and how these differences might impact measurements of electrophysiological parameters. We curated the compositions of external bath solutions (ACSF), internal pipette solutions, and other methodological details such as animal strain and age from 509 published neurophysiology articles studying rodent neurons. We found that very few articles used the exact same experimental solutions as any other and some solution differences stem from recipe inheritance from adviser to advisee as well as changing trends over the years. Next, we used statistical models to understand how the use of different experimental conditions impacts downstream electrophysiological measurements such as resting potential and action potential width. While these experimental condition features could explain up to 43% of the study-to-study variance in electrophysiological parameters, the majority of the variability was left unexplained. Our results suggest that there are likely additional experimental factors that contribute to cross-laboratory electrophysiological variability, and identifying and addressing these will be important to future efforts to assemble consensus descriptions of neurophysiological phenotypes for mammalian cell types.

1 download
Last Released: Nov 23, 2017

The 2016 Census Geographic Attribute File contains information at the dissemination block level, based on 2016 Census standard geographic areas. The data available include population counts, dwelling counts and land area. In addition, the 2016 Census Geographic Attribute File contains higher level standard geographic codes, names and, where applicable, types and classes. Data for higher level standard geographic areas can be derived by aggregating dissemination block-level data. The dissemination area representative point coordinates are also included in the 2016 Census Geographic Attribute File.

This version of the Geographic Attribute File is a dissemination block (DB)-level dataset which also includes data for the following 2016 Census standard geographic areas:

  • province and territory (PR)
  • economic region (ER)
  • census division (CD)
  • census consolidated subdivision (CCS)
  • census subdivision (CSD)
  • designated place (DPL)
  • federal electoral district (FED) (2013 Representation Order)
  • census metropolitan area (CMA), census agglomeration (CA) and census metropolitan in uenced zone (MIZ)
  • census tract (CT)
  • population centre (POPCTR) and rural area (RA)
  • aggregate dissemination area (ADA)
  • dissemination area (DA)

Last Released: Nov 22, 2017

This series of cross-tabulations present a portrait of Canada based on the various census topics. They range in complexity and are available for various levels of geography.

Last Released: Nov 20, 2017
Last Released: Nov 9, 2017
Abstract Meaning Representation (AMR) Annotation Release 2.0by Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan

Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.

AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.

LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12).


The source data includes discussion forums collected for the DARPA BOLT and DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

Dataset Training Dev Test Totals
BOLT DF MT 1061 133 133 1327
Broadcast conversation 214 0 0 214
Weblog and WSJ 0 100 100 200
BOLT DF English 6455 210 229 6894
DEFT DF English 19558 0 0 19558
Guidelines AMRs 819 0 0 819
2009 Open MT 204 0 0 204
Proxy reports 6603 826 823 8252
Weblog 866 0 0 866
Xinhua MT 741 99 86
Totals 36521 1368 1371 39260

For those interested in utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the “split” directory contains 39,260 AMRs split roughly 93%/3.5%/3.5% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The “unsplit” directory contains the same 39,260 AMRs with no train/dev/test partition.

1 download
Last Released: Nov 6, 2017

The Labour Force Survey provides estimates of employment and unemployment which are among the most timely and important measures of performance of the Canadian economy. With the release of the survey results only 10 days after the completion of data collection, the LFS estimates are the first of the major monthly economic data series to be released.

The Canadian Labour Force Survey was developed following the Second World War to satisfy a need for reliable and timely data on the labour market. Information was urgently required on the massive labour market changes involved in the transition from a war to a peace-time economy. The main objective of the LFS is to divide the working-age population into three mutually exclusive classifications - employed, unemployed, and not in the labour force - and to provide descriptive and explanatory data on each of these.

LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, wage rates, union status, job permanency and workplace size are also produced. For a full listing and description of LFS variables, see the Guide to the Labour Force Survey (71-543-G), available through the "Publications" link above.

These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

Important note -- 4 August 2017

Labour Force Survey (LFS) data from January 2017 – July 2017 contained errors with numerical variables. Variables such as HRLYARN and UHRSMAIN were missing decimal place holders. As such, their values were off by a factor of 100. The issue has been addressed and the data for the year re-released

116 downloads + analyses
Last Released: Nov 3, 2017
Metalogue Multi-Issue Bargaining Dialogueby Petukhova, Volha; Malchanau, Andrei; Oualil, Youssef; Klakow, Dietrich; Stevens, Christopher; Weerd, Harmen de; Taatgen, Niels

Metalogue Multi-Issue Bargaining Dialogue was developed by the Metalogue Consortium under the European Community’s Seventh Framework Programme for Research and Technological Development. This release consists of approximately 2.5 hours of semantically annotated English dialogue data that includes speech and transcripts.

The goal of the Metalogue project was to develop a dialogue system with flexible dialogue management to enable the system’s behavior in setting goals, choosing strategies and monitoring various processes. Participants were involved in a multi-issue bargaining scenario in which a representative of a city council and a representative of small business owners negotiated the implementation of new anti-smoking regulations. The negotiation involved four issues, each with four or five options. Participants received a preference profile for each scenario and negotiated for an agreement with the highest value based on their preference information. Negotiators were not allowed to accept an agreement with a negative value or to share their preference profiles with other participants.


Six unique subjects (undergraduates between 19 and 25 years of age) participated in the collection. The dialogue speech was captured with two headset microphones and saved in 16kHz, 16-bit mono linear PCM FLAC format. Speech signal files are of two types: full dialogue session; and segmented speech signal, cut per speaker and roughly per turn.

Transcripts were produced semi-automatically, using an automatic speech recognizer followed by manual correction.

Seven types of annotation were performed manually using the Anvil tool: dialogue act annotations; discourse structure acts; contact management acts; task management dialogue acts; negotiation moves; rhetorical relations; and disfluencies in speech production. More information about the annotation process is included in the documentation.

All text is presented in UTF-8 as either plain text or XML.

Last Released: Nov 2, 2017
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2015