Sort By:
Studies: 2009 | Downloads: 47358

The population of Metro Vancouver (20110729Regional Growth Strategy Projections Population, Housing and Employment 2006 – 2041 File) will have increased greatly by 2040, and finding a new source of reservoirs for drinking water (2015_ Water Consumption_ Statistics File) will be essential. This issue of drinking water needs to be optimized and estimated (Data Mining file) with the aim of developing the region. Three current sources of water reservoirs for Metro Vancouver are Capilano, Seymour, and Coquitlam, in which the treated water is being supplied to the customer. The linear optimization (LP) model (Optimization, Sensitivity Report File) illustrates the amount of drinking water for each reservoir and region. In fact, the B.C. government has a specific strategy for the growing population till 2040, which leads them toward their goal. In addition, another factor is the new water source for drinking water that needs to be estimated and monitored to anticipate the feasible water source (wells) until 2040. As such, the government will have to make a decision on how much groundwater is used. The goal of the project is two steps: (1) an optimization model for three water reservoirs, and (2) estimating the new source of water to 2040.

The process of data analysis for the project includes: the data is analyzed with six software—Trifacta Wrangler, AMPL, Excel Solver, Arc GIS, and SQL—and is visualized in Tableau. 1. Trifacta Wrangler Software clean data (Data Mining file). 2. AMPL and Solver Excel Software optimize drinking water consumption for Metro Vancouver (data in the Optimization and Sensitivity Report file). 3. ArcMap collaborates the raw data and result of the optimization water reservoir and estimating population till 2040 with the ArcGIS software (GIS Map for Tableau file). 4. Visualizing, estimating, and optimizing the source of drinking water for Metro Vancouver until 2040 with SQL software in Tableau (export tableau data file).

Last Released: Nov 24, 2017

Orthorectified aerial imagery of the UBC Vancouver campus, 2017. Ortho Pixel size - 10 cm

Last Released: Nov 24, 2017
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Databy Mancarci, B. Ogan; Toker, Lilah; Tripathy, Shreejoy J; Li, Brenna; Rocco, Brad; Sibille, Etienne; Pavlidis, Paul

Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single cell RNA-sequencing studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at

Last Released: Nov 23, 2017
Modeling sources of inter-laboratory variability in electrophysiological properties of mammalian neuronsby Tebaykin, Dmitry; Tripathy, Shreejoy J.; Binnion, Nathalie; Li, Brenna; Gerkin, Richard C.; Pavlidis, Paul

Patch-clamp electrophysiology is widely used to characterize neuronal electrical phenotypes. However, there are no standard experimental conditions for in vitro whole-cell patch-clamp electrophysiology, complicating direct comparisons between datasets. Here, we sought to understand how basic experimental conditions differ among labs and how these differences might impact measurements of electrophysiological parameters. We curated the compositions of external bath solutions (ACSF), internal pipette solutions, and other methodological details such as animal strain and age from 509 published neurophysiology articles studying rodent neurons. We found that very few articles used the exact same experimental solutions as any other and some solution differences stem from recipe inheritance from adviser to advisee as well as changing trends over the years. Next, we used statistical models to understand how the use of different experimental conditions impacts downstream electrophysiological measurements such as resting potential and action potential width. While these experimental condition features could explain up to 43% of the study-to-study variance in electrophysiological parameters, the majority of the variability was left unexplained. Our results suggest that there are likely additional experimental factors that contribute to cross-laboratory electrophysiological variability, and identifying and addressing these will be important to future efforts to assemble consensus descriptions of neurophysiological phenotypes for mammalian cell types.

1 download
Last Released: Nov 23, 2017

The 2016 Census Geographic Attribute File contains information at the dissemination block level, based on 2016 Census standard geographic areas. The data available include population counts, dwelling counts and land area. In addition, the 2016 Census Geographic Attribute File contains higher level standard geographic codes, names and, where applicable, types and classes. Data for higher level standard geographic areas can be derived by aggregating dissemination block-level data. The dissemination area representative point coordinates are also included in the 2016 Census Geographic Attribute File.

This version of the Geographic Attribute File is a dissemination block (DB)-level dataset which also includes data for the following 2016 Census standard geographic areas:

  • province and territory (PR)
  • economic region (ER)
  • census division (CD)
  • census consolidated subdivision (CCS)
  • census subdivision (CSD)
  • designated place (DPL)
  • federal electoral district (FED) (2013 Representation Order)
  • census metropolitan area (CMA), census agglomeration (CA) and census metropolitan in uenced zone (MIZ)
  • census tract (CT)
  • population centre (POPCTR) and rural area (RA)
  • aggregate dissemination area (ADA)
  • dissemination area (DA)

Last Released: Nov 22, 2017

This series of cross-tabulations present a portrait of Canada based on the various census topics. They range in complexity and are available for various levels of geography.

Last Released: Nov 20, 2017
Mapping permeability of saturated terrestrial lithologies over the surface of the Earth (data)by Gleeson, Tom; Smith, Leslie; Moosdorf, Nils; Hartmann, Jens; Durr, Hans; Manning, Andrew; van Beek, Ludovicus; Jellinek, A M

Auxiliary Materials for ‘Mapping permeability over the surface of the earth’ [Gleeson et al.] Table S1 is a compilation of horizontal intrinsic permeability, vertical anisotropy and horizontal unit length from peer-reviewed, calibrated models with hydrolithologic units that are >5 km in length with a shallow upper contact (< 100m depth). We compiled two-hundred and thirty hydrogeologic units from calibrated models which are grouped into seven hydrolithologic categories.

Last Released: Nov 17, 2017
Global volume and distribution of modern groundwater: groundwater age transport modeling results (data)by Gleeson, Tom ; Befus, Kevin; Jasechko, Scott; Luijendijk, Elco; Cardenas, M. Bayani

The authors combine geochemical, geologic, hydrologic and geospatial data sets with numerical simulations of groundwater and analyse tritium ages to show that less than 6% of the groundwater in the uppermost portion of Earth’s landmass is modern. We find that the total groundwater volume in the upper 2 km of continental crust is approximately 22.6 million km3, of which 0.1–5.0 million km3 is less than 50 years old. Although modern groundwater represents a small percentage of the total groundwater on Earth, the volume of modern groundwater is equivalent to a body of water with a depth of about 3 m spread over the continents. This water resource dwarfs all other components of the active hydrologic cycle. For each continent, we present the geomatic assignment of hydrologic parameters and the resulting simulation-based modern groundwater equivalent (D_eq50) for the purely geomatic assignment of parameters, an estimate pairing models to watersheds using groundwater recharge and strict lithology control, and an estimate using recharge and strict water table gradient control. These files have a 2 letter acronym for the continent/landmass followed by _globalws_results_Gleesonetal_NatGeo.csv. The corresponding watershed data can be downloaded at Geomatic analyses used an updated, unpublished HydroSHEDS watershed boundaries that are slightly different than those available on (Bernhard Lehner, personal communication 2014). Therefore, in the data presented here, we used a spatial join to assign the modeling results and geomatic data to the currently downloadable HydroSHEDS zeroth-level watersheds. Nearly all of the watersheds were very similar in extent, however a variable small percent (< 0.1%) of watersheds in each continent were not located in the currently downloadable HydroSHEDS data.

Last Released: Nov 17, 2017
Last Released: Nov 9, 2017
Abstract Meaning Representation (AMR) Annotation Release 2.0by Knight, Kevin; Badarau, Bianca; Baranescu, Laura; Bonial, Claire; Bardocz, Madalina; Griffitt, Kira; Hermjakob, Ulf; Marcu, Daniel; Palmer, Martha; O'Gorman, Tim; Schneider, Nathan

Abstract Meaning Representation (AMR) Annotation Release 2.0 was developed by the Linguistic Data Consortium (LDC), SDL/Language Weaver, Inc., the University of Colorado’s Computational Language and Educational Research group and the Information Sciences Institute at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums.

AMR captures “who is doing what to whom” in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax.

LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12).


The source data includes discussion forums collected for the DARPA BOLT and DEFT programs, transcripts and English translations of Mandarin Chinese broadcast news programming from China Central TV, Wall Street Journal text, translated Xinhua news texts, various newswire data from NIST OpenMT evaluations and weblog data used in the DARPA GALE program. The following table summarizes the number of training, dev, and test AMRs for each dataset in the release. Totals are also provided by partition and dataset:

Dataset Training Dev Test Totals
BOLT DF MT 1061 133 133 1327
Broadcast conversation 214 0 0 214
Weblog and WSJ 0 100 100 200
BOLT DF English 6455 210 229 6894
DEFT DF English 19558 0 0 19558
Guidelines AMRs 819 0 0 819
2009 Open MT 204 0 0 204
Proxy reports 6603 826 823 8252
Weblog 866 0 0 866
Xinhua MT 741 99 86
Totals 36521 1368 1371 39260

For those interested in utilizing a standard/community partition for AMR research (for instance in development of semantic parsers), data in the “split” directory contains 39,260 AMRs split roughly 93%/3.5%/3.5% into training/dev/test partitions, with most smaller datasets assigned to one of the splits as a whole. Note that splits observe document boundaries. The “unsplit” directory contains the same 39,260 AMRs with no train/dev/test partition.

1 download
Last Released: Nov 6, 2017
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2017