Research and experimental data from principal investigators and others at the University of British Columbia.


If you are a UBC researcher who wishes to deposit data, please see UBC's research data site and we will work with you to make your data available, or contact us at

UBC Research Data Collection
Sort By:
Studies: 40 | Downloads: 2695

The population of Metro Vancouver (20110729Regional Growth Strategy Projections Population, Housing and Employment 2006 – 2041 File) will have increased greatly by 2040, and finding a new source of reservoirs for drinking water (2015_ Water Consumption_ Statistics File) will be essential. This issue of drinking water needs to be optimized and estimated (Data Mining file) with the aim of developing the region. Three current sources of water reservoirs for Metro Vancouver are Capilano, Seymour, and Coquitlam, in which the treated water is being supplied to the customer. The linear optimization (LP) model (Optimization, Sensitivity Report File) illustrates the amount of drinking water for each reservoir and region. In fact, the B.C. government has a specific strategy for the growing population till 2040, which leads them toward their goal. In addition, another factor is the new water source for drinking water that needs to be estimated and monitored to anticipate the feasible water source (wells) until 2040. As such, the government will have to make a decision on how much groundwater is used. The goal of the project is two steps: (1) an optimization model for three water reservoirs, and (2) estimating the new source of water to 2040.

The process of data analysis for the project includes: the data is analyzed with six software—Trifacta Wrangler, AMPL, Excel Solver, Arc GIS, and SQL—and is visualized in Tableau. 1. Trifacta Wrangler Software clean data (Data Mining file). 2. AMPL and Solver Excel Software optimize drinking water consumption for Metro Vancouver (data in the Optimization and Sensitivity Report file). 3. ArcMap collaborates the raw data and result of the optimization water reservoir and estimating population till 2040 with the ArcGIS software (GIS Map for Tableau file). 4. Visualizing, estimating, and optimizing the source of drinking water for Metro Vancouver until 2040 with SQL software in Tableau (export tableau data file).

Production Date:November 23, 2017
Producer:University of British Columbia (UBC), Master of Engineering Leadership
Last Released: Nov 24, 2017
Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Databy Mancarci, B. Ogan; Toker, Lilah; Tripathy, Shreejoy J; Li, Brenna; Rocco, Brad; Sibille, Etienne; Pavlidis, Paul

Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single cell RNA-sequencing studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at

Distribution Date:November 23, 2017
Related Publications:Mancarci, B. Ogan, Lilah Toker, Shreejoy J. Tripathy, Brenna Li, Brad Rocco, Etienne Sibille, and Paul Pavlidis. “Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data.” eNeuro, November 20, 2017, ENEURO.0212-17.2017.
Last Released: Nov 23, 2017
Modeling sources of inter-laboratory variability in electrophysiological properties of mammalian neuronsby Tebaykin, Dmitry; Tripathy, Shreejoy J.; Binnion, Nathalie; Li, Brenna; Gerkin, Richard C.; Pavlidis, Paul

Patch-clamp electrophysiology is widely used to characterize neuronal electrical phenotypes. However, there are no standard experimental conditions for in vitro whole-cell patch-clamp electrophysiology, complicating direct comparisons between datasets. Here, we sought to understand how basic experimental conditions differ among labs and how these differences might impact measurements of electrophysiological parameters. We curated the compositions of external bath solutions (ACSF), internal pipette solutions, and other methodological details such as animal strain and age from 509 published neurophysiology articles studying rodent neurons. We found that very few articles used the exact same experimental solutions as any other and some solution differences stem from recipe inheritance from adviser to advisee as well as changing trends over the years. Next, we used statistical models to understand how the use of different experimental conditions impacts downstream electrophysiological measurements such as resting potential and action potential width. While these experimental condition features could explain up to 43% of the study-to-study variance in electrophysiological parameters, the majority of the variability was left unexplained. Our results suggest that there are likely additional experimental factors that contribute to cross-laboratory electrophysiological variability, and identifying and addressing these will be important to future efforts to assemble consensus descriptions of neurophysiological phenotypes for mammalian cell types.

Production Date:November 22, 2017
Producer:Pavlidis lab (PavLab), University of British Columbia
1 download
Last Released: Nov 23, 2017
Transcriptomic correlates of neuron electrophysiological diversityby Tripathy, Shreejoy; Toker, Lilah; Li, Brenna; Crichlow, Cindy-Lee; Tebaykin, Dmitry; Mancarci, Ogan; Shreejoy Tripathy

How neuronal diversity emerges from complex patterns of gene expression remains poorly understood. Here we present an approach to understand electrophysiological diversity through gene expression by integrating pooled- and single-cell transcriptomics with intracellular electrophysiology. Using neuroinformatics methods, we compiled a brain-wide dataset of 34 neuron types with paired gene expression and intrinsic electrophysiological features from publically accessible sources, the largest such collection to date. We identified 420 genes whose expression levels significantly correlated with variability in one or more of 11 physiological parameters. We next trained statistical models to infer cellular features from multivariate gene expression. Such models were predictive of gene-electrophysiological relationships in an independent collection of 12 visual cortex cell types from the Allen Institute, suggesting that these correlations might reflect general principles relating expression patterns to phenotypic diversity across very different cell types. Many associations reported here have the potential to provide new insights into how neurons generate functional diversity, and correlations of ion channel genes like Gabrd and Scn1a (Nav1.1) with resting potential and spiking frequency are consistent with known causal mechanisms. Our work highlights the promise and inherent challenges in using cell type-specific transcriptomics to understand the mechanistic origins of neuronal diversity.

Last Released: Sep 11, 2017
Multivariate maps of forest attributes for management units in Canada's boreal forestby Kyle Lochhead; Valerie LeMay; Gary Bull; Olaf Schwab; James Halperin

A spatially explicit “wall to wall” forest inventory of percent crown closure (CC), average height (Ht), average age (Age), and commercial species percentages derived using a multi-source inventory approach from Lochhead et al. (2017). This process involved compiling a suite of spatial layers including Landsat TM/ETM+ imagery, interpolated climate variables, topographic variables and other remote sensing products to be used as predictors and Canada’s National Forest Inventory photo plot information (1986-2010) as the source of dependant variables.Following Lochhead et al.(2017) the application used a method called kriging with external drift which relies on a system of non-linear models with spatially varying parameters to produce multivariate maps for the year 2010 at a 90m x 90m pixel resolution. The resulting maps are clipped to each forest management area (FMA) boundary which was sourced online ( For cases, where a 90 m x 90 m pixel window overlapped non-vegetated area (e.g., waterbodies) the forest attribute information was averaging using only the vegetated pixels (i.e., prorating). For more information on methods refer to the publication.

Production Date:July 26, 2017
Producer:Kyle Lochhead
Distribution Date:July 26, 2017
Related Publications:Lochhead, K., LeMay, V., Bull, G., Schwab, O., Halperin, J. 2017. Multivariate imputation for accurate and logically-consistent maps of forest attributes at macroscales. (In press)
Last Released: Aug 15, 2017
UBC Research Data Management Survey: Health Sciencesby Barsky, Eugene; Brown, Helen; Ellis, Ursula; Ishida, Mayu; Janke, Robert; Menzies, Erin; Miller, Katherine; Mitchell, Marjorie; Vis-Dunbar, Mathew

In 2016, the Canadian federal funding agencies introduced the Tri-Agency Statement of Principles on Digital Data Management, which advocates for developing data management plans (DMPs) and making data available for future research. A data management plan addresses questions about: research data types and formats, metadata standards, ethics and legal compliance, data storage and reuse, assignment of data management responsibilities, and resource requirements. With anticipation that DMPs will be increasingly required in grants applications, librarians at University of British Columbia surveyed researchers about their RDM practices and needs in three phases, each of which targets different disciplines: 1) the Sciences and Engineering (fall 2015), 2) the Social Sciences and Humanities (fall 2016), and 3) the Health Sciences (spring 2017). The surveys illuminate disciplinary differences in RDM, and will inform the University in developing infrastructure and services to support researchers in RDM. This report describes findings from the third survey at UBC targeting researchers in the Health Sciences.

Production Date:2017
Producer:UBC Library; UBC ARC
Distribution Date:May 16, 2017
Related Material:Barsky, E., Brown, H., Ellis, U., Ishida, M., Janke, R., Menzies, E., … Vis-Dunbar, M. (2017, May 31). UBC Research Data Management Survey : Health Sciences : Report. Available at
Last Released: Jun 8, 2017
Protease-inhibitor interaction predictions: Lessons on the complexity of protein-protein interactionsby Fortelny, Nikolaus; Butler, Georgina; Overall, Christopher; Pavlidis, Paul

Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features. Thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions were indeed more challenging than predictions of non-proteolytic and non-inhibitory interactions. In summary, we describe a novel and well-defined but difficult protein interaction prediction task, and thereby highlight limitations of computational interaction prediction methods.

Production Date:2017
Last Released: Apr 3, 2017

Type 2 innate lymphoid cells (ILC2) potentiate adaptive immune responses however whether they have a role in mediating cancer immunity has not been assessed. Here, we report that mice genetically lacking ILC2s have significantly increased tumour growth rates and higher frequency of circulating tumour cells (CTCs) and metastasis to distal organs. Our data supports the conclusions that tumour-infiltrating ILC2s help mediate tumour immune-surveillance by promoting adaptive T cells responses and that ILC2s play a hitherto undescribed role in controlling metastasis. Furthermore, we demonstrate that adoptive transfer of ILC2s mediates a dramatic regression in cancer growth. Therefore, the adoptive transfer of ILC2s provides a new immunotherapeutic approach with the potential to aid in the eradication of cancers.

Producer:University of British Columbia (UBC), The Michael Smith Laboratories
Related Material:This is the link for citing the paper
Last Released: Mar 29, 2017
UBC Research Data Management Survey: Humanities and Social Sciencesby Barsky, Eugene; Farrar, Paula ; Meredith-Lobay, Megan; Mitchell, Marjorie ; Naslund, Jo-Anne ; Sylka, Christina

Executive Summary Background In June 2016, the Tri-Council Agencies released a statement regarding Digital Data Management for grant applications . In preparation to support researchers facing new requirements, UBC librarians on both the Vancouver and Okanagan campuses initially surveyed faculty in the Sciences in Fall 2015, to determine both the actual practices of Research Data Management (RDM) employed by these researchers, and areas where the researchers would like help. Acknowledging disciplinary differences, a second survey was administered to all faculty and graduate students in Humanities and Social Sciences in October 2016. The results of these surveys will assist the University in making evidence-based decisions about what expertise will be needed to support and assist faculty in improving their data management practises to meet new requirements from funding bodies. Findings Researchers are collecting and working with a wide variety of data ranging from numerical and text data to multimedia files, software, instrument specific data, geospatial data, and many other types of data. Researchers identified four broad areas where they would like additional help and support: 1. Data Storage (including preservation and sharing) 2. Data Management Plans 3. Data Repository access 4. Data Education (workshops, and personalized training) These areas present opportunities for the Library and campus partners to bolster research excellence by supporting strong RDM practices of Faculty, Students and Staff. Recommendations 1. The Library continues to collaborate with VPR’s Advanced Research Computing (ARC) unit, UBC Ethics, UBC IT Services, and other campus partners to plan and coordinate services for researchers around the management of research data. 2. UBC ensures that a robust infrastructure is available to researchers to store, preserve, and share their research data. 3. UBC implements a campus-wide service to support a Data Management Repository (or suite of repositories) which would include the Abacus Dataverse (currently operated by the Library). Conclusions A more detailed statistical analysis is underway, but initial results show that the majority of survey respondents indicated that they need assistance with storage and security of research data, with crafting data management plans, with a centralized research data repository, and with workshops about research data best practices for faculty and especially for graduate students. Further, understandings of the particular needs or habits within specific research disciplines will provide insights into how these researchers think about, and work with data and can also identify areas for future research and investigation. Finally, this survey has provided a fuller understanding of the RDM needs and perceived barriers and benefits which can now enable more targeted and nuanced conversations between librarians, researchers, and IT research support personnel. These results will assist the Library and other campus partners with the development of specific programs and infrastructure to bolster a strategic direction for RDM support.

Production Date:2016
Distribution Date:February 03, 2017
Related Material:Full report for this survey is available -
Last Released: Feb 23, 2017

This minimum dataset includes variables reported in our publication entitled 'The Mother’s Autonomy in Decision Making (MADM) Scale: Patient-led development and psychometric testing of a new instrument to evaluate experience of maternity care' , published by PLOS ONE. Secondary use of this data must be approved by an ethics committee/ institutional review board. Data is available in csv format and in SPSS (version 23). Please contact Saraswathi Vedam at or Kathrin Stoll at if you would like access to the data. For your information, the original ethics application number at the University of British Columbia was H12-02418.

Related Publications:Saraswathi Vedam, Kathrin Stoll, Kelsey Martin, Nicholas Rubashkin, Sarah Partridge, Dana Thordarson, Ganga Jolicoeur & the Changing Childbirth in BC Steering Council The Mother’s Autonomy in Decision Making (MADM) Scale: Patient-led development and psychometric testing of a new instrument to evaluate experience of maternity care.
7 downloads + analyses
Last Released: Feb 6, 2017
Abacus Dataverse Network - British Columbia Research Library Data Services - Hosted at the University of British Columbia © 2017