GCC2015 Lightning Talks

Lightning talks are a mixture of topics selected in advance and those solicited during the meeting. They will be presented during Session 4, on Tuesday and Session 8 on Wednesday.

The slides for all lightning talks will be made available on the this page, and the talks may be videotaped and posted here as well.

See also:

Lightning Talk Session 1

Tuesday, 17:15-18:00

Cancer Genomics in Galaxy

Marco Albuquerque1, Bruno Grande 1, Dr. Ryan Morin1

1 Simon Fraser University

Slides

An inherent difficulty in data-driven biology is the multi-disciplinary skill set required of the scientist to draw meaningful inferences from complex data sets. Cancer genomics epitomizes this problem with the advent of next-generation sequencing (NGS) and the concomitant need for computational analysis. Despite the myriad available algorithms, a bottleneck in data analysis remains because of cryptic command-line parameters, inflexible system environments with difficult installations, and demanding hardware requirements. Our project directly addresses these issues by building a cancer genomics toolbox consisting of parallelized tools and workflows for the cloud-ready Galaxy platform. Our toolbox will contain some 50 new Galaxy tools spanning several sub-categories, notably variant calling, visualization and additional helper tools for integrating and summarizing results. These will be assembled with existing tools to form Galaxy workflows. Following Galaxy best practices, users will be able to seamlessly install our tools automatically. To ensure optimal accuracy, workflow design and tool parameterization will be informed by the benchmarking results from the ICGC-TCGA DREAM challenges. All tools and workflows will be developed to ensure optimal parallelism on a cluster environment. The incomplete Map-Reduce parallelization framework offered by Galaxy will be expanded, including new merge and split functions for NGS data types used by our tools. Ultimately, this will provide a competitive graphical user interface for performing cancer genome analyses and hopefully find a home in clinics around the world, advancing the field of personalized medicine.

Factors impacting bacterial gut microbiome community composition in wild non-human primates in Taï National Park, Côte d’Ivoire

Jan F. Gogarten1,2,3, Michael C. Nelson4, Joerg Graf4, Johann P. Gogarten4, Roman Wittig2, Sébastien Calvignac-Spencer3, Fabian H. Leendertz3

1 Department of Biology, McGill University, Montreal, Canada
2 Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
3 Project Group Epidemiology of Highly Pathogenic Microorganisms, Robert Koch Institute, Berlin, Germany
4 Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, USA

Slides

Host microbiomes are estimated to make up as much as 90% of cells and 99% of unique genes in host organisms. Increasing evidence suggests this microbiome impacts a broad array of processes including a host’s ability to access nutrients and health by processes such as pathogen exclusion and immune system priming. Yet little is known about how microbiomes of individual hosts assemble, particularly in the ecological contexts in which they evolved. We examined several factors hypothesized to play a role in gut bacterial microbiome assembly within an ecosystem, such as a host’s species, social group, relatedness, and interactions with pathogens. We collected fecal samples from wild primates that regularly form mixed-species associations in Taï National Park, Côte d’Ivoire. We generated amplicons covering the 16S V4 hypervariable region and sequenced them using an Illumina MiSeq. To understand factors influencing within-species variation, we concentrated our sampling on sooty mangabeys (Cercocebus atys). Using a non-metric multidimensional scaling approach and the ENVFIT function from the R package “vegan”, we found that different samples from the same individual had consistent bacterial community compositions, and differences between individuals were related to the host’s species, social group, and familial relationships. In contrast to humans and chimpanzees infected with primate immunodeficiency viruses and mirroring results from gorillas, infection with simian immunodeficiency viruses was not a significant predictor of a host’s bacterial community. We hope to use this dataset to examine whether social factors such as grooming and spatial proximity influence the composition of gut bacterial communities.

JBrowse as a Galaxy Tool

Eric Rasche1

1 Center for Phage Technology

Slides

JBrowse as a Galaxy tool presents a novel addition, allowing for the simple summarisation of genome annotation workflows as a simple, user-friendly, interactive HTML dataset.

Unlike traditional visualisations, it can be used as a component of workflows and downloaded to disk at any time.

Giving Galaxy Hands

Cameron Smith1, Torsten Houwaart1, Bjoern Gruening1

At the first Freiburg open data hackathon, the Freiburg Galaxy team jumped at the opportunity to take Galaxy outside the box. We proof-of-concept’d ev3, a robot controlled by Galaxy.

GenAP-Galaxy on cvmfs

David Morais, Michel Barrette, David Bujold, Carol Gauthier, Kuang Chung Chen, Simon Nderitu, Maxime Levesque, Bryan Caron, Alain Veilleux, Pierre-Etienne Jacques, Guillaume Bourque

GenAP is a Canadian platform to host genetic and genomic tools. In GenAP users can instantiate their own Galaxy via a web portal. The platform was designed to be installed on all major HPC centres in Canada.

We use CERN VM file system (CVMFS) to install and propagate softwares across HPC sites.

Galaxy has been integrated in CVMFS allowing for a central installation and maintenance point regardless of where it is running in Canada.

bioaRchive: Enabling reproducibility of Bioconductor analyses in Galaxy

Nitesh Turaga1, Eric Rasche2, Enis Afgan1 Dannon Baker1, Galaxy team

1 Johns Hopkins University
2 Center for phage Technology, TAMU

Project Website: bioarchive.galaxyproject.org
Source Code: https://github.com/bioarchive/aRchive_source_code
License: MIT

Slides, (PDF)

The Bioconductor suite provides bioinformatics tools in the form of R packages, which have frequent version upgrades. However, once a Bioconductor package is upgraded, it is difficult to retrieve previous versions and that causes interoperability challenges between Galaxy and Bioconductor. The Galaxy Tool Shed enables Galaxy administrators to easily install desired versions of tools, including Bioconductor packages. However, not all Bioconductor package versions are available. To address this, we have implemented this “bioaRchive” – a repository of all versions of all Bioconductor packages and can be easily retrieved.

Lightning Talk Session 2

Wednesday, 17:00-17:50

Dynamic Job Mapping in Galaxy Simplified

Daniel Bouchard, Philip Mabon, Eric Enns

Dynamic job mapping in Galaxy is a very useful feature for admins, but may be underutilized. We have developed a dynamic job mapper which is suitable for any tool. It is designed to be simple to add rules for any tool and doesn’t require a restart of your Galaxy server.

BioMAJ2Galaxy: automatic update of reference data in Galaxy using BioMAJ

Anthony Bretaudeau1,2, Cyril Monjeaud2, Yvan Le Bras 2, Fabrice Legeai1,3, Olivier Collin2

1 INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), Campus Beaulieu, 35042 Rennes, France
2 INRIA, IRISA, GenOuest Core Facility, Campus de Beaulieu, 35042 Rennes, France
3 INRIA, IRISA, GenScale, Campus de Beaulieu, 35042 Rennes, France

Slides

BioMAJ is a widely used and stable software that is designed to automate the download and transformation of data from various sources. To ease the process of giving access to reference data in Galaxy, we have developed the BioMAJ2Galaxy module. With this module, it is now possible to configure BioMAJ to automatically download some reference data, to then convert and/or index it in various formats, and then make this data available in a Galaxy server using data libraries or data managers.

These developments allow us to integrate the reference data in Galaxy in an automatic, reliable, and diskspace-saving way. The code is freely available on the GenOuest GitHub account (https://github.com/genouest/biomaj2galaxy).

Call me, maybe – Integrating candiSNP visualisation into Galaxy

Christian Schudoma1, Martin Page1, Dan MacLean1

1The Sainsbury Laboratory, Norwich, UK

Slides

CandiSNP generates per-chromosome plots of SNPs across genomes. It provides the location of the SNPs and uses colour-coded palettes to show if the SNPs are in coding or non-coding regions and if they are synonymous or non-synonymous. We built a Galaxy tool that communicates with and pulls in the reply from the candiSNP server in order to integrate SNP visualisation into variant calling and annotation workflows.

iPlant Collaborative – Community Cyberinfrastructure for Life Science

Jason Williams1, John Fonner2

1 Cold Spring Harbor Laboratory
2 Texas Advanced Computing Center, University of Texas, Austin

iPlant Collaborative

Slides

The iPlant Collaborative (www.iplantcollaborative.org) develops a comprehensive cyberinfrastructure for the storage, sharing, and analyses of large datasets – from genomes to phenotype data, and beyond. iPlant offers easy-to-use tools that cover a variety of genotype-phenotype related analyses (e.g. genome assembly, annotation, RNA-Seq, GWAS, image analysis, etc.) in a platform that accommodates every level of user – from “bench-biologist” to bioinformatician. Computational resources include generous storage allocations as well as access to high-performance and cloud computing. iPlant platforms are extensible and customizable via application programming interfaces (APIs), RESTful services, and web-based systems for data access, tool integration, and analysis. Training and online learning materials make collaboration and people central to the CI. Funded by the National Science Foundation (#DBI-0735191), iPlant is driven by and freely available to the community.

Training Microbiologists in NGS Analysis: from CLI to Galaxy: Pain to Gain

Ali Al-Shahib1

1 Public Health England, London

Slides

Training non-bioinformaticians to analyse NGS data can be a challenging task. Pre-Galaxy, we at the Public Health England taught lab scientists how to analyse NGS data using the command line. Now, we have introduced Galaxy in our training programmes and they love it! I would be presenting the methodology and feedback of training we provide in PHE.

Publish your tools: for real!

Rob Davidson1, Scott Edmunds, Peter Li, Chris Hunter, Jesse Xiao

1 GigaScience

Slides

The Galaxy community is always talking about publishing tools and workflows – meaning stick it in the toolshed. GigaScience is an Open Access journal for big data and analyses in biosciences and we offer to peer review and then provide a DOI for your Galaxy workflows – now that’s publishing for real. We’ll quickly talk through our Open Peer Review and other Open processes including our own GigaGalaxy server. GigaScience is currently calling for submissions to our ‘Galaxy Series’ with a discount for any projects presented at GCC2015.

A system to validate published data and studies

Tazro Ohta1, “, Database Center for Life Science, http://dbcls.rois.ac.jp Ryota Yamanaka2, Osamu Ogasawara3, Yoshinobu Masatani4, Shigetoshi Yokoyama4, Kento Aida4

We will introduce our project to develop a system to validate sequencing data submitted to the public repositories like DDBJ’s Sequence Read Archive. We have done a proof-of-concept development of the system to fetch the data from public repository and run a workflow on a galaxy system based on docker, on the large computing infrastructure.

GCC2015 Hackathon Report

GCC2015 Hackathon Organizers (Code and Data!)

Presented by Carrie Ganote

This talk is a brief overview of our two Hackathon Sessions – Code and Data. Just a collection of the major points about the projects we worked.

Background

Goals

Lightning talks are your opportunity to give an impassioned and enthralling talk about something that you care about – but you only have 300 seconds. Make every one count, because your audience may include people suffering from limited attention spans this late in the proceedings.

Timing

  • Lightning talks are 5 minutes followed by 2 minutes for questions.
  • At 5 minutes in, thunder will be played
  • At 6 minutes in we will take over the presentation laptop and start switching to the next set of slides.
  • At 7 minutes the next talk will start, no matter what.

Slides

  • Your slides (as PDF or !PowerPoint) should be on the presentation computer before the session starts (talk to Dave Clements) to minimize the risk of BYOD.
  • You can BYOD (your own computer or whatever) but you are advised not to.
  • If you do BYOD, we will start swapping out your device at 2 minutes left, rather than 1.
  • Connection and fiddling time beyond the first minute comes out of your 5 minutes and is painful, for everyone.

Gratuitous Advice

From Ross Lazarus, the former Benevolent Lightning Session Moderator for Life

  • Good lightning talks are well rehearsed and very, very focussed.
  • Plan on talking to 5 or 6 slides
  • Don’t try to cram a 30 minute talk into 5 minutes. It won’t fit.
  • 5 minutes is not long enough to explain anything in detail. Just give the big picture.
  • Practice your talk at least 3 times to make sure it works and fits in 5 minutes.
  • If you have more than 5 or 6 slides, you are probably screwed before you start and stand a high risk of being cut off in mid-flight unless you have rehearsed a few times with a timer to be sure you can fit everything in.
  • You are advised not to read your acknowledgements out loud. It’s a lightning talk for heaven’s sake.