GCC2015 Posters

The deadlines for poster presentations has passed. However, late poster abstracts are still being accepted and will be considered until the poster space is full, or as cancellations occur.

Odd-numbered posters will be presented Tuesday from 15:00 to 16:20, and even-numbered posters will be presented on Wednesday from from 15:00 to 16:20.

Posters are size A0, 1189 mm wide by 841 mm tall, and will be hung with push-pins.

P01: Towards Bioinformatics for All: Galaxy at UoM

Peter Briggs¹, Ian Donaldson¹, Sarah Griffiths¹, Leo Zeef¹

¹ University of Manchester

Poster

As part of the Bioinformatics Core Facility (BCF) at the University of Manchester (UoM) we have developed a number of bespoke Galaxy tools to support local researchers conducting next generation sequencing (NGS) analyses. The tools are accessible via a private local Galaxy instance maintained by the BCF, but are also available to the wider Galaxy community via the public Galaxy toolshed.

In collaboration with researchers we were able to help improve the detection of microsatellites by implementing Trimmomatic and PALfinder as Galaxy tools. This now allows non-bioinformaticians to analyse their own data, circumventing installation and use of command line programs. Additionally we have developed a set of ChIP-seq analysis tools (Trimmomatic, MACS2, CEAS, Weeder2, RnaChipIntegrator) that allows our users to further explore their data after it has left the BCF. The tools also provide a framework for tutorials about ChIP-seq analysis.

Our ongoing aim is to maintain and develop a local Galaxy instance that provides researchers with the means to run bioinformatics tools that they would not otherwise be able to use, and provide a means of easily rerunning analyses.

P02: The Galaxy framework as a concept for a national system for monitoring and surveillance of infectious disease

Arnold Knijn¹, Massimiliano Orsini², Valeria Michelacci¹, Stefano Morabito¹

¹ Istituto Superiore di Sanità, Rome, Italy
² Istituto Zooprofilattico Sperimentale dell’Abruzzo e Molise, Teramo, Italy

Poster

A proposal has been submitted to a national call of the Italian Ministry of health concerning the creation of a National Information System for the collection of genomic data in the field of veterinary public health, with the aim of deploying a state of the art molecular epidemiology approach to the surveillance of food-borne zoonoses and infectious diseases at the human and animal interface. The concept described revolves around the creation of a nation-wide distributed cluster of pathogen-specific databases of NGS and epidemiological data hosted on servers present at each of the participating institutes. A common framework for the comparison of such data will complete the system with the aim of detecting clusters of cases as well as to provide convincing evidence to link cases of disease and sources of infection. The databases will be replicated on each server constituting the DB-cluster. The redundancy originated by the replication process is meant to guarantee a distributed access to the Information System, a high availability of the data hosted, a geographically distributed disaster recovery capability and to enable load-balancing of queries at each node, increasing the performance of access to the analytical pipelines in case of heavy traffic on any of the servers. All the nodes of the network will use the same Information System implemented into the open source framework Galaxy.

P03: Gene identifier matching to join publicly available databases for the generation of a Mammalian Ortholog and Annotation Database with access from Galaxy-server

Jochen Bick¹, Mark Robinson², Susanne E. Ulbrich¹, Stefan Bauersachs¹

¹ Animal Physiology at ETH Zurich
² Institute of Molecular Life Sciences at University of Zurich

Poster

So far there is a number of well-organized databases that contain useful information regarding orthologous genes, e.g., EnsemblCompara ortholog database (Ecodb). The main problem when using information derived from different databases is to correctly assign different gene, transcript or protein identifiers. For example, Ecodb does not provide NCBI EntrezGene identifiers and the assignment available in BioMart is incomplete and contains errors. However, because NCBI annotation is for most species the most comprehensive, we need to map information from other databases to EntrezGene IDs. This is an important issue for the generation of a Mammalian Ortholog and Annotation Database (MOA-Db) which will be partially based on information from publicly available databases, which needs to be collected, analyzed, and connected. Since each public source database uses own unique identifiers, it is necessary to assign the corresponding database-specific identifiers. Existing lists that assign corresponding genes, e.g. between Ensembl and EntrezGene are incomplete and/or contain errors. Therefore, missing information needs to be calculated and duplicates need to be handled. R BioConductor packages were used to find overlapping gene and exon positions which were integrated as a lookup table into the MySQL database to handle the comparison of different database sources. Finally, this database will be integrated into our local Galaxy-server to give easy access to all our research groups and provide a useful interface with various options to parse information via SQL queries. The MOA-Db provides a basis for optimal across-species comparisons of transcriptome datasets from different mammalian species accessible within a Galaxy-server.

P04: GenAP: A platform to provide Biomedical tools throughout Canadian HPCS

David Anderson de Lima Morais¹, Michel Barrette¹, David Bujold², Carol Gauthier¹, Kuang Chung Chen², Simon Nderitu², Maxime Levesque¹, Bryan Caron², Alain Veilleux¹, Pierre-Etienne Jacques¹, Guillaume Bourque²

¹ Université de Sherbrooke, Sherbrooke, Quebec, Canada
² McGill University, Montreal, Quebec, Canada

Poster

The Genetics and Genomics Analysis Platform (GenAP) is a computing platform for life sciences researchers. GenAP offers three components: a web portal from which users have access to tools (UCSC browser) and platforms (Galaxy); bioinformatics software and libraries, distributed via CERN Virtual Machine File System (CVMFS); and bioinformatics software pipelines.

In Galaxy-GenAP, we use a hybrid system involving cloud images on an HPC facility, to provide private Galaxy instances to our users. These private instances are only available to a project Principal Investigator (PI), his group members, and any external member that the PI chooses to add. GenAP is fully integrated with Compute Canada (CC) and all Galaxy jobs are computed toward the users’ CC resource allocation in any HPC cluster.

GenAP was designed to be portable to any HPC center in Canada and in our second phase we will increase the number of hosts. To facilitate the installation of the platform we are currently integrating Galaxy and the CERN Virtual Machine File System (CVMFS). In this case Galaxy will be installed on the main CVMFS repository (stratum 0) and any HPC facility running a mirror client (stratum 1) will receive the Galaxy code, tools and updates automatically.

Through GenAP, Galaxy has been integrated to curricular courses at McGill and Sherbrooke Universities and is a fundamental part of several workshops. We aim to have GenAP and Galaxy integrated in most major HPC centers.

P05: Galaxy in teaching computational methods of genome analysis for master degree students in Medical Genetics program at the Faculty of Medicine, Vilnius University

Erinija Pranckeviciene¹, Laima Ambrozaityte¹, Ingrida Uktveryte¹, Algirdas Utkus¹, Vaidutis Kucinskas¹

¹ Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University

Poster

Master program in Medical Genetics is offered by the Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University. In this program a computational analysis of genomic data constitutes a considerable part of practical exercises. In “Genome analysis” seminar and “Biotechnology and fundamentals of bioinformatics analysis” course students are introduced to a computational pipeline of next generation sequencing (NGS) data analysis starting by quality assessment of raw exome sequencing data and ending by interpretation of the identified genomic variants. In class students use data from scientific articles and have to reproduce some of the published results.

For these courses Galaxy runs on a Hardware-as-a-Service server (2 Hexa core Intel Xeon CPU E5-2630L, 8 processing units, 8192 Mb RAM and 320 Gb disk space).

Using Galaxy in teaching and learning is novel approach at the Department of Human and Medical Genetics. Noted benefit of this approach is that students without previous exposure to bioinformatics are efficiently grasping complex concepts and share “know how” of tools. A little effort is needed to get used to Galaxy interface, its visualization capabilities. Practical computations are evaluated directly in students named histories in a process of step by step inspection. Benefits of using Galaxy in teaching at the end of the course will be evaluated by qualitative analysis (interview of students).

P06: Galaxy in Public Health: the Microbial Genomics Virtual Laboratory

Simon Gladman¹, Nuwan Goonasekera¹, Clare Sloggett¹, Dieter Bulach¹, Torsten Seemann¹, Andrew Lonie¹

¹ VLSCI, University of Melbourne, Australia,

Poster

The uptake of genomics in public health and clinical microbiology laboratories is being slowed by the perceived requirement that each laboratory needs to, counterproductively, establish and evaluate their own tools and infrastructure which will result in a lack of standardisation of methods.

An easily instantiated computer image based around Galaxy with a defined set of microbial-specific tools and reference data is an ideal solution for enabling standardisation between laboratories. We have established the Genomics Virtual Laboratory [GVL: http://genome.edu.au] to empower laboratories to establish their own private operating environment to securely analyse their own data using software and analysis methods that are widely used for microbial genomics in a reproducible manner suited to government accreditation

The GVL consists of a set of machine images for performing genomics analyses in a scalable, reproducible manner, plus web tools for instantiating and managing the images on multiple cloud architectures. The images incorporate a number of pre-configured genomic analyses platforms including Galaxy, the Linux command line, RStudio and IPython Notebook.

The GVL images are constructed from Ansible scripts which make it straightforward to customise. Here we present a flavour of the GVL fully tailored to microbial genomics (MGVL) by incorporating various microbial analysis pipelines and tools for both the Galaxy environment and the command line.

The Genomics Virtual Laboratory project is funded by the federal NeCTAR and ANDS programs (http://nectar.org.au; http://ands.org.au).

P07: 16S rDNA amplicon sequencing data analysis in Galaxy

Loïc Bourgeois¹, Amalia Soenens¹, Nuria Lozano¹, Juan Imperial¹

¹ Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid, Campus de Montegancedo, 28223 Pozuelo de Alarcón, Madrid, Spain

Poster

Most biologists can easily access NGS technologies and data in order to characterize the microbial diversity of a sample with 16S rDNA amplicon sequencing. However, the output of this kind of experiment can be challenging to handle. We assessed the different options to address 16S rDNA amplicon data analysis in Galaxy, and will highlight the benefits and drawbacks of the existing solutions. Indeed, even if the bioinformatics community now provides numerous tools allowing treatment of this sort of data, determining which software best fits the user’s needs is not trivial for several reasons. To begin with, some of this software is not easy to install, which can be a first barrier. In line with this, most tools do not provide any GUI, which can be tedious for people not used to the UNIX environment. Finally, a critical point is that even if the available software usually provide similar core steps to perform the analysis of 16s rDNA amplicon data, they do not always use the same approaches. Moreover there are a lot of different algorithms that can be used for each step of the analysis. The choice of the software and the algorithms one should use is important, as it will impact the output of the experiment and relies on the characteristics of the data and the user experience. Galaxy can handle the tool installation and GUI barriers on top of other intrinsic benefits of using Galaxy, which allows users to focus on the data analysis itself.

P08:A Galaxy approach to microbial data integration: the USMI Galaxy Demonstrator

Daniele Pierpaolo Colobraro¹, Paolo Romano¹

¹ IRCCS AOU San Martino IST

Poster

Many application domains, such as health, food, energy and waste management, exploit research on micro-organisms, which information is distributed in many heterogeneous repositories.

The Microbial Resource Research Infrastructure (MIRRI) aims to orchestrate European microBiological Resource Centers (mBRCs) with the goal of providing improved and extended services and integrated access to data. In this context, the aims are i) integrating the information on microorganisms, ii) assessing available information, iii) pointing out discrepancies, errors and gaps, iv) carrying out in-silico analyses, and v) curating mBRC catalogues’ data.

USMI Galaxy Demonstrator, which is under active development, is available at http://galaxy.nettab.org:8088/. All tools are written in Python.

The tools menu includes a section devoted to MIRRI tools, where three categories are shown, related to retrieval of data from MIRRI catalogues, extension of catalogues contents with data from external resources, and data integration applications.

Tools of the “data_source” type are available for importing both full catalogues and single strain data in Galaxy. Information is archived by using an extended version of the Microbiological Common Language (MCL, http://www.straininfo.net/projects/mcl/reference).

The external data sources that have been already taken into account are NCBI Taxonomy, BRENDA, Pubmed, UNIPROT and ENA, which are respectively queried in order to retrieve information on taxon identifiers, EC numbers, Pubmed identifiers and DOIs, UniProt identifiers, and rRNA sequences. These are linked by using either the strain numbers, or the enzyme and species names, or the bibliographic references.

Outputs are provided in tabular form, allowing both for human and machine readable.

P09: A Galaxy metagenomic workflow for reference-tree based phylogenetic placement (MG-RTPP)

Ambrose Andongabo^1*, Ian M. Clark^1*, Dariush Rowlands¹, Keywan Hassani-Pak¹, Penny R. Hirsch¹, Elisa loza¹, Andy Neal^1*

¹ Rothamsted Research, Harpenden, United Kingdom
^* Contributed equally

Poster

Background: High-throughput sequencing of environmental nucleic acids is revolutionizing and dramatically expanding our understanding of the diversity and functionality of complex microbial communities. There are a number of tools which allow community structure to be surveyed using metagenomics or meta-transcriptomics at the rRNA level, or by using COG- or KEGG-based functional assignments. However, there are limited complementary approaches to investigate the phylogenetic diversity of functionally important individual genes in large sequence databases.

Results: We have designed a workflow for reference-tree based phylogenetic placement (MG-RTPP) of metagenomics and meta-transcriptomics samples. The inputs to the workflow are unassembled reads, a multiple sequence alignment (MSA) of the genes of interest and large public sequence databases. Reference nucleotide profile hidden Markov models (pHMMs) are built from the MSA and are used as queries. Homologous reads are checked for accuracy before being placed on a reference phylogenetic tree, maximising phylogenetic likelihood. The workflow retains considerable flexibility, allowing for tuning of redundancy in the nucleotide pHMMs used as queries to recover as many true hits as possible.

Conclusions: MG-RTPP facilitates fast interrogation of sequence databases in a flexible and robust fashion. It avoids misidentification of false positives while pHMM tuning allows for maximum recovery of sequences. Phylogenetic placement provides unique visualization approaches which reveal the phylogenetic relationships between environment-derived sequences and sequenced organisms and between samples. The approach compliments tools such as QIIME, MG-RAST and MEGAN in allowing interrogation of individual gene abundance and diversity in samples. Keywords: metagenome, metatranscriptome, assembly-free, community analysis, functional genes, phylogeny.

P10: IRIDA: A Genomic Epidemiology Platform Built on top of Galaxy

Aaron Petkau¹, Franklin Bristow¹, Thomas Matthews¹, Josh Adam¹, Philip Mabon¹, Eric Enns¹, Jennifer Cabral^1,2, Joel Thiessen^1,2, Cameron Sieffert¹, Natalie Knox¹, Damion Dooley³, Emma Griffiths⁵, Geoff Winsor⁵, Matthew Laird⁵, Mélanie Courtot^3,5, Peter Kruczkiewicz⁶, Alex Keddy⁷, Robert G. Beiko⁷, William Hsiao^3,4, Gary Van Domselaar^1,2, Fiona Brinkman⁵

¹National Microbiology Laboratory, Winnipeg, Canada
²University of Manitoba, Winnipeg, Canada
³BC Public Health Microbiology and Reference Laboratory, Vancouver, Canada
⁴University of British Columbia, Vancouver, Canada
⁵Simon Fraser University, Burnaby, Canada
⁶Laboratory for Foodborne Zoonoses, Lethbridge, Canada
⁷Dalhousie University, Halifax, Canada

Poster

Whole genome sequencing (WGS) is revolutionizing epidemiological methods for identification and investigation of infectious disease outbreaks. However, the routine use of WGS has been hindered due to the complexity in data management and the lack of pipelines supporting quality control and data analysis standards. While an increasing number of pipelines for genomic epidemiology are being developed, each typically has different installation and execution requirements. This leads to a difficulty in the integration of these pipelines into a single genomic epidemiology system.

Galaxy offers a solution by providing a system to integrate, execute, and maintain data analysis pipelines. In addition, Galaxy provides a community of developers who contribute and maintain the bioinformatics tools used for genomic epidemiology. Our project, IRIDA (Integrated Rapid Infectious Disease Analysis), builds on top of Galaxy a platform for genomic epidemiology. IRIDA provides a system for the storage and management of sequencing data and sample metadata, an interface for the execution of data analysis pipelines, and the storage, auditing and visualization of results. Within IRIDA, we provide standard pipelines for genomic epidemiology including SNVPhyl, our SNV (Single Nucleotide Variant) phylogeny pipeline. These pipelines are executed using a Galaxy instance internal to IRIDA and additional support is provided for exporting genomic sequence data to external Galaxy instances.

By building on top of Galaxy we hope to simplify the process of pipeline integration, to share our pipelines with the bioinformatics community, and to contribute to the development of standards for genomic epidemiology. More information can be found at http://irida.ca.

P11: Galaxy – a platform for teaching the analysis and interpretation of clinical NGS data

Ang Davies¹, Jan Taylor², Mike Cornell¹, Peter Briggs¹, Sanjeev Bhaskar³ Andy Brass¹

¹ The University of Manchester
² St James’ Hospital Leeds, The University of Manchester
³ St Mary’s Hospital Manchester

Poster

The University of Manchester delivers a masters programme in Clinical Bioinformatics which provides the education for trainee clinical bioinformaticians on the NHS Scientist Training Programme, training to become registered healthcare scientists. This programme is lead under the direction of Manchester Academy for Healthcare Scientist Education (MAHSE). Clinical bioinformaticians within the NHS are at the forefront of genomic medicine in their roles, often responsible for building and validating bioinformatic workflows that are used in the analysis and interpretation of clinical Next Generation Sequencing (NGS) data. Within the diagnostic genomic medical centres across the UK bioinformaticians are building and using NGS pipelines to analyse sequencing data from gene panels, whole exomes and now whole genomes for those centres involved in the 100000 Genomes Project. Within the masters programme we used Galaxy to teach the trainees how to analyse anonymised clinical NGS gene panel data, kindly provided by the Manchester Centre for Genomic Medicine for teaching purposes. The pipeline the trainees built included quality control, alignment, annotation, interpretation and viewing on a genome browse, enabling trainees to identify a potential causal pathogenic variant from the original Fastq file. The analysis was undertaken on a local installation of Galaxy configured by the Bioinformatics Core Facility at the university. For more information contact angela.davies@manchester.ac.uk.

P12: Bioinformatics Evolving at Canada’s National Microbiology Laboratory

Eric Enns¹, Philip Mabon¹, Jennifer Cabral^1,2, Mariam Iskander^1,2, Cameron Sieffert¹, Natalie Knox¹, Heather Kent¹, Shane Thiessen¹, Paul Williams¹, Brian Yeo², Joel Thiessen^1,2, Josh Adam¹, Aaron Petkau¹, Thomas Matthews¹, Franklin Bristow¹, Gary Van Domselaar^1,2

¹ National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
² Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada

Poster

The National Microbiology Laboratory (NML) is Canada’s leading public health laboratory, responsible for the identification, control and prevention of infectious diseases. The bioinformatics core facility at the NML deployed our first instance of Galaxy in 2010. The introduction of the Galaxy platform has revolutionized bioinformatics at the NML by bridging the gap between bioinformaticians and biologists.

Prior to Galaxy, most of our in-house tools and pipelines required an extensive background in UNIX command line and high performance computing to operate. This requirement demanded that bioinformaticians be intimately involved in projects with significant computational requirements. Galaxy was selected to be the bioinformatics analysis platform at the NML, as it made our tools and pipelines accessible to biologists. Bioinformaticians are able to focus more time on tool and pipeline development, as their project involvement has been reduced. Biologists are able to perform analyses on their own as Galaxy lowers the barrier to carrying out complex analysis in a high performance computing environment. As a result NGS (Next-generation sequencing) projects are able to progress at a much faster rate. Moving forward, we are developing a Galaxy-powered infectious disease analysis platform for our standardized analyses while retaining our traditional Galaxy environment for ad hoc pipeline development and bioinformatics analysis.

P13: Galaxy Flavours – your highly portable, configurable local Galaxy distributions with preinstalled workflows – for Linux, MacOSX and Windows

Christian Rausch¹, Jeroen Galle², Stef van Lieshout¹, Wim van Criekinge², Björn Grüning³,

¹ Cancer Center Amsterdam, VU University Medical Center, Amsterdam, The Netherlands
² Biobix, Lab of Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium
³ Chair of Bioinformatics, University of Freiburg, Germany

Poster

Galaxy makes it easy for biologists to use advanced bioinformatics software through graphical web-browser-based user interfaces. However, when using one of the public Galaxy servers like at usegalaxy.org is not an option (e.g. in the case of sensitive data), setting-up a local Galaxy installation still requires Linux administrator skills.

Therefore we are developing installers for the Linux, Macintosh and Windows operating systems that make use of portable Docker software containers.

Another aspect that makes the usage of Galaxy actually increasingly difficult especially for the less advanced user is the growing number of available tools – how to find the right tool for a given task? Here we want to help by providing a useful selection of tools and workflows for typical problems in biomedical data analysis, preconfigured in the Galaxy Flavours we provide.

On the poster we present the current status of our work, future plans and further ideas like e.g. a configurator for tailored Galaxy Docker images. Please join the discussion on the prioritisation of future Galaxy developments at our poster and at the conference in general.

P14: GIO: Standards-compliant Galaxy workflows for proteomics informed by transcriptomics

Jun Fan¹, Shyamasree Saha¹, Adelyne Sue Li Chan¹, David A Matthews², Conrad Bessant¹

¹ School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS. UK
² School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol. BS8 1TD. UK.

Poster

The most common method of identifying proteins in a complex sample is to perform liquid chromatrography tandem mass spectrometry (LC-MS/MS) then search the acquired spectra against a reference proteome downloaded from a database such as UniProt. This approach has the major drawback of not being able to identify gene products that are not already known. The recently developed proteomics informed by transcriptomics (PIT) methodology tackles this problem by using RNA-seq to generate sample-specific protein databases that the LS-MS/MS data can be searched against. This allows the detection and quantitation of previously unknown proteins, protein variants and other exotic translated genomic elements. This is of particular utility when studying non-model organisms and samples with very dynamic proteomes, e.g. stem cells, cancer cells and virus-infected cells. The analysis of PIT data is complex and computationally intensive, requiring the integration of multiple third party tools from the proteomics, transcriptomics and genomics communities. To make this analysis tractable and repeatable we have produced GIO (Galaxy Integrated Omics) – a Galaxy-based framework containing the key tools and workflows needed to analyse data from PIT experiments in a reliable and repeatable way.

P15: Reproducible Galaxy: Administration and Development

Aarif Mohamed Nazeer Batcha¹, Sebastian Schaaf, Guokun Zhang, Sandra Fischer, Ashok Varadharajan, Ulrich Mansmann

¹ Ludwig-Maximilians-University Munich

Poster

Establishing a structured IT infrastructure for processing NGS data is a challenge on multiple levels. To deal with such challenges in the field of molecular diagnostics and medical research, which often demands reproducibility, makes it much more interesting. An user-friendly, open source and modular galaxy framework was of great help, in facing those challenges although the reproducibility part was a bit questionable. Over three years of dedicated efforts, the Munich NGS-FabLab was build up as a running IT system, based on an assessment of requirements, constraints and given structural conditions. Aiming for a structured approach in resolving reproducibility issues and improving cross-connections between hospitals and research institutes associated with us, we came up with ansible-playbooks setup scripts to recreate our IT infrastructure. The scripts include setting up dedicated file servers, creating production, test and development environments, postgres database setup, apache configurations and grid engine for distributed management systems along with galaxy installation procedures. Although the playbooks were developed in SLES, blank unix systems with SSL conectivity and an inifile is all that is necessary for the scripts to run. The scripts can be used to create and recreate an IT infrastructure and a reproducible environment for processing NGS data which is in high demand in medical research and diagnostics. We also hope to return our playbooks to the community that offered a great deal of support in developing our NGS-FabLab for processing medical sequencing data.

P16: Mass spectrometry proteomics analysis with diverse tools for hundreds of runs

Jorrit Boekel^1,2, Rui Mamede-Branca², Henric Zazzi^1,3, Yafeng Zhu², Matthew The⁴, Lukas Käll⁴, and Janne Lehtiö²

¹ Bioinformatics Infrastructure for Life Sciences (BILS), Sweden
² Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institute, Stockholm, Sweden
³ PDC Center for High Performance Computing, Royal Institute of Technology – KTH, Stockholm, Sweden
⁴ School of Biotechnology, Science for Life Laboratory, Royal Institute of Technology – KTH, Stockholm, Sweden

Poster

The mass spectrometry (MS) field is currently undergoing rapid growth and is seeing an increasing amount of datasets per experiment. The growth is caused by sample size increase, meta-analyses and prefractionation, but analysis is constrained by MS computing environments which often lean towards proprietary software on Windows systems. Transition of typical analysis platforms to more powerful and flexible infrastructure is necessary to support availability of large scale analysis to users without access or in-depth knowledge to powerful bioinformatics tools and platforms.

We have combined a number of tools for spectra search (MSGF+), quantification (OpenMS, Hardklör/Krönik) and statistical scoring (Percolator) in the Galaxy framework. Since freely available MS tools do not always interact in all sought-after combinations, we have written software called msstitch to manipulate input and output files for a number of tools, including doing protein grouping and keeping an SQLite database of results. The resulting pipeline is under continuous development and can currently deliver data-repository-ready mzIdentML, and PSM, peptide and protein tables for end-users.

P17: Read Between the Lines: Closing Gaps of Materials and Methods to Build Workflow from the Publication

Tazro Ohta¹, Osamu Ogasawara², Yoshinobu Masatani³, Shigetoshi Yokoyama³, Kento Aida³

¹ Database Center for Life Science
² DNA Data Bank of Japan
³ National Institute of Informatics

Poster

Publishing and sharing data analysis workflow using the galaxy platform has spectacularly reduced the cost of reproducing one’s research, but following the description of data analysis which had been performed by other researchers to get the exact same result is still a big challenge. To evaluate the cost of data analysis workflow from the natural language description, we have performed to rebuild the workflow of CAGE sequencing data processing done by FANTOM5 team on the galaxy platform. Though the project has already published a set of papers with a lot of supplementary of methods and online protocols, it was not that straightforward to get the same result from the raw sequencing data available in the public data repository. The results processed by the rebuilt workflow are compared with the results published online by FANTOM5 team. This case study showed that some of the important information to rebuild the workflow is missing even in the well-described documents, for example, the location of the older source code, or the parameters for command execution. As the speed of biological data production increases, it will be more important to build the framework of cost-effective research reproducibility such as an automated evaluation process of published workflow. We will provide the details of our case study, and discuss how we can assure the reproducibility with the galaxy and other possible ways to perform, share, and publish the workflow as it is “executable materials and methods”.

P18: Galaxy-M: A galaxy workflow for processing and analysing direct infusion and liquid chromatography mass spectrometry-based metabolomics data

Riccardo Di Guida¹, Ralf J. M. Weber¹, Robert L. Davidson², Haoyu Liu¹, Archana Sharma-Oates¹, Warwick B. Dunn¹, Mark R. Viant¹

¹ School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
² GigaScience, BGI-Hong Kong Co. Ltd, 16 Dai Fu Street, Tai Po Industrial Estate, NT, Hong Kong

Poster

Motivation: Metabolomics is increasingly recognised as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological bioinformatics maturity of other ‘omics fields, specifically genomics and transcriptomics. To achieve its full potential, standardisation and reproducibility of computational tools must be improved significantly. Here we report the development of Galaxy-M and describe further work to validate pre-processing methods for implementation in to Galaxy-M.

Development of Galaxy-M: We have developed an end-to-end mass spectrometry metabolomics pipeline in Galaxy for direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from the processing of raw data, e.g. peak picking and alignment, and proceeds through data pre-processing to principal components analysis (PCA) and the associated statistical evaluation. To aid accessibility, the tools, Galaxy and data will all be provided via download. Additionally, source code, executables and installation instructions are available from Github.

Validation of pre-processing methods for LC-MS metabolomics: To provide a robust module for liquid chromatography-mass spectrometry (LC-MS) data pre-processing we have assessed the influence of different missing value imputation, normalisation, scaling and transformation methods on univariate and multivariate analysis. We show that normalisation by sum or PQN provides the most robust results for univariate analysis while further KNN missing value imputation and glog transformation are optimal for multivariate analysis. These methods are currently being implemented in to Galaxy-M.

P19: Integrating Galaxy in the Mr.SymBioMath Cloud Infrastructure

Óscar Torreño¹, Johan Karlsson², Alex Upton³, Michael Krieger¹, Oswaldo Trelles³

¹ RISC Software GmbH, 4232 Hagenberg, Austria
² Integromics S.L., 18100 Armilla Granada, Spain
³ University of Malaga, 29071 Malaga, Spain

Poster

Workflows are an increasingly important paradigm in bioinformatics and biomedicine; complex analyses are often performed by separate software packages that are later connected to form a complete pipeline. A number of workflows in both the bioinformatics and biomedicine domains are being developed in the Mr.SymBioMath project. GECKO¹, a biological sequence comparison workflow that studies the similarities between two or more genomes, and its post-processing steps, is the main development in the bioinformatics use case of the project. Genome wide association studies (GWAS) of SNPs² and Multi-SNPs (epistatic interactions)³ are the principal implementations in the biomedicine use case. However, these workflows are command-line based, making their exploitation difficult for inexperienced users. Consequently, we have decided to use Galaxy in the project in order to facilitate their execution and distribution, whilst ensuring that all the experiments are reproducible. Our current architecture is comprised of 3 nodes deployed in a cloud infrastructure: 1) Gateway – which proxies the client requests to the web server; 2) Web server – which runs the galaxy web page contained in nginx; 3) DB server – which contains the meta-data queried from the galaxy web server. The present configuration executes the tasks in the second node, but we are currently working on the execution of the tasks in a separate Torque cluster which will be auto-scaled depending on the system load. The customised Mr.SymBioMath Galaxy configuration ensures that a wide spectrum of end users is able to obtain results as quickly and easily as possible.

Notes:

¹ Andres Rodrıguez Moreno, Oscar Torreno Tirado, and Oswaldo Trelles Salazar. Out of core computation of hsps for large biological sequences. In Advances in Computational Intelligence, pages 189–199. Springer, 2013.
² P. Heinzlreiter, J.R. Perkins, O. Torreño, J. Karlsson, J.A. Ranea, A. Mitterecker, M. Blanca, O.Trelles: A Cloud-based GWAS Analysis Pipeline for Clinical Researchers In Proc. of the 4th International Conference on Cloud Computing and Services Science (CLOSER 2014), ISBN 978-989-758-019-2, Barcelona, Spain, pp. 387-394, April 2014, DOI 10.5220/0004802103870394
³ Alex Upton, Oswaldo Trelles, James Perkins, Epistatic Analysis of Clarkson Disease, Procedia Computer Science, Volume 51, 2015, Pages 725-734, ISSN 1877-0509, http://dx.doi.org/10.1016/j.procs.2015.05.191.

P20: Deep Proteome Coverage Through Ribosome Profiling and MS Integration

Elvis Ndah¹, Jeroen Crappé¹, Alexander Koch¹, Sandra Steyaert¹, Gerben Menschaert¹, Petra V. Damme²

¹ Biobix, University of Gent
² VIB Department of Medical Protein Research, University of Gent

Poster

Note: This poster will not be presented at GCC2015 due to visa issues.

The novel ribosome profiling (RIBO-seq) approach provides genome-wide information about protein synthesis by monitoring mRNA entering the translation machinery, while highly sensitive mass spectrometry (MS) provides information about the protein composition of a sample. Integrating these technologies provides more intuitive information about the protein synthesis and the identification of novel translation products as well as a better understanding of the translation mechanism.

We developed a proteogenomic pipeline, called PROTEOFORMER, that automatically processes data from RIBO-seq experiments, resulting in the genome-wide visualization of ribosome occupancy. The tool includes pre-processing, mapping to a reference genome, sequence variation analysis and identification of translation initiation sites, allowing the delineation of the open reading frames of all translation products. A complete protein synthesis-based sequence database can thus be compiled for MS-based identification from shotgun proteomics and N-terminomics experiments. The tool is freely available as a stand-alone pipeline and has been implemented in the GALAXY framework allowing easy integration with available proteomics tools such as SearchGUI and PeptideShaker in a multi-omics setting.

To evaluate the pipeline we performed matching RIBO-seq, gel-free shotgun and N-terminal COFRADIC proteomics experiments on mouse and human cell samples. We were able to observe an overall increase in protein identification rates, detection of 5’-extended proteoforms, upstream ORF translation and near-cognate (non-AUG) translation start sites. Futhermore, integration through the PROTEOFORMER pipeline of RIBO-seq and N-terminomics data evidenced the translation of non-coding genes in the Arabidopsis genome indicative of mis-annotation in The Arabidopsis Information Resource (TAIR10).

P21: The de.NBI RNA Bioinformatics Center

Cameron Smith¹, Torsten Houwaart¹, Anika Erxleben¹, Sebastian Will³, Altuna Akalin², Uwe Ohler², Nikolaus Rajewsky², Peter F. Stadler³, Björn Grüning¹, Rolf Backofen¹

¹ Universität Freiburg
² MDC Berlin
³ Universität Leipzig

Poster

Genome-wide sequencing revealed pervasive transcription, where the majority of the DNA encodes non-coding RNAs. Non-coding RNAs and RNA-protein interactions play a fundamental role in cellular regulation; consequently they have received increasing attention over the past decade. Recent advances in high-throughput sequencing as well as in the genome-wide identification of miRNAs and RNA-protein interactions have shown that the complexity of post-transcriptional gene regulation is equivalent to that of transcriptional gene regulation.

The recently launched German Network for Bioinformatics Infrastructure (de.NBI) aims to provide comprehensive bioinformatics services to users in life sciences research, industry and medicine. Within this network, the RNA Bioinformatics Center (RBC) is responsible for supporting RNA related research in Germany, such as the detection of noncoding RNAs and RNA structure prediction. The RBC aspires to build and advance a movement of Galaxy based RNA bioinformatics and help foster a community of users and developers in this field.

This poster details the infrastructure, services and methods the RBC will employ to meet this challenge and how Galaxy is used to provide an integrated workbench for RNA analysis.

P22: VAPoR: A Visual web pipeline for Annotation of host/pathogen interactions in Plant Resistance

Benedikt Rauscher¹, Benjamen White¹, Manuel Corpas¹, Burkhard Rost²

¹ The Genome Analysis Centre, Norwich, UK
² Department for Bioinformatics and Computational Biology, Technical University of Munich

Poster

Plants are engaged in a continuous co-evolutionary struggle for dominance with their pathogens. The outcomes of these interactions are of particular importance to human activities as they can have dramatic effects on agricultural systems. Agricultural systems such as those conferring Effector Triggered Immunity (ETI) allow recognition of specific pathogen effectors (i.e., proteins secreted by pathogens into host cells to enhance infection). R (Resistance) genes play a crucial role in controlling a broad set of disease resistance responses whose introduction is often sufficient to stop further pathogen growth and spread.

We introduce VAPoR, a novel tool specifically tailored to annotation of resistance genes in uncharacterised genomes. Taking as input a putative genome sequence for an R gene, it gathers relevant information about experimentally annotated homologues as well as their evolutionary relationship with the candidate gene from UniProt and STRING. The information is then displayed in an interactive and intuitive way.

We tested VAPoR with two datasets: 1) a set of known R genes in Brachypodium spp. and 2) a putative set of R genes for Dioscorea alata, a species of yam. Our application is written purely in Javascript, using the BioJS and Galaxy platforms. By exploiting Galaxy’s powerful data transformation facilities and the variety and interactivity of BioJS components, we are able to display an abundance of relevant information in a concentrated and intuitive way.

P23: Enabling large scale Genotype-Tissue Expression studies using Galaxy

Genna Gliner¹, Ian McDowell², Barbara E Engelhardt³

¹ Operations Research and Financial Engineering Department, Princeton University
² Computational Biology and Bioinformatics, Duke University
³ Computer Science Department and Center for Statistics and Machine Learning, Princeton University

Poster

The Princeton BEEHIVE Group develops statistical models and methods for high-dimensional genomic data. As part of the Genotype-Tissue Expression (GTEx) consortium, we are involved in processing vast quantities of RNA-sequencing and whole genome sequence data for different types of statistical and functional genomics studies, including cis- and trans-eQTLs, non-coding RNA regulation, and allele specific expression studies. The creation, testing, and deployment of the processing pipelines for each of these different study types require comprehensive analysis of large datasets through a dedicated pipeline used by all members of the group. With the ability to create custom tools and share and modify workflows, Galaxy provides a robust framework to develop this pipeline for use across our lab, but incorporating our diverse set of analysis tools into Galaxy is a non-trivial task.

In this poster we chronicle the evolution of the Princeton BEEHIVE Galaxy Pipeline. We illustrate our vision for a flexible, scalable, and streamlined pipeline using Galaxy for statistical genomics studies. We explore how our pipeline evolved by highlighting how our lab addressed the challenges of tool creation and integration, data processing and organization, and training lab members to use our Galaxy instance.

P24: A French Galaxy Tool Shed to federate the national infrastructures and offering quality assessed tools

Loraine BRILLET-GUÉGUEN¹, Christophe CARON¹, Valentin LOUX², the French Galaxy Working Group³

Presented by Olivier Inizan

¹ ABIMS, FR2424 CNRS-UPMC, Station Biologique, Place Georges Teissier, 29680, Roscoff, France
² UR1404 Mathématiques et Informatique Appliquées du Génome à l’Environnement, INRA, F-78352 Jouy-en-Josas, France
³ Institut Français de Bioinformatique [ANR-11-INBS-0013], France Génomique [ANR-10-INBS-0009] and MetaboHUB [ANR-11-INBS-0010]

Poster

The Galaxy environment, notably dedicated to bio-analyses, is finding a growing success within bioinformatics and biology communities. The “Institut Français de Bioinformatique” (IFB) commissioned in 2013 a Working Group around the Galaxy platform. This group gathers several national platforms, and manages animation actions (Galaxy Day, thematic schools, etc.) and actions to structure (training, good practices guides, etc.) users and developers communities.

Besides, as part of the bioinformatics work packages funded by the “France Génomique” project, the community has developed or evaluated many tools and set up analysis workflows. Exploitation and diffusion of these pipelines dedicated to people unfamiliar with the command line instructions now lies on using a common platform (Galaxy) and on creating a common repository (Tool Shed). From this perspective and in the Working Group dynamic, the IFB offers a reference repository to centralize and promote the bio-analyses tools of the French community. The scope of this repository, initially dedicated to “France Génomique” NGS pipelines, is now extending to other national infrastructures (MetaboHUB, etc.) and to training actions (e.g. “Ecole NGS AVIESAN”).

The IFB Tool Shed is part of a strategy to federate the community around good practices for integrating tools into Galaxy and training of engineers from concerned platforms. A special effort is made on the quality of tools and workflows integration, with functional tests and validation procedures.

P25: Statistical method for filtering sequencing error from minor clonal mutation in sequencing data and implementation

Vojtech Kulvait¹, Katerina Machova Polakova², Tomas Stopka¹

¹ Charles University in Prague
² The Institute of Hematology and Blood Transfusion

Poster

When analyzing data from current NGS technologies one have to deal with sequencing and amplification errors. For clonal disorders (we study mainly cancer and leukemia) in the patient sample there may be present subclones in low relative amounts (~1\%). These subclones do have individual mutational profile. Since NGS data contains technical errors we present statistical method to distinguish biologically relevant mutations in subclones from technical errors. This method is based on fitting negative binomial distribution to the sequencing data from control samples to obtain null distribution. Then the distribution is used to detect mutations in samples. Method is implemented in Java. I agree to these terms and conditions.

P26: Yet another Galaxy Genome viewer

Thomas Darde¹, François Moreews², Yvan le Bras³, Cyril Monjeaud³, Frédéric Chalmel¹

¹ INSERM U625 – Rennes, France
² Genscale team – IRISA -Rennes, France
³ Genouest Bioinformatics facility – INRIA/IRISA – Rennes, France

Poster

Galaxy owns its own genome viewer¹. Another alternate popular genome viewer is JBrowse ². We developped a server application that acts as a gateway between Galaxy and the JBrowse viewer.

Unlike Trackster, which benefits of a strong GALAXY integration, we used a loosely coupled service architecture. Our gateway application exposes services than can retrieve the configuration of a genome view or update it, using json data produced by a set of scripts wrapped as GALAXY tools. These dedicated Galaxy Tools perform the pre-processing of BAM, SAM, BED, GTF or GFF files to produce the json configuration files used by JBrowse to display new tracks.

Our application includes a session mechanism allowing one user to restore, display or update the configuration of an already existing JBrowse genome view.

A feature allows for each session, an easy exportation of the corresponding configuration files. By this way, we combine both Galaxy and JBrowse systems to be able to download and redeploy any user-defined custom genome view, independently of the processing environment, locally or within any web server. This option is particularly useful when data are processed in a cloud-based Galaxy instance. Unlike other integration of JBrowse within Galaxy ³, we provide a generic way to display data produced by Galaxy in JBrowse.

It was successfully used ⁴ to interpret multifaceted “omic” data. Thus , we consider this work as i) a contribution to improve bioinformatics open source software interoperability ii) a way to deploy and spread pre-populated genome browsers with minimum technical skills.

Notes:

¹ Goecks,J.,Coraor,N.,Nekrutenko,A.andTaylor,J.(2012)NGS analyses by visualization with Trackster. Nat. Biotechnol., 30, 1036–1039.
² Skinner,M.E.,Uzilov,A.V.,Stein,L.D.,Mungall,C.J.and Holmes,I.H. (2009) JBrowse: a next-generation genome browser. Genome Res., 19, 1630–1638.
³ Venter Institute Cloud Viral Browser : https://github.com/JCVI-Cloud/VICVB
⁴The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community, Thomas A. Darde; Olivier Sallou; Emmanuelle Becker; Bertrand Evrard; Cyril Monjeaud; Yvan Le Bras; Bernard Jegou; Olivier Collin; Antoine D. Rolland; Frederic Chalmel Nucleic Acids Research 2015; doi: 10.1093/nar/gkv345

P27: Integration of Mechanical Testing Process in the Galaxy Environment

R. Créac’Hcadec¹, E. Poupart², Y. Le Bras³, O. Collin³, D. Malfondet⁴, Y. Quéré⁵

¹ LBMS EA 4325 – ENSTA-Bretagne / Université de Brest / ENIB, France
² Université Européenne de Bretagne, Pôle Système d’information – Direction des usages et services numériques, France
³ e-Biogenouest project, CNRS UMR 6074 IRISA-INRIA, France
⁴ Université de Bretagne Occidentale – UFR Sciences, France
⁵ Lab-Sticc CNRS UMR 6074, Université de Bretagne Occidentale – UFR Sciences, France

Poster

The aim of integrating mechanical testing process in galaxy environment is to play a part in the enhancement of the overall experimental process efficiency.

The concepts dealing with the experimental life cycle are similar both in biological and mechanical fields. Thus, as Galaxy presents flexible setups, it appears that adapting this tool to the needs of the mechanical approach could be relevant.

The initial step is to manage raw data provided by devices such as connected measurement instruments or specific personal computers. It relies on various sub-processes such as collecting, flagging, synchronizing, preprocessing and storing data.

As raw data are produced by expensive methods, they should be automatically linked with metadata, with treatment scripts in order to generate numerical studies, and finally with a project entity so that we get a reliable bottom up management. All these associations should be stored in order to be able to replay or improve previous numerical studies as well as to match to publishing criteria.

A demonstrator will show some points mentioned above and help to schedule further developments.

P28: BioMAJ2Galaxy: automatic update of reference data in Galaxy using BioMAJ

Anthony Bretaudeau ^1,2, Cyril Monjeaud ², Yvan Le Bras ², Fabrice Legeai^1,3, Olivier Collin²

¹ INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), Campus Beaulieu, 35042 Rennes, France
² INRIA, IRISA, GenOuest Core Facility, Campus de Beaulieu, 35042 Rennes, France
³ INRIA, IRISA, GenScale, Campus de Beaulieu, 35042 Rennes, France

Poster

Many bioinformatics tools use reference data, such as genome assemblies or sequence databanks. Galaxy offers multiple ways to give access to this data through its web interface. However, the process of adding new reference data was customarily manual and time consuming, even more so when this data needed to be indexed in a variety of formats (e.g. Blast, Bowtie, BWA, or 2bit).

BioMAJ is a widely used and stable software that is designed to automate the download and transformation of data from various sources. This data can be used directly from the command line in more complex systems, such as Mobyle, or by using a REST API.

To ease the process of giving access to reference data in Galaxy, we have developed the BioMAJ2Galaxy module, which enables the gap between BioMAJ and Galaxy to be bridged. With this module, it is now possible to configure BioMAJ to automatically download some reference data, to then convert and/or index it in various formats, and then make this data available in a Galaxy server using data libraries or data managers.

The developments presented in this paper allow us to integrate the reference data in Galaxy in an automatic, reliable, and diskspace-saving way. The code is freely available on the GenOuest GitHub account (https://github.com/genouest/biomaj2galaxy).

P29: Colib’read on Galaxy: A tools suite dedicated to biological information extraction from raw NGS reads

Yvan Le Bras¹, Olivier Collin¹, Cyril Monjeaud¹, Vincent Lacroix², Eric Rivals³, Claire Lemaitre⁴, Vincent Miele², Gustavo Sacomoto², Camille Marchet², Bastien Cazaux³, Amal Makrini³, Leena Salmela⁵, Susete Alves-Carvalho⁴, Alexan Andrieux⁴, Raluca Uricaru⁶, Pierre Peterlongo⁴

¹ GenOuest Core Facility, UMR6074 IRISA CNRS/INRIA/Université de Rennes1, France
² BAMBOO team, INRIA Grenoble Rhône-Alpes & Laboratoire Biométrie et Biologie Évolutive, UMR5558 CNRS
³ MAB team, UMR5506 CNRS
⁴ INRIA/IRISA, Genscale team, UMR6074 IRISA CNRS/INRIA/Université de Rennes1
⁵ Department of Computer Science and Helsinki Institute for Information Technology HIIT
⁶ University of Bordeaux, LaBRI/CNRS & CBiB

Poster

With NGS technologies, life sciences face a raw data deluge. Classical analysis processes of such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to directly focus on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.

Dedicated to “whole genome assembly-free” treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of \textit{de Bruijn} graph and bloom filter, such analyses can be performed in few hours, using small amounts of memory. Applications on real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories.

With the Colib’read Galaxy tools suite, we give the possibility to a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows to keep the maximum of biological information from data and use very low memory footprint.

P30: Galaxy for biological image analysis

Sylvain Prigent¹, Yvan Le Bras ¹

¹ Biogenouest

Poster

Imaging technologies for biology are manifold (photos, photonic microscopy, electron microscopy, scanners, MRI…) and evolve rapidly. This provides a huge amount of data that are not anymore possible to be analyzed manually. The image analysists community has developed software for asses image analysis tasks. We can distinguish generic softwares like ImageJ or Icy and specific tools developed by researchers in Matlab, Java, R, Python or C. Most of these softwares aim at processing a dedicated image analysis task and do not work together. This is the reason why, the bio-analysists community tends to develop a unique interface to merge all these tools and make them easy to use for a biologist. A solution is Galaxy. We made a first attempt to use galaxy in the context of a participative project using imaging.

In the north-Ouest of France population of rays are evaluated by picking up and identificating the rays eggs capsules on the beaches. To make this process easier, an application has been developed to allow people to take pictures of the egg capsules on the beaches and to upload them to a dedicated website. Images can then be analyzed to identify the eggs, spices by specialists or any citizen. To automate this identification step, we develop a Galaxy workflow based on ImageJ tools that extract the egg shape features.

P31: Cancer Genomics in Galaxy

Marco Albuquerque¹, Bruno Grande ¹, Dr. Ryan Morin¹

¹ Simon Fraser University

Poster

An inherent difficulty in data-driven biology is the multi-disciplinary skill set required of the scientist to draw meaningful inferences from complex data sets. Cancer genomics epitomizes this problem with the advent of next-generation sequencing (NGS) and the concomitant need for computational analysis. Despite the myriad available algorithms, a bottleneck in data analysis remains because of cryptic command-line parameters, inflexible system environments with difficult installations, and demanding hardware requirements. Our project directly addresses these issues by building a cancer genomics toolbox consisting of parallelized tools and workflows for the cloud-ready Galaxy platform. Our toolbox will contain some 50 new Galaxy tools spanning several sub-categories, notably variant calling, visualization and additional helper tools for integrating and summarizing results. These will be assembled with existing tools to form Galaxy workflows. Following Galaxy best practices, users will be able to seamlessly install our tools automatically. To ensure optimal accuracy, workflow design and tool parameterization will be informed by the benchmarking results from the ICGC-TCGA DREAM challenges. All tools and workflows will be developed to ensure optimal parallelism on a cluster environment. The incomplete Map-Reduce parallelization framework offered by Galaxy will be expanded, including new merge and split functions for NGS data types used by our tools. Ultimately, this will provide a competitive graphical user interface for performing cancer genome analyses and hopefully find a home in clinics around the world, advancing the field of personalized medicine.

P32: Trinity Galaxy Portal

Carrie Ganote¹, Ben Fulton², Brian Haas³

¹ National Center for Genome Analysis Support
² Indiana University
³ Broad Institute of MIT and Harvard

Poster

Large memory requirements, long running times and large-scale CPU consumption are some of the barriers to providing bioinformatics services through Galaxy, especially when limited hardware is available to power these services. We will outline a brief overview of our hardware and software setup and provide benchmarking data that we have collected using the Trinity RNA-Seq Assembler through the Trinity Galaxy portal at https://galaxy.ncgas-trinity.indiana.edu. This should provide a starting point for other institutions who may want to implement similar workflows for their users.

P33: NeLS: Norwegian e-Infrastructure for Life Sciences

Sveinung Gundersen¹, Christian Andreetta², Abdulrahman Azab¹, Kjetil Klepper³, Inge Alexander Raknes⁴, Jeevan Karloss⁵, Teshome D. Mulugeta⁵, Xiaxi Li², Patcharee Thongtra⁵, Kai Trengereid¹, Tim Kahlke⁴, Erik Semb⁴, Kidane M. Tekle²

¹ University of Oslo
² University of Bergen
³ Norwegian University of Science and Technology
⁴ University of Tromsø
⁵ Norwegian University of Life Sciences

Poster

NeLS is one of the packages of the ELIXIR.NO project and aims to provide a national Norwegian e-infrastructure allowing users within the life sciences community to efficiently and safely store, share, analyse and publish their genomics scale data. The e-infrastructure maintains a web portal at https://nels.bioinfo.no that functions as the central point of access to the NeLS resources, which include data storage and analysis pipelines. NeLS relies on Galaxy as its primary platform for data analysis. Each of the five participating universities hosts its own Galaxy server with different types of analysis tools and workflows reflecting the research focus of the hosting groups. Workflows include differential gene expression analysis of RNA-seq data, variant calling in somatic and germline cells, and taxonomic classification of shotgun metagenomic sequences. For analysis of human patient data, NeLS collaborates closely with the Norwegian “services for sensitive data (TSD)”. The servers are either running on dedicated hardware or as a front-end to shared computer clusters. Authentication of users is done with the common electronic identity provider for the Norwegian educational sector (FEIDE), with an alternative identity provider soon in production.

P34: High Quality Library Construction and Reliable Quantitation with NEBNext Reagents

Bradley W. Langhorst¹, Erbay Yigit¹, Eileen T. Dimalanta¹, Theodore B. Davis¹

¹ New England Biolabs

Poster

Both the Quant Kit and the Ultra II Library Prep Kit were developed using the SeqResults tool described in the talk (link to talk). This poster provides more detailed supporting information and shows several figures generated by SeqResults.

An expanded role for NGS in the clinic and research will depend on continued improvement of the upstream processes required to produce high quality NGS data. In particular, maximizing data output & minimizing instrument run failure are imperative. In this poster, we present data showing: 1) the improvements we have made in library preparation with the development of the NEBNext Ultra II Library Prep kit; and 2) the development of the NEBNext Library Quant Kit for Illumina, a simple and robust method for quantitation of Illumina libraries. In the first part, we show that libraries made with the Ultra II kit have low DNA input requirements; additionally, we significantly improved library yields and reduced sequence bias. Moreover, the workflow is simple and streamlined, greatly reducing the time required to produce high quality DNA libraries and the possibility of errors. In the second part, we demonstrate the effectiveness of the NEBNext Library Quant Kit for a broad range of library types and sizes as well as advantages offered by qPCR quantitation for obtaining optimal cluster density and user-to-user consistency. The NEBNext Quant Kit offers an efficient and cost-effective qPCR library quantitation workflow for users looking to optimize both sequencing yield and throughput.

Poster Printing in Norwich

The conference does not have an official relationship with a poster printer in Norwich. However, Jarrold’s Print and Copy Shop is well-known locally. A0 poster printing is £25 for full colour.

Submit a late abstract

The deadlines for oral and poster presentations has passed. However, late oral and poster abstracts are still being accepted and will be considered as cancellations occur, or space opens up.

Abstracts are submitted electronically and should be 250 words of plain text or less. See the GCC2014 abstracts list to see the broad range of topics presented in 2014.

There will also be an opportunity for lightning talks, which will be solicited during the meeting.

Oral presentations will be 15 or 20 minutes long.

Talks and posters on any topics of interest to the Galaxy community are welcome.

Please Note: By submitting an abstract you:

Agree to make your slides/posters freely available on this web site no later than 15 August 2015.
Those giving oral presentations agree to have their presentations videotaped and made publicly available during and after the conference.

Submit a late oral or poster presentation abstract