GCC2015 Posters
The deadlines for poster presentations has passed. However, late poster abstracts are still being accepted and will be considered until the poster space is full, or as cancellations occur.
Odd-numbered posters will be presented Tuesday from 15:00 to 16:20, and even-numbered posters will be presented on Wednesday from from 15:00 to 16:20.
Posters are size A0, 1189 mm wide by 841 mm tall, and will be hung with push-pins.
See also:
Contents
- 1 GCC2015 Posters
- 1.1 P01: Towards Bioinformatics for All: Galaxy at UoM
- 1.2 P02: The Galaxy framework as a concept for a national system for monitoring and surveillance of infectious disease
- 1.3 P03: Gene identifier matching to join publicly available databases for the generation of a Mammalian Ortholog and Annotation Database with access from Galaxy-server
- 1.4 P04: GenAP: A platform to provide Biomedical tools throughout Canadian HPCS
- 1.5 P05: Galaxy in teaching computational methods of genome analysis for master degree students in Medical Genetics program at the Faculty of Medicine, Vilnius University
- 1.6 P06: Galaxy in Public Health: the Microbial Genomics Virtual Laboratory
- 1.7 P07: 16S rDNA amplicon sequencing data analysis in Galaxy
- 1.8 P08:A Galaxy approach to microbial data integration: the USMI Galaxy Demonstrator
- 1.9 P09: A Galaxy metagenomic workflow for reference-tree based phylogenetic placement (MG-RTPP)
- 1.10 P10: IRIDA: A Genomic Epidemiology Platform Built on top of Galaxy
- 1.11 P11: Galaxy – a platform for teaching the analysis and interpretation of clinical NGS data
- 1.12 P12: Bioinformatics Evolving at Canada’s National Microbiology Laboratory
- 1.13 P13: Galaxy Flavours – your highly portable, configurable local Galaxy distributions with preinstalled workflows – for Linux, MacOSX and Windows
- 1.14 P14: GIO: Standards-compliant Galaxy workflows for proteomics informed by transcriptomics
- 1.15 P15: Reproducible Galaxy: Administration and Development
- 1.16 P16: Mass spectrometry proteomics analysis with diverse tools for hundreds of runs
- 1.17 P17: Read Between the Lines: Closing Gaps of Materials and Methods to Build Workflow from the Publication
- 1.18 P18: Galaxy-M: A galaxy workflow for processing and analysing direct infusion and liquid chromatography mass spectrometry-based metabolomics data
- 1.19 P19: Integrating Galaxy in the Mr.SymBioMath Cloud Infrastructure
- 1.20 P20: Deep Proteome Coverage Through Ribosome Profiling and MS Integration
- 1.21 P21: The de.NBI RNA Bioinformatics Center
- 1.22 P22: VAPoR: A Visual web pipeline for Annotation of host/pathogen interactions in Plant Resistance
- 1.23 P23: Enabling large scale Genotype-Tissue Expression studies using Galaxy
- 1.24 P24: A French Galaxy Tool Shed to federate the national infrastructures and offering quality assessed tools
- 1.25 P25: Statistical method for filtering sequencing error from minor clonal mutation in sequencing data and implementation
- 1.26 P26: Yet another Galaxy Genome viewer
- 1.27 P27: Integration of Mechanical Testing Process in the Galaxy Environment
- 1.28 P28: BioMAJ2Galaxy: automatic update of reference data in Galaxy using BioMAJ
- 1.29 P29: Colib’read on Galaxy: A tools suite dedicated to biological information extraction from raw NGS reads
- 1.30 P30: Galaxy for biological image analysis
- 1.31 P31: Cancer Genomics in Galaxy
- 1.32 P32: Trinity Galaxy Portal
- 1.33 P33: NeLS: Norwegian e-Infrastructure for Life Sciences
- 1.34 P34: High Quality Library Construction and Reliable Quantitation with NEBNext Reagents
- 2 Poster Printing in Norwich
- 3 Submit a late abstract
P01: Towards Bioinformatics for All: Galaxy at UoM
Peter Briggs1,
As part of the Bioinformatics Core Facility (BCF) at the University of Manchester (UoM) we have developed a number of bespoke Galaxy tools to support local researchers conducting next generation sequencing (NGS) analyses. The tools are accessible via a private local Galaxy instance maintained by the BCF, but are also available to the wider Galaxy community via the public Galaxy toolshed.
In collaboration with researchers we were able to help improve the detection of microsatellites by implementing Trimmomatic and PALfinder as Galaxy tools. This now allows non-bioinformaticians to analyse their own data, circumventing installation and use of command line programs. Additionally we have developed a set of ChIP-seq analysis tools (Trimmomatic, MACS2, CEAS, Weeder2, RnaChipIntegrator) that allows our users to further explore their data after it has left the BCF. The tools also provide a framework for tutorials about ChIP-seq analysis.
Our ongoing aim is to maintain and develop a local Galaxy instance that provides researchers with the means to run bioinformatics tools that they would not otherwise be able to use, and provide a means of easily rerunning analyses.
P02: The Galaxy framework as a concept for a national system for monitoring and surveillance of infectious disease
Arnold Knijn1, Massimiliano Orsini2, Valeria Michelacci1, Stefano Morabito1
2 Istituto Zooprofilattico Sperimentale dell’Abruzzo e Molise, Teramo, Italy
Poster
A proposal has been submitted to a national call of the Italian Ministry of health concerning the creation of a National Information System for the collection of genomic data in the field of veterinary public health, with the aim of deploying a state of the art molecular epidemiology approach to the surveillance of food-borne zoonoses and infectious diseases at the human and animal interface. The concept described revolves around the creation of a nation-wide distributed cluster of pathogen-specific databases of NGS and epidemiological data hosted on servers present at each of the participating institutes. A common framework for the comparison of such data will complete the system with the aim of detecting clusters of cases as well as to provide convincing evidence to link cases of disease and sources of infection. The databases will be replicated on each server constituting the DB-cluster. The redundancy originated by the replication process is meant to guarantee a distributed access to the Information System, a high availability of the data hosted, a geographically distributed disaster recovery capability and to enable load-balancing of queries at each node, increasing the performance of access to the analytical pipelines in case of heavy traffic on any of the servers. All the nodes of the network will use the same Information System implemented into the open source framework Galaxy.
P03: Gene identifier matching to join publicly available databases for the generation of a Mammalian Ortholog and Annotation Database with access from Galaxy-server
Jochen Bick1, Mark Robinson2, Susanne E. Ulbrich1, Stefan Bauersachs1
2 Institute of Molecular Life Sciences at University of Zurich
Poster
So far there is a number of well-organized databases that contain useful information regarding orthologous genes, e.g., EnsemblCompara ortholog database (Ecodb). The main problem when using information derived from different databases is to correctly assign different gene, transcript or protein identifiers. For example, Ecodb does not provide NCBI EntrezGene identifiers and the assignment available in BioMart is incomplete and contains errors. However, because NCBI annotation is for most species the most comprehensive, we need to map information from other databases to EntrezGene IDs. This is an important issue for the generation of a Mammalian Ortholog and Annotation Database (MOA-Db) which will be partially based on information from publicly available databases, which needs to be collected, analyzed, and connected. Since each public source database uses own unique identifiers, it is necessary to assign the corresponding database-specific identifiers. Existing lists that assign corresponding genes, e.g. between Ensembl and EntrezGene are incomplete and/or contain errors. Therefore, missing information needs to be calculated and duplicates need to be handled. R BioConductor packages were used to find overlapping gene and exon positions which were integrated as a lookup table into the MySQL database to handle the comparison of different database sources. Finally, this database will be integrated into our local Galaxy-server to give easy access to all our research groups and provide a useful interface with various options to parse information via SQL queries. The MOA-Db provides a basis for optimal across-species comparisons of transcriptome datasets from different mammalian species accessible within a Galaxy-server.
P04: GenAP: A platform to provide Biomedical tools throughout Canadian HPCS
David Anderson de Lima Morais1, Michel Barrette1, David Bujold2, Carol Gauthier1, Kuang Chung Chen2, Simon Nderitu2, Maxime Levesque1, Bryan Caron2, Alain Veilleux1, Pierre-Etienne Jacques1, Guillaume Bourque2
2 McGill University, Montreal, Quebec, Canada
Poster
The Genetics and Genomics Analysis Platform (GenAP) is a computing platform for life sciences researchers. GenAP offers three components: a web portal from which users have access to tools (UCSC browser) and platforms (Galaxy); bioinformatics software and libraries, distributed via CERN Virtual Machine File System (CVMFS); and bioinformatics software pipelines.
In Galaxy-GenAP, we use a hybrid system involving cloud images on an HPC facility, to provide private Galaxy instances to our users. These private instances are only available to a project Principal Investigator (PI), his group members, and any external member that the PI chooses to add. GenAP is fully integrated with Compute Canada (CC) and all Galaxy jobs are computed toward the users’ CC resource allocation in any HPC cluster.
GenAP was designed to be portable to any HPC center in Canada and in our second phase we will increase the number of hosts. To facilitate the installation of the platform we are currently integrating Galaxy and the CERN Virtual Machine File System (CVMFS). In this case Galaxy will be installed on the main CVMFS repository (stratum 0) and any HPC facility running a mirror client (stratum 1) will receive the Galaxy code, tools and updates automatically.
Through GenAP, Galaxy has been integrated to curricular courses at McGill and Sherbrooke Universities and is a fundamental part of several workshops. We aim to have GenAP and Galaxy integrated in most major HPC centers.
P05: Galaxy in teaching computational methods of genome analysis for master degree students in Medical Genetics program at the Faculty of Medicine, Vilnius University
Erinija Pranckeviciene1, Laima Ambrozaityte1, Ingrida Uktveryte1, Algirdas Utkus1, Vaidutis Kucinskas1
Master program in Medical Genetics is offered by the Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University. In this program a computational analysis of genomic data constitutes a considerable part of practical exercises. In “Genome analysis” seminar and “Biotechnology and fundamentals of bioinformatics analysis” course students are introduced to a computational pipeline of next generation sequencing (NGS) data analysis starting by quality assessment of raw exome sequencing data and ending by interpretation of the identified genomic variants. In class students use data from scientific articles and have to reproduce some of the published results.
For these courses Galaxy runs on a Hardware-as-a-Service server (2 Hexa core Intel Xeon CPU E5-2630L, 8 processing units, 8192 Mb RAM and 320 Gb disk space).
Using Galaxy in teaching and learning is novel approach at the Department of Human and Medical Genetics. Noted benefit of this approach is that students without previous exposure to bioinformatics are efficiently grasping complex concepts and share “know how” of tools. A little effort is needed to get used to Galaxy interface, its visualization capabilities. Practical computations are evaluated directly in students named histories in a process of step by step inspection. Benefits of using Galaxy in teaching at the end of the course will be evaluated by qualitative analysis (interview of students).
P06: Galaxy in Public Health: the Microbial Genomics Virtual Laboratory
Simon Gladman1, Nuwan Goonasekera1, Clare Sloggett1, Dieter Bulach1, Torsten Seemann1, Andrew Lonie1
The uptake of genomics in public health and clinical microbiology laboratories is being slowed by the perceived requirement that each laboratory needs to, counterproductively, establish and evaluate their own tools and infrastructure which will result in a lack of standardisation of methods.
An easily instantiated computer image based around Galaxy with a defined set of microbial-specific tools and reference data is an ideal solution for enabling standardisation between laboratories. We have established the Genomics Virtual Laboratory [GVL: http://genome.edu.au] to empower laboratories to establish their own private operating environment to securely analyse their own data using software and analysis methods that are widely used for microbial genomics in a reproducible manner suited to government accreditation
The GVL consists of a set of machine images for performing genomics analyses in a scalable, reproducible manner, plus web tools for instantiating and managing the images on multiple cloud architectures. The images incorporate a number of pre-configured genomic analyses platforms including Galaxy, the Linux command line, RStudio and IPython Notebook.
The GVL images are constructed from Ansible scripts which make it straightforward to customise. Here we present a flavour of the GVL fully tailored to microbial genomics (MGVL) by incorporating various microbial analysis pipelines and tools for both the Galaxy environment and the command line.
The Genomics Virtual Laboratory project is funded by the federal NeCTAR and ANDS programs (http://nectar.org.au; http://ands.org.au).
P07: 16S rDNA amplicon sequencing data analysis in Galaxy
Loïc Bourgeois1, Amalia Soenens1, Nuria Lozano1, Juan Imperial1
Poster
Most biologists can easily access NGS technologies and data in order to characterize the microbial diversity of a sample with 16S rDNA amplicon sequencing. However, the output of this kind of experiment can be challenging to handle. We assessed the different options to address 16S rDNA amplicon data analysis in Galaxy, and will highlight the benefits and drawbacks of the existing solutions. Indeed, even if the bioinformatics community now provides numerous tools allowing treatment of this sort of data, determining which software best fits the user’s needs is not trivial for several reasons. To begin with, some of this software is not easy to install, which can be a first barrier. In line with this, most tools do not provide any GUI, which can be tedious for people not used to the UNIX environment. Finally, a critical point is that even if the available software usually provide similar core steps to perform the analysis of 16s rDNA amplicon data, they do not always use the same approaches. Moreover there are a lot of different algorithms that can be used for each step of the analysis. The choice of the software and the algorithms one should use is important, as it will impact the output of the experiment and relies on the characteristics of the data and the user experience. Galaxy can handle the tool installation and GUI barriers on top of other intrinsic benefits of using Galaxy, which allows users to focus on the data analysis itself.
P08:A Galaxy approach to microbial data integration: the USMI Galaxy Demonstrator
Daniele Pierpaolo Colobraro1, Paolo Romano1
Many application domains, such as health, food, energy and waste management, exploit research on micro-organisms, which information is distributed in many heterogeneous repositories.
The Microbial Resource Research Infrastructure (MIRRI) aims to orchestrate European microBiological Resource Centers (mBRCs) with the goal of providing improved and extended services and integrated access to data. In this context, the aims are i) integrating the information on microorganisms, ii) assessing available information, iii) pointing out discrepancies, errors and gaps, iv) carrying out in-silico analyses, and v) curating mBRC catalogues’ data.
USMI Galaxy Demonstrator, which is under active development, is available at http://galaxy.nettab.org:8088/. All tools are written in Python.
The tools menu includes a section devoted to MIRRI tools, where three categories are shown, related to retrieval of data from MIRRI catalogues, extension of catalogues contents with data from external resources, and data integration applications.
Tools of the “data_source” type are available for importing both full catalogues and single strain data in Galaxy. Information is archived by using an extended version of the Microbiological Common Language (MCL, http://www.straininfo.net/projects/mcl/reference).
The external data sources that have been already taken into account are NCBI Taxonomy, BRENDA, Pubmed, UNIPROT and ENA, which are respectively queried in order to retrieve information on taxon identifiers, EC numbers, Pubmed identifiers and DOIs, UniProt identifiers, and rRNA sequences. These are linked by using either the strain numbers, or the enzyme and species names, or the bibliographic references.
Outputs are provided in tabular form, allowing both for human and machine readable.
P09: A Galaxy metagenomic workflow for reference-tree based phylogenetic placement (MG-RTPP)
Ambrose Andongabo1*, Ian M. Clark1*, Dariush Rowlands1, Keywan Hassani-Pak1, Penny R. Hirsch1, Elisa loza1, Andy Neal1*
Background: High-throughput sequencing of environmental nucleic acids is revolutionizing and dramatically expanding our understanding of the diversity and functionality of complex microbial communities. There are a number of tools which allow community structure to be surveyed using metagenomics or meta-transcriptomics at the rRNA level, or by using COG- or KEGG-based functional assignments. However, there are limited complementary approaches to investigate the phylogenetic diversity of functionally important individual genes in large sequence databases.
Results: We have designed a workflow for reference-tree based phylogenetic placement (MG-RTPP) of metagenomics and meta-transcriptomics samples. The inputs to the workflow are unassembled reads, a multiple sequence alignment (MSA) of the genes of interest and large public sequence databases. Reference nucleotide profile hidden Markov models (pHMMs) are built from the MSA and are used as queries. Homologous reads are checked for accuracy before being placed on a reference phylogenetic tree, maximising phylogenetic likelihood. The workflow retains considerable flexibility, allowing for tuning of redundancy in the nucleotide pHMMs used as queries to recover as many true hits as possible.
Conclusions: MG-RTPP facilitates fast interrogation of sequence databases in a flexible and robust fashion. It avoids misidentification of false positives while pHMM tuning allows for maximum recovery of sequences. Phylogenetic placement provides unique visualization approaches which reveal the phylogenetic relationships between environment-derived sequences and sequenced organisms and between samples. The approach compliments tools such as QIIME, MG-RAST and MEGAN in allowing interrogation of individual gene abundance and diversity in samples. Keywords: metagenome, metatranscriptome, assembly-free, community analysis, functional genes, phylogeny.
P10: IRIDA: A Genomic Epidemiology Platform Built on top of Galaxy
Aaron Petkau1, Franklin Bristow1, Thomas Matthews1, Josh Adam1, Philip Mabon1, Eric Enns1, Jennifer Cabral1,2, Joel Thiessen1,2, Cameron Sieffert1, Natalie Knox1, Damion Dooley3, Emma Griffiths5, Geoff Winsor5, Matthew Laird5, Mélanie Courtot3,5, Peter Kruczkiewicz6, Alex Keddy7, Robert G. Beiko7, William Hsiao3,4, Gary Van Domselaar1,2, Fiona Brinkman5
2University of Manitoba, Winnipeg, Canada
3BC Public Health Microbiology and Reference Laboratory, Vancouver, Canada
4University of British Columbia, Vancouver, Canada
5Simon Fraser University, Burnaby, Canada
6Laboratory for Foodborne Zoonoses, Lethbridge, Canada
7Dalhousie University, Halifax, Canada
Poster
Whole genome sequencing (WGS) is revolutionizing epidemiological methods for identification and investigation of infectious disease outbreaks. However, the routine use of WGS has been hindered due to the complexity in data management and the lack of pipelines supporting quality control and data analysis standards. While an increasing number of pipelines for genomic epidemiology are being developed, each typically has different installation and execution requirements. This leads to a difficulty in the integration of these pipelines into a single genomic epidemiology system.
Galaxy offers a solution by providing a system to integrate, execute, and maintain data analysis pipelines. In addition, Galaxy provides a community of developers who contribute and maintain the bioinformatics tools used for genomic epidemiology. Our project, IRIDA (Integrated Rapid Infectious Disease Analysis), builds on top of Galaxy a platform for genomic epidemiology. IRIDA provides a system for the storage and management of sequencing data and sample metadata, an interface for the execution of data analysis pipelines, and the storage, auditing and visualization of results. Within IRIDA, we provide standard pipelines for genomic epidemiology including SNVPhyl, our SNV (Single Nucleotide Variant) phylogeny pipeline. These pipelines are executed using a Galaxy instance internal to IRIDA and additional support is provided for exporting genomic sequence data to external Galaxy instances.
By building on top of Galaxy we hope to simplify the process of pipeline integration, to share our pipelines with the bioinformatics community, and to contribute to the development of standards for genomic epidemiology. More information can be found at http://irida.ca.
P11: Galaxy – a platform for teaching the analysis and interpretation of clinical NGS data
Ang Davies1, Jan Taylor2, Mike Cornell1, Peter Briggs1, Sanjeev Bhaskar3 Andy Brass1
2 St James’ Hospital Leeds, The University of Manchester
3 St Mary’s Hospital Manchester
Poster
The University of Manchester delivers a masters programme in Clinical Bioinformatics which provides the education for trainee clinical bioinformaticians on the NHS Scientist Training Programme, training to become registered healthcare scientists. This programme is lead under the direction of Manchester Academy for Healthcare Scientist Education (MAHSE). Clinical bioinformaticians within the NHS are at the forefront of genomic medicine in their roles, often responsible for building and validating bioinformatic workflows that are used in the analysis and interpretation of clinical Next Generation Sequencing (NGS) data. Within the diagnostic genomic medical centres across the UK bioinformaticians are building and using NGS pipelines to analyse sequencing data from gene panels, whole exomes and now whole genomes for those centres involved in the 100000 Genomes Project. Within the masters programme we used Galaxy to teach the trainees how to analyse anonymised clinical NGS gene panel data, kindly provided by the Manchester Centre for Genomic Medicine for teaching purposes. The pipeline the trainees built included quality control, alignment, annotation, interpretation and viewing on a genome browse, enabling trainees to identify a potential causal pathogenic variant from the original Fastq file. The analysis was undertaken on a local installation of Galaxy configured by the Bioinformatics Core Facility at the university. For more information contact angela.davies@manchester.ac.uk.
P12: Bioinformatics Evolving at Canada’s National Microbiology Laboratory
Eric Enns1, Philip Mabon1, Jennifer Cabral1,2, Mariam Iskander1,2, Cameron Sieffert1, Natalie Knox1, Heather Kent1, Shane Thiessen1, Paul Williams1, Brian Yeo2, Joel Thiessen1,2, Josh Adam1, Aaron Petkau1, Thomas Matthews1, Franklin Bristow1, Gary Van Domselaar1,2
2 Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada
Poster
The National Microbiology Laboratory (NML) is Canada’s leading public health laboratory, responsible for the identification, control and prevention of infectious diseases. The bioinformatics core facility at the NML deployed our first instance of Galaxy in 2010. The introduction of the Galaxy platform has revolutionized bioinformatics at the NML by bridging the gap between bioinformaticians and biologists.
Prior to Galaxy, most of our in-house tools and pipelines required an extensive background in UNIX command line and high performance computing to operate. This requirement demanded that bioinformaticians be intimately involved in projects with significant computational requirements. Galaxy was selected to be the bioinformatics analysis platform at the NML, as it made our tools and pipelines accessible to biologists. Bioinformaticians are able to focus more time on tool and pipeline development, as their project involvement has been reduced. Biologists are able to perform analyses on their own as Galaxy lowers the barrier to carrying out complex analysis in a high performance computing environment. As a result NGS (Next-generation sequencing) projects are able to progress at a much faster rate. Moving forward, we are developing a Galaxy-powered infectious disease analysis platform for our standardized analyses while retaining our traditional Galaxy environment for ad hoc pipeline development and bioinformatics analysis.
P13: Galaxy Flavours – your highly portable, configurable local Galaxy distributions with preinstalled workflows – for Linux, MacOSX and Windows
Christian Rausch1, Jeroen Galle2, Stef van Lieshout1, Wim van Criekinge2, Björn Grüning3,
2 Biobix, Lab of Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium
3 Chair of Bioinformatics, University of Freiburg, Germany
Poster
Galaxy makes it easy for biologists to use advanced bioinformatics software through graphical web-browser-based user interfaces. However, when using one of the public Galaxy servers like at usegalaxy.org is not an option (e.g. in the case of sensitive data), setting-up a local Galaxy installation still requires Linux administrator skills.
Therefore we are developing installers for the Linux, Macintosh and Windows operating systems that make use of portable Docker software containers.
Another aspect that makes the usage of Galaxy actually increasingly difficult especially for the less advanced user is the growing number of available tools – how to find the right tool for a given task? Here we want to help by providing a useful selection of tools and workflows for typical problems in biomedical data analysis, preconfigured in the Galaxy Flavours we provide.
On the poster we present the current status of our work, future plans and further ideas like e.g. a configurator for tailored Galaxy Docker images. Please join the discussion on the prioritisation of future Galaxy developments at our poster and at the conference in general.
P14: GIO: Standards-compliant Galaxy workflows for proteomics informed by transcriptomics
Jun Fan1, Shyamasree Saha1, Adelyne Sue Li Chan1, David A Matthews2, Conrad Bessant1
2 School of Cellular and Molecular Medicine, University of Bristol, University Walk, Bristol. BS8 1TD. UK.
Poster
The most common method of identifying proteins in a complex sample is to perform liquid chromatrography tandem mass spectrometry (LC-MS/MS) then search the acquired spectra against a reference proteome downloaded from a database such as UniProt. This approach has the major drawback of not being able to identify gene products that are not already known. The recently developed proteomics informed by transcriptomics (PIT) methodology tackles this problem by using RNA-seq to generate sample-specific protein databases that the LS-MS/MS data can be searched against. This allows the detection and quantitation of previously unknown proteins, protein variants and other exotic translated genomic elements. This is of particular utility when studying non-model organisms and samples with very dynamic proteomes, e.g. stem cells, cancer cells and virus-infected cells. The analysis of PIT data is complex and computationally intensive, requiring the integration of multiple third party tools from the proteomics, transcriptomics and genomics communities. To make this analysis tractable and repeatable we have produced GIO (Galaxy Integrated Omics) – a Galaxy-based framework containing the key tools and workflows needed to analyse data from PIT experiments in a reliable and repeatable way.
P15: Reproducible Galaxy: Administration and Development
Aarif Mohamed Nazeer Batcha1, Sebastian Schaaf, Guokun Zhang, Sandra Fischer, Ashok Varadharajan, Ulrich Mansmann
Establishing a structured IT infrastructure for processing NGS data is a challenge on multiple levels. To deal with such challenges in the field of molecular diagnostics and medical research, which often demands reproducibility, makes it much more interesting. An user-friendly, open source and modular galaxy framework was of great help, in facing those challenges although the reproducibility part was a bit questionable. Over three years of dedicated efforts, the Munich NGS-FabLab was build up as a running IT system, based on an assessment of requirements, constraints and given structural conditions. Aiming for a structured approach in resolving reproducibility issues and improving cross-connections between hospitals and research institutes associated with us, we came up with ansible-playbooks setup scripts to recreate our IT infrastructure. The scripts include setting up dedicated file servers, creating production, test and development environments, postgres database setup, apache configurations and grid engine for distributed management systems along with galaxy installation procedures. Although the playbooks were developed in SLES, blank unix systems with SSL conectivity and an inifile is all that is necessary for the scripts to run. The scripts can be used to create and recreate an IT infrastructure and a reproducible environment for processing NGS data which is in high demand in medical research and diagnostics. We also hope to return our playbooks to the community that offered a great deal of support in developing our NGS-FabLab for processing medical sequencing data.
P16: Mass spectrometry proteomics analysis with diverse tools for hundreds of runs
Jorrit Boekel1,2, Rui Mamede-Branca2, Henric Zazzi1,3, Yafeng Zhu2, Matthew The4, Lukas Käll4, and Janne Lehtiö2
2 Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institute, Stockholm, Sweden
3 PDC Center for High Performance Computing, Royal Institute of Technology – KTH, Stockholm, Sweden
4 School of Biotechnology, Science for Life Laboratory, Royal Institute of Technology – KTH, Stockholm, Sweden
Poster
The mass spectrometry (MS) field is currently undergoing rapid growth and is seeing an increasing amount of datasets per experiment. The growth is caused by sample size increase, meta-analyses and prefractionation, but analysis is constrained by MS computing environments which often lean towards proprietary software on Windows systems. Transition of typical analysis platforms to more powerful and flexible infrastructure is necessary to support availability of large scale analysis to users without access or in-depth knowledge to powerful bioinformatics tools and platforms.
We have combined a number of tools for spectra search (MSGF+), quantification (OpenMS, Hardklör/Krönik) and statistical scoring (Percolator) in the Galaxy framework. Since freely available MS tools do not always interact in all sought-after combinations, we have written software called msstitch to manipulate input and output files for a number of tools, including doing protein grouping and keeping an SQLite database of results. The resulting pipeline is under continuous development and can currently deliver data-repository-ready mzIdentML, and PSM, peptide and protein tables for end-users.
P17: Read Between the Lines: Closing Gaps of Materials and Methods to Build Workflow from the Publication
Tazro Ohta1, Osamu Ogasawara2, Yoshinobu Masatani3, Shigetoshi Yokoyama3, Kento Aida3
2 DNA Data Bank of Japan
3 National Institute of Informatics
Poster
Publishing and sharing data analysis workflow using the galaxy platform has spectacularly reduced the cost of reproducing one’s research, but following the description of data analysis which had been performed by other researchers to get the exact same result is still a big challenge. To evaluate the cost of data analysis workflow from the natural language description, we have performed to rebuild the workflow of CAGE sequencing data processing done by FANTOM5 team on the galaxy platform. Though the project has already published a set of papers with a lot of supplementary of methods and online protocols, it was not that straightforward to get the same result from the raw sequencing data available in the public data repository. The results processed by the rebuilt workflow are compared with the results published online by FANTOM5 team. This case study showed that some of the important information to rebuild the workflow is missing even in the well-described documents, for example, the location of the older source code, or the parameters for command execution. As the speed of biological data production increases, it will be more important to build the framework of cost-effective research reproducibility such as an automated evaluation process of published workflow. We will provide the details of our case study, and discuss how we can assure the reproducibility with the galaxy and other possible ways to perform, share, and publish the workflow as it is “executable materials and methods”.
P18: Galaxy-M: A galaxy workflow for processing and analysing direct infusion and liquid chromatography mass spectrometry-based metabolomics data
Riccardo Di Guida1, Ralf J. M. Weber1, Robert L. Davidson2, Haoyu Liu1, Archana Sharma-Oates1, Warwick B. Dunn1, Mark R. Viant1
2 GigaScience, BGI-Hong Kong Co. Ltd, 16 Dai Fu Street, Tai Po Industrial Estate, NT, Hong Kong
Poster
Motivation: Metabolomics is increasingly recognised as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological bioinformatics maturity of other ‘omics fields, specifically genomics and transcriptomics. To achieve its full potential, standardisation and reproducibility of computational tools must be improved significantly. Here we report the development of Galaxy-M and describe further work to validate pre-processing methods for implementation in to Galaxy-M.
Development of Galaxy-M: We have developed an end-to-end mass spectrometry metabolomics pipeline in Galaxy for direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from the processing of raw data, e.g. peak picking and alignment, and proceeds through data pre-processing to principal components analysis (PCA) and the associated statistical evaluation. To aid accessibility, the tools, Galaxy and data will all be provided via download. Additionally, source code, executables and installation instructions are available from Github.
Validation of pre-processing methods for LC-MS metabolomics: To provide a robust module for liquid chromatography-mass spectrometry (LC-MS) data pre-processing we have assessed the influence of different missing value imputation, normalisation, scaling and transformation methods on univariate and multivariate analysis. We show that normalisation by sum or PQN provides the most robust results for univariate analysis while further KNN missing value imputation and glog transformation are optimal for multivariate analysis. These methods are currently being implemented in to Galaxy-M.
P19: Integrating Galaxy in the Mr.SymBioMath Cloud Infrastructure
Óscar Torreño1, Johan Karlsson2, Alex Upton3, Michael Krieger1, Oswaldo Trelles3
2 Integromics S.L., 18100 Armilla Granada, Spain
3 University of Malaga, 29071 Malaga, Spain
Poster
Workflows are an increasingly important paradigm in bioinformatics and biomedicine; complex analyses are often performed by separate software packages that are later connected to form a complete pipeline. A number of workflows in both the bioinformatics and biomedicine domains are being developed in the Mr.SymBioMath project. GECKO1, a biological sequence comparison workflow that studies the similarities between two or more genomes, and its post-processing steps, is the main development in the bioinformatics use case of the project. Genome wide association studies (GWAS) of SNPs2 and Multi-SNPs (epistatic interactions)3 are the principal implementations in the biomedicine use case. However, these workflows are command-line based, making their exploitation difficult for inexperienced users. Consequently, we have decided to use Galaxy in the project in order to facilitate their execution and distribution, whilst ensuring that all the experiments are reproducible. Our current architecture is comprised of 3 nodes deployed in a cloud infrastructure: 1) Gateway – which proxies the client requests to the web server; 2) Web server – which runs the galaxy web page contained in nginx; 3) DB server – which contains the meta-data queried from the galaxy web server. The present configuration executes the tasks in the second node, but we are currently working on the execution of the tasks in a separate Torque cluster which will be auto-scaled depending on the system load. The customised Mr.SymBioMath Galaxy configuration ensures that a wide spectrum of end users is able to obtain results as quickly and easily as possible.
Notes:
2 P. Heinzlreiter, J.R. Perkins, O. Torreño, J. Karlsson, J.A. Ranea, A. Mitterecker, M. Blanca, O.Trelles: A Cloud-based GWAS Analysis Pipeline for Clinical Researchers In Proc. of the 4th International Conference on Cloud Computing and Services Science (CLOSER 2014), ISBN 978-989-758-019-2, Barcelona, Spain, pp. 387-394, April 2014, DOI 10.5220/0004802103870394
3 Alex Upton, Oswaldo Trelles, James Perkins, Epistatic Analysis of Clarkson Disease, Procedia Computer Science, Volume 51, 2015, Pages 725-734, ISSN 1877-0509, http://dx.doi.org/10.1016/j.procs.2015.05.191.
P20: Deep Proteome Coverage Through Ribosome Profiling and MS Integration
Elvis Ndah1, Jeroen Crappé1, Alexander Koch1, Sandra Steyaert1, Gerben Menschaert1, Petra V. Damme2
2 VIB Department of Medical Protein Research, University of Gent
Poster
Note: This poster will not be presented at GCC2015 due to visa issues.
The novel ribosome profiling (RIBO-seq) approach provides genome-wide information about protein synthesis by monitoring mRNA entering the translation machinery, while highly sensitive mass spectrometry (MS) provides information about the protein composition of a sample. Integrating these technologies provides more intuitive information about the protein synthesis and the identification of novel translation products as well as a better understanding of the translation mechanism.
We developed a proteogenomic pipeline, called PROTEOFORMER, that automatically processes data from RIBO-seq experiments, resulting in the genome-wide visualization of ribosome occupancy. The tool includes pre-processing, mapping to a reference genome, sequence variation analysis and identification of translation initiation sites, allowing the delineation of the open reading frames of all translation products. A complete protein synthesis-based sequence database can thus be compiled for MS-based identification from shotgun proteomics and N-terminomics experiments. The tool is freely available as a stand-alone pipeline and has been implemented in the GALAXY framework allowing easy integration with available proteomics tools such as SearchGUI and PeptideShaker in a multi-omics setting.
To evaluate the pipeline we performed matching RIBO-seq, gel-free shotgun and N-terminal COFRADIC proteomics experiments on mouse and human cell samples. We were able to observe an overall increase in protein identification rates, detection of 5’-extended proteoforms, upstream ORF translation and near-cognate (non-AUG) translation start sites. Futhermore, integration through the PROTEOFORMER pipeline of RIBO-seq and N-terminomics data evidenced the translation of non-coding genes in the Arabidopsis genome indicative of mis-annotation in The Arabidopsis Information Resource (TAIR10).
P21: The de.NBI RNA Bioinformatics Center
Cameron Smith1, Torsten Houwaart1, Anika Erxleben1, Sebastian Will3, Altuna Akalin2, Uwe Ohler2, Nikolaus Rajewsky2, Peter F. Stadler3, Björn Grüning1, Rolf Backofen1
Genome-wide sequencing revealed pervasive transcription, where the majority of the DNA encodes non-coding RNAs. Non-coding RNAs and RNA-protein interactions play a fundamental role in cellular regulation; consequently they have received increasing attention over the past decade. Recent advances in high-throughput sequencing as well as in the genome-wide identification of miRNAs and RNA-protein interactions have shown that the complexity of post-transcriptional gene regulation is equivalent to that of transcriptional gene regulation.
The recently launched German Network for Bioinformatics Infrastructure (de.NBI) aims to provide comprehensive bioinformatics services to users in life sciences research, industry and medicine. Within this network, the RNA Bioinformatics Center (RBC) is responsible for supporting RNA related research in Germany, such as the detection of noncoding RNAs and RNA structure prediction. The RBC aspires to build and advance a movement of Galaxy based RNA bioinformatics and help foster a community of users and developers in this field.
This poster details the infrastructure, services and methods the RBC will employ to meet this challenge and how Galaxy is used to provide an integrated workbench for RNA analysis.
P22: VAPoR: A Visual web pipeline for Annotation of host/pathogen interactions in Plant Resistance
Benedikt Rauscher1, Benjamen White1, Manuel Corpas1, Burkhard Rost2
2 Department for Bioinformatics and Computational Biology, Technical University of Munich
Poster
Plants are engaged in a continuous co-evolutionary struggle for dominance with their pathogens. The outcomes of these interactions are of particular importance to human activities as they can have dramatic effects on agricultural systems. Agricultural systems such as those conferring Effector Triggered Immunity (ETI) allow recognition of specific pathogen effectors (i.e., proteins secreted by pathogens into host cells to enhance infection). R (Resistance) genes play a crucial role in controlling a broad set of disease resistance responses whose introduction is often sufficient to stop further pathogen growth and spread.
We introduce VAPoR, a novel tool specifically tailored to annotation of resistance genes in uncharacterised genomes. Taking as input a putative genome sequence for an R gene, it gathers relevant information about experimentally annotated homologues as well as their evolutionary relationship with the candidate gene from UniProt and STRING. The information is then displayed in an interactive and intuitive way.
We tested VAPoR with two datasets: 1) a set of known R genes in Brachypodium spp. and 2) a putative set of R genes for Dioscorea alata, a species of yam. Our application is written purely in Javascript, using the BioJS and Galaxy platforms. By exploiting Galaxy’s powerful data transformation facilities and the variety and interactivity of BioJS components, we are able to display an abundance of relevant information in a concentrated and intuitive way.
P23: Enabling large scale Genotype-Tissue Expression studies using Galaxy
Genna Gliner1, Ian McDowell2, Barbara E Engelhardt3
2 Computational Biology and Bioinformatics, Duke University
3 Computer Science Department and Center for Statistics and Machine Learning, Princeton University
Poster
The Princeton BEEHIVE Group develops statistical models and methods for high-dimensional genomic data. As part of the Genotype-Tissue Expression (GTEx) consortium, we are involved in processing vast quantities of RNA-sequencing and whole genome sequence data for different types of statistical and functional genomics studies, including cis- and trans-eQTLs, non-coding RNA regulation, and allele specific expression studies. The creation, testing, and deployment of the processing pipelines for each of these different study types require comprehensive analysis of large datasets through a dedicated pipeline used by all members of the group. With the ability to create custom tools and share and modify workflows, Galaxy provides a robust framework to develop this pipeline for use across our lab, but incorporating our diverse set of analysis tools into Galaxy is a non-trivial task.
In this poster we chronicle the evolution of the Princeton BEEHIVE Galaxy Pipeline. We illustrate our vision for a flexible, scalable, and streamlined pipeline using Galaxy for statistical genomics studies. We explore how our pipeline evolved by highlighting how our lab addressed the challenges of tool creation and integration, data processing and organization, and training lab members to use our Galaxy instance.
P24: A French Galaxy Tool Shed to federate the national infrastructures and offering quality assessed tools
Loraine BRILLET-GUÉGUEN1, Christophe CARON1, Valentin LOUX2, the French Galaxy Working Group3
Presented by Olivier Inizan
2 UR1404 Mathématiques et Informatique Appliquées du Génome à l’Environnement, INRA, F-78352 Jouy-en-Josas, France
3 Institut Français de Bioinformatique [ANR-11-INBS-0013], France Génomique [ANR-10-INBS-0009] and MetaboHUB [ANR-11-INBS-0010]
Poster
The Galaxy environment, notably dedicated to bio-analyses, is finding a growing success within bioinformatics and biology communities. The “Institut Français de Bioinformatique” (IFB) commissioned in 2013 a Working Group around the Galaxy platform. This group gathers several national platforms, and manages animation actions (Galaxy Day, thematic schools, etc.) and actions to structure (training, good practices guides, etc.) users and developers communities.
Besides, as part of the bioinformatics work packages funded by the “France Génomique” project, the community has developed or evaluated many tools and set up analysis workflows. Exploitation and diffusion of these pipelines dedicated to people unfamiliar with the command line instructions now lies on using a common platform (Galaxy) and on creating a common repository (Tool Shed). From this perspective and in the Working Group dynamic, the IFB offers a reference repository to centralize and promote the bio-analyses tools of the French community. The scope of this repository, initially dedicated to “France Génomique” NGS pipelines, is now extending to other national infrastructures (MetaboHUB, etc.) and to training actions (e.g. “Ecole NGS AVIESAN”).
The IFB Tool Shed is part of a strategy to federate the community around good practices for integrating tools into Galaxy and training of engineers from concerned platforms. A special effort is made on the quality of tools and workflows integration, with functional tests and validation procedures.
P25: Statistical method for filtering sequencing error from minor clonal mutation in sequencing data and implementation
Vojtech Kulvait1, Katerina Machova Polakova2, Tomas Stopka1
When analyzing data from current NGS technologies one have to deal with sequencing and amplification errors. For clonal disorders (we study mainly cancer and leukemia) in the patient sample there may be present subclones in low relative amounts (~1\%). These subclones do have individual mutational profile. Since NGS data contains technical errors we present statistical method to distinguish biologically relevant mutations in subclones from technical errors. This method is based on fitting negative binomial distribution to the sequencing data from control samples to obtain null distribution. Then the distribution is used to detect mutations in samples. Method is implemented in Java. I agree to these terms and conditions.
P26: Yet another Galaxy Genome viewer
Thomas Darde1, François Moreews2, Yvan le Bras3, Cyril Monjeaud3, Frédéric Chalmel1
2 Genscale team – IRISA -Rennes, France
3 Genouest Bioinformatics facility – INRIA/IRISA – Rennes, France
Poster
Galaxy owns its own genome viewer1. Another alternate popular genome viewer is JBrowse 2. We developped a server application that acts as a gateway between Galaxy and the JBrowse viewer.
Unlike Trackster, which benefits of a strong GALAXY integration, we used a loosely coupled service architecture. Our gateway application exposes services than can retrieve the configuration of a genome view or update it, using json data produced by a set of scripts wrapped as GALAXY tools. These dedicated Galaxy Tools perform the pre-processing of BAM, SAM, BED, GTF or GFF files to produce the json configuration files used by JBrowse to display new tracks.
Our application includes a session mechanism allowing one user to restore, display or update the configuration of an already existing JBrowse genome view.
A feature allows for each session, an easy exportation of the corresponding configuration files. By this way, we combine both Galaxy and JBrowse systems to be able to download and redeploy any user-defined custom genome view, independently of the processing environment, locally or within any web server. This option is particularly useful when data are processed in a cloud-based Galaxy instance. Unlike other integration of JBrowse within Galaxy 3, we provide a generic way to display data produced by Galaxy in JBrowse.
It was successfully used 4 to interpret multifaceted “omic” data. Thus , we consider this work as i) a contribution to improve bioinformatics open source software interoperability ii) a way to deploy and spread pre-populated genome browsers with minimum technical skills.
Notes:
2 Skinner,M.E.,Uzilov,A.V.,Stein,L.D.,Mungall,C.J.and Holmes,I.H. (2009) JBrowse: a next-generation genome browser. Genome Res., 19, 1630–1638.
3 Venter Institute Cloud Viral Browser : https://github.com/JCVI-Cloud/VICVB
4The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community, Thomas A. Darde; Olivier Sallou; Emmanuelle Becker; Bertrand Evrard; Cyril Monjeaud; Yvan Le Bras; Bernard Jegou; Olivier Collin; Antoine D. Rolland; Frederic Chalmel Nucleic Acids Research 2015; doi: 10.1093/nar/gkv345
P27: Integration of Mechanical Testing Process in the Galaxy Environment
R. Créac’Hcadec1, E. Poupart2, Y. Le Bras3, O. Collin3, D. Malfondet4, Y. Quéré5
2 Université Européenne de Bretagne, Pôle Système d’information – Direction des usages et services numériques, France
3 e-Biogenouest project, CNRS UMR 6074 IRISA-INRIA, France
4 Université de Bretagne Occidentale – UFR Sciences, France
5 Lab-Sticc CNRS UMR 6074, Université de Bretagne Occidentale – UFR Sciences, France
Poster
The aim of integrating mechanical testing process in galaxy environment is to play a part in the enhancement of the overall experimental process efficiency.
The concepts dealing with the experimental life cycle are similar both in biological and mechanical fields. Thus, as Galaxy presents flexible setups, it appears that adapting this tool to the needs of the mechanical approach could be relevant.
The initial step is to manage raw data provided by devices such as connected measurement instruments or specific personal computers. It relies on various sub-processes such as collecting, flagging, synchronizing, preprocessing and storing data.
As raw data are produced by expensive methods, they should be automatically linked with metadata, with treatment scripts in order to generate numerical studies, and finally with a project entity so that we get a reliable bottom up management. All these associations should be stored in order to be able to replay or improve previous numerical studies as well as to match to publishing criteria.
A demonstrator will show some points mentioned above and help to schedule further developments.
P28: BioMAJ2Galaxy: automatic update of reference data in Galaxy using BioMAJ
Anthony Bretaudeau 1,2, Cyril Monjeaud 2, Yvan Le Bras 2, Fabrice Legeai1,3, Olivier Collin2
2 INRIA, IRISA, GenOuest Core Facility, Campus de Beaulieu, 35042 Rennes, France
3 INRIA, IRISA, GenScale, Campus de Beaulieu, 35042 Rennes, France
Poster
Many bioinformatics tools use reference data, such as genome assemblies or sequence databanks. Galaxy offers multiple ways to give access to this data through its web interface. However, the process of adding new reference data was customarily manual and time consuming, even more so when this data needed to be indexed in a variety of formats (e.g. Blast, Bowtie, BWA, or 2bit).
BioMAJ is a widely used and stable software that is designed to automate the download and transformation of data from various sources. This data can be used directly from the command line in more complex systems, such as Mobyle, or by using a REST API.
To ease the process of giving access to reference data in Galaxy, we have developed the BioMAJ2Galaxy module, which enables the gap between BioMAJ and Galaxy to be bridged. With this module, it is now possible to configure BioMAJ to automatically download some reference data, to then convert and/or index it in various formats, and then make this data available in a Galaxy server using data libraries or data managers.
The developments presented in this paper allow us to integrate the reference data in Galaxy in an automatic, reliable, and diskspace-saving way. The code is freely available on the GenOuest GitHub account (https://github.com/genouest/biomaj2galaxy).
P29: Colib’read on Galaxy: A tools suite dedicated to biological information extraction from raw NGS reads
Yvan Le Bras1, Olivier Collin1, Cyril Monjeaud1, Vincent Lacroix2, Eric Rivals3, Claire Lemaitre4, Vincent Miele2, Gustavo Sacomoto2, Camille Marchet2, Bastien Cazaux3, Amal Makrini3, Leena Salmela5, Susete Alves-Carvalho4, Alexan Andrieux4, Raluca Uricaru6, Pierre Peterlongo4
2 BAMBOO team, INRIA Grenoble Rhône-Alpes & Laboratoire Biométrie et Biologie Évolutive, UMR5558 CNRS
3 MAB team, UMR5506 CNRS
4 INRIA/IRISA, Genscale team, UMR6074 IRISA CNRS/INRIA/Université de Rennes1
5 Department of Computer Science and Helsinki Institute for Information Technology HIIT
6 University of Bordeaux, LaBRI/CNRS & CBiB
Poster
With NGS technologies, life sciences face a raw data deluge. Classical analysis processes of such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to directly focus on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.
Dedicated to “whole genome assembly-free” treatments, the Colib’read tools suite uses optimized algorithms for various analyses of NGS datasets, such as variant calling or read set comparisons. Based on the use of \textit{de Bruijn} graph and bloom filter, such analyses can be performed in few hours, using small amounts of memory. Applications on real data demonstrate the good accuracy of these tools compared to classical approaches. To facilitate data analysis and tools dissemination, we developed Galaxy tools and tool shed repositories.
With the Colib’read Galaxy tools suite, we give the possibility to a broad range of life scientists to analyze raw NGS data. More importantly, our approach allows to keep the maximum of biological information from data and use very low memory footprint.
P30: Galaxy for biological image analysis
Sylvain Prigent1, Yvan Le Bras 1
Poster
Imaging technologies for biology are manifold (photos, photonic microscopy, electron microscopy, scanners, MRI…) and evolve rapidly. This provides a huge amount of data that are not anymore possible to be analyzed manually. The image analysists community has developed software for asses image analysis tasks. We can distinguish generic softwares like ImageJ or Icy and specific tools developed by researchers in Matlab, Java, R, Python or C. Most of these softwares aim at processing a dedicated image analysis task and do not work together. This is the reason why, the bio-analysists community tends to develop a unique interface to merge all these tools and make them easy to use for a biologist. A solution is Galaxy. We made a first attempt to use galaxy in the context of a participative project using imaging.
In the north-Ouest of France population of rays are evaluated by picking up and identificating the rays eggs capsules on the beaches. To make this process easier, an application has been developed to allow people to take pictures of the egg capsules on the beaches and to upload them to a dedicated website. Images can then be analyzed to identify the eggs, spices by specialists or any citizen. To automate this identification step, we develop a Galaxy workflow based on ImageJ tools that extract the egg shape features.
P31: Cancer Genomics in Galaxy
Marco Albuquerque1, Bruno Grande 1, Dr. Ryan Morin1
Poster
An inherent difficulty in data-driven biology is the multi-disciplinary skill set required of the scientist to draw meaningful inferences from complex data sets. Cancer genomics epitomizes this problem with the advent of next-generation sequencing (NGS) and the concomitant need for computational analysis. Despite the myriad available algorithms, a bottleneck in data analysis remains because of cryptic command-line parameters, inflexible system environments with difficult installations, and demanding hardware requirements. Our project directly addresses these issues by building a cancer genomics toolbox consisting of parallelized tools and workflows for the cloud-ready Galaxy platform. Our toolbox will contain some 50 new Galaxy tools spanning several sub-categories, notably variant calling, visualization and additional helper tools for integrating and summarizing results. These will be assembled with existing tools to form Galaxy workflows. Following Galaxy best practices, users will be able to seamlessly install our tools automatically. To ensure optimal accuracy, workflow design and tool parameterization will be informed by the benchmarking results from the ICGC-TCGA DREAM challenges. All tools and workflows will be developed to ensure optimal parallelism on a cluster environment. The incomplete Map-Reduce parallelization framework offered by Galaxy will be expanded, including new merge and split functions for NGS data types used by our tools. Ultimately, this will provide a competitive graphical user interface for performing cancer genome analyses and hopefully find a home in clinics around the world, advancing the field of personalized medicine.
P32: Trinity Galaxy Portal
Carrie Ganote1, Ben Fulton2, Brian Haas3
2 Indiana University
3 Broad Institute of MIT and Harvard
Poster
Large memory requirements, long running times and large-scale CPU consumption are some of the barriers to providing bioinformatics services through Galaxy, especially when limited hardware is available to power these services. We will outline a brief overview of our hardware and software setup and provide benchmarking data that we have collected using the Trinity RNA-Seq Assembler through the Trinity Galaxy portal at https://galaxy.ncgas-trinity.indiana.edu. This should provide a starting point for other institutions who may want to implement similar workflows for their users.
P33: NeLS: Norwegian e-Infrastructure for Life Sciences
Sveinung Gundersen1, Christian Andreetta2, Abdulrahman Azab1, Kjetil Klepper3, Inge Alexander Raknes4, Jeevan Karloss5, Teshome D. Mulugeta5, Xiaxi Li2, Patcharee Thongtra5, Kai Trengereid1, Tim Kahlke4, Erik Semb4, Kidane M. Tekle2
2 University of Bergen
3 Norwegian University of Science and Technology
4 University of Tromsø
5 Norwegian University of Life Sciences
Poster
NeLS is one of the packages of the ELIXIR.NO project and aims to provide a national Norwegian e-infrastructure allowing users within the life sciences community to efficiently and safely store, share, analyse and publish their genomics scale data. The e-infrastructure maintains a web portal at https://nels.bioinfo.no that functions as the central point of access to the NeLS resources, which include data storage and analysis pipelines. NeLS relies on Galaxy as its primary platform for data analysis. Each of the five participating universities hosts its own Galaxy server with different types of analysis tools and workflows reflecting the research focus of the hosting groups. Workflows include differential gene expression analysis of RNA-seq data, variant calling in somatic and germline cells, and taxonomic classification of shotgun metagenomic sequences. For analysis of human patient data, NeLS collaborates closely with the Norwegian “services for sensitive data (TSD)”. The servers are either running on dedicated hardware or as a front-end to shared computer clusters. Authentication of users is done with the common electronic identity provider for the Norwegian educational sector (FEIDE), with an alternative identity provider soon in production.
P34: High Quality Library Construction and Reliable Quantitation with NEBNext Reagents
Bradley W. Langhorst1, Erbay Yigit1, Eileen T. Dimalanta1, Theodore B. Davis1
Both the Quant Kit and the Ultra II Library Prep Kit were developed using the SeqResults tool described in the talk (link to talk). This poster provides more detailed supporting information and shows several figures generated by SeqResults.
An expanded role for NGS in the clinic and research will depend on continued improvement of the upstream processes required to produce high quality NGS data. In particular, maximizing data output & minimizing instrument run failure are imperative. In this poster, we present data showing: 1) the improvements we have made in library preparation with the development of the NEBNext Ultra II Library Prep kit; and 2) the development of the NEBNext Library Quant Kit for Illumina, a simple and robust method for quantitation of Illumina libraries. In the first part, we show that libraries made with the Ultra II kit have low DNA input requirements; additionally, we significantly improved library yields and reduced sequence bias. Moreover, the workflow is simple and streamlined, greatly reducing the time required to produce high quality DNA libraries and the possibility of errors. In the second part, we demonstrate the effectiveness of the NEBNext Library Quant Kit for a broad range of library types and sizes as well as advantages offered by qPCR quantitation for obtaining optimal cluster density and user-to-user consistency. The NEBNext Quant Kit offers an efficient and cost-effective qPCR library quantitation workflow for users looking to optimize both sequencing yield and throughput.
Poster Printing in Norwich
The conference does not have an official relationship with a poster printer in Norwich. However, Jarrold’s Print and Copy Shop is well-known locally. A0 poster printing is £25 for full colour.Submit a late abstract
The deadlines for oral and poster presentations has passed. However, late oral and poster abstracts are still being accepted and will be considered as cancellations occur, or space opens up.
Abstracts are submitted electronically and should be 250 words of plain text or less. See the GCC2014 abstracts list to see the broad range of topics presented in 2014.
There will also be an opportunity for lightning talks, which will be solicited during the meeting.
Oral presentations will be 15 or 20 minutes long.
Talks and posters on any topics of interest to the Galaxy community are welcome.
Please Note: By submitting an abstract you:
- Agree to make your slides/posters freely available on this web site no later than 15 August 2015.
- Those giving oral presentations agree to have their presentations videotaped and made publicly available during and after the conference.