GCC2015 Training Day(s)
The 2015 Galaxy Community Conference (GCC2015) starts with Training. Monday, 6 July is a Training Day featuring multiple tracks covering a wealth of topics.
Something new for GCC2015 is Training SunDay, an additional day of training offered immediately before the conference, and featuring a single track with the most in-demand topics. You can attend both Training Days, or just one.
Training SunDay
Time | Topic |
Training SunDay is an additional day of training offered immediately before the conference, and featuring a single track with the three topics that received the most votes from the community. These three topics are also offered on Monday as well. You can register for both Training Days (and attend 6 sessions), or just one. |
---|---|---|
08:00 | Registration | |
08:45 | Introduction to Galaxy | |
11:00 | Break | |
11:15 | RNA-Seq Analysis with Galaxy, Part 1
Saskia Hiltemann, Youri Hoogstrate |
|
12:45 | Catered Lunch | |
13:45 | RNA-Seq Analysis with Galaxy, Part 2
Saskia Hiltemann, Youri Hoogstrate |
|
14:45 | Visualisation of NGS Data, Part 1 | |
15:30 | Break | |
15:50 | Visualisation of NGS Data, Part 2 | |
17:35 | Done |
Training Day
Monday, 6 July is a Training Day featuring five parallel tracks, each with three, two and a half hour workshops. There are topics on using Galaxy, interacting with it programmatically, and deploying, administering, and extending Galaxy. No matter what you do with Galaxy, there are workshops for you.Time | Auditorium | Watson G34 | Crick G35 | Wilkins G36 | Franklin G37 |
---|---|---|---|---|---|
08:00 | Registration | ||||
09:15 | Setting up a Galaxy instance as a service | Scripting Galaxy using the API and BioBlend | Galaxy Interactive Environments | Introduction to Galaxy
Daniel Blankenberg Sold Out |
Finding causative mutations in genomes with a Candidate SNP approach |
11:45 | Catered Lunch | ||||
13:00 | Introduction to Writing Galaxy Tools & Publishing in Galaxy ToolShed | Advanced Workflows and Variables
Jennifer Hillman-Jackson Sold Out |
Running Galaxy on Docker and StarCluster | The Galaxy Database Schema | RNA-Seq Analysis with Galaxy
Saskia Hiltemann, Youri Hoogstrate |
15:30 | Break | ||||
16:00 | Test-Driven Development of Galaxy Tools with Planemo & Advanced Topics in Tool Creation | Visualisation of NGS Data | Variant Detection with Galaxy | Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P | Galaxy Architecture |
18:30 | Training Sessions Done | ||||
18:50 |
Dinner (on your own)
| ||||
22:00 | Finish |
Prerequisites
All sessions are hands on and participants should bring a wifi-enabled, fully charged laptop to participate in each session. Each session also has additional prerequisites as well.Topics: Using Galaxy
Introduction to Galaxy
Daniel Blankenberg, Penn State University
Slides
New to Galaxy? This will introduce you to the Galaxy Project, the Galaxy Community, and walk you through a simple use case demonstrating what Galaxy can do. This session is recommended for anyone who has not used, or only rarely uses Galaxy.
Prerequisites:
- Little or no knowledge of Galaxy
Finding causative mutations in genomes with a Candidate SNP approach
Dan MacLean, The Sainsbury Laboratory
Training materials
Mapping mutations by position, either using classical methods or whole genome high-throughput sequencing (HTS), largely relies on the analysis of genome-wide polymorphisms in F2 recombinant populations.
We will study high-throughput genomic sequence from genomes of back-and out-crossed bulks of plants to identify a genetic mutation caused by EMS mutagenisation of bulk segregants. The workflow demonstrated and implemented by the attendees will QC paired Illumina reads and align them against the Arabidopsis reference genome using BWA, generate a BAM file, identify SNPs using SAMtools and separate SNPs by allele frequency. We will then use SNPeff to annotate SNPs as to their effect and location in genes and generate plots that will allow us to compare the relative densities of SNP classes across the genome and reveal the candidate positions of the causative mutation.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
- Basic understanding of genetics.
RNA-Seq Analysis with Galaxy
Saskia Hiltemann, Erasmus MC
Youri Hoogstrate, Erasmus MC
Slides, with notes
Practicals
Practicals with Answers
This hands-on workshop will demonstrate basic RNA-Seq analysis pipelines including quality control, alignment, and differential expression analysis in Galaxy.
Sample datasets small enough to be successfully processed during the course of the seminar will be provided. Participants will perform the analyses themselves on the provided cloud instance of Galaxy.Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
Advanced Workflows and Variables
Jennifer Hillman-Jackson, Penn State University
This workshop will teach participants all they need to know in order to create their own publication and/or production quality Galaxy Workflows.
- Basic and Advanced Workflow Editor functions.
- Demystify the magic variables defined by the Workflow’s engine with a special emphasis on how to track data inputs and outputs: utilize labels inherited from existing datasets, prompt for user-defined labels, and/or create custom-specified labels (or portions of labels) within the Workflow itself.
- Hands-on examples for batch processing, including how to execute using multiple input streams or Dataset Collections.
- Tips for preparing a Workflow so it may be used effectively by others: annotation options, run-time parameter changes, and proper input selection.
- Best Practices for Sharing or Publishing a Workflow on a Galaxy instance, be it stand-alone or embedded within a Page.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
- A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.
Visualisation of NGS Data
Jeremy Goecks, George Washington University
Aysam Guerler, Johns Hopkins University
Carl Eberhard, Johns Hopkins University
Workshop materials, slides
This workshop will cover visualisation of both primary NGS analyses –alignments, variants, annotations — as well as downstream options such as heat maps, charts, and graphs.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
- A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.
Variant Detection with Galaxy
Andrew Lonie, University of Melbourne
Clare Sloggett, University of Melbourne
Tutorial
The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:
- Evaluate the quality of the short data. If the quality is poor, then adjustments can be made – eg trimming the short reads, or adjusting your expectations of the final outcome.
- Map each of the individual reads in the sample FASTQ readsets to a reference genome, so that we can then identify the sequence changes with respect to the reference genome. Some of the variant callers need extra information regarding the source of reads in order to identify the correct error profiles to use in their statistical variant detection model, so we add more information into the alignment step so that that generated BAM file contains the metadata the variant caller expects.
- Calling Variants using the GATK Unified Genotyper. The GATK Unified Genotyper is a Bayesian variant caller and genotyper from the Broad Institute. Many users consider the GATK to be best practice in human variant calling.
- Try an alternative caller: Mpileup
- Evaluate known variations. We know a lot about variation in humans from many empirical studies, including the 1000Genomes project, so we have some expectations on what we should see when we detect variants in a new sample.
- Annotate the detected variants against the ensembl database and interpret the annotation output.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P
Tim Griffin, University of Minnesota
Pratik Jagtap, University of Minnesota
James Johnson, University of Minnesota
Course Documentation (PDF)
Conceptual Overview (PDF)
This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets. After a short introduction on the basics of MS-based proteomics data types and concepts that underly protein identification from this data, the workshop will be organized around three integrated modules, presented in this order:
- Basic proteomic workflows for protein identification
- Attendees will be taken on a tour of MS-based proteomics tools available in the Tool Shed; using some of these tools, attendees will learn methods for protein sequence database construction and manipulation, available Galaxy-based tools for sequence database searching, outputted data types and tools for collating results
- Advanced proteomic workflows
- Building on knowledge gained in module 1, attendees will learn about advanced applications in protein identification, focusing on applications that integrate genomic/transcriptomic data with proteomics data. Attendees will learn methods to construct protein databases from RNA-seq data, and downstream tools designed to evaluate the quality of protein identifications matching to genomic/transcriptomic-derived protein sequences.
- Visualization and interpretation of results
- Attendees will gain exposure to the mechanics of visualization in Galaxy, a variety of tools in place for visualizing outputted protein identifications from upstream workflows. These include tools for data quality control. Visualization tools for interpreting results from proteogenomics applications, via mapping of identified peptides to reference genomes, will also be demonstrated.
Participants will be given temporary accounts to local Galaxy instance at the University of Minnesota to participate in hands-on workshop activities.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
Topics: Using Galaxy Programmatically
Scripting Galaxy using the API and BioBlend
Nicola Soranzo, The Genome Analysis Centre (TGAC)
Dannon Baker, Johns Hopkins University
Carl Eberhard, Johns Hopkins University
Tutorial materials
Galaxy has a growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover programmatic access of the API either by direct REST web requests or by using the BioBlend Python library.
Prerequisites:
- Basic understanding of Galaxy from a developer point of view.
- Python programming.
Topics: Deploying, Administering, and Wrapping Tools for Galaxy
Setting up a Galaxy instance as a service
Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research
Nikolay Vazov, University of Oslo
Jochen Bick, ETH Zürich
Slides 1, Slides 2, and Slides 3
The premise: You are given the task to set up a Galaxy instance for others (i.e. as a core service in your institute) and you are not really familiar with Galaxy.
In this workshop, you will learn what is important when you set up a Galaxy server from scratch, what are the pitfalls you might run into, how to interact with the potential users of the service you gonna offer, and how to make sure, the Galaxy instance you have set up is really used in the end. After a general introduction, several Galaxy installations are presented. The session will finish with a panel discussion, where we intend to discuss questions from the workshop participants.
Prerequisites:
- Basic knowledge of the Unix/Linux command line interface
- Familiar with the Bioinformatics problems (and their solutions) that wet lab scientists run into.
Galaxy Interactive Environments
Björn Grüning, University of Freiburg
Eric Rasche, Texas A&M University
Cameron Smith, University of Freiburg
John Chilton, Penn State University
Slides (PDF)
In this session you will get an introduction to Interactive Environments (IE) as an easy and powerful way to integrate arbitrary interactive web services into Galaxy. We will demonstrate the IPython Galaxy project and the general concept of IE’s. Moreover, we will create an IE on-the-fly to get you started!
Prerequisites:
- Basic understanding of Galaxy from a developer point of view.
Running Galaxy on Docker and StarCluster
Gaurav Kaul, Intel Corporation
Robert Sugar, Intel Corporation
Training materials, Docker Security Slides, Galaxy on Docker
Two different methods of running Galaxy are covered
- As a Docker container : here we will cover the fundamentals of Docker containers and why would you want to use them for running your pipeline. After the overview we will have a hands on session of running Docker Galaxy image and running the deepTools pipeline
- Managing Galaxy using Starcluster : The STAR (Software Tools for Academics and Researchers) program at MIT provides a command-line tool called StarCluster. This tool has a number of sub commands, which can be used to create, manage, login to, stop, and destroy clusters of one or more VM instances on EC2. Although StarCluster does not natively support Galaxy (yet), it provides convenient command tool chain to manage EC2 AMI (which could be the CloudMan instances running Galaxy servers). The real utility of StarCluster comes when doing development on Galaxy ToolShed, whose workflow we will demonstrate as part of the hands on.
Prerequisites:
- Python
- Linux Shell Scripting
Introduction to Writing Galaxy Tools and Publishing in Galaxy ToolShed
Martin Čech, Penn State University
Björn Grüning, University of Freiburg
Dan Blankenberg, Penn State University
Dave Bouvier, Penn State University
John Chilton, Penn State University
Peter Cock, The James Hutton Institute
Eric Rasche, Texas A&M University
Nicola Soranzo, The Genome Analysis Centre (TGAC)
Slides (PDF)
Data and Examples
This tutorial will teach developers and bioinformaticians how to take a working script or application and turn it into a Galaxy tool. It will cover the basics of wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, add citations, and publishing tools to the Github and Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from some of the best tool developers out there.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
- Familiarity with Unix command line and text editors
Test-Driven Development of Galaxy Tools with Planemo & Advanced Topics in Tool Creation
John Chilton, Penn State University
Martin Čech, Penn State University
Björn Grüning, University of Freiburg
Dan Blankenberg, Penn State University
Dave Bouvier, Penn State University
Peter Cock, The James Hutton Institute
Eric Rasche, Texas A&M University
Nicola Soranzo, The Genome Analysis Centre (TGAC)
Slides
(PDF)
Tutorial
This tutorial is aimed at people with some experience developing tools and will cover more advanced topics in tool development, more complex tools, and recent enhancements to the Galaxy tool development process including:
- Using Planemo, a new command-line application to aid Galaxy tool development, to develop Galaxy tools using a test driven development methodology.
- Designing tools for use with the dataset collections.
- Publishing complex tools to the Galaxy Tool Shed.
- Maintaining Galaxy Tools.
Prerequisites:
- Basic Knowledge of Galaxy Tools, or attendance at the Introduction to Writing Galaxy Tools and Publishing in Galaxy Tool Shed session.
The Galaxy Database Schema
Dave Clements, Johns Hopkins University
Nitesh Turaga, Johns Hopkins University
Tutorial
Running a production Galaxy server, you some times end up in with a situation, where you manually need to interact with the database. e.g. you need to change the state of a job to ‘error’. This is always a very risky adventure. Or a not-at-all risky situation: you want to extract usage information, which can not be gathered using the given report tools. For both cases, you need a good understanding of the Galaxy database schema.
Learn some of the design concepts of the database, which parts of the schema are stable, and which will be changing in the foreseeable future.
Prerequisites:
- Experience maintaining a production Galaxy server (recommended)
- Basic knowledge of relational databases and SQL statements
Galaxy Architecture
James Taylor, Johns Hopkins University
Nate Coraor, Penn State University
Want to know the big picture about what is going on inside Galaxy? This workshop will introduce participants to the high-level architecture of Galaxy internals, and to the project’s coding practices and standards.
Prerequisites:
- General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
- Knowledge of programming or a scripting language.
Training Sponsors
Feedback on the GCC2014 Training Day
- All the instructors were amazing.
- Have attended many training days/tutorials in my 15yr career. This was for me the most fruitful (and thus the best) so far.
- Intense but great
- I could definitely tell a lot of time and energy went into planning this training day/conference, and for that I want to say thank you.
- Overall a great experience