GCC2015 Training Day(s)

 

The 2015 Galaxy Community Conference (GCC2015) starts with Training. Monday, 6 July is a Training Day featuring multiple tracks covering a wealth of topics.

Something new for GCC2015 is Training SunDay, an additional day of training offered immediately before the conference, and featuring a single track with the most in-demand topics. You can attend both Training Days, or just one.

Training SunDay

Time Topic

Training SunDay is an additional day of training offered immediately before the conference, and featuring a single track with the three topics that received the most votes from the community. These three topics are also offered on Monday as well.

You can register for both Training Days (and attend 6 sessions), or just one.

08:00 Registration
08:45 Introduction to Galaxy
11:00 Break
11:15 RNA-Seq Analysis with Galaxy, Part 1
Saskia Hiltemann, Youri Hoogstrate
12:45 Catered Lunch
13:45 RNA-Seq Analysis with Galaxy, Part 2
Saskia Hiltemann, Youri Hoogstrate
14:45 Visualisation of NGS Data, Part 1
15:30 Break
15:50 Visualisation of NGS Data, Part 2
17:35 Done

Training Day

Monday, 6 July is a Training Day featuring five parallel tracks, each with three, two and a half hour workshops. There are topics on using Galaxy, interacting with it programmatically, and deploying, administering, and extending Galaxy. No matter what you do with Galaxy, there are workshops for you.
Time Auditorium Watson G34 Crick G35 Wilkins G36 Franklin G37
08:00 Registration
09:15 Setting up a Galaxy instance as a service Scripting Galaxy using the API and BioBlend Galaxy Interactive Environments
Björn Grüning, Eric Rasche, Cameron Smith, John Chilton
In the Chris Lamb Lounge
Introduction to Galaxy Finding causative mutations in genomes with a Candidate SNP approach
11:45 Catered Lunch
13:00 Introduction to Writing Galaxy Tools & Publishing in Galaxy ToolShed Advanced Workflows and Variables Running Galaxy on Docker and StarCluster The Galaxy Database Schema RNA-Seq Analysis with Galaxy
Saskia Hiltemann, Youri Hoogstrate
15:30 Break
16:00 Test-Driven Development of Galaxy Tools with Planemo & Advanced Topics in Tool Creation Visualisation of NGS Data Variant Detection with Galaxy Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P Galaxy Architecture
18:30 Training Sessions Done
18:50
Dinner (on your own)
22:00 Finish

Prerequisites

All sessions are hands on and participants should bring a wifi-enabled, fully charged laptop to participate in each session. Each session also has additional prerequisites as well.

Topics: Using Galaxy

Introduction to Galaxy

Daniel Blankenberg, Penn State University

Slides

New to Galaxy? This will introduce you to the Galaxy Project, the Galaxy Community, and walk you through a simple use case demonstrating what Galaxy can do. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites:

  • Little or no knowledge of Galaxy
 

Finding causative mutations in genomes with a Candidate SNP approach

Dan MacLean, The Sainsbury Laboratory

Training materials

Mapping mutations by position, either using classical methods or whole genome high-throughput sequencing (HTS), largely relies on the analysis of genome-wide polymorphisms in F2 recombinant populations.

We will study high-throughput genomic sequence from genomes of back-and out-crossed bulks of plants to identify a genetic mutation caused by EMS mutagenisation of bulk segregants. The workflow demonstrated and implemented by the attendees will QC paired Illumina reads and align them against the Arabidopsis reference genome using BWA, generate a BAM file, identify SNPs using SAMtools and separate SNPs by allele frequency. We will then use SNPeff to annotate SNPs as to their effect and location in genes and generate plots that will allow us to compare the relative densities of SNP classes across the genome and reveal the candidate positions of the causative mutation.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
  2. Basic understanding of genetics.
 

RNA-Seq Analysis with Galaxy

Saskia Hiltemann, Erasmus MC
Youri Hoogstrate, Erasmus MC

Slides, with notes
Practicals
Practicals with Answers

This hands-on workshop will demonstrate basic RNA-Seq analysis pipelines including quality control, alignment, and differential expression analysis in Galaxy.

Sample datasets small enough to be successfully processed during the course of the seminar will be provided. Participants will perform the analyses themselves on the provided cloud instance of Galaxy.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
 

Advanced Workflows and Variables

Jennifer Hillman-Jackson, Penn State University

This workshop will teach participants all they need to know in order to create their own publication and/or production quality Galaxy Workflows.

  1. Basic and Advanced Workflow Editor functions.
  2. Demystify the magic variables defined by the Workflow’s engine with a special emphasis on how to track data inputs and outputs: utilize labels inherited from existing datasets, prompt for user-defined labels, and/or create custom-specified labels (or portions of labels) within the Workflow itself.
  3. Hands-on examples for batch processing, including how to execute using multiple input streams or Dataset Collections.
  4. Tips for preparing a Workflow so it may be used effectively by others: annotation options, run-time parameter changes, and proper input selection.
  5. Best Practices for Sharing or Publishing a Workflow on a Galaxy instance, be it stand-alone or embedded within a Page.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
  2. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.
 

Visualisation of NGS Data

Jeremy Goecks, George Washington University
Aysam Guerler, Johns Hopkins University
Carl Eberhard, Johns Hopkins University

Workshop materials, slides

This workshop will cover visualisation of both primary NGS analyses –alignments, variants, annotations — as well as downstream options such as heat maps, charts, and graphs.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
  2. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Variant Detection with Galaxy

Andrew Lonie, University of Melbourne
Clare Sloggett, University of Melbourne

Tutorial

The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:

  • Evaluate the quality of the short data. If the quality is poor, then adjustments can be made – eg trimming the short reads, or adjusting your expectations of the final outcome.
  • Map each of the individual reads in the sample FASTQ readsets to a reference genome, so that we can then identify the sequence changes with respect to the reference genome. Some of the variant callers need extra information regarding the source of reads in order to identify the correct error profiles to use in their statistical variant detection model, so we add more information into the alignment step so that that generated BAM file contains the metadata the variant caller expects.
  • Calling Variants using the GATK Unified Genotyper. The GATK Unified Genotyper is a Bayesian variant caller and genotyper from the Broad Institute. Many users consider the GATK to be best practice in human variant calling.
  • Try an alternative caller: Mpileup
  • Evaluate known variations. We know a lot about variation in humans from many empirical studies, including the 1000Genomes project, so we have some expectations on what we should see when we detect variants in a new sample.
  • Annotate the detected variants against the ensembl database and interpret the annotation output.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
   

Mass Spectrometry-based Proteomics Data Analysis using Galaxy-P

Tim Griffin, University of Minnesota
Pratik Jagtap, University of Minnesota
James Johnson, University of Minnesota

Course Documentation (PDF)
Conceptual Overview (PDF)

This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets. After a short introduction on the basics of MS-based proteomics data types and concepts that underly protein identification from this data, the workshop will be organized around three integrated modules, presented in this order:

  1. Basic proteomic workflows for protein identification
    • Attendees will be taken on a tour of MS-based proteomics tools available in the Tool Shed; using some of these tools, attendees will learn methods for protein sequence database construction and manipulation, available Galaxy-based tools for sequence database searching, outputted data types and tools for collating results
  2. Advanced proteomic workflows
    • Building on knowledge gained in module 1, attendees will learn about advanced applications in protein identification, focusing on applications that integrate genomic/transcriptomic data with proteomics data. Attendees will learn methods to construct protein databases from RNA-seq data, and downstream tools designed to evaluate the quality of protein identifications matching to genomic/transcriptomic-derived protein sequences.
  3. Visualization and interpretation of results
    • Attendees will gain exposure to the mechanics of visualization in Galaxy, a variety of tools in place for visualizing outputted protein identifications from upstream workflows. These include tools for data quality control. Visualization tools for interpreting results from proteogenomics applications, via mapping of identified peptides to reference genomes, will also be demonstrated.
At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools in the Tool Shed, experience in setting up basic workflows for protein identification, as well as more advanced applications in proteogenomics. An understanding of available tools for results visualization and interpretation will also be gained.

Participants will be given temporary accounts to local Galaxy instance at the University of Minnesota to participate in hands-on workshop activities.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
 

Topics: Using Galaxy Programmatically

Scripting Galaxy using the API and BioBlend

Nicola Soranzo, The Genome Analysis Centre (TGAC)
Dannon Baker, Johns Hopkins University
Carl Eberhard, Johns Hopkins University

Tutorial materials

Galaxy has a growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover programmatic access of the API either by direct REST web requests or by using the BioBlend Python library.

Prerequisites:

  1. Basic understanding of Galaxy from a developer point of view.
  2. Python programming.
 

Topics: Deploying, Administering, and Wrapping Tools for Galaxy

Setting up a Galaxy instance as a service

Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research
Nikolay Vazov, University of Oslo
Jochen Bick, ETH Zürich

Slides 1, Slides 2, and Slides 3

The premise: You are given the task to set up a Galaxy instance for others (i.e. as a core service in your institute) and you are not really familiar with Galaxy.

In this workshop, you will learn what is important when you set up a Galaxy server from scratch, what are the pitfalls you might run into, how to interact with the potential users of the service you gonna offer, and how to make sure, the Galaxy instance you have set up is really used in the end. After a general introduction, several Galaxy installations are presented. The session will finish with a panel discussion, where we intend to discuss questions from the workshop participants.

Prerequisites:

  1. Basic knowledge of the Unix/Linux command line interface
  2. Familiar with the Bioinformatics problems (and their solutions) that wet lab scientists run into.

Galaxy Interactive Environments

Björn Grüning, University of Freiburg Eric Rasche, Texas A&M University Cameron Smith, University of Freiburg John Chilton, Penn State University

Slides (PDF)

In this session you will get an introduction to Interactive Environments (IE) as an easy and powerful way to integrate arbitrary interactive web services into Galaxy. We will demonstrate the IPython Galaxy project and the general concept of IE’s. Moreover, we will create an IE on-the-fly to get you started!

Prerequisites:

  1. Basic understanding of Galaxy from a developer point of view.
 

Running Galaxy on Docker and StarCluster

Gaurav Kaul, Intel Corporation
Robert Sugar, Intel Corporation

Training materials, Docker Security Slides, Galaxy on Docker

Two different methods of running Galaxy are covered

  1. As a Docker container : here we will cover the fundamentals of Docker containers and why would you want to use them for running your pipeline. After the overview we will have a hands on session of running Docker Galaxy image and running the deepTools pipeline
  2. Managing Galaxy using Starcluster : The STAR (Software Tools for Academics and Researchers) program at MIT provides a command-line tool called StarCluster. This tool has a number of sub commands, which can be used to create, manage, login to, stop, and destroy clusters of one or more VM instances on EC2. Although StarCluster does not natively support Galaxy (yet), it provides convenient command tool chain to manage EC2 AMI (which could be the CloudMan instances running Galaxy servers). The real utility of StarCluster comes when doing development on Galaxy ToolShed, whose workflow we will demonstrate as part of the hands on.

Prerequisites:

  1. Python
  2. Linux Shell Scripting
 

Introduction to Writing Galaxy Tools and Publishing in Galaxy ToolShed

Martin Čech, Penn State University
Björn Grüning, University of Freiburg
Dan Blankenberg, Penn State University
Dave Bouvier, Penn State University
John Chilton, Penn State University
Peter Cock, The James Hutton Institute
Eric Rasche, Texas A&M University
Nicola Soranzo, The Genome Analysis Centre (TGAC)

Slides (PDF)
Data and Examples

This tutorial will teach developers and bioinformaticians how to take a working script or application and turn it into a Galaxy tool. It will cover the basics of wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, add citations, and publishing tools to the Github and Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from some of the best tool developers out there.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
  2. Familiarity with Unix command line and text editors
 

Test-Driven Development of Galaxy Tools with Planemo & Advanced Topics in Tool Creation

John Chilton, Penn State University
Martin Čech, Penn State University
Björn Grüning, University of Freiburg
Dan Blankenberg, Penn State University
Dave Bouvier, Penn State University
Peter Cock, The James Hutton Institute
Eric Rasche, Texas A&M University
Nicola Soranzo, The Genome Analysis Centre (TGAC)

Slides (PDF)
Tutorial

This tutorial is aimed at people with some experience developing tools and will cover more advanced topics in tool development, more complex tools, and recent enhancements to the Galaxy tool development process including:

  • Using Planemo, a new command-line application to aid Galaxy tool development, to develop Galaxy tools using a test driven development methodology.
  • Designing tools for use with the dataset collections.
  • Publishing complex tools to the Galaxy Tool Shed.
  • Maintaining Galaxy Tools.

Prerequisites:

  1. Basic Knowledge of Galaxy Tools, or attendance at the Introduction to Writing Galaxy Tools and Publishing in Galaxy Tool Shed session.
 

The Galaxy Database Schema

Dave Clements, Johns Hopkins University
Nitesh Turaga, Johns Hopkins University

Tutorial

Running a production Galaxy server, you some times end up in with a situation, where you manually need to interact with the database. e.g. you need to change the state of a job to ‘error’. This is always a very risky adventure. Or a not-at-all risky situation: you want to extract usage information, which can not be gathered using the given report tools. For both cases, you need a good understanding of the Galaxy database schema.

Learn some of the design concepts of the database, which parts of the schema are stable, and which will be changing in the foreseeable future.

Prerequisites:

  1. Experience maintaining a production Galaxy server (recommended)
  2. Basic knowledge of relational databases and SQL statements
 

Galaxy Architecture

James Taylor, Johns Hopkins University
Nate Coraor, Penn State University

Want to know the big picture about what is going on inside Galaxy? This workshop will introduce participants to the high-level architecture of Galaxy internals, and to the project’s coding practices and standards.

Prerequisites:

  1. General knowledge of Galaxy, or attendance at the “Introduction to Galaxy” session.
  2. Knowledge of programming or a scripting language.
 

Training Sponsors

Feedback on the GCC2014 Training Day

  • All the instructors were amazing.
  • Have attended many training days/tutorials in my 15yr career. This was for me the most fruitful (and thus the best) so far.
  • Intense but great
  • I could definitely tell a lot of time and energy went into planning this training day/conference, and for that I want to say thank you.
  • Overall a great experience
Thanks to everyone who voted on Training Day topics nominations.