DataHack_Logo_1b_chalk font 3

This year’s GCC 2015 will feature something new and experimental: the first ever Data Wrangling Hackathon!

This event will precede the main conference and is being held in conjunction with the Developers (Coding) Hackathon. We are inviting Galaxy power end-users as well as scientists and bioinformatics support researchers that design Galaxy tools and/or workflows to join us for this inaugural Data Hack.

The main goals of this Hackathon are the generation of “Best Practices” analysis workflows and tools as well as pipelines to simplify and address common data manipulations. We also want to identify issues with Galaxy usability and other bottlenecks from an end-user perspective and resolve these together with the participants of the Coding Hackathon. Upon successful interaction between both Hackathons, we hope to see the proposed changes directly turned into production code.

The results of the Hackathon will be annotated and published as shared solutions to the Main Public Galaxy instance (http://usegalaxy.org) with some promoted as Tutorials on the Galaxy Biostars forum. Successfully implemented tools and/or workflows will be published in GigaScience. For the publication, each involved participant, be it coder or analyst, will receive co-authorship.

In order to meet these goals, interaction with the Coding Hackathon is a must. The venues of both GCC2015 Hackathons will be physically close to facilitate communication and coordination between participants. Furthermore, we will have food together.

What’s a Hackathon?

A hackathon is an intense event at which a group of individuals with different backgrounds and skills collaborate hands-on and face-to-face to develop a product. Here analysts and developers will be collaborating on working code or workflow Best Practices that are useful to the Galaxy community as a whole.

Hackathons gather people in a room where they can focus on their task at hand, i.e. designing and developing solutions, free from the distractions inevitable during normal working hours.

Hackathons are driven by participants. Participants propose and coalesce around a set of core goals early on the first day, and then spend the rest of their time working towards those goals. With two Hackathons working on partially common goals, there will of course be interaction between the different events. A shared Trello board will be opened about a month before the event to facilitate communication.

Who Should Attend?

EvoHackLaptops2010

This event is for you if you:

  • are a Galaxy power user or bioinformatics specialist interested in generating analysis and workflow “Best Practices”
  • are passionate about improving Galaxy’s usability in cooperation with developers
  • have 2 or 3 free days this July

Schedule

Schedule? At a hackathon?

Hackathons are self-organising events. Attempts to impose anything as detailed as an agenda should be met with resistance and humor. With that in mind, this schedule tries to provide the absolute minimum amount of structure needed to ensure that something actually gets done.

Both hackathons are “officially” scheduled for three days, starting on Saturday, 4 July. The last two “official” days overlap with the Training SunDay and Training Day. The exact location for the first two days is almost settled. It will have plenty of space (and outlets!) for individuals and groups to break off and work.

We keep putting official in quotes because our hope is that activity will continue throughout the conference and afterward as well.  If you have experience with hackathons you know that followup is key. The project will do all it can to support followup efforts, in particular those involved in GigaScience projects will work together to finalise the publication. We very much want to make sure that all of the efforts from both GCC 2015 hackathons make it into the Galaxy code base, Tutorials, and/or Best Practices.

Minimum Amount of Structure

Here’s a proposal for “the absolute minimum amount of structure needed to ensure that something actually gets done.”

Pre-Hackathon

One challenge with hackathons is getting things done in the short amount of time the event itself is going on. To help this along we’ll use a Trello board (GCC 2014 example, an updated board for GCC 2015 will be created prior to the event). This is to help organise both ideas and people into concrete projects and teams. The hackathon organisers will be active contributors to the board both to seed ideas (though, add your own as well, please!) and offer commentary on proposed projects. We’ll use Trello throughout the event to keep track of project status and changes. New tutorials and workflows will transition to finished published work headed for Galaxy Main, Galaxy Biostars, GalaxyProject.org (wiki), and/or the GigaScience Publication. Completed coding projects and related documentation will transition to the regular Galaxy or Cloudman development Trello boards.

Not familiar with Trello yet? Don’t worry, we’ll be posting additional help for getting started on this web site soon. If you wish to get involved early, visit the GalaxyProject.org Trello Support wiki and Trello’s Tour.

Day 0: Saturday, 4 July

Welcome and participant introductions. Participants will have a few minutes each to say who they are and what they’re interested in hacking on. We’ll have brief task proposals, followed by some Brownian motion and coalescing into groups of folks who want to work on similar things. The rest of the day will be filled with hacking, lunch, hacking, dinner, more hacking.

Day 1: Sunday, 5 July

The GCC 2015 Training Days start today with Training SunDay.

Day 2: Monday, 6 July

This is the second GCC 2015 Training Day. Conference meals start today. We aren’t yet sure if there will be one place for participants to meet or if teams will be distributed around the venue.

There is probably no formal hackathon schedule for today other than a single evening meeting at which everyone presents their work so far. During the day, teams will meet according to their own agreements and continue work.

Lastly, each team will send out their final pre-conference team report for that day, via email, before midnight.

Post-Hackathon: GCC2015

Teams are encouraged to continue work and meet during breaks and other unscheduled time.

Post-GCC2015

Join us in IRC (irc.freenode.net #galaxyproject) and keep on hacking! Data Wranglers are strongly encouraged to bring their voice into the conversation. Not sure how to use IRC? See the GalaxyProject.org IRC Support wiki. 

Costs

There will likely be no registration cost for the hackathon. Why? Please thank the sponsors. Repeatedly.

Participants will need to cover their own lodging costs, and (most) of their own meal costs. We will do our best to keep caffeine flowing throughout the event. Details about lodging options will be included here soon.

The Johns Hopkins University Data Science Specialization Program offered on Coursera is the exclusive sponsor for both the GCC2015 Coding Hackathon and the Data Wrangling Hackathon.

This program includes a Genomic Data Science course covering

  • Basic computational, biological, and statistical skills for analyzing big genomic data
  • Foundational tools for successfully engaging in this rapidly changing field
  • How to go from unprocessed next generation sequencing data to meaningful biological results

and introduces Galaxy, Bioconductor, Python, command line, and many other tools.

The series of nine MOOCs are now open for enrollment and free to anyone.

Twitter Hashtag

Please use #usegalaxy as the Twitter hashtag for the Hackathon (and anything else related to GCC2015).

Is this part of GCC2015?

Mostly yes, but there is a separate organising committee.