Software and data required for the course

Important

  • The course instructions assume you are using a Linux environment

  • If you’re using a Mac or Windows computer, you might find it easier to set up a Linux Virtual Machine using software like Virtual Box. (Instructions below.)

  • However, all of the software used should also be installable on Mac or Windows computers.

Assumed file system structure

All of the practical sessions are written to refer to various pieces of data in a root directory called /course. If you’re using a Virtual Machine, you can just make this directory (sudo mkdir /course) and put the various pieces of data there. If you’re using your own computer, and put the data elsewhere like somewhere in your home folder, you’ll need to modify the course instructions appropriately.

Setting up a Linux virtual machine for the course

Installing software for the course

Docker

Docker allows you to run “containers”: reproducible builds of certain tools. Install Docker Desktop (or alternatives like Podman).

Anaconda

Conda allows you to create “environments”: sets of tools and libraries that depend on each other. Install Anaconda distribution.

Sirius

Sirius is a tool for analysting metabolite data. Install Sirius 4.

MZmine

MZmine is a tool for processing mass-spectrometery data. Install MZmine 3.

Gemma

Gemma is a tool for working with genome-wide association studies. Install Gemma 0.98.3.

Bedtools

Bedtools is a set of tools for genomic analysis. Install Bedtools 2.30.0.

Dependencies

cd /course (assuming you are using a Virtual Machine, see notes above)

This fetches the course notes, some code notebooks, and various dependencies and datasets: git clone https://github.com/ebi-metagenomics/holofood-course.git docs

This creates Conda environments with the dependencies required for the practical sessions: cd docs/sessions/Metabolomics/

conda create -f Metabolomics.yml

cd docs/sessions/metagenomics/notebooks/

conda create --name jupyter -c conda-forge jupyterlab

conda acivate jupyter

pip install -r requirements.txt

conda create --name r --channel conda-forge "r-base>=4.0.3" r-devtools

conda activate r

conda install -c conda-forge r-reshape2 r-ggplot2

Copying data for the course

For the MAG generation practical

Download all of the data from this EBI-hosted FTP site.

Unzip any of the .tar.gz files, using e.g. tar -xzf eukaryotes.tar.gz.

For the multi-kingdom metagenomics practical

wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_courses/biata_2021/virify_tutorial.tar.gz
or
rsync -av --partial --progress rsync://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_courses/biata_2021/virify_tutorial.tar.gz .

Once downloaded, extract the files from the tarball:

tar -xzvf virify_tutorial.tar.gz

Now change into the virify_tutorial directory and setup the environment by running the following commands in your current terminal session:

cd virify_tutorial
docker load --input docker/virify.tar
docker run --rm -it -v $(pwd)/data:/opt/data virify
mkdir obs_results