Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

adding fmt tutorial#353

Merged
thermokarst merged 3 commits into
qiime2:fmt_cdifffrom
antgonza:fmt-tutorial
Oct 7, 2018
Merged

adding fmt tutorial#353
thermokarst merged 3 commits into
qiime2:fmt_cdifffrom
antgonza:fmt-tutorial

Conversation

@antgonza
Copy link
Copy Markdown
Member

Brief summary of the Pull Request, including any issues it may fix using the GitHub closing syntax:

https://help.github.com/articles/closing-issues-using-keywords/

Also, include any co-authors or contributors using the GitHub coauthor tag:

https://help.github.com/articles/creating-a-commit-with-multiple-authors/


Include any questions for reviewers, screenshots, sample outputs, etc.

@gregcaporaso gregcaporaso self-assigned this Sep 20, 2018
Copy link
Copy Markdown
Member

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @antgonza ! I have a bunch of minor changes requested.

Comment thread source/tutorials/fmt-cdiff.rst Outdated

.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

In this tutorial you'll use QIIME 2 to perform an analysis
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is redundant with the next line

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Sample metadata
---------------

Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a google spreadsheet could be useful for data exploration, but not necessary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense, too, please see my comment below (related).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, do we have an "official" account for this?

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, do we have an "official" account for this?

Yep --- I shared the file with you at your gmail account earlier this morning.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Obtaining and importing data
----------------------------

Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the 1% subsample data set so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files contain a subset of the reads

should specify what %

you can use the 1% subsample data

where can that be obtained?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, adding.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--output-path emp-paired-end-sequences.qza

.. tip::
Links are included to view and download precomputed QIIME 2 artifacts and visualizations created by commands in the documentation. For example, the command above created a pared ``emp-paired-end-sequences.qza`` file, and a corresponding precomputed file is linked above. You can view precomputed QIIME 2 artifacts and visualizations without needing to install additional software (e.g. QIIME 2).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pared --> paired

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Demultiplexing sequences
------------------------

To demultiplex sequences we need to know which barcode sequence is associated with each sample. This information is contained in the `sample metadata`_ file. You can run the following commands to demultiplex the sequences (the ``demux emp-paired`` command refers to the fact that these sequences are barcoded according to the `Earth Microbiome Project`_ protocol, and are pared-end reads). The ``demux.qza`` QIIME 2 artifact will contain the demultiplexed sequences. Additionally, we are passing the parameter ```--p-rev-comp-mapping-barcodes```, reverse complements the barcode sequences in the sample metadata prior to demultiplexing.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pared --> paired

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse complements the barcode --> which reverse complements the barcode


In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did Greg mention the part about not providing these files in the future? Unless if Greg wants that there, I would recommend removing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't aware, @gregcaporaso ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nm I see now that you grabbed this from the MP tutorial, @antgonza

still, @gregcaporaso maybe we should reassess whether we want to leave that statement in there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to list

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-visualization taxonomy.qzv

.. question::
Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should note the issues with relying on NCBI BLAST taxonomy... and explain why the incomplete taxonomic assignments provided by the classifier are more reliable for short sequence reads

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add? It sounds like you are more familiar with it, and more after your reply in github.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just remove this question. It has caused some confusion with folks on the forum who think that that visualization takes the place of taxonomy classification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-visualization taxa-bar-plots.qzv

.. question::
Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then sort the samples by animations_gradient, and then by animations_gradient

did you mean to use a different column for the second sort?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread source/tutorials/fmt-cdiff.rst Outdated
ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method.

.. note::
Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another tutorial which uses gneiss --> a separate gneiss tutorial

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

.. question::
Which genera differ in abundance across Subject? In which subject is each genus more abundant?


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a conclusion, maybe link to the last section in the overview tutorial, which gives an overview of other plugins that can be used downstream

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @antgonza, this is really exciting! I have a handful of questions, comments, and suggestions, inline. Please let me know if you have questions, need clarification, or need a hand with anything. Thanks!

Comment thread README.md Outdated
1. Install the dependencies necessary to build the docs:
1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install.

2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qiime2-docs is probably too specific - I know it isn't called that on my computers! How about we either stick with docs (the name of the repo), or we could reword to something like "you need to run this command within the QIIME 2 User Docs directory:"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.

.. download::
:url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This URL isn't wgetable, since it links to the Drive preview (this ultimately saves a bunch of HTML and JS to a tsv file, which obviously doesn't work).

Related to @nbokulich's comment above - if we host this as a Google Spreadsheet, we can get it both way. If you are fine with this @antgonza I can set this up, just let me know.

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Sample metadata
---------------

Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense, too, please see my comment below (related).

Comment thread source/tutorials/fmt-cdiff.rst Outdated
mkdir emp-paired-end-sequences

.. download::
:url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
:saveas: emp-paired-end-sequences/barcodes.fastq.gz

.. download::
:url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be great, I'll PM you ...

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--m-metadata-file sample-metadata.tsv \
--o-visualization core-metrics-results/faith-pd-group-significance.qzv

qiime diversity alpha-group-significance \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command fails for me:

qiime diversity alpha-group-significance \
  --i-alpha-diversity core-metrics-results/evenness_vector.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization core-metrics-results/evenness-group-significance.qzv \
  --verbose
Traceback (most recent call last):
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-379>", line 2, in alpha_group_significance
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 424, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2_diversity/_alpha/_visualizer.py", line 101, in alpha_group_significance
    groups[j])
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 1058, in kruskal
    T = 1. - np.sum(v*(k**3-k) for (k,v) in iteritems(ties))/float(ntot**3-ntot)
ZeroDivisionError: float division by zero

Plugin error from diversity:

  float division by zero

See above for debug info.

We might need to open up an issue on q2-diversity for this, too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, my theory is due to the singletons generated by dada2, we could try to remove ... suggestions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to list ...

Comment thread source/tutorials/fmt-cdiff.rst Outdated
qiime diversity alpha-rarefaction \
--i-table table.qza \
--i-phylogeny rooted-tree.qza \
--p-max-depth 4000 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only to 4000? Maybe it makes sense to go all the way for this tutorial, that way users can see everything.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K

Comment thread source/tutorials/fmt-cdiff.rst Outdated
.. question::
What differences do you observe between the unweighted UniFrac and Bray-Curtis PCoA plots?

Alpha rarefaction plotting
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be cool to move this section up sooner, since many folks will consider this plot when picking a rarefaction depth.


.. _`fmt cdiff taxonomy`:

Taxonomic analysis
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to taxonomic analysis, I wonder if this tutorial could include an example of filtering, as well as a section including q2-longitudinal and/or q2-sample classifier.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan ... perhaps a good example will be to filter out the healthy samples and just do ancom on pre/post fmt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I like that!


.. _`fmt cdiff ancom`:

Differential abundance testing with ANCOM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should skip ANCOM and use q2-gneiss, instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q2-gneiss may be too much to show in this tutorial! gneiss is complicated enough / has enough steps that it really needs its own tutorial.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbokulich, I think that is a fair point, but, is ANCOM any "easier" to explain? I think the beauty of docs like this is that you can cross-reference topics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nbokulich

Copy link
Copy Markdown
Contributor

@thermokarst thermokarst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just set up data.qiime2.org links for your data resources --- see new URLs inline. Thanks!

Comment thread source/tutorials/fmt-cdiff.rst Outdated
:saveas: emp-paired-end-sequences/forward.fastq.gz

.. download::
:url: https://drive.google.com/open?id=1DIW8kQmZN0TI3xVOqwbXZkmo_9eK21jW
Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
:saveas: emp-paired-end-sequences/barcodes.fastq.gz

.. download::
:url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_
Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
mkdir emp-paired-end-sequences

.. download::
:url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0
Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.

.. download::
:url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG
Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Sample metadata
---------------

Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
Copy link
Copy Markdown
Contributor

@thermokarst thermokarst Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing most of the comments ...

Comment thread README.md Outdated
1. Install the dependencies necessary to build the docs:
1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install.

2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...

FMT for recurrent Clostridium difficile infection Tutorial
==========================================================

.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, not sure how to do that or if you just reference it here? Either way, IMOO as in a class/workshop, duplication of concepts doesn't hurt ...

Comment thread source/tutorials/fmt-cdiff.rst Outdated

.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

In this tutorial you'll use QIIME 2 to perform an analysis
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Comment thread source/tutorials/fmt-cdiff.rst Outdated
.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

In this tutorial you'll use QIIME 2 to perform an analysis
In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiply recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point ... had to read about this ... I think in the medical field there is a difference between multiple recurrent and recurrent, for example: https://www.ncbi.nlm.nih.gov/pubmed/3951243. Rather leave it as it was defined in the paper: multiple recurrent

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Sample metadata
---------------

Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, do we have an "official" account for this?


In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't aware, @gregcaporaso ?

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-visualization taxonomy.qzv

.. question::
Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add? It sounds like you are more familiar with it, and more after your reply in github.

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-visualization taxa-bar-plots.qzv

.. question::
Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT?
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


.. _`fmt cdiff ancom`:

Differential abundance testing with ANCOM
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nbokulich

Comment thread source/tutorials/fmt-cdiff.rst Outdated
ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method.

.. note::
Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@antgonza antgonza mentioned this pull request Oct 5, 2018
8 tasks
Copy link
Copy Markdown
Member

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @antgonza ! This is looking fantastic!

I just have a few minor formatting changes. I also could not get tests to pass with make preview DEBUG=tutorials/fmt-cdiff, so we will need to make sure this works before we merge.

At the end of the tutorial, maybe you should conclude by linking to the last section of the "grand overview" (I love the adjective you've introduced to describe that doc), which just discusses some additional downstream analysis options. See here

Comment thread source/tutorials/fmt-cdiff.rst Outdated

.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiple recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are some links here that are in markdown. Pls convert to rst

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

Comment thread source/tutorials/fmt-cdiff.rst Outdated
Obtaining and importing data
----------------------------

Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads (10%) in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the (1% subsample data set)[add_link] so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1% subsample data set)[add_link]

markdown will not render. I would recommend just having downloads for 10% and 1% together below, see the current FMT tutorial for an example

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-table table-deblur.qza \
--o-stats deblur-stats.qza

.. note:: The deblur command used above generates QIIME 2 artifacts containing summary statistics. To view those summary statistics, you can visualize them using ``qiime metadata tabulate`` and ``qiime deblur visualize-stats``, respectively:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit weird that the command block below appears outside of this note. I'd recommend just making this main text (remove the .. note:: part) or put the commands inside the note (based on how you did it above, it looks like you just indent the command block to appear in the note)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

Comment thread source/tutorials/fmt-cdiff.rst Outdated
.. question::
Based on the plots you see in ``demux.qzv``, what values would you choose for ``--p-trunc-len`` and ``--p-trim-left`` in this case?

In the ``demux.qzv`` quality plots, we see that the quality of the initial bases seems to be high, so we won't trim any bases from the beginning of the sequences. The quality seems to drop off around position 150, so we'll truncate our sequences at 120 bases. This next command may take up to 10 minutes to run, and is the slowest step in this tutorial.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we'll truncate our sequences at 120

do you mean 150? That is what is used below and consistent with the previous clause

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching

Comment thread source/tutorials/fmt-cdiff.rst Outdated

.. command-block::

qiime fragment-insertion filter-features \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation is incorrect in this command block. It is causing these commands to display incorrectly, and is causing the next section of text to appear in the command block!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Comment thread source/tutorials/fmt-cdiff.rst Outdated
------------------

In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls point to the correct section in the grant overview (feel free to add a ref point if that's needed)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@antgonza antgonza changed the base branch from master to fmt_cdiff October 5, 2018 19:15
Copy link
Copy Markdown
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna check the issue with the make preview DEBUG=tutorials/fmt-cdiff and then I'll push the changes. Note that I did not introduce the term grand, borrowed for another tutorial.

--m-metadata-file sample-metadata.tsv \
--o-visualization core-metrics-results/faith-pd-group-significance.qzv

qiime diversity alpha-group-significance \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to list ...

qiime emperor plot \
--i-pcoa core-metrics-results/unweighted_unifrac_pcoa_results.qza \
--m-metadata-file sample-metadata.tsv \
--p-custom-axes animations_gradient \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to list

Comment thread source/tutorials/fmt-cdiff.rst Outdated
------------------

In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to list

Comment thread source/tutorials/fmt-cdiff.rst Outdated
--o-visualization taxonomy.qzv

.. question::
Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.. question::
Which genera differ in abundance across Subject? In which subject is each genus more abundant?


Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

@antgonza
Copy link
Copy Markdown
Member Author

antgonza commented Oct 7, 2018

make preview DEBUG=tutorials/fmt-cdiff should work fine, note that it takes over an hour ...

@thermokarst thermokarst merged commit 96f9f9b into qiime2:fmt_cdiff Oct 7, 2018
@antgonza antgonza mentioned this pull request Jun 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants