adding fmt tutorial by antgonza · Pull Request #353 · qiime2/docs

antgonza · 2018-09-20T22:46:01Z

Brief summary of the Pull Request, including any issues it may fix using the GitHub closing syntax:

https://help.github.com/articles/closing-issues-using-keywords/

Also, include any co-authors or contributors using the GitHub coauthor tag:

https://help.github.com/articles/creating-a-commit-with-multiple-authors/

Include any questions for reviewers, screenshots, sample outputs, etc.

nbokulich

Looks great, thanks @antgonza ! I have a bunch of minor changes requested.

nbokulich · 2018-09-21T14:58:31Z

+
+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+
+In this tutorial you'll use QIIME 2 to perform an analysis


this sentence is redundant with the next line

nbokulich · 2018-09-21T15:00:03Z

+Sample metadata
+---------------
+
+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.


having a google spreadsheet could be useful for data exploration, but not necessary

I think it would make sense, too, please see my comment below (related).

Will add, do we have an "official" account for this?

New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/sample_metadata~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/sample_metadata

Will add, do we have an "official" account for this?

Yep --- I shared the file with you at your gmail account earlier this morning.

nbokulich · 2018-09-21T15:02:00Z

+Obtaining and importing data
+----------------------------
+
+Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the 1% subsample data set so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data.


These files contain a subset of the reads

should specify what %

you can use the 1% subsample data

where can that be obtained?

good point, adding.

nbokulich · 2018-09-21T15:03:19Z

+     --output-path emp-paired-end-sequences.qza
+
+.. tip::
+   Links are included to view and download precomputed QIIME 2 artifacts and visualizations created by commands in the documentation. For example, the command above created a pared ``emp-paired-end-sequences.qza`` file, and a corresponding precomputed file is linked above. You can view precomputed QIIME 2 artifacts and visualizations without needing to install additional software (e.g. QIIME 2).


pared --> paired

nbokulich · 2018-09-21T15:04:23Z

+Demultiplexing sequences
+------------------------
+
+To demultiplex sequences we need to know which barcode sequence is associated with each sample. This information is contained in the `sample metadata`_ file. You can run the following commands to demultiplex the sequences (the ``demux emp-paired`` command refers to the fact that these sequences are barcoded according to the `Earth Microbiome Project`_ protocol, and are pared-end reads). The ``demux.qza`` QIIME 2 artifact will contain the demultiplexed sequences. Additionally, we are passing the parameter ```--p-rev-comp-mapping-barcodes```, reverse complements the barcode sequences in the sample metadata prior to demultiplexing.


pared --> paired

reverse complements the barcode --> which reverse complements the barcode

nbokulich · 2018-09-21T15:49:10Z

+
+In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
+
+.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.


did Greg mention the part about not providing these files in the future? Unless if Greg wants that there, I would recommend removing.

wasn't aware, @gregcaporaso ?

oh nm I see now that you grabbed this from the MP tutorial, @antgonza

still, @gregcaporaso maybe we should reassess whether we want to leave that statement in there.

adding to list

nbokulich · 2018-09-21T15:50:44Z

+     --o-visualization taxonomy.qzv
+
+.. question::
+    Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?


You should note the issues with relying on NCBI BLAST taxonomy... and explain why the incomplete taxonomic assignments provided by the classifier are more reliable for short sequence reads

Could you add? It sounds like you are more familiar with it, and more after your reply in github.

Maybe just remove this question. It has caused some confusion with folks on the forum who think that that visualization takes the place of taxonomy classification.

nbokulich · 2018-09-21T15:51:22Z

+     --o-visualization taxa-bar-plots.qzv
+
+.. question::
+    Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT?


then sort the samples by animations_gradient, and then by animations_gradient

did you mean to use a different column for the second sort?

nbokulich · 2018-09-21T15:52:32Z

+ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method.
+
+.. note::
+   Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more.


another tutorial which uses gneiss --> a separate gneiss tutorial

nbokulich · 2018-09-21T15:54:28Z

+.. question::
+   Which genera differ in abundance across Subject? In which subject is each genus more abundant?
+
+


as a conclusion, maybe link to the last section in the overview tutorial, which gives an overview of other plugins that can be used downstream

thermokarst

Hey @antgonza, this is really exciting! I have a handful of questions, comments, and suggestions, inline. Please let me know if you have questions, need clarification, or need a hand with anything. Thanks!

thermokarst · 2018-09-28T15:04:33Z

-1. Install the dependencies necessary to build the docs:
+1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install.
+
+2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder:


qiime2-docs is probably too specific - I know it isn't called that on my computers! How about we either stick with docs (the name of the repo), or we could reword to something like "you need to run this command within the QIIME 2 User Docs directory:"

Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...

thermokarst · 2018-09-28T15:11:21Z

+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
+
+.. download::
+   :url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG


This URL isn't wgetable, since it links to the Drive preview (this ultimately saves a bunch of HTML and JS to a tsv file, which obviously doesn't work).

Related to @nbokulich's comment above - if we host this as a Google Spreadsheet, we can get it both way. If you are fine with this @antgonza I can set this up, just let me know.

New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/sample_metadata.tsv~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/sample_metadata.tsv

thermokarst · 2018-09-28T15:11:41Z

+Sample metadata
+---------------
+
+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.


I think it would make sense, too, please see my comment below (related).

thermokarst · 2018-09-28T15:13:11Z

+   mkdir emp-paired-end-sequences
+
+.. download::
+   :url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0


Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.

New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/barcodes.fastq.gz~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/barcodes.fastq.gz

thermokarst · 2018-09-28T15:13:15Z

+   :saveas: emp-paired-end-sequences/barcodes.fastq.gz
+
+.. download::
+  :url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_


Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.

That will be great, I'll PM you ...

New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/forward.fastq.gz~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/forward.fastq.gz

thermokarst · 2018-09-28T17:02:28Z

+     --m-metadata-file sample-metadata.tsv \
+     --o-visualization core-metrics-results/faith-pd-group-significance.qzv
+
+   qiime diversity alpha-group-significance \


This command fails for me:

qiime diversity alpha-group-significance \ --i-alpha-diversity core-metrics-results/evenness_vector.qza \ --m-metadata-file sample-metadata.tsv \ --o-visualization core-metrics-results/evenness-group-significance.qzv \ --verbose Traceback (most recent call last): File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__ results = action(**arguments) File "<decorator-gen-379>", line 2, in alpha_group_significance File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable output_types, provenance) File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 424, in _callable_executor_ ret_val = self._callable(output_dir=temp_dir, **view_args) File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2_diversity/_alpha/_visualizer.py", line 101, in alpha_group_significance groups[j]) File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 1058, in kruskal T = 1. - np.sum(v*(k**3-k) for (k,v) in iteritems(ties))/float(ntot**3-ntot) ZeroDivisionError: float division by zero Plugin error from diversity: float division by zero See above for debug info.

We might need to open up an issue on q2-diversity for this, too.

Yup, my theory is due to the singletons generated by dada2, we could try to remove ... suggestions?

Adding to list ...

thermokarst · 2018-09-28T17:50:09Z

+   qiime diversity alpha-rarefaction \
+     --i-table table.qza \
+     --i-phylogeny rooted-tree.qza \
+     --p-max-depth 4000 \


Why only to 4000? Maybe it makes sense to go all the way for this tutorial, that way users can see everything.

thermokarst · 2018-09-28T17:55:10Z

+.. question::
+    What differences do you observe between the unweighted UniFrac and Bray-Curtis PCoA plots?
+
+Alpha rarefaction plotting


It might be cool to move this section up sooner, since many folks will consider this plot when picking a rarefaction depth.

thermokarst · 2018-09-28T17:56:28Z

+
+.. _`fmt cdiff taxonomy`:
+
+Taxonomic analysis


Prior to taxonomic analysis, I wonder if this tutorial could include an example of filtering, as well as a section including q2-longitudinal and/or q2-sample classifier.

Sounds like a good plan ... perhaps a good example will be to filter out the healthy samples and just do ancom on pre/post fmt

Oh nice, I like that!

thermokarst · 2018-09-28T17:56:47Z

+
+.. _`fmt cdiff ancom`:
+
+Differential abundance testing with ANCOM


Maybe we should skip ANCOM and use q2-gneiss, instead?

q2-gneiss may be too much to show in this tutorial! gneiss is complicated enough / has enough steps that it really needs its own tutorial.

@nbokulich, I think that is a fair point, but, is ANCOM any "easier" to explain? I think the beauty of docs like this is that you can cross-reference topics.

I agree with @nbokulich

thermokarst

Just set up data.qiime2.org links for your data resources --- see new URLs inline. Thanks!

thermokarst · 2018-10-04T13:44:19Z

+  :saveas: emp-paired-end-sequences/forward.fastq.gz
+
+.. download::
+   :url: https://drive.google.com/open?id=1DIW8kQmZN0TI3xVOqwbXZkmo_9eK21jW


New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/reverse.fastq.gz~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/reverse.fastq.gz

thermokarst · 2018-10-04T13:44:32Z

+   :saveas: emp-paired-end-sequences/barcodes.fastq.gz
+
+.. download::
+  :url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_


New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/forward.fastq.gz~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/forward.fastq.gz

thermokarst · 2018-10-04T13:44:42Z

+   mkdir emp-paired-end-sequences
+
+.. download::
+   :url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0


New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/barcodes.fastq.gz~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/barcodes.fastq.gz

thermokarst · 2018-10-04T13:45:18Z

+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.
+
+.. download::
+   :url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG


New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/sample_metadata.tsv~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/sample_metadata.tsv

thermokarst · 2018-10-04T13:45:45Z

+Sample metadata
+---------------
+
+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.


New URL:

~~https://data.qiime2.org/2018.11/tutorials/cdiff-fmt/sample_metadata~~
https://data.qiime2.org/2018.11/tutorials/fmt-cdiff/sample_metadata

antgonza

Addressing most of the comments ...

antgonza · 2018-10-02T11:14:28Z

-1. Install the dependencies necessary to build the docs:
+1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install.
+
+2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder:


Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...

antgonza · 2018-10-02T11:16:10Z

+FMT for recurrent Clostridium difficile infection Tutorial
+==========================================================
+
+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.


Interesting, not sure how to do that or if you just reference it here? Either way, IMOO as in a class/workshop, duplication of concepts doesn't hurt ...

antgonza · 2018-10-02T11:16:31Z

+
+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+
+In this tutorial you'll use QIIME 2 to perform an analysis


antgonza · 2018-10-02T11:20:14Z

+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+
+In this tutorial you'll use QIIME 2 to perform an analysis
+In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiply recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol.


good point ... had to read about this ... I think in the medical field there is a difference between multiple recurrent and recurrent, for example: https://www.ncbi.nlm.nih.gov/pubmed/3951243. Rather leave it as it was defined in the paper: multiple recurrent

antgonza · 2018-10-02T11:21:39Z

+Sample metadata
+---------------
+
+Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file  `sample-metadata.tsv`. This  `sample-metadata.tsv` file is used throughout the rest of the tutorial.


Will add, do we have an "official" account for this?

antgonza · 2018-10-04T23:17:31Z

+
+In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
+
+.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.


wasn't aware, @gregcaporaso ?

antgonza · 2018-10-04T23:18:00Z

+     --o-visualization taxonomy.qzv
+
+.. question::
+    Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?


Could you add? It sounds like you are more familiar with it, and more after your reply in github.

antgonza · 2018-10-04T23:39:20Z

+     --o-visualization taxa-bar-plots.qzv
+
+.. question::
+    Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT?


antgonza · 2018-10-04T23:40:11Z

+
+.. _`fmt cdiff ancom`:
+
+Differential abundance testing with ANCOM


I agree with @nbokulich

antgonza · 2018-10-04T23:40:52Z

+ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method.
+
+.. note::
+   Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more.


nbokulich

thanks @antgonza ! This is looking fantastic!

I just have a few minor formatting changes. I also could not get tests to pass with make preview DEBUG=tutorials/fmt-cdiff, so we will need to make sure this works before we merge.

At the end of the tutorial, maybe you should conclude by linking to the last section of the "grand overview" (I love the adjective you've introduced to describe that doc), which just discusses some additional downstream analysis options. See here

nbokulich · 2018-10-05T15:32:21Z

+
+.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.
+
+In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiple recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol.


looks like there are some links here that are in markdown. Pls convert to rst

nbokulich · 2018-10-05T15:37:32Z

+Obtaining and importing data
+----------------------------
+
+Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads (10%) in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the (1% subsample data set)[add_link] so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data.


(1% subsample data set)[add_link]

markdown will not render. I would recommend just having downloads for 10% and 1% together below, see the current FMT tutorial for an example

nbokulich · 2018-10-05T15:53:55Z

+     --o-table table-deblur.qza \
+     --o-stats deblur-stats.qza
+
+.. note:: The deblur command used above generates QIIME 2 artifacts containing summary statistics. To view those summary statistics, you can visualize them using ``qiime metadata tabulate`` and ``qiime deblur visualize-stats``, respectively:


It looks a bit weird that the command block below appears outside of this note. I'd recommend just making this main text (remove the .. note:: part) or put the commands inside the note (based on how you did it above, it looks like you just indent the command block to appear in the note)

nbokulich · 2018-10-05T15:56:02Z

+.. question::
+  Based on the plots you see in ``demux.qzv``, what values would you choose for ``--p-trunc-len`` and ``--p-trim-left`` in this case?
+
+In the ``demux.qzv`` quality plots, we see that the quality of the initial bases seems to be high, so we won't trim any bases from the beginning of the sequences. The quality seems to drop off around position 150, so we'll truncate our sequences at 120 bases. This next command may take up to 10 minutes to run, and is the slowest step in this tutorial.


so we'll truncate our sequences at 120

do you mean 150? That is what is used below and consistent with the previous clause

thanks for catching

nbokulich · 2018-10-05T16:00:20Z

+
+.. command-block::
+
+     qiime fragment-insertion filter-features \


the indentation is incorrect in this command block. It is causing these commands to display incorrectly, and is causing the next section of text to appear in the command block!

good catch!

nbokulich · 2018-10-05T16:07:43Z

 ------------------

-In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
+In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`.


pls point to the correct section in the grant overview (feel free to add a ref point if that's needed)

antgonza

I'm gonna check the issue with the make preview DEBUG=tutorials/fmt-cdiff and then I'll push the changes. Note that I did not introduce the term grand, borrowed for another tutorial.

antgonza · 2018-10-06T11:10:56Z

+     --m-metadata-file sample-metadata.tsv \
+     --o-visualization core-metrics-results/faith-pd-group-significance.qzv
+
+   qiime diversity alpha-group-significance \


Adding to list ...

antgonza · 2018-10-06T11:11:49Z

+   qiime emperor plot \
+     --i-pcoa core-metrics-results/unweighted_unifrac_pcoa_results.qza \
+     --m-metadata-file sample-metadata.tsv \
+     --p-custom-axes animations_gradient \


added to list

antgonza · 2018-10-06T11:13:48Z

 ------------------

-In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
+In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`.


antgonza · 2018-10-06T11:13:57Z

+
+In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.
+
+.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.


adding to list

antgonza · 2018-10-06T11:14:37Z

+     --o-visualization taxonomy.qzv
+
+.. question::
+    Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)?


antgonza · 2018-10-06T11:16:19Z

+.. question::
+   Which genera differ in abundance across Subject? In which subject is each genus more abundant?
+
+


antgonza · 2018-10-07T01:13:41Z

make preview DEBUG=tutorials/fmt-cdiff should work fine, note that it takes over an hour ...

adding fmt tutorial

67b26e8

gregcaporaso self-assigned this Sep 20, 2018

nbokulich suggested changes Sep 21, 2018

View reviewed changes

thermokarst mentioned this pull request Sep 28, 2018

Plot zoomed in and panned over when initially loaded biocore/emperor#696

Closed

thermokarst suggested changes Sep 28, 2018

View reviewed changes

thermokarst reviewed Oct 4, 2018

View reviewed changes

addressing @nbokulich and @thermokarst comments

246be1c

antgonza commented Oct 5, 2018

View reviewed changes

antgonza mentioned this pull request Oct 5, 2018

FMT tutorial improvements. #357

Open

8 tasks

nbokulich suggested changes Oct 5, 2018

View reviewed changes

antgonza changed the base branch from master to fmt_cdiff October 5, 2018 19:15

antgonza commented Oct 6, 2018

View reviewed changes

addressing @nbokulich comments

780dabd

thermokarst approved these changes Oct 7, 2018

View reviewed changes

thermokarst merged commit 96f9f9b into qiime2:fmt_cdiff Oct 7, 2018

antgonza mentioned this pull request Jun 21, 2019

Fmt cdiff khanna #424

Closed


		.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

		In this tutorial you'll use QIIME 2 to perform an analysis


		In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy.

		.. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.

		.. question::
		Which genera differ in abundance across Subject? In which subject is each genus more abundant?


		.. _`fmt cdiff ancom`:

		Differential abundance testing with ANCOM


		.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

		In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiple recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol.


		.. command-block::

		qiime fragment-insertion filter-features \

Conversation

antgonza commented Sep 20, 2018

Uh oh!

nbokulich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thermokarst Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thermokarst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thermokarst Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thermokarst Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

thermokarst Oct 4, 2018 •

edited

Loading

thermokarst Oct 4, 2018 •

edited

Loading

thermokarst Oct 4, 2018 •

edited

Loading

thermokarst Oct 4, 2018 •

edited

Loading