adding fmt tutorial#353
Conversation
|
|
||
| .. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. | ||
|
|
||
| In this tutorial you'll use QIIME 2 to perform an analysis |
There was a problem hiding this comment.
this sentence is redundant with the next line
| Sample metadata | ||
| --------------- | ||
|
|
||
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. |
There was a problem hiding this comment.
having a google spreadsheet could be useful for data exploration, but not necessary
There was a problem hiding this comment.
I think it would make sense, too, please see my comment below (related).
There was a problem hiding this comment.
Will add, do we have an "official" account for this?
There was a problem hiding this comment.
There was a problem hiding this comment.
Will add, do we have an "official" account for this?
Yep --- I shared the file with you at your gmail account earlier this morning.
| Obtaining and importing data | ||
| ---------------------------- | ||
|
|
||
| Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the 1% subsample data set so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data. |
There was a problem hiding this comment.
These files contain a subset of the reads
should specify what %
you can use the 1% subsample data
where can that be obtained?
| --output-path emp-paired-end-sequences.qza | ||
|
|
||
| .. tip:: | ||
| Links are included to view and download precomputed QIIME 2 artifacts and visualizations created by commands in the documentation. For example, the command above created a pared ``emp-paired-end-sequences.qza`` file, and a corresponding precomputed file is linked above. You can view precomputed QIIME 2 artifacts and visualizations without needing to install additional software (e.g. QIIME 2). |
| Demultiplexing sequences | ||
| ------------------------ | ||
|
|
||
| To demultiplex sequences we need to know which barcode sequence is associated with each sample. This information is contained in the `sample metadata`_ file. You can run the following commands to demultiplex the sequences (the ``demux emp-paired`` command refers to the fact that these sequences are barcoded according to the `Earth Microbiome Project`_ protocol, and are pared-end reads). The ``demux.qza`` QIIME 2 artifact will contain the demultiplexed sequences. Additionally, we are passing the parameter ```--p-rev-comp-mapping-barcodes```, reverse complements the barcode sequences in the sample metadata prior to demultiplexing. |
There was a problem hiding this comment.
reverse complements the barcode --> which reverse complements the barcode
|
|
||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. | ||
|
|
||
| .. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data. |
There was a problem hiding this comment.
did Greg mention the part about not providing these files in the future? Unless if Greg wants that there, I would recommend removing.
There was a problem hiding this comment.
oh nm I see now that you grabbed this from the MP tutorial, @antgonza
still, @gregcaporaso maybe we should reassess whether we want to leave that statement in there.
| --o-visualization taxonomy.qzv | ||
|
|
||
| .. question:: | ||
| Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)? |
There was a problem hiding this comment.
You should note the issues with relying on NCBI BLAST taxonomy... and explain why the incomplete taxonomic assignments provided by the classifier are more reliable for short sequence reads
There was a problem hiding this comment.
Could you add? It sounds like you are more familiar with it, and more after your reply in github.
There was a problem hiding this comment.
Maybe just remove this question. It has caused some confusion with folks on the forum who think that that visualization takes the place of taxonomy classification.
| --o-visualization taxa-bar-plots.qzv | ||
|
|
||
| .. question:: | ||
| Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT? |
There was a problem hiding this comment.
then sort the samples by animations_gradient, and then by animations_gradient
did you mean to use a different column for the second sort?
| ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method. | ||
|
|
||
| .. note:: | ||
| Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more. |
There was a problem hiding this comment.
another tutorial which uses gneiss --> a separate gneiss tutorial
| .. question:: | ||
| Which genera differ in abundance across Subject? In which subject is each genus more abundant? | ||
|
|
||
|
|
There was a problem hiding this comment.
as a conclusion, maybe link to the last section in the overview tutorial, which gives an overview of other plugins that can be used downstream
thermokarst
left a comment
There was a problem hiding this comment.
Hey @antgonza, this is really exciting! I have a handful of questions, comments, and suggestions, inline. Please let me know if you have questions, need clarification, or need a hand with anything. Thanks!
| 1. Install the dependencies necessary to build the docs: | ||
| 1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install. | ||
|
|
||
| 2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder: |
There was a problem hiding this comment.
qiime2-docs is probably too specific - I know it isn't called that on my computers! How about we either stick with docs (the name of the repo), or we could reword to something like "you need to run this command within the QIIME 2 User Docs directory:"
There was a problem hiding this comment.
Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG |
There was a problem hiding this comment.
This URL isn't wgetable, since it links to the Drive preview (this ultimately saves a bunch of HTML and JS to a tsv file, which obviously doesn't work).
Related to @nbokulich's comment above - if we host this as a Google Spreadsheet, we can get it both way. If you are fine with this @antgonza I can set this up, just let me know.
There was a problem hiding this comment.
| Sample metadata | ||
| --------------- | ||
|
|
||
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. |
There was a problem hiding this comment.
I think it would make sense, too, please see my comment below (related).
| mkdir emp-paired-end-sequences | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0 |
There was a problem hiding this comment.
Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.
There was a problem hiding this comment.
| :saveas: emp-paired-end-sequences/barcodes.fastq.gz | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_ |
There was a problem hiding this comment.
Same issue as above - I can rehost these data on data.qiime2.org, just let me know if you are okay with that.
There was a problem hiding this comment.
That will be great, I'll PM you ...
There was a problem hiding this comment.
| --m-metadata-file sample-metadata.tsv \ | ||
| --o-visualization core-metrics-results/faith-pd-group-significance.qzv | ||
|
|
||
| qiime diversity alpha-group-significance \ |
There was a problem hiding this comment.
This command fails for me:
qiime diversity alpha-group-significance \
--i-alpha-diversity core-metrics-results/evenness_vector.qza \
--m-metadata-file sample-metadata.tsv \
--o-visualization core-metrics-results/evenness-group-significance.qzv \
--verbose
Traceback (most recent call last):
File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
results = action(**arguments)
File "<decorator-gen-379>", line 2, in alpha_group_significance
File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 424, in _callable_executor_
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2_diversity/_alpha/_visualizer.py", line 101, in alpha_group_significance
groups[j])
File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/scipy/stats/mstats_basic.py", line 1058, in kruskal
T = 1. - np.sum(v*(k**3-k) for (k,v) in iteritems(ties))/float(ntot**3-ntot)
ZeroDivisionError: float division by zero
Plugin error from diversity:
float division by zero
See above for debug info.We might need to open up an issue on q2-diversity for this, too.
There was a problem hiding this comment.
Yup, my theory is due to the singletons generated by dada2, we could try to remove ... suggestions?
| qiime diversity alpha-rarefaction \ | ||
| --i-table table.qza \ | ||
| --i-phylogeny rooted-tree.qza \ | ||
| --p-max-depth 4000 \ |
There was a problem hiding this comment.
Why only to 4000? Maybe it makes sense to go all the way for this tutorial, that way users can see everything.
| .. question:: | ||
| What differences do you observe between the unweighted UniFrac and Bray-Curtis PCoA plots? | ||
|
|
||
| Alpha rarefaction plotting |
There was a problem hiding this comment.
It might be cool to move this section up sooner, since many folks will consider this plot when picking a rarefaction depth.
|
|
||
| .. _`fmt cdiff taxonomy`: | ||
|
|
||
| Taxonomic analysis |
There was a problem hiding this comment.
Prior to taxonomic analysis, I wonder if this tutorial could include an example of filtering, as well as a section including q2-longitudinal and/or q2-sample classifier.
There was a problem hiding this comment.
Sounds like a good plan ... perhaps a good example will be to filter out the healthy samples and just do ancom on pre/post fmt
|
|
||
| .. _`fmt cdiff ancom`: | ||
|
|
||
| Differential abundance testing with ANCOM |
There was a problem hiding this comment.
Maybe we should skip ANCOM and use q2-gneiss, instead?
There was a problem hiding this comment.
q2-gneiss may be too much to show in this tutorial! gneiss is complicated enough / has enough steps that it really needs its own tutorial.
There was a problem hiding this comment.
@nbokulich, I think that is a fair point, but, is ANCOM any "easier" to explain? I think the beauty of docs like this is that you can cross-reference topics.
thermokarst
left a comment
There was a problem hiding this comment.
Just set up data.qiime2.org links for your data resources --- see new URLs inline. Thanks!
| :saveas: emp-paired-end-sequences/forward.fastq.gz | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1DIW8kQmZN0TI3xVOqwbXZkmo_9eK21jW |
There was a problem hiding this comment.
| :saveas: emp-paired-end-sequences/barcodes.fastq.gz | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1gwVANNI0qh4sT1a7Z4G7wr3dXTTA1SR_ |
There was a problem hiding this comment.
| mkdir emp-paired-end-sequences | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1OE9KIrSqi4lFI5lFURdvKbJxqaKTUnZ0 |
There was a problem hiding this comment.
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. | ||
|
|
||
| .. download:: | ||
| :url: https://drive.google.com/open?id=1X58tm2G1FcUazL5jnAEE08pIfJP182yG |
There was a problem hiding this comment.
| Sample metadata | ||
| --------------- | ||
|
|
||
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. |
There was a problem hiding this comment.
antgonza
left a comment
There was a problem hiding this comment.
Addressing most of the comments ...
| 1. Install the dependencies necessary to build the docs: | ||
| 1. Install the QIIME 2 framework. Typically you'll want the latest development version (i.e. master branch). You can find the [latest builds for your operating system here](https://github.com/qiime2/environment-files/tree/master/latest/staging). Make sure you get the raw link for your OS (e.g. `linux-64` or `osx-64`), download that file and then you can use `conda env create -n qiime2-docs --file path-to-yml` to install. | ||
|
|
||
| 2. Install the dependencies necessary to build the docs, you need to run this command within the `qiime2-docs` folder: |
There was a problem hiding this comment.
Changing, note that the repo being named just docs is kind of broad so I called in my local version qiime2-docs and forgot that I did ...
| FMT for recurrent Clostridium difficile infection Tutorial | ||
| ========================================================== | ||
|
|
||
| .. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. |
There was a problem hiding this comment.
Interesting, not sure how to do that or if you just reference it here? Either way, IMOO as in a class/workshop, duplication of concepts doesn't hurt ...
|
|
||
| .. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. | ||
|
|
||
| In this tutorial you'll use QIIME 2 to perform an analysis |
| .. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. | ||
|
|
||
| In this tutorial you'll use QIIME 2 to perform an analysis | ||
| In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiply recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol. |
There was a problem hiding this comment.
good point ... had to read about this ... I think in the medical field there is a difference between multiple recurrent and recurrent, for example: https://www.ncbi.nlm.nih.gov/pubmed/3951243. Rather leave it as it was defined in the paper: multiple recurrent
| Sample metadata | ||
| --------------- | ||
|
|
||
| Before starting the analysis, explore the sample metadata to familiarize yourself with the samples used in this study. [ToDo: add metadata to Google spreadsheets?]. The following command will download the sample metadata as tab-separated text and save it in the file `sample-metadata.tsv`. This `sample-metadata.tsv` file is used throughout the rest of the tutorial. |
There was a problem hiding this comment.
Will add, do we have an "official" account for this?
|
|
||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. | ||
|
|
||
| .. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data. |
| --o-visualization taxonomy.qzv | ||
|
|
||
| .. question:: | ||
| Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)? |
There was a problem hiding this comment.
Could you add? It sounds like you are more familiar with it, and more after your reply in github.
| --o-visualization taxa-bar-plots.qzv | ||
|
|
||
| .. question:: | ||
| Visualize the samples at *Level 2* (which corresponds to the phylum level in this analysis), and then sort the samples by animations_gradient, and then by animations_gradient. What are the dominant phyla in before and after the FMT? |
|
|
||
| .. _`fmt cdiff ancom`: | ||
|
|
||
| Differential abundance testing with ANCOM |
| ANCOM can be applied to identify features that are differentially abundant (i.e. present in different abundances) across sample groups. As with any bioinformatics method, you should be aware of the assumptions and limitations of ANCOM before using it. We recommend reviewing the `ANCOM paper`_ before using this method. | ||
|
|
||
| .. note:: | ||
| Differential abundance testing in microbiome analysis is an active area of research. There are two QIIME 2 plugins that can be used for this: ``q2-gneiss`` and ``q2-composition``. This section uses ``q2-composition``, but there is :doc:`another tutorial which uses gneiss <gneiss>` on a different dataset if you are interested in learning more. |
nbokulich
left a comment
There was a problem hiding this comment.
thanks @antgonza ! This is looking fantastic!
I just have a few minor formatting changes. I also could not get tests to pass with make preview DEBUG=tutorials/fmt-cdiff, so we will need to make sure this works before we merge.
At the end of the tutorial, maybe you should conclude by linking to the last section of the "grand overview" (I love the adjective you've introduced to describe that doc), which just discusses some additional downstream analysis options. See here
|
|
||
| .. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`. | ||
|
|
||
| In this tutorial you’ll use QIIME 2 to perform an analysis of fecal human microbiome samples looking at short- and long-term changes in patients with multiple recurrent Clostridium difficile infection that were refractory to antibiotic therapy and treated using fecal microbiota transplantation. A study based on these samples was originally published in [Weingarden et al. (2015)](https://www.ncbi.nlm.nih.gov/pubmed/25825673) and it has been used in [animations](https://www.youtube.com/watch?v=-FFDqhM4pks) and [meta-analyses](https://github.com/knightlab-analyses/qiita-paper). The data used in this tutorial were sequenced on an Illumina MiSeq using the [Earth Microbiome Project](http://earthmicrobiome.org/) hypervariable region 4 (V4) 16S rRNA sequencing protocol. |
There was a problem hiding this comment.
looks like there are some links here that are in markdown. Pls convert to rst
| Obtaining and importing data | ||
| ---------------------------- | ||
|
|
||
| Next, you’ll download the multiplexed reads. You will download three `fastq.gz` files, corresponding to the forward, reverse, and barcode (i.e., index) reads. These files contain a subset of the reads (10%) in the full data set generated for this study, which allows for the following commands to be run relatively quickly. If you are only planning to run through the commands presented here to get experience, you can use the (1% subsample data set)[add_link] so that the commands will run quickly. If you’re planning to work through the questions presented at the end of this document to gain more experience with QIIME analysis and data interpretation, you should use the 10% subsample data set so that the analysis results will be supported by more sequence data. |
There was a problem hiding this comment.
(1% subsample data set)[add_link]
markdown will not render. I would recommend just having downloads for 10% and 1% together below, see the current FMT tutorial for an example
| --o-table table-deblur.qza \ | ||
| --o-stats deblur-stats.qza | ||
|
|
||
| .. note:: The deblur command used above generates QIIME 2 artifacts containing summary statistics. To view those summary statistics, you can visualize them using ``qiime metadata tabulate`` and ``qiime deblur visualize-stats``, respectively: |
There was a problem hiding this comment.
It looks a bit weird that the command block below appears outside of this note. I'd recommend just making this main text (remove the .. note:: part) or put the commands inside the note (based on how you did it above, it looks like you just indent the command block to appear in the note)
| .. question:: | ||
| Based on the plots you see in ``demux.qzv``, what values would you choose for ``--p-trunc-len`` and ``--p-trim-left`` in this case? | ||
|
|
||
| In the ``demux.qzv`` quality plots, we see that the quality of the initial bases seems to be high, so we won't trim any bases from the beginning of the sequences. The quality seems to drop off around position 150, so we'll truncate our sequences at 120 bases. This next command may take up to 10 minutes to run, and is the slowest step in this tutorial. |
There was a problem hiding this comment.
so we'll truncate our sequences at 120
do you mean 150? That is what is used below and consistent with the previous clause
|
|
||
| .. command-block:: | ||
|
|
||
| qiime fragment-insertion filter-features \ |
There was a problem hiding this comment.
the indentation is incorrect in this command block. It is causing these commands to display incorrectly, and is causing the next section of text to appear in the command block!
| ------------------ | ||
|
|
||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. | ||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`. |
There was a problem hiding this comment.
pls point to the correct section in the grant overview (feel free to add a ref point if that's needed)
antgonza
left a comment
There was a problem hiding this comment.
I'm gonna check the issue with the make preview DEBUG=tutorials/fmt-cdiff and then I'll push the changes. Note that I did not introduce the term grand, borrowed for another tutorial.
| --m-metadata-file sample-metadata.tsv \ | ||
| --o-visualization core-metrics-results/faith-pd-group-significance.qzv | ||
|
|
||
| qiime diversity alpha-group-significance \ |
| qiime emperor plot \ | ||
| --i-pcoa core-metrics-results/unweighted_unifrac_pcoa_results.qza \ | ||
| --m-metadata-file sample-metadata.tsv \ | ||
| --p-custom-axes animations_gradient \ |
| ------------------ | ||
|
|
||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. | ||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. You can read more about this in the :doc:`grand overview <overview>`. |
|
|
||
| In the next sections we'll begin to explore the taxonomic composition of the samples, and again relate that to sample metadata. The first step in this process is to assign taxonomy to the sequences in our ``FeatureData[Sequence]`` QIIME 2 artifact. We'll do that using a pre-trained Naive Bayes classifier and the ``q2-feature-classifier`` plugin. This classifier was trained on the Greengenes 13_8 99% OTUs, where the sequences have been trimmed to only include 250 bases from the region of the 16S that was sequenced in this analysis (the V4 region, bound by the 515F/806R primer pair). We'll apply this classifier to our sequences, and we can generate a visualization of the resulting mapping from sequence to taxonomy. | ||
|
|
||
| .. note:: Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in :doc:`Training feature classifiers with q2-feature-classifier <../tutorials/feature-classifier>` to train your own taxonomic classifiers. We provide some common classifiers on our :doc:`data resources page <../data-resources>`, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data. |
| --o-visualization taxonomy.qzv | ||
|
|
||
| .. question:: | ||
| Recall that our ``rep-seqs.qzv`` visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the ``taxonomy.qzv`` visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they're dissimilar, at what *taxonomic level* do they begin to differ (e.g., species, genus, family, ...)? |
| .. question:: | ||
| Which genera differ in abundance across Subject? In which subject is each genus more abundant? | ||
|
|
||
|
|
|
|
Brief summary of the Pull Request, including any issues it may fix using the GitHub closing syntax:
https://help.github.com/articles/closing-issues-using-keywords/
Also, include any co-authors or contributors using the GitHub coauthor tag:
https://help.github.com/articles/creating-a-commit-with-multiple-authors/
Include any questions for reviewers, screenshots, sample outputs, etc.