3 Formatting all inputs
Adriano Rutz
2023-08-29
Source:vignettes/III-formatting.Rmd
III-formatting.Rmd
This vignette describes how to format all your files.
Structural annotations of your features
For the moment, we support annotations coming from 4 different annotation tools:
MS1-based
These annotations are of the lowest possible quality. However, they allow to annotate unusual adducts, in-source fragments thanks to different small tricks implemented. Try to really restrict the adduct list and structure-organism pairs you want to consider as possibilities explode rapidly.
source(file = "inst/scripts/annotate_masses.R")
Spectral
In order to perform MS2 annotation based on an In Silico DataBase, please follow the following steps.
ISDB
source(file = "inst/scripts/annotate_spectra.R")
We use the spectral entropy
from https://doi.org/10.1038/s41592-021-01331-z for
matching.
In case, a python implementation of the spectral matching steps is also available at: https://github.com/mandelbrot-project/spectral_lib_matcher. The python version also includes other similarity measures.
And as before:
source(file = "inst/scripts/prepare_annotations_spectra.R")
GNPS
We also provide an example GNPS job id, which is:
41c6068e905d482db5698ad81d145d7c
Before running the corresponding code, do not forget to modify
inst/params/user/prepare_*yourAnnotationTool*.yaml
.
Depending on the annotation tool you used, you can format its results using:
source(file = "inst/scripts/prepare_annotations_gnps.R")
You now have all your spectral annotations:
Fingerprint-based
Sirius
As SIRIUS jobs are long to perform, we provide an already computed SIRIUS Workspace. It has been generated on the same MGF as the GNPS and ISDB jobs with the following command:
config --IsotopeSettings.filter true --FormulaSearchDB BIO,COCONUT,GNPS,KNAPSACK,UNDP,PLANTCYC --Timeout.secondsPerTree 0 --FormulaSettings.enforced HCNOP --Timeout.secondsPerInstance 0 --AdductSettings.detectable [[M + H3N + H]+, [M - H2O + H]+, [M + K]+, [M - H4O2 + H]+, [M + H]+, [M + Na]+] --UseHeuristic.mzToUseHeuristicOnly 650 --AlgorithmProfile orbitrap --IsotopeMs2Settings IGNORE --MS2MassDeviation.allowedMassDeviation 5.0ppm --NumberOfCandidatesPerIon 1 --UseHeuristic.mzToUseHeuristic 300 --FormulaSettings.detectable B,Cl,Br,Se,S --NumberOfCandidates 10 --ZodiacNumberOfConsideredCandidatesAt300Mz 10 --ZodiacRunInTwoSteps true --ZodiacEdgeFilterThresholds.minLocalConnections 10 --ZodiacEdgeFilterThresholds.thresholdFilter 0.95 --ZodiacEpochs.burnInPeriod 2000 --ZodiacEpochs.numberOfMarkovChains 10 --ZodiacNumberOfConsideredCandidatesAt800Mz 50 --ZodiacEpochs.iterations 20000 --AdductSettings.enforced , --AdductSettings.fallback [[M + K]+, [M + H]+, [M + Na]+] --FormulaResultThreshold true --InjectElGordoCompounds true --StructureSearchDB BIO,COCONUT,GNPS,KNAPSACK,UNDP,PLANTCYC --RecomputeResults false formula zodiac fingerprint structure canopus
These parameters were not optimized and were only used to give an example output.
Then, the summaries have been generated using:
sirius -i inst/extdata/interim/annotations/sirius_example/ write-summaries -c --digits 3
You can get the example running:
source(file = "inst/scripts/get_example_sirius.R")
Then prepare it:
source(file = "inst/scripts/prepare_annotations_sirius.R")
You now have your annotations well prepared and can keep on formatting the rest of your metadata. Your features are not only informed with structural information but also, chemical class information. The latter might be corresponding or not to the chemical class of your annotated structure, depending on the consistency of your annotations.
Chemical class annotation of your features
Within our workflow, we offer a new way to attribute chemical classes to your features. It is analog to Network Annotation Propagation, but uses the edges of your network instead of the clusters. This makes more sense in our view, as also recently illustrated by CANOPUS.
All steps can take both manual inputs or GNPS metadata directly from your GNPS job ID.
We are currently also working on CANOPUS integration for chemical class annotation but this implies way heavier computations and we want to offer our users a fast solution.
Case when no network available
If no network was generated previously on GNPS (no GNPS job ID provided), it can be generated using:
source(file = "inst/scripts/create_edges_spectra.R")
Again, the edges are created based on the
spectral entropy similarity
calculated between your spectra
(see https://doi.org/10.1038/s41592-021-01331-z).
If needed, you can get an example of what your minimal feature table should look like by running (no parameters needed):
source(file = "inst/scripts/get_example_features.R")
Formatting
Before running the corresponding code, do not forget to modify
inst/params/user/prepare_features_tables.yaml
,
inst/params/user/prepare_features_edges.yaml
,
inst/params/user/create_components.yaml
, and
inst/params/user/prepare_features_components.yaml
accordingly.
source(file = "inst/scripts/prepare_features_tables.R")
source(file = "inst/scripts/prepare_features_edges.R")
source(file = "inst/scripts/create_components.R")
source(file = "inst/scripts/prepare_features_components.R")
Biological source annotation
This step allows you to attribute biological source information to your features. If all your features come from a single extract, it will attribute the biological source of your extract to all your features. If you have multiple extracts aligned, it will take the n (according to your parameters) highest intensities of your aligned feature table and attribute the biological source of corresponding extracts. It can take both manual inputs or GNPS metadata directly from your GNPS job ID.
Before running the corresponding code, do not forget to modify
inst/params/user/prepare_taxa.yaml
.
source(file = "inst/scripts/prepare_taxa.R")
Filter annotations (based on retention time)
This step allows you to filter out the annotation of all the tools used, based on your own internal (experimental or predicted) retention times library. It is optional. If you do not have one, it will simply group the annotations of all tools.
Before running the corresponding code, do not forget to modify
inst/params/user/filter_annotations.yaml
.
source(file = "inst/scripts/filter_annotations.R")
You are almost there! See already all the steps accomplished!
We now recommend you to read the next vignette.