3 Formatting all inputs

This vignette describes how to format all your files.

Structural annotations of your features

For the moment, we support annotations coming from 4 different annotation tools:

MS¹ exact mass-based library search
GNPS
SIRIUS
Formatted results of ISDB annotation.

MS¹-based

These annotations are of the lowest possible quality. However, they allow to annotate unusual adducts, in-source fragments thanks to different small tricks implemented. Try to really restrict the adduct list and structure-organism pairs you want to consider as possibilities explode rapidly.

library("timaR")

source(file = "inst/scripts/annotate_masses.R")

Spectral

In order to perform MS² annotation based on an In Silico DataBase, please follow the following steps.

ISDB

source(file = "inst/scripts/annotate_spectra.R")

We use the spectral entropy from https://doi.org/10.1038/s41592-021-01331-z for matching.

In case, a python implementation of the spectral matching steps is also available at: https://github.com/mandelbrot-project/spectral_lib_matcher. The python version also includes other similarity measures.

And as before:

source(file = "inst/scripts/prepare_annotations_spectra.R")

GNPS

We also provide an example GNPS job id, which is: 41c6068e905d482db5698ad81d145d7c

Before running the corresponding code, do not forget to modify inst/params/user/prepare_*yourAnnotationTool*.yaml.

Depending on the annotation tool you used, you can format its results using:

source(file = "inst/scripts/prepare_annotations_gnps.R")

You now have all your spectral annotations:

Fingerprint-based

Sirius

As SIRIUS jobs are long to perform, we provide an already computed SIRIUS Workspace. It has been generated on the same MGF as the GNPS and ISDB jobs with the following command:

config --IsotopeSettings.filter true --FormulaSearchDB BIO,COCONUT,GNPS,KNAPSACK,UNDP,PLANTCYC --Timeout.secondsPerTree 0 --FormulaSettings.enforced HCNOP --Timeout.secondsPerInstance 0 --AdductSettings.detectable [[M + H3N + H]+, [M - H2O + H]+, [M + K]+, [M - H4O2 + H]+, [M + H]+, [M + Na]+] --UseHeuristic.mzToUseHeuristicOnly 650 --AlgorithmProfile orbitrap --IsotopeMs2Settings IGNORE --MS2MassDeviation.allowedMassDeviation 5.0ppm --NumberOfCandidatesPerIon 1 --UseHeuristic.mzToUseHeuristic 300 --FormulaSettings.detectable B,Cl,Br,Se,S --NumberOfCandidates 10 --ZodiacNumberOfConsideredCandidatesAt300Mz 10 --ZodiacRunInTwoSteps true --ZodiacEdgeFilterThresholds.minLocalConnections 10 --ZodiacEdgeFilterThresholds.thresholdFilter 0.95 --ZodiacEpochs.burnInPeriod 2000 --ZodiacEpochs.numberOfMarkovChains 10 --ZodiacNumberOfConsideredCandidatesAt800Mz 50 --ZodiacEpochs.iterations 20000 --AdductSettings.enforced , --AdductSettings.fallback [[M + K]+, [M + H]+, [M + Na]+] --FormulaResultThreshold true --InjectElGordoCompounds true --StructureSearchDB BIO,COCONUT,GNPS,KNAPSACK,UNDP,PLANTCYC --RecomputeResults false formula zodiac fingerprint structure canopus

These parameters were not optimized and were only used to give an example output.

Then, the summaries have been generated using:

sirius -i inst/extdata/interim/annotations/sirius_example/ write-summaries -c --digits 3

You can get the example running:

source(file = "inst/scripts/get_example_sirius.R")

Then prepare it:

source(file = "inst/scripts/prepare_annotations_sirius.R")

You now have your annotations well prepared and can keep on formatting the rest of your metadata. Your features are not only informed with structural information but also, chemical class information. The latter might be corresponding or not to the chemical class of your annotated structure, depending on the consistency of your annotations.

Chemical class annotation of your features

Within our workflow, we offer a new way to attribute chemical classes to your features. It is analog to Network Annotation Propagation, but uses the edges of your network instead of the clusters. This makes more sense in our view, as also recently illustrated by CANOPUS.

All steps can take both manual inputs or GNPS metadata directly from your GNPS job ID.

We are currently also working on CANOPUS integration for chemical class annotation but this implies way heavier computations and we want to offer our users a fast solution.

Case when no network available

If no network was generated previously on GNPS (no GNPS job ID provided), it can be generated using:

source(file = "inst/scripts/create_edges_spectra.R")

Again, the edges are created based on the spectral entropy similarity calculated between your spectra (see https://doi.org/10.1038/s41592-021-01331-z).

If needed, you can get an example of what your minimal feature table should look like by running (no parameters needed):

source(file = "inst/scripts/get_example_features.R")

Formatting

Before running the corresponding code, do not forget to modify inst/params/user/prepare_features_tables.yaml, inst/params/user/prepare_features_edges.yaml, inst/params/user/create_components.yaml, and inst/params/user/prepare_features_components.yaml accordingly.

source(file = "inst/scripts/prepare_features_tables.R")

source(file = "inst/scripts/prepare_features_edges.R")

source(file = "inst/scripts/create_components.R")

source(file = "inst/scripts/prepare_features_components.R")

Biological source annotation

This step allows you to attribute biological source information to your features. If all your features come from a single extract, it will attribute the biological source of your extract to all your features. If you have multiple extracts aligned, it will take the n (according to your parameters) highest intensities of your aligned feature table and attribute the biological source of corresponding extracts. It can take both manual inputs or GNPS metadata directly from your GNPS job ID.

Before running the corresponding code, do not forget to modify inst/params/user/prepare_taxa.yaml.

source(file = "inst/scripts/prepare_taxa.R")

Filter annotations (based on retention time)

This step allows you to filter out the annotation of all the tools used, based on your own internal (experimental or predicted) retention times library. It is optional. If you do not have one, it will simply group the annotations of all tools.

Before running the corresponding code, do not forget to modify inst/params/user/filter_annotations.yaml.

source(file = "inst/scripts/filter_annotations.R")

You are almost there! See already all the steps accomplished!

We now recommend you to read the next vignette.

Adriano Rutz

2023-08-29

Structural annotations of your features

MS¹-based

Spectral

ISDB

GNPS

Fingerprint-based

Sirius

Chemical class annotation of your features

Case when no network available

Formatting

Biological source annotation

Filter annotations (based on retention time)

3 Formatting all inputs

Adriano Rutz

2023-08-29

Structural annotations of your features

MS1-based

Spectral

ISDB

GNPS

Fingerprint-based

Sirius

Chemical class annotation of your features

Case when no network available

Formatting

Biological source annotation

Filter annotations (based on retention time)

MS¹-based