2 Inputs: Getting everything you need

This vignette describes how to prepare:

All parameters you need
All inputs you need

Parameters

As mentioned previously, almost all steps require parameters. Some pre-defined parameters are given in inst/params/default directory. Those will help you understanding how the workflow works so you can adapt them later to your own data.

So first, let’s have a look at the structure of the parameters:

Copy

Now, you want to copy them and adapt them to your own files to get something like:

Prepare

To do so, simply:

source(file = "inst/scripts/prepare_params.R")

And now, you should get the following:

See how your targets changed? The values you adapted in inst/prepare_params.yaml were automatically propagated to all sub-files in inst/params/user.

And everything is stored in case you re-perform it, the beauty of targets!

source(file = "inst/scripts/prepare_params.R")

You can easily fine tune all parameters in each file and only the steps impacted by these modifications will run!

Adapt

Your final prepared parameters look now like:

Inputs

Dictionaries

The package already comes with some default dictionaries. These dictionaries are needed among the pipeline and it is recommended NOT to touch them. If you are confident in doing so, feel free.

Your own file(s)

The most important file you need to provide is an MGF file containing the spectra you want to annotate.

If you already ran a GNPS job, you can simply do:

source(file = "inst/scripts/get_gnps_tables.R")

To get an example of an MGF file (corresponding to the spectra to annotate, not to to the library to query), just run:

source(file = "inst/scripts/get_example_spectra.R")

Libraries

Retention times

This library is optional. As no standard LC method is shared (for now) among laboratories, this library will be heavily laboratory-dependent. It could also be a library of in silico predicted retention times.

Before running the corresponding code, do not forget to modify inst/params/user/prepare_libraries_rt.yaml.

source(file = "inst/scripts/prepare_libraries_rt.R")

Structure-Organism Pairs

LOTUS

As we developed LOTUS ¹ with Taxonomically Informed Metabolite Annotation in mind, we provide it here as a starting point for your structure-organism pairs library.

You can easily get it by running:

source(file = "inst/scripts/get_lotus.R")

You then need to format it as expected by the pipeline, to do so:

source(file = "inst/scripts/prepare_libraries_sop_lotus.R")

As you can see, the target seems outdated. In reality, we force it to search if a new version of LOTUS exists each time. If a newer version exists, it will fetch it and re-run needed steps accordingly.

ECMDB

If you want, you can also complement LOTUS pairs with the ones coming from ECMDB.

To do so, you need to:

source(file = "inst/scripts/prepare_libraries_sop_ecmdb.R")

HMDB

You can do the same with the ones coming from HMDB (not running by default as quite long):

source(file = "inst/scripts/get_hmdb.R")

As previously, you then need to prepare the library:

source(file = "inst/scripts/prepare_hmdb.R")

For these first steps, you do not need to change any parameters as they are implemented by default.

Other libraries

As we want our tool to be flexible, you can also add your own library to LOTUS. You just need to format it in order to be compatible. As example, we prepared some ways too format closed, in house libraries. If you need help formatting your library or would like to share it with us for it to be implemented, feel free to contact us.

Before running the corresponding code, do not forget to modify inst/params/user/prepare_libraries_sop_closed.yaml

source(file = "inst/scripts/prepare_libraries_sop_closed.R")

Merging

Once you have all your sub-libraries prepared, you are ready to merge them in a single file that will be used for the next steps.

Before running the corresponding code, do not forget to modify inst/params/user/prepare_libraries_sop_merged.yaml.

At this step, if you want to, you can restrict your library to specific taxa only. We do not advise doing so, but we offer you the possibility to do it.

source(file = "inst/scripts/prepare_libraries_sop_merged.R")

You now should have a nice custom structure-organism pairs library for the next steps!

Adducts

The next library you need is an adducts library! As a main limitation of current annotation tools is adducts detection and coverage, an adducts library can also be generated in order to perform MS¹ annotation later on.

As you can see, it depends on the previously built library.

Before running the corresponding code, do not forget to modify inst/params/user/prepare_libraries_adducts.yaml.

source(file = "inst/scripts/prepare_libraries_adducts.R")

Spectra

Finally, you need a spectral library to perform MS²-based annotation.

Experimental

You can of course use your own experimental spectral library to perform MS² annotation. We currently only support spectral libraries in MGF format.

To get a small example:

get_file(
  url = paths$urls$examples$spectral_lib_mini$with_rt,
  export = paths$data$source$libraries$spectra$exp$with_rt
)

We are currently working to get all MONA and GNPS experimental spectra easily available.

However, programmatic download of MONA is currently not supported, and GNPS spectra require some pre-processing.

So for now, either use your GNPS job ID and download MONA manually, helpers to use them later on are already available. (see inst/scripts/prepare_libraries_spectra_exp_mona.R and inst/scripts/prepare_libraries_spectra_is_hmdb.R, respectively).

In case you want to format your own spectral library to use it for spectral matching, adapt the steps in inst/params/user/prepare_libraries_spectra.yaml and inst/params/user/annotate_spectra.yaml.

In silico

As the availability of experimental spectra is limited, we can take advantage of in silico generated spectra.

LOTUS

We generated an in silico spectral library of the structures found in LOTUS using CFM4. For more info, see https://doi.org/10.5281/zenodo.5607185.

In order to get it:

source(file = "inst/scripts/get_isdb_lotus.R")

And as previously, then prepare it:

source(file = "inst/scripts/prepare_libraries_spectra_is_lotus.R")

As you can see, both MS polarities are available!

Again, you can also complement with the in silico spectra from HMDB (not running by default as quite long):

HMDB

source(file = "inst/scripts/get_isdb_hmdb.R")

You are finally ready for the next step, with all dictionaries, inputs, and libraries you need!

We now recommend you to read the next vignette.

Adriano Rutz

2023-08-29

Parameters

Copy

Prepare

Adapt

Inputs

Dictionaries

Your own file(s)

Libraries

Retention times

Structure-Organism Pairs

LOTUS

ECMDB

HMDB

Other libraries

Merging

Adducts

Spectra

Experimental

In silico

LOTUS

HMDB