3 Performing Taxonomically Informed Metabolite Annotation
Adriano Rutz
2025-02-21
Source:vignettes/articles/III-processing.Rmd
III-processing.Rmd
This vignette describes how Taxonomically Informed Metabolite Annotation is performed. If you followed all previous steps successfully, this should be a piece of cake, you deserve it!
targets::tar_make(names = tidyselect::matches("ann_pre$"))
#> ✔ skipping targets (1 so far)...
#> ▶ dispatched target par_def_wei_ann
#> ● completed target par_def_wei_ann [0.026 seconds, 4.96 kilobytes]
#> ▶ dispatched target lib_sop_lot
#> A file with the same size is already present. Skipping
#> ● completed target lib_sop_lot [1.235 seconds, 92.98 megabytes]
#> ✔ skipping targets (46 so far)...
#> ▶ dispatched target lib_spe_is_wik_neg
#> File already exists. Skipping.
#> ● completed target lib_spe_is_wik_neg [0 seconds, 687.328 megabytes]
#> ▶ dispatched target lib_spe_is_wik_pos
#> File already exists. Skipping.
#> ● completed target lib_spe_is_wik_pos [0 seconds, 863.95 megabytes]
#> ▶ dispatched target par_usr_wei_ann
#> 2025-02-21 17:24:45 Loading default params
#> 2025-02-21 17:24:45 All params
#> 2025-02-21 17:24:45 Small params
#> 2025-02-21 17:24:45 Advanced params
#> 2025-02-21 17:24:45 Changing params
#> 2025-02-21 17:24:45 Changing filenames
#> 2025-02-21 17:24:46 Exporting params ...
#> ● completed target par_usr_wei_ann [1.283 seconds, 1.739 kilobytes]
#> ✔ skipping targets (54 so far)...
#> ▶ dispatched target par_wei_ann
#> ● completed target par_wei_ann [0.001 seconds, 921 bytes]
#> ▶ dispatched target ann_pre
#> 2025-02-21 17:24:46 Loading files ...
#> 2025-02-21 17:24:46 ... components
#> 2025-02-21 17:24:46 ... edges
#> 2025-02-21 17:24:46 ... structure-organism pairs
#> 2025-02-21 17:24:57 ... canopus
#> 2025-02-21 17:24:57 ... formula
#> 2025-02-21 17:24:57 ... annotations
#> 2025-02-21 17:24:58 Got c("ISDB - Wikidata", "MassBank - 2024.11", "SIRIUS", "TIMA MS1") initial annotations
#> 2025-02-21 17:24:58 Got c(4556, 31, 479, 188552) initial annotations
#> 2025-02-21 17:24:59 Re-arranging annotations
#> 2025-02-21 17:25:00 adding biological organism metadata
#> 2025-02-21 17:25:00 performing taxonomically informed scoring
#> 2025-02-21 17:25:00 filtering top 1 candidates and keeping only MS1 candidates with minimum
#> 0 biological score
#> OR 0 chemical score
#>
#> 2025-02-21 17:25:00 adding "notClassified"
#>
#> 2025-02-21 17:25:01 calculating biological score at all levels ...
#>
#> 2025-02-21 17:25:01 ... domain
#>
#> 2025-02-21 17:25:01 ... kingdom
#>
#> 2025-02-21 17:25:01 ... phylum
#>
#> 2025-02-21 17:25:01 ... class
#>
#> 2025-02-21 17:25:01 ... order
#>
#> 2025-02-21 17:25:01 ... family
#>
#> 2025-02-21 17:25:01 ... tribe
#>
#> 2025-02-21 17:25:01 ... genus
#>
#> 2025-02-21 17:25:01 ... species
#>
#> 2025-02-21 17:25:01 ... varietas
#>
#> 2025-02-21 17:25:01 ... keeping best biological score
#>
#> 2025-02-21 17:25:02 ... calculating weighted biological score
#>
#> 2025-02-21 17:25:02 taxonomically informed scoring led to
#> 8537 annotations reranked at the kingdom level,
#> 8479 annotations reranked at the phylum level,
#> 8036 annotations reranked at the class level,
#> 1411 annotations reranked at the order level,
#> 432 annotations reranked at the family level,
#> 45 annotations reranked at the tribe level,
#> 41 annotations reranked at the genus level,
#> 24 annotations reranked at the species level,
#> and 0 annotations reranked at the variety level.
#> WITHOUT TAKING CONSISTENCY SCORE INTO ACCOUNT! (for later predictions)
#> 2025-02-21 17:25:02 calculating chemical consistency
#> features with at least 2 neighbors ...
#>
#> 2025-02-21 17:25:03 ... among all edges ...
#>
#> 2025-02-21 17:25:03 ... at the (classyfire) kingdom level
#>
#> 2025-02-21 17:25:03 ... at the (NPC) pathway level
#>
#> 2025-02-21 17:25:03 ... at the (classyfire) superclass level
#>
#> 2025-02-21 17:25:03 ... at the (NPC) superclass level
#>
#> 2025-02-21 17:25:04 ... at the (classyfire) class level
#>
#> 2025-02-21 17:25:04 ... at the (NPC) class level
#>
#> 2025-02-21 17:25:05 ... at the (classyfire) parent level
#>
#> 2025-02-21 17:25:06 splitting already computed predictions
#>
#> 2025-02-21 17:25:06 joining all except -1 together
#>
#> 2025-02-21 17:25:08 adding dummy consistency for features
#> with less than 2 neighbors
#>
#> 2025-02-21 17:25:08 adding already computed predictions back
#>
#> 2025-02-21 17:25:09 calculating chemical score at all levels ...
#>
#> 2025-02-21 17:25:09 ... (classyfire) kingdom
#>
#> 2025-02-21 17:25:09 ... (NPC) pathway
#>
#> 2025-02-21 17:25:09 ... (classyfire) superclass
#>
#> 2025-02-21 17:25:09 ... (NPC) superclass
#>
#> 2025-02-21 17:25:09 ... (classyfire) class
#>
#> 2025-02-21 17:25:09 ... (NPC) class
#>
#> 2025-02-21 17:25:09 ... (classyfire) parent
#>
#> 2025-02-21 17:25:09 ... keeping best chemical score
#>
#> 2025-02-21 17:25:10 ... calculating weighted chemical score
#>
#> 2025-02-21 17:25:10 chemically informed scoring led to
#> 32716 annotations reranked at the (classyfire) kingdom level,
#> 16267 annotations reranked at the (NPC) pathway level,
#> 12675 annotations reranked at the (classyfire) superclass level,
#> 6261 annotations reranked at the (NPC) superclass level,
#> 12668 annotations reranked at the (classyfire) class level,
#> 6257 annotations reranked at the (NPC) class level, and
#> 6213 annotations reranked at the (classyfire) parent level.
#> WITHOUT TAKING CONSISTENCY SCORE INTO ACCOUNT!
#> 2025-02-21 17:25:10 Keeping high confidence candidates only...
#> 2025-02-21 17:25:10 Removed 194170 low confidence candidates out of the 194296 total ones.
#> 2025-02-21 17:25:10 126 high confidence candidates remaining.
#> 2025-02-21 17:25:10 adding initial metadata (RT, etc.) and simplifying columns
#>
#> 2025-02-21 17:25:10 adding references
#>
#> 2025-02-21 17:25:13 selecting columns to export
#>
#> 2025-02-21 17:25:13 adding consensus again to droped candidates
#>
#> 2025-02-21 17:25:14 Exporting ...
#> Directory data/processed/250221_172514_example created.
#> 2025-02-21 17:25:14 ... path to used parameters is data/processed/250221_172514_example
#> 2025-02-21 17:25:14 ... path to used parameters is data/processed/250221_172514_example
#> 2025-02-21 17:25:14 ... path to export is data/processed/250221_172514_example/example_results.tsv
#> ● completed target ann_pre [27.918 seconds, 885.312 kilobytes]
#> ▶ ended pipeline [32.829 seconds]
#>
The final exported file is formatted in order to be easily imported in Cytoscape to further explore your data!
We hope you enjoyed using TIMA and are pleased to hear from you!
For any remark or suggestion, please fill an issue or feel free to contact us directly.