3 Performing Taxonomically Informed Metabolite Annotation
Adriano Rutz
2024-11-22
Source:vignettes/articles/III-processing.Rmd
III-processing.Rmd
This vignette describes how Taxonomically Informed Metabolite Annotation is performed. If you followed all previous steps successfully, this should be a piece of cake, you deserve it!
targets::tar_make(names = tidyselect::matches("ann_pre$"))
#> ✔ skipped target yaml_paths
#> ✔ skipped target paths
#> ✔ skipped target par_pre_par
#> ✔ skipped target par_def_ann_mas
#> ✔ skipped target paths_test_mode
#> ✔ skipped target par_def_pre_lib_sop_ecm
#> ✔ skipped target par_def_pre_fea_edg
#> ✔ skipped target paths_urls_massbank_url
#> ✔ skipped target paths_data_source_libraries_sop_lotus
#> ✔ skipped target par_def_pre_lib_spe
#> ✔ skipped target paths_urls_massbank_version
#> ✔ skipped target par_pre_par2
#> ✔ skipped target par_def_pre_lib_sop_mer
#> ✔ skipped target par_def_ann_spe
#> ✔ skipped target par_def_pre_ann_spe
#> ▶ dispatched target par_def_wei_ann
#> ● completed target par_def_wei_ann [0.001 seconds, 4.96 kilobytes]
#> ✔ skipped target par_def_cre_edg_spe
#> ✔ skipped target paths_data_source_libraries_sop_ecmdb
#> ✔ skipped target paths_urls_ecmdb_metabolites
#> ✔ skipped target paths_urls_examples_spectral_lib_pos
#> ✔ skipped target paths_data_source_libraries_spectra_is_lotus_neg
#> ✔ skipped target paths_data_source_spectra
#> ✔ skipped target paths_urls_examples_spectral_lib_neg
#> ✔ skipped target par_def_pre_fea_tab
#> ✔ skipped target par_def_pre_lib_sop_lot
#> ✔ skipped target paths_urls_lotus_pattern
#> ✔ skipped target paths_data_source_libraries_spectra_is_lotus_pos
#> ✔ skipped target par_def_pre_fea_com
#> ✔ skipped target par_def_fil_ann
#> ✔ skipped target par_def_pre_ann_sir
#> ✔ skipped target paths_data_source_libraries_sop_hmdb
#> ✔ skipped target par_def_pre_lib_sop_clo
#> ✔ skipped target paths_urls_hmdb_structures
#> ✔ skipped target par_def_cre_com
#> ✔ skipped target par_def_pre_lib_sop_hmd
#> ✔ skipped target par_def_pre_ann_gnp
#> ✔ skipped target par_def_pre_lib_rt
#> ✔ skipped target par_def_pre_tax
#> ✔ skipped target paths_urls_examples_spectra_mini
#> ✔ skipped target paths_urls_massbank_file
#> ✔ skipped target paths_urls_lotus_doi
#> ✔ skipped target par_fin_par
#> ✔ skipped target par_fin_par2
#> ✔ skipped target lib_sop_ecm
#> ▶ dispatched target lib_spe_is_lot_neg
#> File already exists. Skipping.
#> ● completed target lib_spe_is_lot_neg [0 seconds, 168.722 megabytes]
#> ▶ dispatched target lib_spe_is_lot_pos
#> File already exists. Skipping.
#> ● completed target lib_spe_is_lot_pos [0 seconds, 221.809 megabytes]
#> ✔ skipped target lib_sop_hmd
#> ✔ skipped target lib_spe_exp_mb_raw
#> ▶ dispatched target lib_sop_lot
#> A file with the same size is already present. Skipping
#> ● completed target lib_sop_lot [1.303 seconds, 92.98 megabytes]
#> ✔ skipped target par_usr_ann_mas
#> ✔ skipped target par_usr_pre_ann_gnp
#> ✔ skipped target par_usr_pre_lib_sop_ecm
#> ✔ skipped target par_usr_ann_spe
#> ✔ skipped target par_usr_pre_ann_spe
#> ▶ dispatched target par_usr_wei_ann
#> 2024-11-22 10:52:47 Loading default params
#> 2024-11-22 10:52:47 All params
#> 2024-11-22 10:52:47 Small params
#> 2024-11-22 10:52:47 Changing params
#> 2024-11-22 10:52:47 Changing filenames
#> 2024-11-22 10:52:48 Exporting params ...
#> ● completed target par_usr_wei_ann [0.467 seconds, 1.739 kilobytes]
#> ✔ skipped target par_usr_fil_ann
#> ✔ skipped target par_usr_pre_fea_com
#> ✔ skipped target par_usr_pre_lib_sop_clo
#> ✔ skipped target par_usr_pre_fea_edg
#> ✔ skipped target par_usr_pre_ann_sir
#> ✔ skipped target par_usr_pre_lib_sop_lot
#> ✔ skipped target par_usr_pre_lib_spe
#> ✔ skipped target par_usr_pre_lib_sop_hmd
#> ✔ skipped target par_usr_cre_edg_spe
#> ✔ skipped target par_usr_pre_lib_sop_mer
#> ✔ skipped target par_usr_pre_lib_rt
#> ✔ skipped target par_usr_pre_fea_tab
#> ✔ skipped target par_usr_cre_com
#> ✔ skipped target par_usr_pre_tax
#> ✔ skipped target lib_spe_is_lot_pre_neg
#> ✔ skipped target lib_spe_is_lot_pre_pos
#> ✔ skipped target lib_spe_exp_mb_pre
#> ✔ skipped target par_ann_mas
#> ✔ skipped target par_pre_ann_gnp
#> ✔ skipped target par_pre_lib_sop_ecm
#> ✔ skipped target par_ann_spe
#> ✔ skipped target par_pre_ann_spe
#> ▶ dispatched target par_wei_ann
#> ● completed target par_wei_ann [0.001 seconds, 921 bytes]
#> ✔ skipped target par_fil_ann
#> ✔ skipped target par_pre_fea_com
#> ✔ skipped target par_pre_lib_sop_clo
#> ✔ skipped target par_pre_fea_edg
#> ✔ skipped target par_pre_ann_sir
#> ✔ skipped target par_pre_lib_sop_lot
#> ✔ skipped target par_pre_lib_spe
#> ✔ skipped target par_pre_lib_sop_hmd
#> ✔ skipped target par_cre_edg_spe
#> ✔ skipped target par_pre_lib_sop_mer
#> ✔ skipped target par_pre_lib_rt
#> ✔ skipped target par_pre_fea_tab
#> ✔ skipped target par_cre_com
#> ✔ skipped target par_pre_tax
#> ✔ skipped target lib_spe_exp_mb_pre_pos
#> ✔ skipped target lib_spe_exp_mb_pre_neg
#> ✔ skipped target lib_spe_exp_mb_pre_sop
#> ✔ skipped target lib_sop_ecm_pre
#> ✔ skipped target input_spectra
#> ✔ skipped target lib_sop_clo_pre
#> ✔ skipped target lib_sop_lot_pre
#> ✔ skipped target lib_spe_exp_int_pre
#> ✔ skipped target lib_sop_hmd_pre
#> ✔ skipped target lib_rt
#> ✔ skipped target input_features
#> ✔ skipped target fea_edg_spe
#> ✔ skipped target lib_spe_exp_int_pre_pos
#> ✔ skipped target lib_spe_exp_int_pre_neg
#> ✔ skipped target lib_spe_exp_int_pre_sop
#> ✔ skipped target lib_rt_sop
#> ✔ skipped target lib_rt_rts
#> ✔ skipped target fea_pre
#> ✔ skipped target edg_spe
#> ✔ skipped target ann_spe_pos
#> ✔ skipped target ann_spe_neg
#> ✔ skipped target lib_sop_mer
#> ✔ skipped target lib_mer_str_met
#> ✔ skipped target lib_mer_str_nam
#> ✔ skipped target lib_mer_str_stereo
#> ✔ skipped target lib_mer_str_tax_cla
#> ✔ skipped target lib_mer_str_tax_npc
#> ✔ skipped target lib_mer_key
#> ✔ skipped target lib_mer_org_tax_ott
#> ✔ skipped target ann_sir_pre
#> ✔ skipped target ann_spe_exp_gnp_pre
#> ✔ skipped target ann_spe_pre
#> ✔ skipped target ann_ms1_pre
#> ✔ skipped target tax_pre
#> ✔ skipped target ann_sir_pre_for
#> ✔ skipped target ann_sir_pre_can
#> ✔ skipped target ann_sir_pre_str
#> ✔ skipped target ann_ms1_pre_edg
#> ✔ skipped target ann_ms1_pre_ann
#> ✔ skipped target fea_edg_pre
#> ✔ skipped target ann_fil
#> ✔ skipped target fea_com
#> ✔ skipped target int_com
#> ✔ skipped target fea_com_pre
#> ▶ dispatched target ann_pre
#> 2024-11-22 10:52:48 Loading files ...
#> 2024-11-22 10:52:48 ... components
#> 2024-11-22 10:52:48 ... edges
#> 2024-11-22 10:52:48 ... structure-organism pairs
#> 2024-11-22 10:52:53 ... canopus
#> 2024-11-22 10:52:53 ... formula
#> 2024-11-22 10:52:53 ... annotations
#> 2024-11-22 10:52:55 Got c("ISDB", "TIMA MS1") initial annotations
#> 2024-11-22 10:52:55 Got c(976, 289567) initial annotations
#> 2024-11-22 10:52:56 Re-arranging annotations
#> 2024-11-22 10:52:58 adding biological organism metadata
#> 2024-11-22 10:52:58 performing taxonomically informed scoring
#> 2024-11-22 10:52:58 filtering top 3 candidates and keeping only MS1 candidates with minimum
#> 0 biological score
#> OR 0 chemical score
#>
#> 2024-11-22 10:52:58 adding "notClassified"
#>
#> 2024-11-22 10:52:59 calculating biological score at all levels ...
#>
#> 2024-11-22 10:52:59 ... domain
#>
#> 2024-11-22 10:52:59 ... kingdom
#>
#> 2024-11-22 10:52:59 ... phylum
#>
#> 2024-11-22 10:52:59 ... class
#>
#> 2024-11-22 10:52:59 ... order
#>
#> 2024-11-22 10:52:59 ... family
#>
#> 2024-11-22 10:52:59 ... tribe
#>
#> 2024-11-22 10:52:59 ... genus
#>
#> 2024-11-22 10:52:59 ... species
#>
#> 2024-11-22 10:53:00 ... varietas
#>
#> 2024-11-22 10:53:00 ... keeping best biological score
#>
#> 2024-11-22 10:53:01 ... calculating weighted biological score
#>
#> 2024-11-22 10:53:01 taxonomically informed scoring led to
#> 47463 annotations reranked at the kingdom level,
#> 47000 annotations reranked at the phylum level,
#> 39309 annotations reranked at the class level,
#> 11409 annotations reranked at the order level,
#> 9386 annotations reranked at the family level,
#> 1538 annotations reranked at the tribe level,
#> 1220 annotations reranked at the genus level,
#> 464 annotations reranked at the species level,
#> and 0 annotations reranked at the variety level.
#> WITHOUT TAKING CONSISTENCY SCORE INTO ACCOUNT! (for later predictions)
#> 2024-11-22 10:53:02 calculating chemical consistency
#> features with at least 2 neighbors ...
#>
#> 2024-11-22 10:53:02 ... among all edges ...
#>
#> 2024-11-22 10:53:02 ... at the (classyfire) kingdom level
#>
#> 2024-11-22 10:53:02 ... at the (NPC) pathway level
#>
#> 2024-11-22 10:53:02 ... at the (classyfire) superclass level
#>
#> 2024-11-22 10:53:02 ... at the (NPC) superclass level
#>
#> 2024-11-22 10:53:03 ... at the (classyfire) class level
#>
#> 2024-11-22 10:53:04 ... at the (NPC) class level
#>
#> 2024-11-22 10:53:05 ... at the (classyfire) parent level
#>
#> 2024-11-22 10:53:06 splitting already computed predictions
#>
#> 2024-11-22 10:53:07 joining all except -1 together
#>
#> 2024-11-22 10:53:09 adding dummy consistency for features
#> with less than 2 neighbors
#>
#> 2024-11-22 10:53:09 adding already computed predictions back
#>
#> 2024-11-22 10:53:11 calculating chemical score at all levels ...
#>
#> 2024-11-22 10:53:11 ... (classyfire) kingdom
#>
#> 2024-11-22 10:53:11 ... (NPC) pathway
#>
#> 2024-11-22 10:53:11 ... (classyfire) superclass
#>
#> 2024-11-22 10:53:11 ... (NPC) superclass
#>
#> 2024-11-22 10:53:11 ... (classyfire) class
#>
#> 2024-11-22 10:53:11 ... (NPC) class
#>
#> 2024-11-22 10:53:11 ... (classyfire) parent
#>
#> 2024-11-22 10:53:11 ... keeping best chemical score
#>
#> 2024-11-22 10:53:11 ... calculating weighted chemical score
#>
#> 2024-11-22 10:53:12 chemically informed scoring led to
#> 36547 annotations reranked at the (classyfire) kingdom level,
#> 24111 annotations reranked at the (NPC) pathway level,
#> 17683 annotations reranked at the (classyfire) superclass level,
#> 10017 annotations reranked at the (NPC) superclass level,
#> 17683 annotations reranked at the (classyfire) class level,
#> 9704 annotations reranked at the (NPC) class level, and
#> 9175 annotations reranked at the (classyfire) parent level.
#> WITHOUT TAKING CONSISTENCY SCORE INTO ACCOUNT!
#> 2024-11-22 10:53:12 Keeping high confidence candidates only...
#> 2024-11-22 10:53:12 Removed 289254 low confidence candidates out of the 291283 total ones.
#> 2024-11-22 10:53:12 2029 high confidence candidates remaining.
#> 2024-11-22 10:53:13 adding initial metadata (RT, etc.) and simplifying columns
#>
#> 2024-11-22 10:53:13 adding references
#>
#> 2024-11-22 10:53:15 selecting columns to export
#>
#> 2024-11-22 10:53:15 adding consensus again to droped candidates
#>
#> 2024-11-22 10:53:17 Exporting ...
#> Directory data/processed/241122_105317_example created.
#> 2024-11-22 10:53:17 ... path to used parameters is data/processed/241122_105317_example
#> 2024-11-22 10:53:17 ... path to used parameters is data/processed/241122_105317_example
#> 2024-11-22 10:53:17 ... path to export is data/processed/241122_105317_example/example_results.tsv
#> ● completed target ann_pre [29.332 seconds, 2.331 megabytes]
#> ▶ ended pipeline [33.206 seconds]
#>
The final exported file is formatted in order to be easily imported in Cytoscape to further explore your data!
We hope you enjoyed using TIMA and are pleased to hear from you!
For any remark or suggestion, please fill an issue or feel free to contact us directly.