3 Performing Taxonomically Informed Metabolite Annotation

Adriano Rutz

3 Performing Taxonomically Informed Metabolite Annotation

Author

Adriano Rutz

Published

March 6, 2026

This vignette describes how Taxonomically Informed Metabolite Annotation is performed. If you followed all previous steps successfully, this should be a piece of cake, you deserve it!

tima::run_tima()
#> + par_def_pre_lib_sop_lot dispatched
#> ✔ par_def_pre_lib_sop_lot completed [28ms, 494 B]
#> + par_def_pre_ann_sir dispatched
#> ✔ par_def_pre_ann_sir completed [1ms, 1.93 kB]
#> + par_def_pre_lib_sop_ecm dispatched
#> ✔ par_def_pre_lib_sop_ecm completed [1ms, 492 B]
#> + par_def_pre_ann_gnp dispatched
#> ✔ par_def_pre_ann_gnp completed [1ms, 1.42 kB]
#> + par_def_pre_lib_sop_mer dispatched
#> ✔ par_def_pre_lib_sop_mer completed [1ms, 3.39 kB]
#> + par_def_ann_spe dispatched
#> ✔ par_def_ann_spe completed [0ms, 2.14 kB]
#> + par_def_cre_edg_spe dispatched
#> ✔ par_def_cre_edg_spe completed [0ms, 1.42 kB]
#> + par_def_pre_fea_com dispatched
#> ✔ par_def_pre_fea_com completed [1ms, 358 B]
#> + par_def_pre_fea_tab dispatched
#> ✔ par_def_pre_fea_tab completed [1ms, 860 B]
#> + par_def_pre_lib_rt dispatched
#> ✔ par_def_pre_lib_rt completed [1ms, 2.20 kB]
#> + par_def_pre_tax dispatched
#> ✔ par_def_pre_tax completed [1ms, 1.51 kB]
#> + par_def_pre_lib_sop_hmd dispatched
#> ✔ par_def_pre_lib_sop_hmd completed [1ms, 492 B]
#> + par_def_pre_lib_sop_big dispatched
#> ✔ par_def_pre_lib_sop_big completed [1ms, 314 B]
#> + par_def_cre_com dispatched
#> ✔ par_def_cre_com completed [1ms, 375 B]
#> + par_def_wei_ann dispatched
#> ✔ par_def_wei_ann completed [1ms, 5.34 kB]
#> + par_def_pre_fea_edg dispatched
#> ✔ par_def_pre_fea_edg completed [1ms, 706 B]
#> + yaml_paths dispatched
#> ✔ yaml_paths completed [1ms, 11.70 kB]
#> + par_def_fil_ann dispatched
#> ✔ par_def_fil_ann completed [0ms, 1.34 kB]
#> + par_def_pre_lib_sop_clo dispatched
#> ✔ par_def_pre_lib_sop_clo completed [1ms, 523 B]
#> + par_def_pre_lib_spe dispatched
#> ✔ par_def_pre_lib_spe completed [1ms, 1.57 kB]
#> + par_def_pre_ann_mzm dispatched
#> ✔ par_def_pre_ann_mzm completed [1ms, 1.43 kB]
#> + par_def_ann_mas dispatched
#> ✔ par_def_ann_mas completed [0ms, 6.09 kB]
#> + par_def_pre_ann_spe dispatched
#> ✔ par_def_pre_ann_spe completed [1ms, 1.46 kB]
#> + paths dispatched
#> ✔ paths completed [1ms, 2.55 kB]
#> + lib_sop_ecm dispatched
#> [2026-03-06 06:21:56.583] [INFO ] > Starting: download_file [url=https://ecmdb.ca/download/ecmdb.json.zip, destination=data/source/libraries/sop/ecmdb.json.zip]
#> [2026-03-06 06:21:58.066] [INFO ] [OK] Completed: download_file [size_bytes=1334921] (1.5s)
#> ✔ lib_sop_ecm completed [1.5s, 1.33 MB]
#> + lib_spe_exp_mer_pre_pos dispatched
#> [2026-03-06 06:21:58.228] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/merlin_13911806_pos.rds, destination=data/interim/libraries/spectra/exp/merlin_13911806_pos.rds]
#> [2026-03-06 06:22:00.751] [INFO ] [OK] Completed: download_file [size_bytes=84936306] (2.5s)
#> ✔ lib_spe_exp_mer_pre_pos completed [2.5s, 84.94 MB]
#> + lib_spe_exp_gnp_pre_pos dispatched
#> [2026-03-06 06:22:00.941] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/gnps_11566051_pos.rds, destination=data/interim/libraries/spectra/exp/gnps_11566051_pos.rds]
#> Downloading  10% ■■■■                              9s
#> Downloading  19% ■■■■■■■                           7s
#> Downloading  55% ■■■■■■■■■■■■■■■■■■                4s
#> Downloading  88% ■■■■■■■■■■■■■■■■■■■■■■■■■■■       1s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-03-06 06:22:10.468] [INFO ] [OK] Completed: download_file [size_bytes=481271483] (9.5s)
#> ✔ lib_spe_exp_gnp_pre_pos completed [9.5s, 481.27 MB]
#> + lib_spe_exp_mb_pre_pos dispatched
#> [2026-03-06 06:22:10.795] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/massbank_2025051_pos.rds, destination=data/interim/libraries/spectra/exp/massbank_2025051_pos.rds]
#> [2026-03-06 06:22:11.901] [INFO ] [OK] Completed: download_file [size_bytes=19411864] (1.1s)
#> ✔ lib_spe_exp_mb_pre_pos completed [1.1s, 19.41 MB]
#> + lib_spe_is_wik_pre_pos dispatched
#> [2026-03-06 06:22:12.069] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-isdb-pos/raw/main/wikidata_5607185_pos.rds, destination=data/interim/libraries/spectra/is/wikidata_5607185_pos.rds]
#> Downloading   6% ■■■                              17s
#> Downloading  16% ■■■■■■                           14s
#> Downloading  33% ■■■■■■■■■■■                      11s
#> Downloading  52% ■■■■■■■■■■■■■■■■                  8s
#> Downloading  71% ■■■■■■■■■■■■■■■■■■■■■■            5s
#> Downloading  89% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      2s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-03-06 06:22:28.863] [INFO ] [OK] Completed: download_file [size_bytes=863950396] (16.8s)
#> ✔ lib_spe_is_wik_pre_pos completed [16.8s, 863.95 MB]
#> + lib_spe_exp_mer_pre_sop dispatched
#> [2026-03-06 06:22:29.328] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/merlin_13911806_prepared.tsv.gz, destination=data/interim/libraries/sop/merlin_13911806_prepared.tsv.gz]
#> [2026-03-06 06:22:29.747] [INFO ] [OK] Completed: download_file [size_bytes=1190284] (419ms)
#> ✔ lib_spe_exp_mer_pre_sop completed [422ms, 1.19 MB]
#> + lib_sop_lot dispatched
#> [2026-03-06 06:22:29.900] [INFO ] Retrieving latest version from Zenodo: 10.5281/zenodo.5794106
#> [2026-03-06 06:22:33.562] [INFO ] Downloading 230106_frozen_metadata.csv.gz from https://doi.org/10.5281/zenodo.5794106
#> [2026-03-06 06:22:33.564] [INFO ] > Starting: download_file [url=https://zenodo.org/records/7534071/files/230106_frozen_metadata.csv.gz, destination=data/source/libraries/sop/lotus.csv.gz]
#> [2026-03-06 06:23:20.337] [INFO ] [OK] Completed: download_file [size_bytes=92979778] (46.8s)
#> [2026-03-06 06:23:20.339] [INFO ] Download completed: data/source/libraries/sop/lotus.csv.gz
#> ✔ lib_sop_lot completed [50.4s, 92.98 MB]
#> + lib_spe_exp_gnp_pre_neg dispatched
#> [2026-03-06 06:23:20.535] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/gnps_11566051_neg.rds, destination=data/interim/libraries/spectra/exp/gnps_11566051_neg.rds]
#> Downloading  40% ■■■■■■■■■■■■■                     2s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-03-06 06:23:23.417] [INFO ] [OK] Completed: download_file [size_bytes=154124724] (2.9s)
#> ✔ lib_spe_exp_gnp_pre_neg completed [2.9s, 154.12 MB]
#> + par_pre_par dispatched
#> ✔ par_pre_par completed [1ms, 1.54 kB]
#> + lib_spe_exp_mb_pre_neg dispatched
#> [2026-03-06 06:23:23.795] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/massbank_2025051_neg.rds, destination=data/interim/libraries/spectra/exp/massbank_2025051_neg.rds]
#> [2026-03-06 06:23:24.458] [INFO ] [OK] Completed: download_file [size_bytes=7057574] (663ms)
#> ✔ lib_spe_exp_mb_pre_neg completed [664ms, 7.06 MB]
#> + lib_spe_exp_mer_pre_neg dispatched
#> [2026-03-06 06:23:24.622] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/merlin_13911806_neg.rds, destination=data/interim/libraries/spectra/exp/merlin_13911806_neg.rds]
#> [2026-03-06 06:23:25.942] [INFO ] [OK] Completed: download_file [size_bytes=31540426] (1.3s)
#> ✔ lib_spe_exp_mer_pre_neg completed [1.3s, 31.54 MB]
#> + lib_spe_is_wik_pre_neg dispatched
#> [2026-03-06 06:23:26.115] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-isdb-neg/raw/main/wikidata_5607185_neg.rds, destination=data/interim/libraries/spectra/is/wikidata_5607185_neg.rds]
#> Downloading   6% ■■■                              15s
#> Downloading  25% ■■■■■■■■                         11s
#> Downloading  47% ■■■■■■■■■■■■■■■                   7s
#> Downloading  71% ■■■■■■■■■■■■■■■■■■■■■■            4s
#> Downloading  94% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     1s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-03-06 06:23:39.955] [INFO ] [OK] Completed: download_file [size_bytes=687328159] (13.8s)
#> ✔ lib_spe_is_wik_pre_neg completed [13.8s, 687.33 MB]
#> + lib_sop_hmd dispatched
#> [2026-03-06 06:23:40.363] [INFO ] > Starting: download_file [url=https://hmdb.ca/system/downloads/current/structures.zip, destination=data/source/libraries/sop/hmdb/structures.zip]
#> [2026-03-06 06:23:40.508] [WARN ] file download failed (attempt 1/3), retrying in 1s: HTTP 403 Forbidden.
#> [2026-03-06 06:23:41.556] [WARN ] file download failed (attempt 2/3), retrying in 2s: HTTP 403 Forbidden.
#> [2026-03-06 06:23:43.605] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> ✔ lib_sop_hmd completed [3.3s, 337 B]
#> + lib_spe_is_wik_pre_sop dispatched
#> [2026-03-06 06:23:43.864] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-example-files/raw/main/wikidata_spectral_5607185_prepared.tsv.gz, destination=data/interim/libraries/sop/wikidata_5607185_prepared.tsv.gz]
#> [2026-03-06 06:23:45.022] [INFO ] [OK] Completed: download_file [size_bytes=37904410] (1.2s)
#> ✔ lib_spe_is_wik_pre_sop completed [1.2s, 37.90 MB]
#> + par_pre_par2 dispatched
#> ✔ par_pre_par2 completed [0ms, 21.97 kB]
#> + lib_spe_exp_mb_pre_sop dispatched
#> [2026-03-06 06:23:45.359] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/massbank_2025051_prepared.tsv.gz, destination=data/interim/libraries/sop/massbank_2025051_prepared.tsv.gz]
#> [2026-03-06 06:23:45.618] [INFO ] [OK] Completed: download_file [size_bytes=480970] (258ms)
#> ✔ lib_spe_exp_mb_pre_sop completed [260ms, 480.97 kB]
#> + lib_spe_exp_gnp_pre_sop dispatched
#> [2026-03-06 06:23:45.779] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/gnps_11566051_prepared.tsv.gz, destination=data/interim/libraries/sop/gnps_11566051_prepared.tsv.gz]
#> [2026-03-06 06:23:46.207] [INFO ] [OK] Completed: download_file [size_bytes=1416699] (427ms)
#> ✔ lib_spe_exp_gnp_pre_sop completed [430ms, 1.42 MB]
#> + par_fin_par dispatched
#> ✔ par_fin_par completed [1ms, 332 B]
#> + par_fin_par2 dispatched
#> ✔ par_fin_par2 completed [2ms, 3.04 kB]
#> + par_usr_pre_lib_sop_lot dispatched
#> ✔ par_usr_pre_lib_sop_lot completed [1.6s, 174 B]
#> + par_usr_pre_ann_sir dispatched
#> ✔ par_usr_pre_ann_sir completed [1.5s, 900 B]
#> + par_usr_pre_lib_sop_ecm dispatched
#> ✔ par_usr_pre_lib_sop_ecm completed [1.5s, 176 B]
#> + par_usr_ann_spe dispatched
#> ✔ par_usr_ann_spe completed [1.6s, 1.01 kB]
#> + par_usr_ann_mas dispatched
#> ✔ par_usr_ann_mas completed [1.5s, 2.75 kB]
#> + par_usr_pre_lib_sop_mer dispatched
#> ✔ par_usr_pre_lib_sop_mer completed [1.5s, 1.60 kB]
#> + par_usr_wei_ann dispatched
#> ✔ par_usr_wei_ann completed [1.5s, 1.80 kB]
#> + par_usr_cre_edg_spe dispatched
#> ✔ par_usr_cre_edg_spe completed [1.5s, 434 B]
#> + par_usr_pre_fea_edg dispatched
#> ✔ par_usr_pre_fea_edg completed [1.5s, 328 B]
#> + par_usr_pre_fea_com dispatched
#> ✔ par_usr_pre_fea_com completed [1.5s, 200 B]
#> + par_usr_pre_lib_rt dispatched
#> ✔ par_usr_pre_lib_rt completed [1.5s, 487 B]
#> + par_usr_pre_tax dispatched
#> ✔ par_usr_pre_tax completed [1.5s, 438 B]
#> + par_usr_pre_fea_tab dispatched
#> ✔ par_usr_pre_fea_tab completed [1.5s, 274 B]
#> + par_usr_pre_lib_sop_hmd dispatched
#> ✔ par_usr_pre_lib_sop_hmd completed [1.5s, 178 B]
#> + par_usr_pre_lib_sop_big dispatched
#> ✔ par_usr_pre_lib_sop_big completed [1.5s, 107 B]
#> + par_usr_pre_ann_spe dispatched
#> ✔ par_usr_pre_ann_spe completed [1.5s, 731 B]
#> + par_usr_cre_com dispatched
#> ✔ par_usr_cre_com completed [1.5s, 200 B]
#> + par_usr_pre_ann_mzm dispatched
#> ✔ par_usr_pre_ann_mzm completed [1.5s, 710 B]
#> + par_usr_pre_lib_sop_clo dispatched
#> ✔ par_usr_pre_lib_sop_clo completed [1.5s, 267 B]
#> + par_usr_pre_lib_spe dispatched
#> ✔ par_usr_pre_lib_spe completed [1.5s, 298 B]
#> + par_usr_pre_ann_gnp dispatched
#> ✔ par_usr_pre_ann_gnp completed [1.5s, 708 B]
#> + par_usr_fil_ann dispatched
#> ✔ par_usr_fil_ann completed [1.7s, 739 B]
#> + par_pre_lib_sop_lot dispatched
#> ✔ par_pre_lib_sop_lot completed [1ms, 186 B]
#> + par_pre_ann_sir dispatched
#> ✔ par_pre_ann_sir completed [2ms, 405 B]
#> + par_pre_lib_sop_ecm dispatched
#> ✔ par_pre_lib_sop_ecm completed [1ms, 191 B]
#> + par_ann_spe dispatched
#> ✔ par_ann_spe completed [1ms, 495 B]
#> + par_ann_mas dispatched
#> ✔ par_ann_mas completed [2ms, 1.14 kB]
#> + par_pre_lib_sop_mer dispatched
#> ✔ par_pre_lib_sop_mer completed [1ms, 559 B]
#> + par_wei_ann dispatched
#> ✔ par_wei_ann completed [3ms, 967 B]
#> + par_cre_edg_spe dispatched
#> ✔ par_cre_edg_spe completed [2ms, 387 B]
#> + par_pre_fea_edg dispatched
#> ✔ par_pre_fea_edg completed [1ms, 244 B]
#> + par_pre_fea_com dispatched
#> ✔ par_pre_fea_com completed [1ms, 184 B]
#> + par_pre_lib_rt dispatched
#> ✔ par_pre_lib_rt completed [2ms, 375 B]
#> + par_pre_tax dispatched
#> ✔ par_pre_tax completed [2ms, 330 B]
#> + par_pre_fea_tab dispatched
#> ✔ par_pre_fea_tab completed [1ms, 278 B]
#> + par_pre_lib_sop_hmd dispatched
#> ✔ par_pre_lib_sop_hmd completed [1ms, 191 B]
#> + par_pre_lib_sop_big dispatched
#> ✔ par_pre_lib_sop_big completed [1ms, 155 B]
#> + par_pre_ann_spe dispatched
#> ✔ par_pre_ann_spe completed [2ms, 334 B]
#> + par_cre_com dispatched
#> ✔ par_cre_com completed [1ms, 191 B]
#> + par_pre_ann_mzm dispatched
#> ✔ par_pre_ann_mzm completed [1ms, 341 B]
#> + par_pre_lib_sop_clo dispatched
#> ✔ par_pre_lib_sop_clo completed [1ms, 232 B]
#> + par_pre_lib_spe dispatched
#> ✔ par_pre_lib_spe completed [1ms, 404 B]
#> + par_pre_ann_gnp dispatched
#> ✔ par_pre_ann_gnp completed [1ms, 336 B]
#> + par_fil_ann dispatched
#> ✔ par_fil_ann completed [2ms, 360 B]
#> + lib_sop_lot_pre dispatched
#> [2026-03-06 06:24:27.191] [INFO ] > Starting: prepare_libraries_sop_lotus [input=data/source/libraries/sop/lotus.csv.gz]
#> [2026-03-06 06:24:37.948] [INFO ] [OK] Completed: prepare_libraries_sop_lotus [n_pairs=791809] (10.8s)
#> [2026-03-06 06:24:37.951] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/lotus_prepared.tsv.gz, n_rows=791809]
#> [2026-03-06 06:24:41.407] [INFO ] [OK] Completed: export_output [size_bytes=46518841] (3.5s)
#> ✔ lib_sop_lot_pre completed [14.2s, 46.52 MB]
#> + lib_sop_ecm_pre dispatched
#> [2026-03-06 06:24:41.772] [INFO ] Preparing ECMDB structure-organism pairs
#> [2026-03-06 06:24:42.423] [INFO ] Exporting parameters to: data/interim/params/260306_062442_prepare_libraries_sop_ecmdb.yaml
#> [2026-03-06 06:24:42.425] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/ecmdb_prepared.tsv.gz, n_rows=3760]
#> [2026-03-06 06:24:42.441] [INFO ] [OK] Completed: export_output [size_bytes=177466] (16ms)
#> ✔ lib_sop_ecm_pre completed [671ms, 177.47 kB]
#> + par_ann_spe_fil_spe_raw dispatched
#> ✔ par_ann_spe_fil_spe_raw completed [0ms, 7.77 MB]
#> + lib_sop_mer_str_pro dispatched
#> [2026-03-06 06:24:42.830] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-example-files/raw/main/processed.csv.gz, destination=data/interim/libraries/sop/merged/structures/processed.csv.gz]
#> [2026-03-06 06:24:45.131] [INFO ] [OK] Completed: download_file [size_bytes=80668186] (2.3s)
#> ✔ lib_sop_mer_str_pro completed [2.3s, 80.67 MB]
#> + lib_rt dispatched
#> [2026-03-06 06:24:45.342] [INFO ] Preparing retention time libraries
#> [2026-03-06 06:24:45.356] [WARN ] No retention time library found, returning empty retention time and sop tables.
#> [2026-03-06 06:24:45.398] [INFO ] Exporting parameters to: data/interim/params/260306_062445_prepare_libraries_rt.yaml
#> [2026-03-06 06:24:45.400] [INFO ] > Starting: export_output [file=data/interim/libraries/rt/prepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:24:45.402] [INFO ] [OK] Completed: export_output [size_bytes=86] (2ms)
#> [2026-03-06 06:24:45.405] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/rt_prepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:24:45.407] [INFO ] [OK] Completed: export_output [size_bytes=105] (1ms)
#> ✔ lib_rt completed [67ms, 191 B]
#> + par_pre_fea_tab_fil_fea_raw dispatched
#> ✔ par_pre_fea_tab_fil_fea_raw completed [1ms, 451.55 kB]
#> + lib_sop_hmd_pre dispatched
#> [2026-03-06 06:24:45.768] [INFO ] > Starting: prepare_libraries_sop_hmdb [input=data/source/libraries/sop/hmdb/structures.zip]
#> [2026-03-06 06:24:45.794] [WARN ] Empty dataframe in select_sop_columns
#> [2026-03-06 06:24:45.799] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/hmdb_prepared.tsv.gz, n_rows=0]
#> [2026-03-06 06:24:45.800] [INFO ] [OK] Completed: export_output [size_bytes=257] (1ms)
#> [2026-03-06 06:24:45.801] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb [n_pairs=0] (33ms)
#> ✔ lib_sop_hmd_pre completed [35ms, 257 B]
#> + lib_sop_big_pre dispatched
#> [2026-03-06 06:24:45.982] [INFO ] Preparing BiGG structure-organism pairs
#> [2026-03-06 06:25:07.252] [INFO ] > Starting: process_smiles [n_structures=946]
#> [2026-03-06 06:25:07.254] [INFO ] Processing SMILES with RDKit
#> [2026-03-06 06:25:08.158] [INFO ] Processing 945 new SMILES with RDKit
#> [2026-03-06 06:25:08.159] [INFO ] Starting SMILES processing pipeline
#> [2026-03-06 06:25:08.160] [INFO ] Input: /tmp/Rtmp75FvI8/file2710d6d2173.smi
#> [2026-03-06 06:25:08.160] [INFO ] Output: /tmp/Rtmp75FvI8/file2710153e4829.csv.gz
#> [2026-03-06 06:25:08.160] [INFO ] Input file validated: /tmp/Rtmp75FvI8/file2710d6d2173.smi
#> [2026-03-06 06:25:08.160] [INFO ] Output file validated: /tmp/Rtmp75FvI8/file2710153e4829.csv.gz
#> [2026-03-06 06:25:08.160] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-03-06 06:25:08.160] [INFO ] SMILES supplier initialized
#> [2026-03-06 06:25:09.523] [INFO ] Processing complete. Total molecules processed: 945
#> [2026-03-06 06:25:09.564] [INFO ] Successfully processed 945 SMILES
#> [2026-03-06 06:25:09.571] [INFO ] [OK] Completed: process_smiles [n_processed=945] (2.3s)
#> [2026-03-06 06:25:13.066] [INFO ] > Starting: process_smiles [n_structures=755]
#> [2026-03-06 06:25:13.068] [INFO ] Processing SMILES with RDKit
#> [2026-03-06 06:25:13.078] [INFO ] Processing 450 new SMILES with RDKit
#> [2026-03-06 06:25:13.079] [INFO ] Starting SMILES processing pipeline
#> [2026-03-06 06:25:13.080] [INFO ] Input: /tmp/Rtmp75FvI8/file27106bf61a29.smi
#> [2026-03-06 06:25:13.080] [INFO ] Output: /tmp/Rtmp75FvI8/file27106b72ece9.csv.gz
#> [2026-03-06 06:25:13.080] [INFO ] Input file validated: /tmp/Rtmp75FvI8/file27106bf61a29.smi
#> [2026-03-06 06:25:13.080] [INFO ] Output file validated: /tmp/Rtmp75FvI8/file27106b72ece9.csv.gz
#> [2026-03-06 06:25:13.080] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-03-06 06:25:13.080] [INFO ] SMILES supplier initialized
#> [2026-03-06 06:25:13.707] [INFO ] Processing complete. Total molecules processed: 450
#> [2026-03-06 06:25:13.741] [INFO ] Successfully processed 450 SMILES
#> [2026-03-06 06:25:13.747] [INFO ] [OK] Completed: process_smiles [n_processed=450] (681ms)
#> [2026-03-06 06:25:13.820] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/bigg_prepared.tsv.gz, n_rows=858]
#> [2026-03-06 06:25:13.826] [INFO ] [OK] Completed: export_output [size_bytes=31573] (6ms)
#> ✔ lib_sop_big_pre completed [27.8s, 31.57 kB]
#> + lib_sop_clo_pre dispatched
#> [2026-03-06 06:25:14.100] [INFO ] Preparing closed structure-organism pairs library
#> [2026-03-06 06:25:14.101] [WARN ] Closed resource not accessible at: ~/Git/lotus-processor/data/processed/240412_closed_metadata.csv.gz. Returning empty template instead.
#> [2026-03-06 06:25:14.117] [INFO ] Exporting parameters to: data/interim/params/260306_062514_prepare_libraries_sop_closed.yaml
#> [2026-03-06 06:25:14.119] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/closed_prepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:25:14.120] [INFO ] [OK] Completed: export_output [size_bytes=273] (1ms)
#> ✔ lib_sop_clo_pre completed [22ms, 273 B]
#> + lib_spe_exp_int_pre dispatched
#> [2026-03-06 06:25:14.365] [INFO ] > Starting: prepare_libraries_spectra [library_name=internal, n_input_files=1]
#> [2026-03-06 06:25:14.370] [WARN ] Input file(s) not found; creating empty library template
#> [2026-03-06 06:25:15.736] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/internal_prepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:25:15.737] [INFO ] [OK] Completed: export_output [size_bytes=106] (2ms)
#> [2026-03-06 06:25:15.804] [INFO ] Exporting parameters to: data/interim/params/260306_062515_prepare_libraries_spectra.yaml
#> [2026-03-06 06:25:15.805] [INFO ] [OK] Completed: prepare_libraries_spectra [n_structures=1, n_spectra_total=2, files_exported=3] (1.4s)
#> ✔ lib_spe_exp_int_pre completed [1.4s, 1.30 kB]
#> + input_spectra dispatched
#> ✔ input_spectra completed [1ms, 7.77 MB]
#> + lib_rt_rts dispatched
#> ✔ lib_rt_rts completed [0ms, 86 B]
#> + lib_rt_sop dispatched
#> ✔ lib_rt_sop completed [1ms, 105 B]
#> + input_features dispatched
#> ✔ input_features completed [0ms, 451.55 kB]
#> + lib_spe_exp_int_pre_pos dispatched
#> ✔ lib_spe_exp_int_pre_pos completed [0ms, 599 B]
#> + lib_spe_exp_int_pre_neg dispatched
#> ✔ lib_spe_exp_int_pre_neg completed [0ms, 599 B]
#> + lib_spe_exp_int_pre_sop dispatched
#> ✔ lib_spe_exp_int_pre_sop completed [0ms, 106 B]
#> + fea_edg_spe dispatched
#> [2026-03-06 06:25:18.667] [INFO ] > Starting: create_edges_spectra [method=gnps, threshold=0.7, n_input_files=1]
#> [2026-03-06 06:25:18.668] [INFO ] Creating spectral similarity network edges
#> [2026-03-06 06:25:18.670] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-03-06 06:25:18.696] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-03-06 06:25:20.637] [INFO ] Processed 10000 spectra...
#> [2026-03-06 06:25:22.173] [INFO ] Total spectra read: 16282
#> [2026-03-06 06:25:29.338] [INFO ] Loaded 16282 spectra from file
#> [2026-03-06 06:25:29.352] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-03-06 06:25:32.104] [INFO ] Combined replicates: 12195 -> 4087 spectra
#> [2026-03-06 06:25:32.107] [INFO ] Sanitizing 4087 spectra (cutoff: dynamic)
#> [2026-03-06 06:25:33.444] [INFO ] Sanitization complete: 3281/4087 spectra retained (80.3%, 806 removed)
#> [2026-03-06 06:25:33.445] [INFO ] Import complete: 3281 spectra ready for analysis
#> [2026-03-06 06:25:33.446] [INFO ] ======================================
#> [2026-03-06 06:25:33.447] [INFO ] Take yourself a break, you deserve it.
#> [2026-03-06 06:25:33.448] [INFO ] ======================================
#> [2026-03-06 06:25:33.449] [INFO ] > Starting: create_edges [n_spectra=3281, method=gnps, threshold=0.7, min_peaks=6]
#> [2026-03-06 06:25:51.328] [INFO ] Processed 500 / 3280 queries
#> [2026-03-06 06:26:06.036] [INFO ] Processed 1000 / 3280 queries
#> [2026-03-06 06:26:17.817] [INFO ] Processed 1500 / 3280 queries
#> [2026-03-06 06:26:26.925] [INFO ] Processed 2000 / 3280 queries
#> [2026-03-06 06:26:32.879] [INFO ] Processed 2500 / 3280 queries
#> [2026-03-06 06:26:35.978] [INFO ] Processed 3000 / 3280 queries
#> [2026-03-06 06:26:36.479] [INFO ] Here is the distribution of edge similarity scores (0.1 bins) BEFORE filtering:
#> [2026-03-06 06:26:36.481] [INFO ] 
#>        bin       N
#>    [0,0.1] 3973955
#>  (0.1,0.2]  855228
#>  (0.2,0.3]  306308
#>  (0.3,0.4]  127457
#>  (0.4,0.5]   58044
#>  (0.5,0.6]   29495
#>  (0.6,0.7]   15740
#>  (0.7,0.8]    8197
#>  (0.8,0.9]    4629
#>    (0.9,1]    1787
#> [2026-03-06 06:26:36.484] [INFO ] [OK] Completed: create_edges [n_edges=7199, n_comparisons=5380840, pass_rate=0.1%] (1m 3s)
#> [2026-03-06 06:26:36.555] [INFO ] Exporting parameters to: data/interim/params/260306_062636_create_edges_spectra.yaml
#> [2026-03-06 06:26:36.557] [INFO ] > Starting: export_output [file=data/interim/features/example_edgesSpectra.tsv, n_rows=9138]
#> [2026-03-06 06:26:36.560] [INFO ] [OK] Completed: export_output [size_bytes=433440] (3ms)
#> [2026-03-06 06:26:36.561] [INFO ] [OK] Completed: create_edges_spectra [n_edges=9138] (1m 18s)
#> ✔ fea_edg_spe completed [1m 17.9s, 433.44 kB]
#> + fea_pre dispatched
#> [2026-03-06 06:26:36.928] [INFO ] > Starting: prepare_features_tables [input=data/source/example_features.csv, candidates=1]
#> [2026-03-06 06:26:37.043] [INFO ] Prepared 5328 feature-sample pairs
#> [2026-03-06 06:26:37.045] [INFO ] [OK] Completed: prepare_features_tables [n_features=5328] (117ms)
#> [2026-03-06 06:26:37.069] [INFO ] Exporting parameters to: data/interim/params/260306_062637_prepare_features_tables.yaml
#> [2026-03-06 06:26:37.071] [INFO ] > Starting: export_output [file=data/interim/features/example_features.tsv.gz, n_rows=5328]
#> [2026-03-06 06:26:37.085] [INFO ] [OK] Completed: export_output [size_bytes=95629] (14ms)
#> ✔ fea_pre completed [160ms, 95.63 kB]
#> + ann_spe_pos dispatched
#> [2026-03-06 06:26:37.483] [INFO ] ============================================================
#> [2026-03-06 06:26:37.485] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-03-06 06:26:37.486] [INFO ] ============================================================
#> [2026-03-06 06:26:37.487] [INFO ] Checking MGF file...
#> [2026-03-06 06:26:38.620] [INFO ] [OK] MGF file: 12195 MS2 spectra found
#> [2026-03-06 06:26:38.622] [INFO ] ============================================================
#> [2026-03-06 06:26:38.623] [INFO ] [OK] All pre-flight checks passed!
#> [2026-03-06 06:26:38.624] [INFO ] Data validation complete. Ready to proceed.
#> [2026-03-06 06:26:38.625] [INFO ] ============================================================
#> [2026-03-06 06:26:38.626] [INFO ] Starting spectral annotation in pos mode
#> [2026-03-06 06:26:38.628] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-03-06 06:26:38.630] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-03-06 06:26:40.631] [INFO ] Processed 10000 spectra...
#> [2026-03-06 06:26:42.212] [INFO ] Total spectra read: 16282
#> [2026-03-06 06:26:49.366] [INFO ] Loaded 16282 spectra from file
#> [2026-03-06 06:26:49.385] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-03-06 06:26:50.475] [INFO ] Combined replicates: 12195 -> 4087 spectra
#> [2026-03-06 06:26:50.476] [INFO ] Sanitizing 4087 spectra (cutoff: dynamic)
#> [2026-03-06 06:26:52.209] [INFO ] Sanitization complete: 3281/4087 spectra retained (80.3%, 806 removed)
#> [2026-03-06 06:26:52.211] [INFO ] Import complete: 3281 spectra ready for analysis
#> [2026-03-06 06:26:52.212] [INFO ] Importing spectra from: data/interim/libraries/spectra/is/wikidata_5607185_pos.rds
#> [2026-03-06 06:27:16.209] [INFO ] Loaded 998198 spectra from file
#> [2026-03-06 06:27:18.356] [INFO ] Import complete: 998198 spectra ready for analysis
#> [2026-03-06 06:27:18.358] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/internal_pos.rds
#> [2026-03-06 06:27:18.359] [INFO ] Loaded 1 spectra from file
#> [2026-03-06 06:27:18.362] [INFO ] Import complete: 0 spectra ready for analysis
#> [2026-03-06 06:27:18.363] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/gnps_11566051_pos.rds
#> [2026-03-06 06:27:26.089] [INFO ] Loaded 354789 spectra from file
#> [2026-03-06 06:27:26.290] [INFO ] Import complete: 354788 spectra ready for analysis
#> [2026-03-06 06:27:26.292] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/massbank_2025051_pos.rds
#> [2026-03-06 06:27:27.032] [INFO ] Loaded 66388 spectra from file
#> [2026-03-06 06:27:27.080] [INFO ] Import complete: 66388 spectra ready for analysis
#> [2026-03-06 06:27:27.082] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/merlin_13911806_pos.rds
#> [2026-03-06 06:27:29.663] [INFO ] Loaded 208280 spectra from file
#> [2026-03-06 06:27:29.817] [INFO ] Import complete: 208273 spectra ready for analysis
#> [2026-03-06 06:27:43.336] [INFO ] 
#>          library spectra unique_connectivities
#>  ISDB - Wikidata  998198                998198
#>             gnps  354788                 22675
#>           merlin  208273                 26197
#>         massbank   66388                  5901
#> [2026-03-06 06:27:44.130] [INFO ] > Starting: calculate_entropy_similarity [n_library=478616, n_query=3281, method=gnps]
#> [2026-03-06 06:27:44.132] [INFO ] Calculating entropy and similarity for 3281 spectra
#> [2026-03-06 06:27:44.134] [WARN ] Unsanitized library spectra detected. Sanitizing 478616 spectra in-place before scoring. Consider using import_spectra(sanitize = TRUE) upstream.
#> [2026-03-06 06:28:27.878] [INFO ] Processed 500 / 3281 queries
#> [2026-03-06 06:28:42.105] [INFO ] Processed 1000 / 3281 queries
#> [2026-03-06 06:28:55.462] [INFO ] Processed 1500 / 3281 queries
#> [2026-03-06 06:29:09.146] [INFO ] Processed 2000 / 3281 queries
#> [2026-03-06 06:29:22.246] [INFO ] Processed 2500 / 3281 queries
#> [2026-03-06 06:29:34.189] [INFO ] Processed 3000 / 3281 queries
#> [2026-03-06 06:29:40.356] [INFO ] Processed 3281 / 3281 queries
#> [2026-03-06 06:29:40.429] [INFO ] [OK] Completed: calculate_entropy_similarity [n_comparisons=910214] (1m 56s)
#> [2026-03-06 06:29:40.442] [INFO ] > Starting: harmonize_adducts [n_rows=478616]
#> [2026-03-06 06:29:42.134] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=48, n_unique_after=48] (1.7s)
#> [2026-03-06 06:29:43.311] [INFO ] Here is the distribution of annotation similarity scores (0.1 bins):
#> [2026-03-06 06:29:43.313] [INFO ] 
#>        bin      N
#>    [0,0.1] 470278
#>  (0.1,0.2]  54772
#>  (0.2,0.3]  21641
#>  (0.3,0.4]   9789
#>  (0.4,0.5]   4501
#>  (0.5,0.6]   1948
#>  (0.6,0.7]    683
#>  (0.7,0.8]    464
#>  (0.8,0.9]    362
#>    (0.9,1]    171
#> [2026-03-06 06:29:43.369] [INFO ] 296731 Candidates annotated on 3126 features (threshold >= 0).
#> [2026-03-06 06:29:43.372] [INFO ] Exporting parameters to: data/interim/params/260306_062943_annotate_spectra.yaml
#> [2026-03-06 06:29:43.374] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatches_pos.tsv.gz, n_rows=564609]
#> [2026-03-06 06:29:44.931] [INFO ] [OK] Completed: export_output [size_bytes=32602502] (1.6s)
#> ✔ ann_spe_pos completed [3m 7.5s, 32.60 MB]
#> + ann_spe_neg dispatched
#> [2026-03-06 06:29:46.731] [INFO ] ============================================================
#> [2026-03-06 06:29:46.732] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-03-06 06:29:46.733] [INFO ] ============================================================
#> [2026-03-06 06:29:46.734] [INFO ] Checking MGF file...
#> [2026-03-06 06:29:47.766] [INFO ] [OK] MGF file: 12195 MS2 spectra found
#> [2026-03-06 06:29:47.767] [INFO ] ============================================================
#> [2026-03-06 06:29:47.768] [INFO ] [OK] All pre-flight checks passed!
#> [2026-03-06 06:29:47.769] [INFO ] Data validation complete. Ready to proceed.
#> [2026-03-06 06:29:47.769] [INFO ] ============================================================
#> [2026-03-06 06:29:47.770] [INFO ] Starting spectral annotation in neg mode
#> [2026-03-06 06:29:47.771] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-03-06 06:29:47.772] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-03-06 06:29:49.501] [INFO ] Processed 10000 spectra...
#> [2026-03-06 06:29:50.569] [INFO ] Total spectra read: 16282
#> [2026-03-06 06:29:56.078] [INFO ] Loaded 16282 spectra from file
#> [2026-03-06 06:29:56.087] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-03-06 06:29:56.091] [INFO ] Combined replicates: 0 -> 0 spectra
#> [2026-03-06 06:29:56.092] [WARN ] No spectra to sanitize
#> [2026-03-06 06:29:56.093] [INFO ] Import complete: 0 spectra ready for analysis
#> [2026-03-06 06:29:56.094] [WARN ] No query spectra loaded
#> [2026-03-06 06:29:56.097] [INFO ] Exporting parameters to: data/interim/params/260306_062956_annotate_spectra.yaml
#> [2026-03-06 06:29:56.098] [WARN ] Returning empty annotation template
#> [2026-03-06 06:29:56.100] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatches_neg.tsv.gz, n_rows=1]
#> [2026-03-06 06:29:56.102] [INFO ] [OK] Completed: export_output [size_bytes=237] (1ms)
#> ✔ ann_spe_neg completed [9.4s, 237 B]
#> + lib_sop_mer dispatched
#> [2026-03-06 06:29:56.733] [INFO ] > Starting: prepare_libraries_sop_merged [n_libraries=11, filter_enabled=FALSE, filter_level=none]
#> [2026-03-06 06:30:02.061] [INFO ] Splitting SOP library into standardized components
#> [2026-03-06 06:30:03.990] [INFO ] > Starting: process_smiles [n_structures=1348503]
#> [2026-03-06 06:30:03.992] [INFO ] Processing SMILES with RDKit
#> [2026-03-06 06:30:10.792] [INFO ] Processing 141 new SMILES with RDKit
#> [2026-03-06 06:30:10.794] [INFO ] Starting SMILES processing pipeline
#> [2026-03-06 06:30:10.794] [INFO ] Input: /tmp/Rtmp75FvI8/file27105b14352d.smi
#> [2026-03-06 06:30:10.794] [INFO ] Output: /tmp/Rtmp75FvI8/file27107bef0505.csv.gz
#> [2026-03-06 06:30:10.794] [INFO ] Input file validated: /tmp/Rtmp75FvI8/file27105b14352d.smi
#> [2026-03-06 06:30:10.794] [INFO ] Output file validated: /tmp/Rtmp75FvI8/file27107bef0505.csv.gz
#> [2026-03-06 06:30:10.794] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-03-06 06:30:10.794] [INFO ] SMILES supplier initialized
#> [06:30:10] Explicit valence for atom # 1 N, 3, is greater than permitted
#> [06:30:10] ERROR: Could not sanitize molecule on line 134
#> [06:30:10] ERROR: Explicit valence for atom # 1 N, 3, is greater than permitted
#> [06:30:10] Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:10] ERROR: Could not sanitize molecule on line 137
#> [06:30:10] ERROR: Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:10] Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:10] ERROR: Could not sanitize molecule on line 138
#> [06:30:10] ERROR: Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:10] Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:10] ERROR: Could not sanitize molecule on line 139
#> [06:30:10] ERROR: Explicit valence for atom # 0 He, 1, is greater than permitted
#> [06:30:11] Explicit valence for atom # 56 P, 7, is greater than permitted
#> [2026-03-06 06:30:11.030] [WARNING] Failed to process SMILES 'CC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCO[P-]([O])(=O)=O': Explicit valence for atom # 56 P, 7, is greater than permitted
#> [06:30:11] Explicit valence for atom # 7 Si, 6, is greater than permitted
#> [2026-03-06 06:30:11.031] [WARNING] Failed to process SMILES 'C1COCC[NH+]1C[Si-]23(OC(C(O2)(C4=CC=CC=C4)C5=CC=CC=C5)(C6=CC=CC=C6)C7=CC=CC=C7)OC(C(O3)(C8=CC=CC=C8)C9=CC=CC=C9)(C1=CC=CC=C1)C1=CC=CC=C1': Explicit valence for atom # 7 Si, 6, is greater than permitted
#> [06:30:11] Explicit valence for atom # 7 As, 7, is greater than permitted
#> [2026-03-06 06:30:11.032] [WARNING] Failed to process SMILES 'C1=CC=C2C(=C1)O[As-]34(O2)(OC5=CC=CC=C5O3)OC6=CC=CC=C6O4': Explicit valence for atom # 7 As, 7, is greater than permitted
#> [06:30:11] Explicit valence for atom # 4 P, 7, is greater than permitted
#> [2026-03-06 06:30:11.032] [WARNING] Failed to process SMILES '[H][C@](O)(CO[P-]([O])(=O)=O)C=O': Explicit valence for atom # 4 P, 7, is greater than permitted
#> [2026-03-06 06:30:11.033] [WARNING] Batch processing: 4/137 molecules failed
#> [2026-03-06 06:30:11.036] [INFO ] Processing complete. Total molecules processed: 133
#> [2026-03-06 06:30:11.068] [INFO ] Successfully processed 133 SMILES
#> [2026-03-06 06:30:14.495] [INFO ] [OK] Completed: process_smiles [n_processed=1288163] (10.5s)
#> [2026-03-06 06:30:33.062] [INFO ] Referenced structure-organism pairs (662,375)
#> [2026-03-06 06:30:34.848] [INFO ] Structures: 213,721 stereoisomers, 998,094 without stereochemistry, 1,036,478 constitutional isomers
#> [2026-03-06 06:30:50.510] [INFO ] Unique organisms (36,801)
#> [2026-03-06 06:30:50.596] [INFO ] Processing 919 organism name(s) for OTT taxonomy lookup
#> [2026-03-06 06:30:50.895] [INFO ] Querying OTT API in 10 batches
#> [2026-03-06 06:30:53.313] [INFO ] Retrieving detailed taxonomy for 13 unique OTT IDs
#> [2026-03-06 06:30:53.949] [INFO ] Got OTTaxonomy!
#> [2026-03-06 06:30:53.984] [INFO ] Exporting parameters to: data/interim/params/260306_063053_prepare_libraries_sop_merged.yaml
#> [2026-03-06 06:30:53.986] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/keys.tsv.gz, n_rows=662375]
#> [2026-03-06 06:30:54.847] [INFO ] [OK] Completed: export_output [size_bytes=13528697] (861ms)
#> [2026-03-06 06:30:54.849] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/organisms/taxonomies/ott.tsv.gz, n_rows=35896]
#> [2026-03-06 06:30:54.936] [INFO ] [OK] Completed: export_output [size_bytes=939193] (87ms)
#> [2026-03-06 06:30:54.937] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/stereo.tsv.gz, n_rows=1213224]
#> [2026-03-06 06:30:57.914] [INFO ] [OK] Completed: export_output [size_bytes=38337653] (3s)
#> [2026-03-06 06:30:57.915] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/metadata.tsv.gz, n_rows=1257144]
#> [2026-03-06 06:30:59.419] [INFO ] [OK] Completed: export_output [size_bytes=32528295] (1.5s)
#> [2026-03-06 06:30:59.421] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/names.tsv.gz, n_rows=216763]
#> [2026-03-06 06:30:59.838] [INFO ] [OK] Completed: export_output [size_bytes=7424153] (417ms)
#> [2026-03-06 06:30:59.840] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/taxonomies/classyfire.tsv.gz, n_rows=146393]
#> [2026-03-06 06:30:59.987] [INFO ] [OK] Completed: export_output [size_bytes=2492954] (147ms)
#> [2026-03-06 06:30:59.989] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/taxonomies/npc.tsv.gz, n_rows=141815]
#> [2026-03-06 06:31:00.278] [INFO ] [OK] Completed: export_output [size_bytes=2395030] (289ms)
#> [2026-03-06 06:31:00.279] [INFO ] [OK] Completed: prepare_libraries_sop_merged [n_pairs=662375, n_structures=1213224, n_organisms=35896, files_exported=7] (1m 4s)
#> ✔ lib_sop_mer completed [1m 3.5s, 97.65 MB]
#> + edg_spe dispatched
#> ✔ edg_spe completed [0ms, 433.44 kB]
#> + lib_mer_str_met dispatched
#> ✔ lib_mer_str_met completed [0ms, 32.53 MB]
#> + lib_mer_str_nam dispatched
#> ✔ lib_mer_str_nam completed [0ms, 7.42 MB]
#> + lib_mer_str_stereo dispatched
#> ✔ lib_mer_str_stereo completed [1ms, 38.34 MB]
#> + lib_mer_str_tax_cla dispatched
#> ✔ lib_mer_str_tax_cla completed [0ms, 2.49 MB]
#> + lib_mer_str_tax_npc dispatched
#> ✔ lib_mer_str_tax_npc completed [0ms, 2.40 MB]
#> + lib_mer_org_tax_ott dispatched
#> ✔ lib_mer_org_tax_ott completed [0ms, 939.19 kB]
#> + lib_mer_key dispatched
#> ✔ lib_mer_key completed [1ms, 13.53 MB]
#> + ann_spe_pre dispatched
#> [2026-03-06 06:31:04.256] [INFO ] Preparing spectral matching annotations from 2 file(s)
#> [2026-03-06 06:31:07.896] [INFO ] > Starting: complement_metadata [n_input=564609]
#> [2026-03-06 06:31:34.181] [INFO ] [OK] Completed: complement_metadata [n_enriched=564609] (26.3s)
#> [2026-03-06 06:31:34.197] [INFO ] Exporting parameters to: data/interim/params/260306_063134_prepare_annotations_spectra.yaml
#> [2026-03-06 06:31:34.199] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatchesPrepared.tsv.gz, n_rows=564609]
#> [2026-03-06 06:31:36.687] [INFO ] [OK] Completed: export_output [size_bytes=52491326] (2.5s)
#> ✔ ann_spe_pre completed [32.4s, 52.49 MB]
#> + ann_spe_exp_gnp_pre dispatched
#> [2026-03-06 06:31:37.636] [INFO ] > Starting: prepare_annotations_gnps [n_files=1]
#> [2026-03-06 06:31:37.637] [WARN ] No GNPS annotations found, returning an empty file instead
#> [2026-03-06 06:31:37.639] [INFO ] [OK] Completed: prepare_annotations_gnps [n_annotations=1] (4ms)
#> [2026-03-06 06:31:37.657] [INFO ] Exporting parameters to: data/interim/params/260306_063137_prepare_annotations_gnps.yaml
#> [2026-03-06 06:31:37.659] [INFO ] > Starting: export_output [file=data/interim/annotations/example_gnpsPrepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:31:37.660] [INFO ] [OK] Completed: export_output [size_bytes=237] (1ms)
#> ✔ ann_spe_exp_gnp_pre completed [31ms, 237 B]
#> + ann_spe_exp_mzm_pre dispatched
#> [2026-03-06 06:31:38.044] [INFO ] > Starting: prepare_annotations_mzmine [n_files=1]
#> [2026-03-06 06:31:38.045] [WARN ] No mzmine annotations found, returning an empty file instead
#> [2026-03-06 06:31:38.047] [INFO ] [OK] Completed: prepare_annotations_mzmine [n_annotations=1] (4ms)
#> [2026-03-06 06:31:38.062] [INFO ] Exporting parameters to: data/interim/params/260306_063138_prepare_annotations_mzmine.yaml
#> [2026-03-06 06:31:38.064] [INFO ] > Starting: export_output [file=data/interim/annotations/example_mzminePrepared.tsv.gz, n_rows=1]
#> [2026-03-06 06:31:38.065] [INFO ] [OK] Completed: export_output [size_bytes=237] (2ms)
#> ✔ ann_spe_exp_mzm_pre completed [26ms, 237 B]
#> + ann_sir_pre dispatched
#> [2026-03-06 06:31:38.410] [INFO ] > Starting: prepare_annotations_sirius [version=6]
#> [2026-03-06 06:31:38.563] [INFO ] > Starting: complement_metadata [n_input=479]
#> [2026-03-06 06:31:52.690] [INFO ] [OK] Completed: complement_metadata [n_enriched=479] (14.1s)
#> [2026-03-06 06:31:52.699] [INFO ] [OK] Completed: prepare_annotations_sirius [n_canopus=14, n_formulas=14, n_structures=479] (14.3s)
#> [2026-03-06 06:31:52.722] [INFO ] Exporting parameters to: data/interim/params/260306_063152_prepare_annotations_sirius.yaml
#> [2026-03-06 06:31:52.723] [INFO ] > Starting: export_output [file=data/interim/annotations/example_canopusPrepared.tsv.gz, n_rows=14]
#> [2026-03-06 06:31:52.725] [INFO ] [OK] Completed: export_output [size_bytes=784] (1ms)
#> [2026-03-06 06:31:52.726] [INFO ] > Starting: export_output [file=data/interim/annotations/example_formulaPrepared.tsv.gz, n_rows=14]
#> [2026-03-06 06:31:52.727] [INFO ] [OK] Completed: export_output [size_bytes=471] (1ms)
#> [2026-03-06 06:31:52.729] [INFO ] > Starting: export_output [file=data/interim/annotations/example_siriusPrepared.tsv.gz, n_rows=479]
#> [2026-03-06 06:31:52.733] [INFO ] [OK] Completed: export_output [size_bytes=24146] (4ms)
#> ✔ ann_sir_pre completed [14.3s, 25.40 kB]
#> + tax_pre dispatched
#> [2026-03-06 06:31:53.816] [INFO ] > Starting: prepare_taxa [taxon=NULL]
#> [2026-03-06 06:31:53.972] [INFO ] Processing 2 organism name(s) for OTT taxonomy lookup
#> [2026-03-06 06:31:54.094] [INFO ] Querying OTT API in 1 batches
#> [2026-03-06 06:31:54.194] [INFO ] Retrying failed queries using genus names only
#> [2026-03-06 06:31:54.201] [INFO ] Retrying with 1 genus names: blk 
#> [2026-03-06 06:31:54.298] [INFO ] Retrieving detailed taxonomy for 1 unique OTT IDs
#> [2026-03-06 06:31:54.383] [INFO ] Got OTTaxonomy!
#> [2026-03-06 06:31:54.816] [INFO ] [OK] Completed: prepare_taxa [n_features=5328] (1s)
#> [2026-03-06 06:31:54.847] [INFO ] Exporting parameters to: data/interim/params/260306_063154_prepare_taxa.yaml
#> [2026-03-06 06:31:54.849] [INFO ] > Starting: export_output [file=data/interim/taxa/example_taxed.tsv.gz, n_rows=5328]
#> [2026-03-06 06:31:54.855] [INFO ] [OK] Completed: export_output [size_bytes=19697] (6ms)
#> ✔ tax_pre completed [1s, 19.70 kB]
#> + ann_ms1_pre dispatched
#> [2026-03-06 06:31:55.282] [INFO ] > Starting: annotate_masses [ms_mode=pos, tolerance_ppm=10, tolerance_rt=0.02]
#> [2026-03-06 06:31:55.283] [INFO ] Starting mass-based annotation
#> [2026-03-06 06:31:55.284] [INFO ] ============================================================
#> [2026-03-06 06:31:55.285] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-03-06 06:31:55.286] [INFO ] ============================================================
#> [2026-03-06 06:31:55.286] [INFO ] Checking features file...
#> [2026-03-06 06:31:55.323] [INFO ] [OK] Features file: 5328 rows, 5 columns
#> [2026-03-06 06:31:55.324] [INFO ] ============================================================
#> [2026-03-06 06:31:55.325] [INFO ] [OK] All pre-flight checks passed!
#> [2026-03-06 06:31:55.325] [INFO ] Data validation complete. Ready to proceed.
#> [2026-03-06 06:31:55.326] [INFO ] ============================================================
#> [2026-03-06 06:31:55.363] [INFO ] Processing 5328 features for annotation
#> [2026-03-06 06:32:10.568] [INFO ] Already 2112 adducts previously detected
#> [2026-03-06 06:32:10.570] [INFO ] > Starting: harmonize_adducts [n_rows=5328]
#> [2026-03-06 06:32:10.580] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=13, n_unique_after=13] (10ms)
#> [2026-03-06 06:32:10.641] [INFO ] Here are the top 10 observed m/z differences inside the RT windows:
#> [2026-03-06 06:32:10.643] [INFO ] 
#>              bin   N
#>  (4.8501,5.0366] 352
#>  (21.822,22.009] 283
#>   (16.973,17.16] 208
#>  (17.906,18.092] 192
#>  (15.854,16.041] 172
#>    (39.914,40.1] 143
#>  (38.981,39.168] 137
#>  (34.878,35.065] 115
#>  (77.962,78.148] 114
#>  (1.8659,2.0524] 108
#> [2026-03-06 06:32:10.644] [INFO ] These differences may help identify potential preprocessing issues
#> [2026-03-06 06:32:13.805] [WARN ] Some adducts were unproperly detected, defaulting to (de)protonated
#> [2026-03-06 06:32:57.384] [INFO ] > Starting: decorate_masses [n_annotations=173942]
#> [2026-03-06 06:32:57.418] [INFO ] MS1 annotations: 42303 unique structures across 3992 features
#> [2026-03-06 06:32:57.420] [INFO ] [OK] Completed: decorate_masses [n_structures=42303, n_features=3992] (35ms)
#> [2026-03-06 06:32:57.472] [INFO ] Exporting parameters to: data/interim/params/260306_063257_annotate_masses.yaml
#> [2026-03-06 06:32:57.475] [INFO ] > Starting: export_output [file=data/interim/features/example_edgesMasses.tsv, n_rows=2653]
#> [2026-03-06 06:32:57.476] [INFO ] [OK] Completed: export_output [size_bytes=81706] (2ms)
#> [2026-03-06 06:32:57.478] [INFO ] > Starting: export_output [file=data/interim/annotations/example_ms1Prepared.tsv.gz, n_rows=173942]
#> [2026-03-06 06:32:58.179] [INFO ] [OK] Completed: export_output [size_bytes=10225206] (701ms)
#> [2026-03-06 06:32:58.180] [INFO ] [OK] Completed: annotate_masses [n_annotations=173942, n_edges=2653] (1m 3s)
#> ✔ ann_ms1_pre completed [1m 2.9s, 10.31 MB]
#> + ann_sir_pre_can dispatched
#> ✔ ann_sir_pre_can completed [0ms, 784 B]
#> + ann_sir_pre_for dispatched
#> ✔ ann_sir_pre_for completed [0ms, 471 B]
#> + ann_sir_pre_str dispatched
#> ✔ ann_sir_pre_str completed [0ms, 24.15 kB]
#> + ann_ms1_pre_edg dispatched
#> ✔ ann_ms1_pre_edg completed [0ms, 81.71 kB]
#> + ann_ms1_pre_ann dispatched
#> ✔ ann_ms1_pre_ann completed [0ms, 10.23 MB]
#> + fea_edg_pre dispatched
#> [2026-03-06 06:33:00.678] [INFO ] > Starting: prepare_features_edges [n_edge_types=2]
#> [2026-03-06 06:33:00.715] [INFO ] [OK] Completed: prepare_features_edges [n_edges=14089] (37ms)
#> [2026-03-06 06:33:00.735] [INFO ] Exporting parameters to: data/interim/params/260306_063300_prepare_features_edges.yaml
#> [2026-03-06 06:33:00.737] [INFO ] > Starting: export_output [file=data/interim/features/example_edges.tsv, n_rows=14089]
#> [2026-03-06 06:33:00.740] [INFO ] [OK] Completed: export_output [size_bytes=637995] (3ms)
#> ✔ fea_edg_pre completed [64ms, 638.00 kB]
#> + ann_fil dispatched
#> [2026-03-06 06:33:01.087] [INFO ] > Starting: filter_annotations [n_annotation_files=5, tolerance_rt=Inf]
#> [2026-03-06 06:33:01.089] [INFO ] Filtering annotations
#> [2026-03-06 06:33:01.129] [INFO ] Processing 5328 unique features for annotation filtering
#> [2026-03-06 06:33:05.636] [INFO ] Removing MS1 annotations superseded by spectral matches
#> [2026-03-06 06:33:08.534] [INFO ] Removed 69095 redundant MS1 annotations
#> [2026-03-06 06:33:08.535] [INFO ] Total annotations before RT filtering: 669937
#> [2026-03-06 06:33:09.784] [INFO ] Filtering annotations outside Inf min RT tolerance
#> [2026-03-06 06:33:12.271] [INFO ] Removed 0 annotations based on retention time tolerance
#> [2026-03-06 06:33:12.499] [INFO ] Exporting parameters to: data/interim/params/260306_063312_filter_annotations.yaml
#> [2026-03-06 06:33:12.501] [INFO ] > Starting: export_output [file=data/interim/annotations/example_annotationsFiltered.tsv.gz, n_rows=670590]
#> [2026-03-06 06:33:15.160] [INFO ] [OK] Completed: export_output [size_bytes=50581870] (2.7s)
#> [2026-03-06 06:33:15.161] [INFO ] [OK] Completed: filter_annotations [n_filtered=670590] (14.1s)
#> ✔ ann_fil completed [14.1s, 50.58 MB]
#> + fea_com dispatched
#> [2026-03-06 06:33:15.896] [INFO ] > Starting: create_components [n_input_files=1]
#> [2026-03-06 06:33:15.897] [INFO ] Creating components from 1 edge file(s)
#> [2026-03-06 06:33:15.910] [INFO ] Loaded 12217 edges connecting 4537 unique features
#> [2026-03-06 06:33:15.918] [INFO ] Found 1481 components
#> [2026-03-06 06:33:15.929] [INFO ] Component sizes - Min: 1, Max: 2068, Mean: 3.1
#> [2026-03-06 06:33:15.945] [INFO ] Exporting parameters to: data/interim/params/260306_063315_create_components.yaml
#> [2026-03-06 06:33:15.947] [INFO ] > Starting: export_output [file=data/interim/features/example_components.tsv, n_rows=4537]
#> [2026-03-06 06:33:15.949] [INFO ] [OK] Completed: export_output [size_bytes=38686] (2ms)
#> [2026-03-06 06:33:15.950] [INFO ] Components written to: data/interim/features/example_components.tsv
#> [2026-03-06 06:33:15.951] [INFO ] [OK] Completed: create_components [n_components=1481, n_features=4537] (55ms)
#> ✔ fea_com completed [59ms, 38.69 kB]
#> + int_com dispatched
#> ✔ int_com completed [0ms, 38.69 kB]
#> + fea_com_pre dispatched
#> [2026-03-06 06:33:16.640] [INFO ] > Starting: prepare_features_components [n_files=1]
#> [2026-03-06 06:33:16.645] [INFO ] [OK] Completed: prepare_features_components [n_assignments=4537] (5ms)
#> [2026-03-06 06:33:16.661] [INFO ] Exporting parameters to: data/interim/params/260306_063316_prepare_features_components.yaml
#> [2026-03-06 06:33:16.663] [INFO ] > Starting: export_output [file=data/interim/features/example_componentsPrepared.tsv, n_rows=4537]
#> [2026-03-06 06:33:16.665] [INFO ] [OK] Completed: export_output [size_bytes=38681] (2ms)
#> ✔ fea_com_pre completed [27ms, 38.68 kB]
#> + ann_wei dispatched
#> [2026-03-06 06:33:17.007] [INFO ] Starting annotation weighting and scoring
#> [2026-03-06 06:33:17.009] [INFO ] > Starting: weight_annotations [n_candidates_neighbors=16, n_candidates_final=1]
#> [2026-03-06 06:33:31.998] [INFO ] 
#>  candidate_library      n
#>    ISDB - Wikidata 519661
#>           TIMA MS1  79027
#>               gnps  21563
#>             merlin  20091
#>           massbank   3195
#>             SIRIUS    479
#> [2026-03-06 06:33:38.636] [INFO ] > Starting: weight_bio [n_annotations=629108, n_sop=663233]
#> [2026-03-06 06:33:38.638] [INFO ] Weighting 629108 annotations by biological source
#> [2026-03-06 06:33:43.144] [INFO ] [OK] Completed: weight_bio [n_weighted=629108] (4.5s)
#> [2026-03-06 06:33:43.146] [INFO ] > Starting: decorate_bio [n_annotations=629108]
#> [2026-03-06 06:33:43.396] [INFO ] Taxonomically informed metabolite annotation reranked:
#>     Kingdom level: 39420 structures
#>     Phylum level:  38975 structures
#>     Class level:   33424 structures
#>     Order level:   9118 structures
#>     Family level:  7333 structures
#>     Tribe level:   1193 structures
#>     Genus level:   932 structures
#>     Species level: 423 structures
#>     Variety level: 31 structures
#>     Biota level:   31 structures
#> [2026-03-06 06:33:43.397] [INFO ] [OK] Completed: decorate_bio [n_processed=629108] (251ms)
#> [2026-03-06 06:33:43.399] [INFO ] > Starting: clean_bio [n_annotations=629108, minimal_consistency=0]
#> [2026-03-06 06:34:02.307] [INFO ] [OK] Completed: clean_bio [n_cleaned=629106] (18.9s)
#> [2026-03-06 06:34:02.309] [INFO ] > Starting: weight_chemo [n_input=629106]
#> [2026-03-06 06:34:02.310] [INFO ] Weighting 629106 annotations by chemical consistency
#> [2026-03-06 06:34:03.516] [INFO ] [OK] Completed: weight_chemo [n_weighted=629106] (1.2s)
#> [2026-03-06 06:34:03.518] [INFO ] > Starting: decorate_chemo [n_annotations=629106]
#> [2026-03-06 06:34:04.467] [INFO ] Chemically informed metabolite annotation reranked:
#>   Classyfire:
#>     Kingdom level:    27942 structures
#>     Superclass level: 27840 structures
#>     Class level:      25659 structures
#>     Parent level:     19675 structures
#>   NPClassifier:
#>     Pathway level:    28144 structures
#>     Superclass level: 27331 structures
#>     Class level:      19793 structures
#> [2026-03-06 06:34:04.469] [INFO ] [OK] Completed: decorate_chemo [n_processed=629106] (951ms)
#> [2026-03-06 06:34:04.470] [INFO ] > Starting: clean_chemo [n_annotations=629106, candidates_final=1, high_confidence=FALSE]
#> [2026-03-06 06:34:13.513] [INFO ] Sampling candidates for 3071 features with more than 7 candidates per score
#> [2026-03-06 06:34:13.594] [INFO ] > Starting: filter_high_confidence [n_input=17362, context=filtered]
#> [2026-03-06 06:34:13.611] [INFO ] [filtered]  Removed 16135 low-confidence candidates (92.9% of 17362 total)
#> [2026-03-06 06:34:13.612] [INFO ] [filtered]  1227 high-confidence candidates remaining (7.1%)
#> [2026-03-06 06:34:13.613] [INFO ] [OK] Completed: filter_high_confidence [n_filtered=1227, n_removed=16135] (19ms)
#> [2026-03-06 06:34:13.616] [INFO ] Summarizing annotation results
#> [2026-03-06 06:34:19.769] [INFO ] Annotated features: 567/5328 (10.6%)
#> [2026-03-06 06:34:19.870] [INFO ] Summarizing annotation results
#> [2026-03-06 06:34:38.687] [INFO ] Annotated features: 4673/5328 (87.7%)
#> [2026-03-06 06:34:39.103] [INFO ] [OK] Completed: clean_chemo [n_final_full=437524, n_final_filtered=5847, n_final_mini=5847, n_features=5328] (34.6s)
#> [2026-03-06 06:34:39.105] [INFO ] [OK] Completed: weight_annotations [n_annotations=NULL] (1m 22s)
#> [2026-03-06 06:34:39.127] [INFO ] Exporting parameters to: data/processed/20260306_063439_example/260306_063439_prepare_params.yaml
#> [2026-03-06 06:34:39.148] [INFO ] Exporting parameters to: data/processed/20260306_063439_example/260306_063439_prepare_params_advanced.yaml
#> [2026-03-06 06:34:39.150] [INFO ] > Starting: export_output [file=data/processed/20260306_063439_example/example_results_mini.tsv, n_rows=5847]
#> [2026-03-06 06:34:39.154] [INFO ] [OK] Completed: export_output [size_bytes=817251] (3ms)
#> [2026-03-06 06:34:39.155] [INFO ] > Starting: export_output [file=data/processed/20260306_063439_example/example_results_filtered.tsv, n_rows=5847]
#> [2026-03-06 06:34:39.160] [INFO ] [OK] Completed: export_output [size_bytes=1753669] (5ms)
#> [2026-03-06 06:34:39.162] [INFO ] > Starting: export_output [file=data/processed/20260306_063439_example/example_results.tsv, n_rows=437524]
#> [2026-03-06 06:34:39.731] [INFO ] [OK] Completed: export_output [size_bytes=268946817] (569ms)
#> [2026-03-06 06:34:39.732] [INFO ] Results exported: example_results.tsv
#> ✔ ann_wei completed [1m 22.7s, 270.70 MB]
#> ✔ ended pipeline [12m 47.1s, 134 completed, 0 skipped]
#> There were 14 warnings (use warnings() to see them)

The final exported file is formatted in order to be easily imported in Cytoscape to further explore your data!

We hope you enjoyed using TIMA and are pleased to hear from you!

For any remark or suggestion, please fill an issue or feel free to contact us directly.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{rutz2026,
  author = {Rutz, Adriano},
  title = {3 {Performing} {Taxonomically} {Informed} {Metabolite}
    {Annotation}},
  date = {2026-03-06},
  url = {https://taxonomicallyinformedannotation.github.io/tima/vignettes/articles/III-processing.html},
  langid = {en}
}

For attribution, please cite this work as:

Rutz, Adriano. 2026. “3 Performing Taxonomically Informed Metabolite Annotation.” March 6, 2026. https://taxonomicallyinformedannotation.github.io/tima/vignettes/articles/III-processing.html.