3 Performing Taxonomically Informed Metabolite Annotation

Author

Adriano Rutz

Published

May 27, 2026

This vignette describes how Taxonomically Informed Metabolite Annotation is performed. If you followed all previous steps successfully, this should be a piece of cake, you deserve it!

tima::run_tima()
#> + par_def_pre_lib_sop_hmd dispatched
#> ✔ par_def_pre_lib_sop_hmd completed [7ms, 492 B]
#> + par_def_pre_lib_sop_mer dispatched
#> ✔ par_def_pre_lib_sop_mer completed [1ms, 6.75 kB]
#> + par_def_pre_ann_sir dispatched
#> ✔ par_def_pre_ann_sir completed [1ms, 1.97 kB]
#> + par_def_pre_lib_sop_clo dispatched
#> ✔ par_def_pre_lib_sop_clo completed [1ms, 523 B]
#> + par_def_pre_tax dispatched
#> ✔ par_def_pre_tax completed [0ms, 1.51 kB]
#> + par_def_pre_ann_gnp dispatched
#> ✔ par_def_pre_ann_gnp completed [1ms, 1.31 kB]
#> + par_def_cre_edg_spe dispatched
#> ✔ par_def_cre_edg_spe completed [1ms, 1.52 kB]
#> + par_def_pre_ann_spe dispatched
#> ✔ par_def_pre_ann_spe completed [1ms, 1.35 kB]
#> + par_def_pre_ann_mzt dispatched
#> ✔ par_def_pre_ann_mzt completed [1ms, 1.19 kB]
#> + par_def_pre_fea_edg dispatched
#> ✔ par_def_pre_fea_edg completed [1ms, 706 B]
#> + par_def_ann_spe dispatched
#> ✔ par_def_ann_spe completed [0ms, 2.39 kB]
#> + par_def_pre_lib_sop_lot dispatched
#> ✔ par_def_pre_lib_sop_lot completed [1ms, 494 B]
#> + par_def_pre_fea_tab dispatched
#> ✔ par_def_pre_fea_tab completed [1ms, 860 B]
#> + par_def_pre_lib_spe dispatched
#> ✔ par_def_pre_lib_spe completed [1ms, 1.58 kB]
#> + par_def_pre_lib_sop_ecm dispatched
#> ✔ par_def_pre_lib_sop_ecm completed [1ms, 492 B]
#> + par_def_pre_ann_mzm dispatched
#> ✔ par_def_pre_ann_mzm completed [0ms, 1.32 kB]
#> + par_def_ann_mas dispatched
#> ✔ par_def_ann_mas completed [1ms, 6.44 kB]
#> + par_def_pre_lib_rt dispatched
#> ✔ par_def_pre_lib_rt completed [0ms, 2.20 kB]
#> + par_def_exp_mzt dispatched
#> ✔ par_def_exp_mzt completed [0ms, 1.76 kB]
#> + yaml_paths dispatched
#> ✔ yaml_paths completed [1ms, 17.23 kB]
#> + par_def_pre_lib_sop_big dispatched
#> ✔ par_def_pre_lib_sop_big completed [1ms, 314 B]
#> + par_def_fil_ann dispatched
#> ✔ par_def_fil_ann completed [0ms, 1.34 kB]
#> + par_def_cre_com dispatched
#> ✔ par_def_cre_com completed [1ms, 375 B]
#> + par_def_wei_ann dispatched
#> ✔ par_def_wei_ann completed [1ms, 5.34 kB]
#> + par_def_pre_fea_com dispatched
#> ✔ par_def_pre_fea_com completed [1ms, 358 B]
#> + paths dispatched
#> ✔ paths completed [2ms, 3.17 kB]
#> + lib_sop_ecm dispatched
#> [2026-05-27 08:54:01.259] [INFO ] > Starting: download_file [url=https://ecmdb.ca/download/ecmdb.json.zip, destination=data/source/libraries/sop/ecmdb.json.zip]
#> [2026-05-27 08:54:01.675] [INFO ] [OK] Completed: download_file [size_bytes=1334921] (386ms)
#> ✔ lib_sop_ecm completed [450ms, 1.33 MB]
#> + lib_spe_is_nor_pre_pos dispatched
#> [2026-05-27 08:54:01.779] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/isdbnormansusdat_14854025_pos.rds, destination=data/interim/libraries/spectra/is/isdbnormansusdat_14854025_pos.rds]
#> [2026-05-27 08:54:03.388] [INFO ] [OK] Completed: download_file [size_bytes=47223884] (1.6s)
#> ✔ lib_spe_is_nor_pre_pos completed [1.6s, 47.22 MB]
#> + par_pre_par dispatched
#> ✔ par_pre_par completed [0ms, 1.69 kB]
#> + par_pre_par2 dispatched
#> ✔ par_pre_par2 completed [0ms, 28.39 kB]
#> + lib_spe_exp_gnp_pre_pos dispatched
#> [2026-05-27 08:54:03.675] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/gnps_11566051_pos.rds, destination=data/interim/libraries/spectra/exp/gnps_11566051_pos.rds]
#> Downloading  18% ■■■■■■                            5s
#> Downloading  52% ■■■■■■■■■■■■■■■■■                 3s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:54:09.697] [INFO ] [OK] Completed: download_file [size_bytes=341237933] (6s)
#> ✔ lib_spe_exp_gnp_pre_pos completed [6s, 341.24 MB]
#> + lib_spe_exp_mb_pre_pos dispatched
#> [2026-05-27 08:54:09.919] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/massbank_202510_pos.rds, destination=data/interim/libraries/spectra/exp/massbank_202510_pos.rds]
#> [2026-05-27 08:54:10.646] [INFO ] [OK] Completed: download_file [size_bytes=17559329] (727ms)
#> ✔ lib_spe_exp_mb_pre_pos completed [729ms, 17.56 MB]
#> + lib_spe_exp_mer_pre_pos dispatched
#> [2026-05-27 08:54:10.757] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/merlin_16984129_pos.rds, destination=data/interim/libraries/spectra/exp/merlin_16984129_pos.rds]
#> Downloading  15% ■■■■■■                            6s
#> Downloading  31% ■■■■■■■■■■                        4s
#> Downloading  81% ■■■■■■■■■■■■■■■■■■■■■■■■■         1s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:54:16.486] [INFO ] [OK] Completed: download_file [size_bytes=158380143] (5.7s)
#> ✔ lib_spe_exp_mer_pre_pos completed [5.7s, 158.38 MB]
#> + lib_spe_is_wik_pre_pos dispatched
#> [2026-05-27 08:54:16.646] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-isdb-pos/raw/main/wikidata_5607185_pos.rds, destination=data/interim/libraries/spectra/is/wikidata_5607185_pos.rds]
#> Downloading   6% ■■■                              28s
#> Downloading  14% ■■■■■                            30s
#> Downloading  26% ■■■■■■■■■                        23s
#> Downloading  39% ■■■■■■■■■■■■■                    17s
#> Downloading  52% ■■■■■■■■■■■■■■■■■                13s
#> Downloading  65% ■■■■■■■■■■■■■■■■■■■■              9s
#> Downloading  78% ■■■■■■■■■■■■■■■■■■■■■■■■          6s
#> Downloading  91% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      2s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:54:42.137] [INFO ] [OK] Completed: download_file [size_bytes=1097022129] (25.5s)
#> ✔ lib_spe_is_wik_pre_pos completed [25.5s, 1.10 GB]
#> + lib_spe_exp_mb_pre_sop dispatched
#> [2026-05-27 08:54:42.622] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/massbank_202510_prepared.tsv.gz, destination=data/interim/libraries/sop/massbank_202510_prepared.tsv.gz]
#> [2026-05-27 08:54:42.964] [INFO ] [OK] Completed: download_file [size_bytes=158982] (342ms)
#> ✔ lib_spe_exp_mb_pre_sop completed [344ms, 158.98 kB]
#> + lib_spe_is_wik_pre_sop dispatched
#> [2026-05-27 08:54:43.061] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-example-files/raw/main/wikidata_spectral_5607185_prepared.tsv.gz, destination=data/interim/libraries/sop/wikidata_5607185_prepared.tsv.gz]
#> [2026-05-27 08:54:43.347] [INFO ] [OK] Completed: download_file [size_bytes=15074639] (285ms)
#> ✔ lib_spe_is_wik_pre_sop completed [288ms, 15.07 MB]
#> + lib_spe_is_nor_pre_neg dispatched
#> [2026-05-27 08:54:43.448] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/isdbnormansusdat_14854025_neg.rds, destination=data/interim/libraries/spectra/is/isdbnormansusdat_14854025_neg.rds]
#> [2026-05-27 08:54:44.375] [INFO ] [OK] Completed: download_file [size_bytes=34220848] (927ms)
#> ✔ lib_spe_is_nor_pre_neg completed [929ms, 34.22 MB]
#> + lib_sop_lot dispatched
#> [2026-05-27 08:54:44.483] [INFO ] Retrieving latest version from Zenodo: 10.5281/zenodo.5794106
#> [2026-05-27 08:54:45.344] [INFO ] Downloading 260413_frozen_metadata.csv.gz from https://doi.org/10.5281/zenodo.5794106
#> [2026-05-27 08:54:45.346] [INFO ] > Starting: download_file [url=https://zenodo.org/api/records/19360665/files/260413_frozen_metadata.csv.gz/content, destination=data/source/libraries/sop/lotus.csv.gz]
#> [2026-05-27 08:55:01.966] [INFO ] [OK] Completed: download_file [size_bytes=90298678] (16.6s)
#> [2026-05-27 08:55:01.967] [INFO ] Download completed: data/source/libraries/sop/lotus.csv.gz
#> ✔ lib_sop_lot completed [17.5s, 90.30 MB]
#> + lib_spe_exp_gnp_pre_neg dispatched
#> [2026-05-27 08:55:02.099] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/gnps_11566051_neg.rds, destination=data/interim/libraries/spectra/exp/gnps_11566051_neg.rds]
#> [2026-05-27 08:55:03.837] [INFO ] [OK] Completed: download_file [size_bytes=91828026] (1.7s)
#> ✔ lib_spe_exp_gnp_pre_neg completed [1.7s, 91.83 MB]
#> + lib_spe_exp_mb_pre_neg dispatched
#> [2026-05-27 08:55:03.971] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/massbank_202510_neg.rds, destination=data/interim/libraries/spectra/exp/massbank_202510_neg.rds]
#> [2026-05-27 08:55:04.724] [INFO ] [OK] Completed: download_file [size_bytes=5972761] (753ms)
#> ✔ lib_spe_exp_mb_pre_neg completed [755ms, 5.97 MB]
#> + lib_spe_exp_mer_pre_neg dispatched
#> [2026-05-27 08:55:04.825] [INFO ] > Starting: download_file [url=https://github.com/adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/spectra/exp/merlin_16984129_neg.rds, destination=data/interim/libraries/spectra/exp/merlin_16984129_neg.rds]
#> [2026-05-27 08:55:06.179] [INFO ] [OK] Completed: download_file [size_bytes=52926499] (1.4s)
#> ✔ lib_spe_exp_mer_pre_neg completed [1.4s, 52.93 MB]
#> + lib_spe_is_wik_pre_neg dispatched
#> [2026-05-27 08:55:06.293] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-isdb-neg/raw/main/wikidata_5607185_neg.rds, destination=data/interim/libraries/spectra/is/wikidata_5607185_neg.rds]
#> Downloading   8% ■■■                              11s
#> Downloading  20% ■■■■■■■                          13s
#> Downloading  40% ■■■■■■■■■■■■■                     9s
#> Downloading  57% ■■■■■■■■■■■■■■■■■■                7s
#> Downloading  75% ■■■■■■■■■■■■■■■■■■■■■■■■          4s
#> Downloading  91% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      1s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:55:23.379] [INFO ] [OK] Completed: download_file [size_bytes=874199749] (17.1s)
#> ✔ lib_spe_is_wik_pre_neg completed [17.1s, 874.20 MB]
#> + lib_spe_exp_mer_pre_sop dispatched
#> [2026-05-27 08:55:23.790] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/merlin_16984129_prepared.tsv.gz, destination=data/interim/libraries/sop/merlin_16984129_prepared.tsv.gz]
#> [2026-05-27 08:55:24.094] [INFO ] [OK] Completed: download_file [size_bytes=823107] (304ms)
#> ✔ lib_spe_exp_mer_pre_sop completed [306ms, 823.11 kB]
#> + lib_sop_hmd_fam_raw dispatched
#> [2026-05-27 08:55:24.202] [INFO ] > Starting: download_file [url=https://www.csfmetabolome.ca/system/downloads/current/csf_metabolites_structures.zip, destination=data/source/libraries/sop/csfmetabolome/structures.zip]
#> [2026-05-27 08:55:24.359] [INFO ] [OK] Completed: download_file [size_bytes=251502] (157ms)
#> [2026-05-27 08:55:24.360] [INFO ] > Starting: download_file [url=https://www.fecalmetabolome.ca/system/downloads/current/feces_metabolites_structures.zip, destination=data/source/libraries/sop/fecalmetabolome/structures.zip]
#> [2026-05-27 08:55:24.677] [INFO ] [OK] Completed: download_file [size_bytes=3201305] (317ms)
#> [2026-05-27 08:55:24.679] [INFO ] > Starting: download_file [url=https://www.salivametabolome.ca/system/downloads/current/saliva_metabolites_structures.zip, destination=data/source/libraries/sop/salivametabolome/structures.zip]
#> [2026-05-27 08:55:24.867] [INFO ] [OK] Completed: download_file [size_bytes=622845] (188ms)
#> [2026-05-27 08:55:24.869] [INFO ] > Starting: download_file [url=https://www.serummetabolome.ca/system/downloads/current/serum_metabolites_structures.zip, destination=data/source/libraries/sop/serummetabolome/structures.zip]
#> [2026-05-27 08:55:25.365] [INFO ] [OK] Completed: download_file [size_bytes=12023792] (496ms)
#> [2026-05-27 08:55:25.367] [INFO ] > Starting: download_file [url=https://www.sweatmetabolome.ca/system/downloads/current/sweat_metabolites_structures.zip, destination=data/source/libraries/sop/sweatmetabolome/structures.zip]
#> [2026-05-27 08:55:25.476] [INFO ] [OK] Completed: download_file [size_bytes=42618] (109ms)
#> [2026-05-27 08:55:25.477] [INFO ] > Starting: download_file [url=https://www.urinemetabolome.ca/system/downloads/current/urine_metabolites_structures.zip, destination=data/source/libraries/sop/urinemetabolome/structures.zip]
#> [2026-05-27 08:55:25.746] [INFO ] [OK] Completed: download_file [size_bytes=2654043] (269ms)
#> [2026-05-27 08:55:25.748] [INFO ] > Starting: download_file [url=https://mcdb.ca/system/downloads/current/milk_metabolites_structures.zip, destination=data/source/libraries/sop/mcdb/structures.zip]
#> [2026-05-27 08:55:25.854] [WARN ] file download failed (attempt 1/3), retrying in 1s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:26.894] [WARN ] file download failed (attempt 2/3), retrying in 2s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:28.983] [WARN ] HMDB family download failed: file download failed
#> ✖ x file download failed after retries Expected: Successful operation Received:
#>   HTTP 403 Forbidden. Reason: Tried 3 times with exponential backoff Fix:
#>   Possible solutions: 1. Check network connection 2. Verify server/service is
#>   available 3. Check authentication credentials 4. Try again later if service
#>   is down 5. Increase max_attempts if transient failures are common
#> [2026-05-27 08:55:28.984] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> [2026-05-27 08:55:29.004] [INFO ] > Starting: download_file [url=https://smpdb.ca/downloads/smpdb_structures.zip, destination=data/source/libraries/sop/smpdb/structures.zip]
#> Downloading  38% ■■■■■■■■■■■■                      2s
#> Downloading  98% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    0s
#> [2026-05-27 08:55:30.845] [INFO ] [OK] Completed: download_file [size_bytes=23382536] (1.8s)
#> [2026-05-27 08:55:30.846] [INFO ] > Starting: download_file [url=https://mimedb.org/system/downloads/2.0/mimedb.sdf.zip, destination=data/source/libraries/sop/mimedb/structures.zip]
#> [2026-05-27 08:55:31.089] [WARN ] file download failed (attempt 1/3), retrying in 1s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:32.128] [WARN ] file download failed (attempt 2/3), retrying in 2s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:34.214] [WARN ] HMDB family download failed: file download failed
#> ✖ x file download failed after retries Expected: Successful operation Received:
#>   HTTP 403 Forbidden. Reason: Tried 3 times with exponential backoff Fix:
#>   Possible solutions: 1. Check network connection 2. Verify server/service is
#>   available 3. Check authentication credentials 4. Try again later if service
#>   is down 5. Increase max_attempts if transient failures are common
#> [2026-05-27 08:55:34.216] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> [2026-05-27 08:55:34.219] [INFO ] > Starting: download_file [url=https://t3db.ca/system/downloads/current/structures.zip, destination=data/source/libraries/sop/t3db/structures.zip]
#> [2026-05-27 08:55:34.298] [WARN ] file download failed (attempt 1/3), retrying in 1s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:35.340] [WARN ] file download failed (attempt 2/3), retrying in 2s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:37.431] [WARN ] HMDB family download failed: file download failed
#> ✖ x file download failed after retries Expected: Successful operation Received:
#>   HTTP 403 Forbidden. Reason: Tried 3 times with exponential backoff Fix:
#>   Possible solutions: 1. Check network connection 2. Verify server/service is
#>   available 3. Check authentication credentials 4. Try again later if service
#>   is down 5. Increase max_attempts if transient failures are common
#> [2026-05-27 08:55:37.432] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> [2026-05-27 08:55:37.436] [INFO ] > Starting: download_file [url=https://bovinedb.ca/system/downloads/current/structures.zip, destination=data/source/libraries/sop/bovinedb/structures.zip]
#> [2026-05-27 08:55:38.008] [INFO ] [OK] Completed: download_file [size_bytes=19260214] (572ms)
#> [2026-05-27 08:55:38.010] [INFO ] > Starting: download_file [url=https://www.ymdb.ca/system/downloads/current/ymdb.sdf.zip, destination=data/source/libraries/sop/ymdb/structures.zip]
#> [2026-05-27 08:55:38.102] [INFO ] [OK] Completed: download_file [size_bytes=1200611] (93ms)
#> [2026-05-27 08:55:38.104] [INFO ] > Starting: download_file [url=https://cannabisdatabase.ca/simple/download_compound_as_sdf, destination=data/source/libraries/sop/cannabisdatabase/compounds.sdf]
#> [2026-05-27 08:55:38.212] [WARN ] file download failed (attempt 1/3), retrying in 1s: Failed to perform HTTP request.
#> Caused by error in `curl::curl_fetch_disk()`:
#> ! SSL peer certificate or SSH remote key was not OK [cannabisdatabase.ca]:
#> SSL certificate problem: certificate has expired
#> [2026-05-27 08:55:39.283] [WARN ] file download failed (attempt 2/3), retrying in 2s: Failed to perform HTTP request.
#> Caused by error in `curl::curl_fetch_disk()`:
#> ! SSL peer certificate or SSH remote key was not OK [cannabisdatabase.ca]:
#> SSL certificate problem: certificate has expired
#> [2026-05-27 08:55:41.402] [WARN ] HMDB family download failed: file download failed
#> ✖ x file download failed after retries Expected: Successful operation Received:
#>   Failed to perform HTTP request. Caused by error in `curl::curl_fetch_disk()`:
#>   ! SSL peer certificate or SSH remote key was not OK [cannabisdatabase.ca]:
#>   SSL certificate problem: certificate has expired Reason: Tried 3 times with
#>   exponential backoff Fix: Possible solutions: 1. Check network connection 2.
#>   Verify server/service is available 3. Check authentication credentials 4. Try
#>   again later if service is down 5. Increase max_attempts if transient failures
#>   are common
#> [2026-05-27 08:55:41.403] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> [2026-05-27 08:55:41.406] [WARN ] Failed to create zip file, trying alternative method
#>  zip warning: missing end signature--probably not a zip file (did you
#>  zip warning: remember to use binary mode when you transferred it?)
#>  zip warning: (if you are trying to read a damaged archive try -F)
#> 
#> zip error: Zip file structure invalid (compounds.sdf)
#> ✔ lib_sop_hmd_fam_raw completed [17.2s, 62.64 MB]
#> + lib_xrefs dispatched
#> [2026-05-27 08:55:41.536] [INFO ] Fetching compound cross-references from Wikidata / QLever
#> [2026-05-27 08:55:41.537] [INFO ] > Starting: get_compounds_xrefs [(no parameters)]
#> [2026-05-27 08:55:43.714] [WARN ] QLever request failed (possibly transient upstream error). Writing empty xrefs file: compounds.tsv.gz
#> [2026-05-27 08:55:43.744] [INFO ] > Starting: export_output [file=data/interim/xrefs/compounds.tsv.gz, n_rows=0]
#> [2026-05-27 08:55:43.746] [INFO ] [OK] Completed: export_output [size_bytes=35] (2ms)
#> ✔ lib_xrefs completed [2.2s, 35 B]
#> + lib_sop_hmd dispatched
#> [2026-05-27 08:55:43.872] [INFO ] > Starting: download_file [url=https://hmdb.ca/system/downloads/current/structures.zip, destination=data/source/libraries/sop/hmdb/structures.zip]
#> [2026-05-27 08:55:43.969] [WARN ] file download failed (attempt 1/3), retrying in 1s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:45.008] [WARN ] file download failed (attempt 2/3), retrying in 2s: HTTP 403 Forbidden.
#> [2026-05-27 08:55:47.101] [WARN ] HMDB download failed: file download failed
#> ✖ x file download failed after retries Expected: Successful operation Received:
#>   HTTP 403 Forbidden. Reason: Tried 3 times with exponential backoff Fix:
#>   Possible solutions: 1. Check network connection 2. Verify server/service is
#>   available 3. Check authentication credentials 4. Try again later if service
#>   is down 5. Increase max_attempts if transient failures are common
#> [2026-05-27 08:55:47.102] [WARN ] HMDB download failed. Creating minimal placeholder SDF file.
#> ✔ lib_sop_hmd completed [3.2s, 340 B]
#> + test_spectra_mini dispatched
#> ✔ test_spectra_mini completed [0ms, 7.77 MB]
#> + lib_spe_exp_gnp_pre_sop dispatched
#> [2026-05-27 08:55:47.338] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/gnps_11566051_prepared.tsv.gz, destination=data/interim/libraries/sop/gnps_11566051_prepared.tsv.gz]
#> [2026-05-27 08:55:47.628] [INFO ] [OK] Completed: download_file [size_bytes=493387] (290ms)
#> ✔ lib_spe_exp_gnp_pre_sop completed [292ms, 493.39 kB]
#> + lib_spe_is_nor_pre_sop dispatched
#> [2026-05-27 08:55:47.729] [INFO ] > Starting: download_file [url=https://github.com/Adafede/SpectRalLibRaRies/raw/main/data/interim/libraries/sop/isdbnormansusdat_14854025_prepared.tsv.gz, destination=data/interim/libraries/sop/isdbnormansusdat_14854025_prepared.tsv.gz]
#> [2026-05-27 08:55:47.980] [INFO ] [OK] Completed: download_file [size_bytes=1236540] (251ms)
#> ✔ lib_spe_is_nor_pre_sop completed [253ms, 1.24 MB]
#> + par_fin_par dispatched
#> ✔ par_fin_par completed [0ms, 341 B]
#> + par_fin_par2 dispatched
#> ✔ par_fin_par2 completed [2ms, 3.44 kB]
#> + lib_sop_hmd_fam_pre dispatched
#> [2026-05-27 08:55:48.274] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=CSFMETABOLOME, input=data/source/libraries/sop/csfmetabolome/structures.zip, tag=csf]
#> [2026-05-27 08:55:48.388] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/csfmetabolome_prepared.tsv.gz, n_rows=445]
#> [2026-05-27 08:55:48.392] [INFO ] [OK] Completed: export_output [size_bytes=19485] (4ms)
#> [2026-05-27 08:55:48.393] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=445] (118ms)
#> [2026-05-27 08:55:48.394] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=FECALMETABOLOME, input=data/source/libraries/sop/fecalmetabolome/structures.zip, tag=fecal]
#> [2026-05-27 08:55:50.185] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/fecalmetabolome_prepared.tsv.gz, n_rows=6810]
#> [2026-05-27 08:55:50.214] [INFO ] [OK] Completed: export_output [size_bytes=237060] (30ms)
#> [2026-05-27 08:55:50.216] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=6810] (1.8s)
#> [2026-05-27 08:55:50.217] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=SALIVAMETABOLOME, input=data/source/libraries/sop/salivametabolome/structures.zip, tag=saliva]
#> [2026-05-27 08:55:50.458] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/salivametabolome_prepared.tsv.gz, n_rows=1245]
#> [2026-05-27 08:55:50.465] [INFO ] [OK] Completed: export_output [size_bytes=47303] (7ms)
#> [2026-05-27 08:55:50.466] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=1245] (249ms)
#> [2026-05-27 08:55:50.467] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=SERUMMETABOLOME, input=data/source/libraries/sop/serummetabolome/structures.zip, tag=serum]
#> [2026-05-27 08:55:58.696] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/serummetabolome_prepared.tsv.gz, n_rows=25411]
#> [2026-05-27 08:55:58.764] [INFO ] [OK] Completed: export_output [size_bytes=812712] (67ms)
#> [2026-05-27 08:55:58.765] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=25411] (8.3s)
#> [2026-05-27 08:55:58.766] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=SWEATMETABOLOME, input=data/source/libraries/sop/sweatmetabolome/structures.zip, tag=sweat]
#> [2026-05-27 08:55:58.818] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/sweatmetabolome_prepared.tsv.gz, n_rows=89]
#> [2026-05-27 08:55:58.821] [INFO ] [OK] Completed: export_output [size_bytes=4110] (2ms)
#> [2026-05-27 08:55:58.822] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=89] (56ms)
#> [2026-05-27 08:55:58.823] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=URINEMETABOLOME, input=data/source/libraries/sop/urinemetabolome/structures.zip, tag=urine]
#> [2026-05-27 08:55:59.590] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/urinemetabolome_prepared.tsv.gz, n_rows=4364]
#> [2026-05-27 08:55:59.606] [INFO ] [OK] Completed: export_output [size_bytes=209222] (16ms)
#> [2026-05-27 08:55:59.607] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=4364] (784ms)
#> [2026-05-27 08:55:59.608] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=MCDB, input=data/source/libraries/sop/mcdb/structures.zip, tag=milk]
#> [2026-05-27 08:55:59.636] [WARN ] Empty dataframe in select_sop_columns
#> [2026-05-27 08:55:59.641] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/mcdb_prepared.tsv.gz, n_rows=0]
#> [2026-05-27 08:55:59.642] [INFO ] [OK] Completed: export_output [size_bytes=256] (1ms)
#> [2026-05-27 08:55:59.643] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=0] (35ms)
#> [2026-05-27 08:55:59.644] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=SMPDB, input=data/source/libraries/sop/smpdb/structures.zip, tag=pathway]
#> [2026-05-27 08:56:12.709] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/smpdb_prepared.tsv.gz, n_rows=49817]
#> [2026-05-27 08:56:12.823] [INFO ] [OK] Completed: export_output [size_bytes=1443937] (114ms)
#> [2026-05-27 08:56:12.824] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=49817] (13.2s)
#> [2026-05-27 08:56:12.825] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=MIMEDB, input=data/source/libraries/sop/mimedb/structures.zip, tag=microbiome]
#> [2026-05-27 08:56:12.854] [WARN ] Empty dataframe in select_sop_columns
#> [2026-05-27 08:56:12.859] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/mimedb_prepared.tsv.gz, n_rows=0]
#> [2026-05-27 08:56:12.861] [INFO ] [OK] Completed: export_output [size_bytes=256] (1ms)
#> [2026-05-27 08:56:12.862] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=0] (36ms)
#> [2026-05-27 08:56:12.863] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=T3DB, input=data/source/libraries/sop/t3db/structures.zip, tag=toxin]
#> [2026-05-27 08:56:12.891] [WARN ] Empty dataframe in select_sop_columns
#> [2026-05-27 08:56:12.896] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/t3db_prepared.tsv.gz, n_rows=0]
#> [2026-05-27 08:56:12.897] [INFO ] [OK] Completed: export_output [size_bytes=256] (1ms)
#> [2026-05-27 08:56:12.898] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=0] (35ms)
#> [2026-05-27 08:56:12.899] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=BOVINEDB, input=data/source/libraries/sop/bovinedb/structures.zip, tag=NA]
#> [2026-05-27 08:56:25.733] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/bovinedb_prepared.tsv.gz, n_rows=51684]
#> [2026-05-27 08:56:25.863] [INFO ] [OK] Completed: export_output [size_bytes=1568975] (130ms)
#> [2026-05-27 08:56:25.864] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=51684] (13s)
#> [2026-05-27 08:56:25.865] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=YMDB, input=data/source/libraries/sop/ymdb/structures.zip, tag=NA]
#> [2026-05-27 08:56:26.265] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/ymdb_prepared.tsv.gz, n_rows=2024]
#> [2026-05-27 08:56:26.277] [INFO ] [OK] Completed: export_output [size_bytes=83615] (11ms)
#> [2026-05-27 08:56:26.278] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=2024] (413ms)
#> [2026-05-27 08:56:26.279] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=CANNABISDATABASE, input=data/source/libraries/sop/cannabisdatabase/compounds.sdf, tag=NA]
#> [2026-05-27 08:56:26.306] [WARN ] Empty dataframe in select_sop_columns
#> [2026-05-27 08:56:26.312] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/cannabisdatabase_prepared.tsv.gz, n_rows=0]
#> [2026-05-27 08:56:26.313] [INFO ] [OK] Completed: export_output [size_bytes=256] (1ms)
#> [2026-05-27 08:56:26.314] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=0] (35ms)
#> ✔ lib_sop_hmd_fam_pre completed [38s, 4.43 MB]
#> + par_usr_pre_tax dispatched
#> ✔ par_usr_pre_tax completed [1.6s, 438 B]
#> + par_usr_pre_lib_sop_mer dispatched
#> ✔ par_usr_pre_lib_sop_mer completed [1.6s, 2.87 kB]
#> + par_usr_pre_lib_sop_clo dispatched
#> ✔ par_usr_pre_lib_sop_clo completed [1.6s, 267 B]
#> + par_usr_pre_ann_gnp dispatched
#> ✔ par_usr_pre_ann_gnp completed [1.6s, 633 B]
#> + par_usr_pre_lib_sop_big dispatched
#> ✔ par_usr_pre_lib_sop_big completed [1.6s, 107 B]
#> + par_usr_cre_edg_spe dispatched
#> ✔ par_usr_cre_edg_spe completed [1.6s, 475 B]
#> + par_usr_pre_fea_tab dispatched
#> ✔ par_usr_pre_fea_tab completed [1.6s, 274 B]
#> + par_usr_wei_ann dispatched
#> ✔ par_usr_wei_ann completed [1.6s, 1.80 kB]
#> + par_usr_pre_ann_spe dispatched
#> ✔ par_usr_pre_ann_spe completed [1.6s, 656 B]
#> + par_usr_pre_ann_mzm dispatched
#> ✔ par_usr_pre_ann_mzm completed [1.6s, 635 B]
#> + par_usr_fil_ann dispatched
#> ✔ par_usr_fil_ann completed [1.6s, 808 B]
#> + par_usr_pre_ann_mzt dispatched
#> ✔ par_usr_pre_ann_mzt completed [1.6s, 546 B]
#> + par_usr_pre_lib_sop_hmd dispatched
#> ✔ par_usr_pre_lib_sop_hmd completed [1.6s, 178 B]
#> + par_usr_ann_spe dispatched
#> ✔ par_usr_ann_spe completed [1.6s, 1.20 kB]
#> + par_usr_ann_mas dispatched
#> ✔ par_usr_ann_mas completed [1.6s, 2.83 kB]
#> + par_usr_pre_lib_spe dispatched
#> ✔ par_usr_pre_lib_spe completed [1.6s, 322 B]
#> + par_usr_pre_lib_sop_ecm dispatched
#> ✔ par_usr_pre_lib_sop_ecm completed [1.6s, 176 B]
#> + par_usr_pre_lib_rt dispatched
#> ✔ par_usr_pre_lib_rt completed [1.5s, 487 B]
#> + par_usr_exp_mzt dispatched
#> ✔ par_usr_exp_mzt completed [1.6s, 425 B]
#> + par_usr_cre_com dispatched
#> ✔ par_usr_cre_com completed [1.6s, 200 B]
#> + par_usr_pre_ann_sir dispatched
#> ✔ par_usr_pre_ann_sir completed [1.6s, 859 B]
#> + par_usr_pre_fea_com dispatched
#> ✔ par_usr_pre_fea_com completed [1.7s, 200 B]
#> + par_usr_pre_fea_edg dispatched
#> ✔ par_usr_pre_fea_edg completed [1.6s, 328 B]
#> + par_usr_pre_lib_sop_lot dispatched
#> ✔ par_usr_pre_lib_sop_lot completed [1.6s, 174 B]
#> + par_pre_tax dispatched
#> ✔ par_pre_tax completed [2ms, 330 B]
#> + par_pre_lib_sop_mer dispatched
#> ✔ par_pre_lib_sop_mer completed [2ms, 797 B]
#> + par_pre_lib_sop_clo dispatched
#> ✔ par_pre_lib_sop_clo completed [2ms, 233 B]
#> + par_pre_ann_gnp dispatched
#> ✔ par_pre_ann_gnp completed [2ms, 324 B]
#> + par_pre_lib_sop_big dispatched
#> ✔ par_pre_lib_sop_big completed [1ms, 153 B]
#> + par_cre_edg_spe dispatched
#> ✔ par_cre_edg_spe completed [2ms, 404 B]
#> + par_pre_fea_tab dispatched
#> ✔ par_pre_fea_tab completed [1ms, 278 B]
#> + par_wei_ann dispatched
#> ✔ par_wei_ann completed [3ms, 961 B]
#> + par_pre_ann_spe dispatched
#> ✔ par_pre_ann_spe completed [1ms, 322 B]
#> + par_pre_ann_mzm dispatched
#> ✔ par_pre_ann_mzm completed [2ms, 329 B]
#> + par_fil_ann dispatched
#> ✔ par_fil_ann completed [1ms, 373 B]
#> + par_pre_ann_mzt dispatched
#> ✔ par_pre_ann_mzt completed [1ms, 313 B]
#> + par_pre_lib_sop_hmd dispatched
#> ✔ par_pre_lib_sop_hmd completed [1ms, 192 B]
#> + par_ann_spe dispatched
#> ✔ par_ann_spe completed [2ms, 543 B]
#> + par_ann_mas dispatched
#> ✔ par_ann_mas completed [3ms, 1.21 kB]
#> + par_pre_lib_spe dispatched
#> ✔ par_pre_lib_spe completed [1ms, 407 B]
#> + par_pre_lib_sop_ecm dispatched
#> ✔ par_pre_lib_sop_ecm completed [1ms, 190 B]
#> + par_pre_lib_rt dispatched
#> ✔ par_pre_lib_rt completed [2ms, 375 B]
#> + par_exp_mzt dispatched
#> ✔ par_exp_mzt completed [1ms, 268 B]
#> + par_cre_com dispatched
#> ✔ par_cre_com completed [2ms, 191 B]
#> + par_pre_ann_sir dispatched
#> ✔ par_pre_ann_sir completed [1ms, 435 B]
#> + par_pre_fea_com dispatched
#> ✔ par_pre_fea_com completed [1ms, 183 B]
#> + par_pre_fea_edg dispatched
#> ✔ par_pre_fea_edg completed [1ms, 243 B]
#> + par_pre_lib_sop_lot dispatched
#> ✔ par_pre_lib_sop_lot completed [2ms, 185 B]
#> + lib_sop_mer_npc_cache dispatched
#> [2026-05-27 08:57:10.589] [INFO ] > Starting: download_file [url=https://github.com/Adafede/marimo/raw/refs/heads/main/apps/public/npclassifier/npclassifier_cache.csv, destination=data/interim/libraries/sop/merged/structures/taxonomies/npc.tsv.gz]
#> Downloading  30% ■■■■■■■■■■                        2s
#> Downloading  60% ■■■■■■■■■■■■■■■■■■■               1s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:57:14.445] [INFO ] [OK] Completed: download_file [size_bytes=201875293] (3.9s)
#> ✔ lib_sop_mer_npc_cache completed [3.9s, 201.88 MB]
#> + lib_sop_mer_cla_cache dispatched
#> [2026-05-27 08:57:14.636] [INFO ] > Starting: download_file [url=https://github.com/Adafede/marimo/raw/refs/heads/main/apps/public/classyfire/classyfire_cache.csv, destination=data/interim/libraries/sop/merged/structures/taxonomies/classyfire_cache.csv]
#> Downloading  95% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■     0s
#> Downloading 100% ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■   0s
#> [2026-05-27 08:57:17.092] [INFO ] [OK] Completed: download_file [size_bytes=143087266] (2.5s)
#> ✔ lib_sop_mer_cla_cache completed [2.5s, 143.09 MB]
#> + lib_sop_mer_str_pro dispatched
#> [2026-05-27 08:57:17.269] [INFO ] > Starting: download_file [url=https://github.com/taxonomicallyinformedannotation/tima-example-files/raw/main/processed.csv.gz, destination=data/interim/libraries/sop/merged/structures/processed.csv.gz]
#> [2026-05-27 08:57:17.956] [INFO ] [OK] Completed: download_file [size_bytes=68730181] (686ms)
#> ✔ lib_sop_mer_str_pro completed [689ms, 68.73 MB]
#> + lib_sop_clo_pre dispatched
#> [2026-05-27 08:57:18.096] [INFO ] Preparing closed structure-organism pairs library
#> [2026-05-27 08:57:18.098] [WARN ] Closed resource not accessible at: ~/Git/lotus-processor/data/processed/240412_closed_metadata.csv.gz. Returning empty template instead.
#> [2026-05-27 08:57:18.115] [INFO ] Exporting parameters to: data/interim/params/260527_085718_prepare_libraries_sop_closed.yaml
#> [2026-05-27 08:57:18.117] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/closed_prepared.tsv.gz, n_rows=1]
#> [2026-05-27 08:57:18.118] [INFO ] [OK] Completed: export_output [size_bytes=277] (1ms)
#> ✔ lib_sop_clo_pre completed [24ms, 277 B]
#> + lib_sop_big_pre dispatched
#> [2026-05-27 08:57:18.233] [INFO ] Preparing BiGG structure-organism pairs
#> [2026-05-27 08:57:56.260] [INFO ] > Starting: process_smiles [n_structures=1420]
#> [2026-05-27 08:57:56.261] [INFO ] Processing SMILES with RDKit
#> Downloading uv...Done!
#> Downloading cpython-3.12.13-linux-x86_64-gnu (download) (32.5MiB)
#>  Downloaded cpython-3.12.13-linux-x86_64-gnu (download)
#> Downloading pillow (6.8MiB)
#> Downloading numpy (15.9MiB)
#> Downloading rdkit (35.4MiB)
#>  Downloaded pillow
#>  Downloaded numpy
#>  Downloaded rdkit
#> Installed 5 packages in 29ms
#> [2026-05-27 08:58:01.088] [INFO ] Processing 1419 new SMILES with RDKit
#> [2026-05-27 08:58:01.090] [INFO ] Starting SMILES processing pipeline
#> [2026-05-27 08:58:01.090] [INFO ] Input: /tmp/RtmptH52Ax/file267e65a50b12.smi
#> [2026-05-27 08:58:01.090] [INFO ] Output: /tmp/RtmptH52Ax/file267e771ab6a9.csv.gz
#> [2026-05-27 08:58:01.090] [INFO ] Input file validated: /tmp/RtmptH52Ax/file267e65a50b12.smi
#> [2026-05-27 08:58:01.090] [INFO ] Output file validated: /tmp/RtmptH52Ax/file267e771ab6a9.csv.gz
#> [2026-05-27 08:58:01.090] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-05-27 08:58:01.090] [INFO ] SMILES supplier initialized
#> [2026-05-27 08:58:03.099] [INFO ] Processing complete. Total molecules processed: 1419
#> [2026-05-27 08:58:03.141] [INFO ] Successfully processed 1419 SMILES
#> [2026-05-27 08:58:03.152] [INFO ] [OK] Completed: process_smiles [n_processed=1419] (6.9s)
#> [2026-05-27 08:58:31.402] [INFO ] > Starting: process_smiles [n_structures=2085]
#> [2026-05-27 08:58:31.403] [INFO ] Processing SMILES with RDKit
#> [2026-05-27 08:58:31.414] [INFO ] Processing 1242 new SMILES with RDKit
#> [2026-05-27 08:58:31.416] [INFO ] Starting SMILES processing pipeline
#> [2026-05-27 08:58:31.416] [INFO ] Input: /tmp/RtmptH52Ax/file267e1d0ed398.smi
#> [2026-05-27 08:58:31.416] [INFO ] Output: /tmp/RtmptH52Ax/file267e48e3a4cd.csv.gz
#> [2026-05-27 08:58:31.416] [INFO ] Input file validated: /tmp/RtmptH52Ax/file267e1d0ed398.smi
#> [2026-05-27 08:58:31.416] [INFO ] Output file validated: /tmp/RtmptH52Ax/file267e48e3a4cd.csv.gz
#> [2026-05-27 08:58:31.416] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-05-27 08:58:31.416] [INFO ] SMILES supplier initialized
#> [2026-05-27 08:58:33.236] [INFO ] Processing complete. Total molecules processed: 1242
#> [2026-05-27 08:58:33.271] [INFO ] Successfully processed 1242 SMILES
#> [2026-05-27 08:58:33.281] [INFO ] [OK] Completed: process_smiles [n_processed=1242] (1.9s)
#> [2026-05-27 08:58:33.402] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/bigg_prepared.tsv.gz, n_rows=2355]
#> [2026-05-27 08:58:33.415] [INFO ] [OK] Completed: export_output [size_bytes=81924] (13ms)
#> ✔ lib_sop_big_pre completed [1m 15.2s, 81.92 kB]
#> + input_features dispatched
#> ✔ input_features completed [0ms, 451.55 kB]
#> + lib_sop_hmd_pre dispatched
#> [2026-05-27 08:58:33.923] [INFO ] > Starting: prepare_libraries_sop_hmdb_like [source=HMDB, input=data/source/libraries/sop/hmdb/structures.zip, tag=NA]
#> [2026-05-27 08:58:33.954] [WARN ] Empty dataframe in select_sop_columns
#> [2026-05-27 08:58:33.959] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/hmdb_prepared.tsv.gz, n_rows=0]
#> [2026-05-27 08:58:33.961] [INFO ] [OK] Completed: export_output [size_bytes=256] (2ms)
#> [2026-05-27 08:58:33.962] [INFO ] [OK] Completed: prepare_libraries_sop_hmdb_like [n_pairs=0] (39ms)
#> ✔ lib_sop_hmd_pre completed [41ms, 256 B]
#> + input_spectra dispatched
#> ✔ input_spectra completed [0ms, 7.77 MB]
#> + lib_spe_exp_int_pre dispatched
#> [2026-05-27 08:58:34.468] [INFO ] > Starting: prepare_libraries_spectra [library_name=internal, n_input_files=1]
#> [2026-05-27 08:58:34.474] [WARN ] Input file(s) not found; creating empty library template
#> [2026-05-27 08:58:36.282] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/internal_prepared.tsv.gz, n_rows=1]
#> [2026-05-27 08:58:36.284] [INFO ] [OK] Completed: export_output [size_bytes=79] (2ms)
#> [2026-05-27 08:58:36.358] [INFO ] Exporting parameters to: data/interim/params/260527_085836_prepare_libraries_spectra.yaml
#> [2026-05-27 08:58:36.360] [INFO ] [OK] Completed: prepare_libraries_spectra [n_structures=1, n_spectra_total=2, files_exported=3] (1.9s)
#> ✔ lib_spe_exp_int_pre completed [1.9s, 1.28 kB]
#> + lib_sop_ecm_pre dispatched
#> [2026-05-27 08:58:36.762] [INFO ] Preparing ECMDB structure-organism pairs
#> [2026-05-27 08:58:37.424] [INFO ] Exporting parameters to: data/interim/params/260527_085837_prepare_libraries_sop_ecmdb.yaml
#> [2026-05-27 08:58:37.426] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/ecmdb_prepared.tsv.gz, n_rows=3760]
#> [2026-05-27 08:58:37.441] [INFO ] [OK] Completed: export_output [size_bytes=165776] (15ms)
#> ✔ lib_sop_ecm_pre completed [682ms, 165.78 kB]
#> + lib_rt dispatched
#> [2026-05-27 08:58:37.829] [INFO ] Preparing retention time libraries
#> [2026-05-27 08:58:37.842] [WARN ] No retention time library found, returning empty retention time and sop tables.
#> [2026-05-27 08:58:37.888] [INFO ] Exporting parameters to: data/interim/params/260527_085837_prepare_libraries_rt.yaml
#> [2026-05-27 08:58:37.890] [INFO ] > Starting: export_output [file=data/interim/libraries/rt/prepared.tsv.gz, n_rows=1]
#> [2026-05-27 08:58:37.891] [INFO ] [OK] Completed: export_output [size_bytes=86] (1ms)
#> [2026-05-27 08:58:37.895] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/rt_prepared.tsv.gz, n_rows=1]
#> [2026-05-27 08:58:37.896] [INFO ] [OK] Completed: export_output [size_bytes=105] (1ms)
#> ✔ lib_rt completed [70ms, 191 B]
#> + lib_sop_lot_pre dispatched
#> [2026-05-27 08:58:38.264] [INFO ] > Starting: prepare_libraries_sop_lotus [input=data/source/libraries/sop/lotus.csv.gz]
#> [2026-05-27 08:58:47.589] [INFO ] [OK] Completed: prepare_libraries_sop_lotus [n_pairs=677545] (9.3s)
#> [2026-05-27 08:58:47.591] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/lotus_prepared.tsv.gz, n_rows=677545]
#> [2026-05-27 08:58:51.144] [INFO ] [OK] Completed: export_output [size_bytes=49541873] (3.6s)
#> ✔ lib_sop_lot_pre completed [12.9s, 49.54 MB]
#> + fea_pre dispatched
#> [2026-05-27 08:58:51.720] [INFO ] > Starting: prepare_features_tables [input=data/source/example_features.csv, candidates=1]
#> [2026-05-27 08:58:51.841] [INFO ] Prepared 5328 feature-sample pairs
#> [2026-05-27 08:58:51.843] [INFO ] [OK] Completed: prepare_features_tables [n_features=5328] (123ms)
#> [2026-05-27 08:58:51.868] [INFO ] Exporting parameters to: data/interim/params/260527_085851_prepare_features_tables.yaml
#> [2026-05-27 08:58:51.870] [INFO ] > Starting: export_output [file=data/interim/features/example_features.tsv.gz, n_rows=5328]
#> [2026-05-27 08:58:51.884] [INFO ] [OK] Completed: export_output [size_bytes=95629] (14ms)
#> ✔ fea_pre completed [167ms, 95.63 kB]
#> + fea_edg_spe dispatched
#> [2026-05-27 08:58:52.301] [INFO ] > Starting: create_edges_spectra [method=gnps, threshold=0.7, n_input_files=1]
#> [2026-05-27 08:58:52.303] [INFO ] Creating spectral similarity network edges
#> [2026-05-27 08:58:52.304] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-05-27 08:58:52.332] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-05-27 08:58:54.343] [INFO ] Processed 10000 spectra...
#> [2026-05-27 08:58:55.641] [INFO ] Total spectra read: 16282
#> [2026-05-27 08:59:01.775] [INFO ] Loaded 16282 spectra from file
#> [2026-05-27 08:59:01.805] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-05-27 08:59:04.572] [INFO ] Combined replicates: 12195 -> 4087 spectra
#> [2026-05-27 08:59:04.607] [INFO ] Sanitizing 4087 spectra (cutoff: 0)
#> [2026-05-27 08:59:05.712] [INFO ] Sanitization complete: 3999/4087 spectra retained (97.8%, 88 removed)
#> [2026-05-27 08:59:05.714] [INFO ] Import complete: 3999 spectra ready for analysis
#> [2026-05-27 08:59:05.715] [INFO ] ======================================
#> [2026-05-27 08:59:05.716] [INFO ] Take yourself a break, you deserve it.
#> [2026-05-27 08:59:05.716] [INFO ] ======================================
#> [2026-05-27 08:59:05.718] [INFO ] > Starting: create_edges [n_spectra=3999, method=gnps, threshold=0.7, min_peaks=6]
#> [2026-05-27 08:59:20.477] [INFO ] Processed 500 / 3998 queries
#> [2026-05-27 08:59:33.156] [INFO ] Processed 1000 / 3998 queries
#> [2026-05-27 08:59:43.875] [INFO ] Processed 1500 / 3998 queries
#> [2026-05-27 08:59:52.563] [INFO ] Processed 2000 / 3998 queries
#> [2026-05-27 08:59:59.254] [INFO ] Processed 2500 / 3998 queries
#> [2026-05-27 09:00:03.951] [INFO ] Processed 3000 / 3998 queries
#> [2026-05-27 09:00:06.710] [INFO ] Processed 3500 / 3998 queries
#> [2026-05-27 09:00:07.645] [INFO ] Here is the distribution of edge similarity scores (0.1 bins) BEFORE filtering:
#> [2026-05-27 09:00:07.647] [INFO ] 
#>        bin       N    Pct
#>    [0,0.1] 5759390 72.05%
#>  (0.1,0.2] 1201848 15.03%
#>  (0.2,0.3]  509850  6.38%
#>  (0.3,0.4]  239674  3.00%
#>  (0.4,0.5]  126023  1.58%
#>  (0.5,0.6]   68810  0.86%
#>  (0.6,0.7]   39955  0.50%
#>  (0.7,0.8]   23824  0.30%
#>  (0.8,0.9]   10727  0.13%
#>    (0.9,1]   13900  0.17%
#> [2026-05-27 09:00:07.651] [INFO ] [OK] Completed: create_edges [n_edges=7265, n_comparisons=7994001, pass_rate=0.1%] (1m 2s)
#> [2026-05-27 09:00:07.732] [INFO ] Exporting parameters to: data/interim/params/260527_090007_create_edges_spectra.yaml
#> [2026-05-27 09:00:07.734] [INFO ] > Starting: export_output [file=data/interim/features/example_edgesSpectra.tsv, n_rows=9905]
#> [2026-05-27 09:00:07.738] [INFO ] [OK] Completed: export_output [size_bytes=454521] (3ms)
#> [2026-05-27 09:00:07.739] [INFO ] [OK] Completed: create_edges_spectra [n_edges=9905] (1m 15s)
#> ✔ fea_edg_spe completed [1m 15.4s, 454.52 kB]
#> + lib_spe_exp_int_pre_pos dispatched
#> ✔ lib_spe_exp_int_pre_pos completed [0ms, 600 B]
#> + lib_spe_exp_int_pre_neg dispatched
#> ✔ lib_spe_exp_int_pre_neg completed [0ms, 600 B]
#> + lib_spe_exp_int_pre_sop dispatched
#> ✔ lib_spe_exp_int_pre_sop completed [1ms, 79 B]
#> + lib_rt_rts dispatched
#> ✔ lib_rt_rts completed [0ms, 86 B]
#> + lib_rt_sop dispatched
#> ✔ lib_rt_sop completed [0ms, 105 B]
#> + ann_spe_pos dispatched
#> [2026-05-27 09:00:10.248] [INFO ] ============================================================
#> [2026-05-27 09:00:10.250] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-05-27 09:00:10.250] [INFO ] ============================================================
#> [2026-05-27 09:00:10.251] [INFO ] Checking MGF file...
#> [2026-05-27 09:00:10.739] [INFO ] [OK] MGF file: 12195 MS2 spectra found
#> [2026-05-27 09:00:10.741] [INFO ] ============================================================
#> [2026-05-27 09:00:10.742] [INFO ] [OK] All pre-flight checks passed!
#> [2026-05-27 09:00:10.742] [INFO ] Data validation complete. Ready to proceed.
#> [2026-05-27 09:00:10.743] [INFO ] ============================================================
#> [2026-05-27 09:00:10.744] [INFO ] Starting spectral annotation in pos mode
#> [2026-05-27 09:00:10.745] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-05-27 09:00:10.746] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-05-27 09:00:12.699] [INFO ] Processed 10000 spectra...
#> [2026-05-27 09:00:14.035] [INFO ] Total spectra read: 16282
#> [2026-05-27 09:00:19.969] [INFO ] Loaded 16282 spectra from file
#> [2026-05-27 09:00:19.996] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-05-27 09:00:21.115] [INFO ] Combined replicates: 12195 -> 4087 spectra
#> [2026-05-27 09:00:21.156] [INFO ] Sanitizing 4087 spectra (cutoff: 0)
#> [2026-05-27 09:00:22.381] [INFO ] Sanitization complete: 3999/4087 spectra retained (97.8%, 88 removed)
#> [2026-05-27 09:00:22.383] [INFO ] Import complete: 3999 spectra ready for analysis
#> [2026-05-27 09:00:22.384] [INFO ] Importing spectra from: data/interim/libraries/spectra/is/isdbnormansusdat_14854025_pos.rds
#> [2026-05-27 09:00:24.055] [INFO ] Loaded 210419 spectra from file
#> [2026-05-27 09:00:24.211] [INFO ] Import complete: 210419 spectra ready for analysis
#> [2026-05-27 09:00:24.213] [INFO ] Importing spectra from: data/interim/libraries/spectra/is/wikidata_5607185_pos.rds
#> [2026-05-27 09:00:52.358] [INFO ] Loaded 994408 spectra from file
#> [2026-05-27 09:00:53.124] [INFO ] Import complete: 994408 spectra ready for analysis
#> [2026-05-27 09:00:53.126] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/internal_pos.rds
#> [2026-05-27 09:00:53.127] [INFO ] Loaded 1 spectra from file
#> [2026-05-27 09:00:53.130] [INFO ] Import complete: 0 spectra ready for analysis
#> [2026-05-27 09:00:53.132] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/gnps_11566051_pos.rds
#> [2026-05-27 09:00:59.567] [INFO ] Loaded 272264 spectra from file
#> [2026-05-27 09:00:59.745] [INFO ] Import complete: 272263 spectra ready for analysis
#> [2026-05-27 09:00:59.747] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/massbank_202510_pos.rds
#> [2026-05-27 09:01:00.369] [INFO ] Loaded 62855 spectra from file
#> [2026-05-27 09:01:00.415] [INFO ] Import complete: 62855 spectra ready for analysis
#> [2026-05-27 09:01:00.416] [INFO ] Importing spectra from: data/interim/libraries/spectra/exp/merlin_16984129_pos.rds
#> [2026-05-27 09:01:07.217] [INFO ] Loaded 328190 spectra from file
#> [2026-05-27 09:01:07.450] [INFO ] Import complete: 328190 spectra ready for analysis
#> [2026-05-27 09:01:17.431] [INFO ] 
#>              library spectra unique_structures Pct_spectra
#>      ISDB - Wikidata  994408            994393      53.23%
#>               merlin  328190             42486      17.57%
#>                 gnps  272263             22882      14.57%
#>  ISDB - NormanSusDat  210419             87502      11.26%
#>             massbank   62855              7140       3.36%
#> [2026-05-27 09:01:19.219] [INFO ] > Starting: calculate_entropy_similarity [n_library=617491, n_query=3999, method=gnps]
#> [2026-05-27 09:01:19.220] [INFO ] Calculating entropy and similarity for 3999 spectra
#> [2026-05-27 09:01:29.710] [INFO ] Processed 500 / 3999 queries
#> [2026-05-27 09:01:36.286] [INFO ] Processed 1000 / 3999 queries
#> [2026-05-27 09:01:42.056] [INFO ] Processed 1500 / 3999 queries
#> [2026-05-27 09:01:47.302] [INFO ] Processed 2000 / 3999 queries
#> [2026-05-27 09:01:52.306] [INFO ] Processed 2500 / 3999 queries
#> [2026-05-27 09:01:56.555] [INFO ] Processed 3000 / 3999 queries
#> [2026-05-27 09:02:01.389] [INFO ] Processed 3500 / 3999 queries
#> [2026-05-27 09:02:05.032] [INFO ] Processed 3999 / 3999 queries
#> [2026-05-27 09:02:05.073] [INFO ] [OK] Completed: calculate_entropy_similarity [n_comparisons=1213608] (45.9s)
#> [2026-05-27 09:02:05.078] [INFO ] > Starting: harmonize_adducts [n_rows=617491]
#> [2026-05-27 09:02:05.231] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=70, n_unique_after=66] (153ms)
#> [2026-05-27 09:02:05.907] [INFO ] Here is the distribution of annotation similarity scores (0.1 bins):
#> [2026-05-27 09:02:05.909] [INFO ] 
#>        bin      N    Pct
#>    [0,0.1] 585700 83.68%
#>  (0.1,0.2]  64784  9.26%
#>  (0.2,0.3]  25877  3.70%
#>  (0.3,0.4]  12055  1.72%
#>  (0.4,0.5]   5991  0.86%
#>  (0.5,0.6]   2803  0.40%
#>  (0.6,0.7]   1178  0.17%
#>  (0.7,0.8]    744  0.11%
#>  (0.8,0.9]    545  0.08%
#>    (0.9,1]    274  0.04%
#> [2026-05-27 09:02:05.960] [INFO ] 327617 Candidates annotated on 3827 features (threshold >= 0).
#> [2026-05-27 09:02:05.964] [INFO ] Exporting parameters to: data/interim/params/260527_090205_annotate_spectra.yaml
#> [2026-05-27 09:02:05.966] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatches_pos.tsv.gz, n_rows=699951]
#> [2026-05-27 09:02:08.483] [INFO ] [OK] Completed: export_output [size_bytes=41615122] (2.5s)
#> ✔ ann_spe_pos completed [1m 58.3s, 41.62 MB]
#> + ann_spe_neg dispatched
#> [2026-05-27 09:02:10.260] [INFO ] ============================================================
#> [2026-05-27 09:02:10.262] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-05-27 09:02:10.263] [INFO ] ============================================================
#> [2026-05-27 09:02:10.264] [INFO ] Checking MGF file...
#> [2026-05-27 09:02:10.741] [INFO ] [OK] MGF file: 12195 MS2 spectra found
#> [2026-05-27 09:02:10.742] [INFO ] ============================================================
#> [2026-05-27 09:02:10.743] [INFO ] [OK] All pre-flight checks passed!
#> [2026-05-27 09:02:10.744] [INFO ] Data validation complete. Ready to proceed.
#> [2026-05-27 09:02:10.744] [INFO ] ============================================================
#> [2026-05-27 09:02:10.745] [INFO ] Starting spectral annotation in neg mode
#> [2026-05-27 09:02:10.746] [INFO ] Importing spectra from: data/source/example_spectra.mgf
#> [2026-05-27 09:02:10.747] [INFO ] Reading MGF file (7.41 MB) with optimized parser: data/source/example_spectra.mgf
#> [2026-05-27 09:02:12.613] [INFO ] Processed 10000 spectra...
#> [2026-05-27 09:02:14.020] [INFO ] Total spectra read: 16282
#> [2026-05-27 09:02:19.626] [INFO ] Loaded 16282 spectra from file
#> [2026-05-27 09:02:19.646] [INFO ] Combining replicate spectra by FEATURE_ID
#> [2026-05-27 09:02:19.650] [INFO ] Combined replicates: 0 -> 0 spectra
#> [2026-05-27 09:02:19.688] [WARN ] No spectra to sanitize
#> [2026-05-27 09:02:19.689] [INFO ] Import complete: 0 spectra ready for analysis
#> [2026-05-27 09:02:19.690] [WARN ] No query spectra loaded
#> [2026-05-27 09:02:19.694] [INFO ] Exporting parameters to: data/interim/params/260527_090219_annotate_spectra.yaml
#> [2026-05-27 09:02:19.695] [WARN ] Returning empty annotation template
#> [2026-05-27 09:02:19.698] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatches_neg.tsv.gz, n_rows=1]
#> [2026-05-27 09:02:19.700] [INFO ] [OK] Completed: export_output [size_bytes=254] (2ms)
#> ✔ ann_spe_neg completed [9.8s, 254 B]
#> + lib_sop_mer dispatched
#> [2026-05-27 09:02:20.539] [INFO ] > Starting: prepare_libraries_sop_merged [n_libraries=25, filter_enabled=FALSE, filter_level=none]
#> [2026-05-27 09:02:25.457] [INFO ] Splitting SOP library into standardized components
#> [2026-05-27 09:02:26.357] [INFO ] > Starting: process_smiles [n_structures=1454371]
#> [2026-05-27 09:02:26.358] [INFO ] Processing SMILES with RDKit
#> [2026-05-27 09:02:35.981] [INFO ] Processing 13 new SMILES with RDKit
#> [2026-05-27 09:02:35.983] [INFO ] Starting SMILES processing pipeline
#> [2026-05-27 09:02:35.983] [INFO ] Input: /tmp/RtmptH52Ax/file267e78710947.smi
#> [2026-05-27 09:02:35.983] [INFO ] Output: /tmp/RtmptH52Ax/file267e4557af24.csv.gz
#> [2026-05-27 09:02:35.983] [INFO ] Input file validated: /tmp/RtmptH52Ax/file267e78710947.smi
#> [2026-05-27 09:02:35.984] [INFO ] Output file validated: /tmp/RtmptH52Ax/file267e4557af24.csv.gz
#> [2026-05-27 09:02:35.984] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-05-27 09:02:35.984] [INFO ] SMILES supplier initialized
#> [09:02:35] Explicit valence for atom # 1 N, 3, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 5
#> [09:02:35] ERROR: Explicit valence for atom # 1 N, 3, is greater than permitted
#> [09:02:35] Explicit valence for atom # 6 C, 5, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 8
#> [09:02:35] ERROR: Explicit valence for atom # 6 C, 5, is greater than permitted
#> [09:02:35] Explicit valence for atom # 31 O, 3, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 9
#> [09:02:35] ERROR: Explicit valence for atom # 31 O, 3, is greater than permitted
#> [09:02:35] Explicit valence for atom # 4 N, 4, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 10
#> [09:02:35] ERROR: Explicit valence for atom # 4 N, 4, is greater than permitted
#> [09:02:35] Explicit valence for atom # 26 N, 4, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 11
#> [09:02:35] ERROR: Explicit valence for atom # 26 N, 4, is greater than permitted
#> [09:02:35] Explicit valence for atom # 0 P, 11, is greater than permitted
#> [09:02:35] ERROR: Could not sanitize molecule on line 12
#> [09:02:35] ERROR: Explicit valence for atom # 0 P, 11, is greater than permitted
#> [09:02:35] Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10 11 12 13 14
#> [09:02:35] ERROR: Could not sanitize molecule on line 13
#> [09:02:35] ERROR: Can't kekulize mol.  Unkekulized atoms: 6 7 8 9 10 11 12 13 14
#> [09:02:35] Explicit valence for atom # 56 P, 7, is greater than permitted
#> [2026-05-27 09:02:35.995] [WARNING] Failed to process SMILES 'CC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCCC(C)=CCO[P-]([O])(=O)=O': Explicit valence for atom # 56 P, 7, is greater than permitted
#> [09:02:35] Explicit valence for atom # 4 P, 7, is greater than permitted
#> [2026-05-27 09:02:35.996] [WARNING] Failed to process SMILES '[H][C@](O)(CO[P-]([O])(=O)=O)C=O': Explicit valence for atom # 4 P, 7, is greater than permitted
#> [2026-05-27 09:02:35.996] [WARNING] Batch processing: 2/6 molecules failed
#> [2026-05-27 09:02:35.996] [INFO ] Processing complete. Total molecules processed: 4
#> [2026-05-27 09:02:36.026] [INFO ] Successfully processed 4 SMILES
#> [2026-05-27 09:02:43.594] [INFO ] [OK] Completed: process_smiles [n_processed=1393401] (17.2s)
#> [2026-05-27 09:03:04.870] [INFO ] Referenced structure-organism pairs (762,348)
#> [2026-05-27 09:03:10.832] [INFO ] Structures: 305,187 stereoisomers, 1,019,524 without stereochemistry, 1,099,018 constitutional isomers
#> [2026-05-27 09:03:28.918] [INFO ] Unique organisms (37,468)
#> [2026-05-27 09:03:29.040] [INFO ] Processing 813 organism name(s) for OTT taxonomy lookup
#> [2026-05-27 09:03:29.508] [INFO ] Querying OTT API in 9 batches
#> [2026-05-27 09:03:34.213] [INFO ] Retrieving detailed taxonomy for 4 unique OTT IDs
#> [2026-05-27 09:03:35.000] [INFO ] Got OTTaxonomy!
#> [2026-05-27 09:03:35.326] [INFO ] Enriching NPClassifier taxonomy from additional cache: data/interim/libraries/sop/merged/structures/taxonomies/npc.tsv.gz
#> [2026-05-27 09:03:44.833] [INFO ] Enriched NPClassifier taxonomy with 1104178 entries from additional cache (1104178 missing keys matched)
#> [2026-05-27 09:03:51.037] [INFO ] Updated additional NPClassifier cache (1783925 total entries): data/interim/libraries/sop/merged/structures/taxonomies/npc.tsv.gz
#> [2026-05-27 09:03:51.098] [INFO ] Enriching ClassyFire taxonomy from additional cache: data/interim/libraries/sop/merged/structures/taxonomies/classyfire_cache.csv
#> [2026-05-27 09:03:55.502] [INFO ] Enriched ClassyFire taxonomy with 63481 entries from additional cache (63481 missing keys matched)
#> [2026-05-27 09:03:57.278] [INFO ] Updated additional ClassyFire cache (1106056 total entries): data/interim/libraries/sop/merged/structures/taxonomies/classyfire_cache.csv
#> [2026-05-27 09:03:57.304] [INFO ] Exporting parameters to: data/interim/params/260527_090357_prepare_libraries_sop_merged.yaml
#> [2026-05-27 09:03:57.306] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/keys.tsv.gz, n_rows=762348]
#> [2026-05-27 09:03:58.559] [INFO ] [OK] Completed: export_output [size_bytes=19171503] (1.3s)
#> [2026-05-27 09:03:58.561] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/organisms/taxonomies/ott.tsv.gz, n_rows=36757]
#> [2026-05-27 09:03:58.655] [INFO ] [OK] Completed: export_output [size_bytes=1013366] (93ms)
#> [2026-05-27 09:03:58.657] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/canonical.tsv.gz, n_rows=1392738]
#> [2026-05-27 09:04:01.853] [INFO ] [OK] Completed: export_output [size_bytes=25390801] (3.2s)
#> [2026-05-27 09:04:01.855] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/stereo.tsv.gz, n_rows=1324711]
#> [2026-05-27 09:04:06.580] [INFO ] [OK] Completed: export_output [size_bytes=60303278] (4.7s)
#> [2026-05-27 09:04:06.582] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/metadata.tsv.gz, n_rows=1099432]
#> [2026-05-27 09:04:07.769] [INFO ] [OK] Completed: export_output [size_bytes=21867089] (1.2s)
#> [2026-05-27 09:04:07.771] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/taxonomies/classyfire.tsv.gz, n_rows=287664]
#> [2026-05-27 09:04:08.046] [INFO ] [OK] Completed: export_output [size_bytes=5750603] (274ms)
#> [2026-05-27 09:04:08.048] [INFO ] > Starting: export_output [file=data/interim/libraries/sop/merged/structures/taxonomies/npc.tsv.gz, n_rows=1324796]
#> [2026-05-27 09:04:10.284] [INFO ] [OK] Completed: export_output [size_bytes=19160105] (2.2s)
#> [2026-05-27 09:04:10.286] [INFO ] [OK] Completed: prepare_libraries_sop_merged [n_pairs=762348, n_structures=1324711, n_organisms=36757, files_exported=7] (1m 50s)
#> ✔ lib_sop_mer completed [1m 49.8s, 152.66 MB]
#> + lib_mer_str_met dispatched
#> ✔ lib_mer_str_met completed [0ms, 21.87 MB]
#> + lib_mer_str_stereo dispatched
#> ✔ lib_mer_str_stereo completed [0ms, 60.30 MB]
#> + lib_mer_str_tax_cla dispatched
#> ✔ lib_mer_str_tax_cla completed [0ms, 5.75 MB]
#> + lib_mer_str_tax_npc dispatched
#> ✔ lib_mer_str_tax_npc completed [0ms, 19.16 MB]
#> + lib_mer_org_tax_ott dispatched
#> ✔ lib_mer_org_tax_ott completed [1ms, 1.01 MB]
#> + lib_mer_key dispatched
#> ✔ lib_mer_key completed [0ms, 19.17 MB]
#> + ann_spe_exp_mzt_pre dispatched
#> [2026-05-27 09:04:14.521] [WARN ] No mzTab input provided for prepare_annotations_mztab, exporting empty annotations
#> [2026-05-27 09:04:14.523] [INFO ] > Starting: export_output [file=data/interim/annotations/example_mztabPrepared.tsv.gz, n_rows=1]
#> [2026-05-27 09:04:14.525] [INFO ] [OK] Completed: export_output [size_bytes=254] (2ms)
#> ✔ ann_spe_exp_mzt_pre completed [7ms, 254 B]
#> + ann_spe_pre dispatched
#> [2026-05-27 09:04:14.894] [INFO ] Preparing spectral matching annotations from 2 file(s)
#> [2026-05-27 09:04:21.980] [INFO ] > Starting: process_smiles [n_structures=327617]
#> [2026-05-27 09:04:21.981] [INFO ] Processing SMILES with RDKit
#> [2026-05-27 09:04:32.815] [INFO ] All SMILES already in cache, no processing needed
#> [2026-05-27 09:04:46.397] [INFO ] > Starting: complement_metadata [n_input=699951]
#> [2026-05-27 09:05:24.393] [INFO ] [OK] Completed: complement_metadata [n_enriched=699951] (38s)
#> [2026-05-27 09:05:24.411] [INFO ] Exporting parameters to: data/interim/params/260527_090524_prepare_annotations_spectra.yaml
#> [2026-05-27 09:05:24.413] [INFO ] > Starting: export_output [file=data/interim/annotations/example_spectralMatchesPrepared.tsv.gz, n_rows=699951]
#> [2026-05-27 09:05:28.195] [INFO ] [OK] Completed: export_output [size_bytes=76563821] (3.8s)
#> ✔ ann_spe_pre completed [1m 13.3s, 76.56 MB]
#> + ann_spe_exp_mzm_pre dispatched
#> [2026-05-27 09:05:29.172] [INFO ] > Starting: prepare_annotations_mzmine [n_files=1]
#> [2026-05-27 09:05:29.173] [WARN ] No mzmine annotations found, returning an empty file instead
#> [2026-05-27 09:05:29.175] [INFO ] [OK] Completed: prepare_annotations_mzmine [n_annotations=1] (4ms)
#> [2026-05-27 09:05:29.190] [INFO ] Exporting parameters to: data/interim/params/260527_090529_prepare_annotations_mzmine.yaml
#> [2026-05-27 09:05:29.192] [INFO ] > Starting: export_output [file=data/interim/annotations/example_mzminePrepared.tsv.gz, n_rows=1]
#> [2026-05-27 09:05:29.193] [INFO ] [OK] Completed: export_output [size_bytes=254] (1ms)
#> ✔ ann_spe_exp_mzm_pre completed [27ms, 254 B]
#> + ann_spe_exp_gnp_pre dispatched
#> [2026-05-27 09:05:29.617] [INFO ] > Starting: prepare_annotations_gnps [n_files=1]
#> [2026-05-27 09:05:29.618] [WARN ] No GNPS annotations found, returning an empty file instead
#> [2026-05-27 09:05:29.621] [INFO ] [OK] Completed: prepare_annotations_gnps [n_annotations=1] (4ms)
#> [2026-05-27 09:05:29.640] [INFO ] Exporting parameters to: data/interim/params/260527_090529_prepare_annotations_gnps.yaml
#> [2026-05-27 09:05:29.642] [INFO ] > Starting: export_output [file=data/interim/annotations/example_gnpsPrepared.tsv.gz, n_rows=1]
#> [2026-05-27 09:05:29.643] [INFO ] [OK] Completed: export_output [size_bytes=254] (1ms)
#> ✔ ann_spe_exp_gnp_pre completed [32ms, 254 B]
#> + ann_sir_pre dispatched
#> [2026-05-27 09:05:30.018] [INFO ] > Starting: prepare_annotations_sirius [version=6]
#> [2026-05-27 09:05:30.170] [INFO ] > Starting: process_smiles [n_structures=2563]
#> [2026-05-27 09:05:30.171] [INFO ] Processing SMILES with RDKit
#> [2026-05-27 09:05:36.138] [INFO ] Processing 9 new SMILES with RDKit
#> [2026-05-27 09:05:36.140] [INFO ] Starting SMILES processing pipeline
#> [2026-05-27 09:05:36.140] [INFO ] Input: /tmp/RtmptH52Ax/file267e3dbb7495.smi
#> [2026-05-27 09:05:36.140] [INFO ] Output: /tmp/RtmptH52Ax/file267e78882cd0.csv.gz
#> [2026-05-27 09:05:36.140] [INFO ] Input file validated: /tmp/RtmptH52Ax/file267e3dbb7495.smi
#> [2026-05-27 09:05:36.140] [INFO ] Output file validated: /tmp/RtmptH52Ax/file267e78882cd0.csv.gz
#> [2026-05-27 09:05:36.140] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-05-27 09:05:36.140] [INFO ] SMILES supplier initialized
#> [09:05:36] Explicit valence for atom # 8 Cl, 3, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 1
#> [09:05:36] ERROR: Explicit valence for atom # 8 Cl, 3, is greater than permitted
#> [09:05:36] Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 2
#> [09:05:36] ERROR: Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 2 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 3
#> [09:05:36] ERROR: Explicit valence for atom # 2 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 4
#> [09:05:36] ERROR: Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 2 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 5
#> [09:05:36] ERROR: Explicit valence for atom # 2 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 6 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 6
#> [09:05:36] ERROR: Explicit valence for atom # 6 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 6 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 7
#> [09:05:36] ERROR: Explicit valence for atom # 6 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 8
#> [09:05:36] ERROR: Explicit valence for atom # 4 P, 7, is greater than permitted
#> [09:05:36] Explicit valence for atom # 2 P, 7, is greater than permitted
#> [09:05:36] ERROR: Could not sanitize molecule on line 9
#> [09:05:36] ERROR: Explicit valence for atom # 2 P, 7, is greater than permitted
#> [2026-05-27 09:05:36.141] [INFO ] Processing complete. Total molecules processed: 0
#> [2026-05-27 09:05:36.171] [INFO ] Successfully processed 0 SMILES
#> [2026-05-27 09:05:39.337] [INFO ] [OK] Completed: process_smiles [n_processed=2555] (9.2s)
#> [2026-05-27 09:05:39.365] [INFO ] > Starting: complement_metadata [n_input=2571]
#> [2026-05-27 09:05:52.850] [INFO ] [OK] Completed: complement_metadata [n_enriched=2571] (13.5s)
#> [2026-05-27 09:05:52.862] [INFO ] [OK] Completed: prepare_annotations_sirius [n_canopus=15, n_formulas=18, n_structures=2571] (22.8s)
#> [2026-05-27 09:05:52.888] [INFO ] Exporting parameters to: data/interim/params/260527_090552_prepare_annotations_sirius.yaml
#> [2026-05-27 09:05:52.890] [INFO ] > Starting: export_output [file=data/interim/annotations/example_canopusPrepared.tsv.gz, n_rows=15]
#> [2026-05-27 09:05:52.892] [INFO ] [OK] Completed: export_output [size_bytes=830] (1ms)
#> [2026-05-27 09:05:52.893] [INFO ] > Starting: export_output [file=data/interim/annotations/example_formulaPrepared.tsv.gz, n_rows=18]
#> [2026-05-27 09:05:52.894] [INFO ] [OK] Completed: export_output [size_bytes=514] (1ms)
#> [2026-05-27 09:05:52.896] [INFO ] > Starting: export_output [file=data/interim/annotations/example_siriusPrepared.tsv.gz, n_rows=2571]
#> [2026-05-27 09:05:52.909] [INFO ] [OK] Completed: export_output [size_bytes=97221] (14ms)
#> ✔ ann_sir_pre completed [22.9s, 98.56 kB]
#> + tax_pre dispatched
#> [2026-05-27 09:05:55.014] [INFO ] > Starting: prepare_taxa [taxon=NULL]
#> [2026-05-27 09:05:55.178] [INFO ] Processing 2 organism name(s) for OTT taxonomy lookup
#> [2026-05-27 09:05:55.621] [INFO ] Querying OTT API in 1 batches
#> [2026-05-27 09:05:55.835] [INFO ] Retrying failed queries using genus names only
#> [2026-05-27 09:05:55.842] [INFO ] Retrying with 1 genus names: blk 
#> [2026-05-27 09:05:56.047] [INFO ] Retrieving detailed taxonomy for 1 unique OTT IDs
#> [2026-05-27 09:05:56.175] [INFO ] Got OTTaxonomy!
#> [2026-05-27 09:05:56.632] [INFO ] [OK] Completed: prepare_taxa [n_features=5328] (1.6s)
#> [2026-05-27 09:05:56.664] [INFO ] Exporting parameters to: data/interim/params/260527_090556_prepare_taxa.yaml
#> [2026-05-27 09:05:56.666] [INFO ] > Starting: export_output [file=data/interim/taxa/example_taxed.tsv.gz, n_rows=5328]
#> [2026-05-27 09:05:56.673] [INFO ] [OK] Completed: export_output [size_bytes=19697] (7ms)
#> ✔ tax_pre completed [1.7s, 19.70 kB]
#> + ann_ms1_pre dispatched
#> [2026-05-27 09:05:57.161] [INFO ] > Starting: annotate_masses [ms_mode=pos, tolerance_ppm=10, tolerance_rt=0.02]
#> [2026-05-27 09:05:57.162] [INFO ] Starting mass-based annotation
#> [2026-05-27 09:05:57.163] [INFO ] ============================================================
#> [2026-05-27 09:05:57.164] [INFO ] Data Sanitizing: Pre-flight Checks
#> [2026-05-27 09:05:57.165] [INFO ] ============================================================
#> [2026-05-27 09:05:57.166] [INFO ] Checking features file...
#> [2026-05-27 09:05:57.204] [INFO ] [OK] Features file: 5328 rows, 5 columns
#> [2026-05-27 09:05:57.205] [INFO ] ============================================================
#> [2026-05-27 09:05:57.206] [INFO ] [OK] All pre-flight checks passed!
#> [2026-05-27 09:05:57.207] [INFO ] Data validation complete. Ready to proceed.
#> [2026-05-27 09:05:57.208] [INFO ] ============================================================
#> [2026-05-27 09:05:57.246] [INFO ] Processing 5328 features for annotation
#> [2026-05-27 09:05:57.247] [INFO ] > Starting: harmonize_adducts [n_rows=5328]
#> [2026-05-27 09:05:57.268] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=13, n_unique_after=13] (20ms)
#> [2026-05-27 09:05:57.286] [INFO ] > Starting: harmonize_adducts [n_rows=2112]
#> [2026-05-27 09:05:57.307] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=13, n_unique_after=13] (20ms)
#> [2026-05-27 09:05:57.308] [INFO ] Pre-assigned adducts kept as hypotheses alongside the [M+H]+ baseline: 2112
#> [2026-05-27 09:06:19.670] [INFO ] Here are the top 10 observed m/z differences inside the RT windows:
#> [2026-05-27 09:06:19.672] [INFO ] 
#>              bin   N    Pct
#>  (4.8501,5.0366] 352 19.30%
#>  (21.822,22.009] 283 15.52%
#>   (16.973,17.16] 208 11.40%
#>  (17.906,18.092] 192 10.53%
#>  (15.854,16.041] 172  9.43%
#>    (39.914,40.1] 143  7.84%
#>  (38.981,39.168] 137  7.51%
#>  (34.878,35.065] 115  6.30%
#>  (77.962,78.148] 114  6.25%
#>  (1.8659,2.0524] 108  5.92%
#> [2026-05-27 09:06:19.960] [INFO ] Evidence engine: 5328 features x 130 adducts (prefilter=on, cap=130)
#> [2026-05-27 09:06:20.132] [INFO ] Evidence engine candidate materialization: 274963 rows
#> [2026-05-27 09:06:30.084] [INFO ] Evidence engine complete: 19708 rows, 9996 supported clusters
#> [2026-05-27 09:06:34.594] [INFO ] > Starting: harmonize_adducts [n_rows=6738]
#> [2026-05-27 09:06:34.607] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=8, n_unique_after=8] (13ms)
#> [2026-05-27 09:06:34.950] [INFO ] Pairwise-support filter removed 10730 modifier-bearing evidence hypothesis row(s) lacking direct adduct/cluster/loss support.
#> [2026-05-27 09:06:34.979] [INFO ] Evidence-based discovery added 1477 adduct edge(s)
#> [2026-05-27 09:06:35.003] [INFO ] > Starting: harmonize_adducts [n_rows=1343]
#> [2026-05-27 09:06:35.037] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=22, n_unique_after=20] (34ms)
#> [2026-05-27 09:06:35.058] [INFO ] > Starting: harmonize_adducts [n_rows=15878]
#> [2026-05-27 09:06:35.116] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=36, n_unique_after=31] (58ms)
#> [2026-05-27 09:06:35.224] [INFO ] > Starting: harmonize_adducts [n_rows=2504]
#> [2026-05-27 09:06:35.909] [INFO ] [OK] Completed: harmonize_adducts [n_unique_before=177, n_unique_after=177] (685ms)
#> [2026-05-27 09:06:36.193] [INFO ] Constrained multi-adduct expansion kept 1789 hypothesis row(s)
#> [2026-05-27 09:06:36.535] [INFO ] Network-consensus pruning dropped 1218 (feature, adduct) candidate(s) with zero adduct-graph support when a supported alternative existed.
#> [2026-05-27 09:06:38.847] [INFO ] Annotation/edge adduct agreement removed 1029 unsupported (feature, adduct) assignment(s).
#> [2026-05-27 09:07:24.654] [INFO ] Conflict-resolution filter removed 3789 annotation row(s) with states incompatible with graph-consistent evidence.
#> [2026-05-27 09:07:24.656] [INFO ] Conflict-resolution pruning touched 1950 feature(s) and removed all annotations from 0 feature(s).
#> [2026-05-27 09:07:24.700] [INFO ] Coverage audit: kept 174485/208144 annotation rows across 5328/5328 features; 33659 annotation rows were pruned from 1950 feature(s).
#> [2026-05-27 09:07:24.702] [INFO ] > Starting: decorate_masses [n_annotations=174485]
#> [2026-05-27 09:07:24.746] [INFO ] MS1 annotations: 50096 unique structures across 4502 features
#> [2026-05-27 09:07:24.748] [INFO ] [OK] Completed: decorate_masses [n_structures=50096, n_features=4502] (46ms)
#> [2026-05-27 09:07:24.829] [INFO ] Breakdown of the annotated adduct species (library-matched):
#> [2026-05-27 09:07:24.832] [INFO ] 
#>              adduct N_features N_annotations Pct_features Pct_annotations
#>              [M+H]+       3065        100418       54.42%          57.82%
#>             [M+Na]+        696         21205       12.36%          12.21%
#>            [M+H4N]+        530         14806        9.41%           8.53%
#>          [M-H2O+H]+        208          9796        3.69%           5.64%
#>              [M+K]+        152          3380        2.70%           1.95%
#>            [M+Ca]2+         72          1013        1.28%           0.58%
#>            [2M+Na]+         64          3600        1.14%           2.07%
#>           [2M+H4N]+         64          2207        1.14%           1.27%
#>                [M]+         62           506        1.10%           0.29%
#>       [M+C2H6OS+H]+         59          2312        1.05%           1.33%
#>             [2M+H]+         59          1767        1.05%           1.02%
#>          [M-H2+Fe]+         59           900        1.05%           0.52%
#>           [M-H+Fe]+         56           962        0.99%           0.55%
#>           [2M+Ca]2+         53          1009        0.94%           0.58%
#>             [2M+K]+         38           668        0.67%           0.38%
#>           [2M+Fe]2+         34           413        0.60%           0.24%
#>             [M+Cu]+         34           160        0.60%           0.09%
#>          [M-H+2Na]+         22           676        0.39%           0.39%
#>      [M-C6H10O5+H]+         22           329        0.39%           0.19%
#>           [2M+Mg]2+         22           296        0.39%           0.17%
#>            [M-O+H]+         21          1119        0.37%           0.64%
#>            [M+H2]2+         18           179        0.32%           0.10%
#>            [M+Fe]2+         18           150        0.32%           0.09%
#>        [M+C2H7N+H]+         16           784        0.28%           0.45%
#>      [M-C6H12O6+H]+         16           283        0.28%           0.16%
#>         [M-H4O2+H]+         14           372        0.25%           0.21%
#>      [M-C6H10O4+H]+         12           210        0.21%           0.12%
#>            [M+Mg]2+         11           130        0.20%           0.07%
#>  [M-C6H10O5-H2O+H]+          8           122        0.14%           0.07%
#>        [M+C2H3N+H]+          7           544        0.12%           0.31%
#>           [M-CO+H]+          7           503        0.12%           0.29%
#>         [M-C2O2+H]+          7           469        0.12%           0.27%
#>         [2M-H2O+H]+          6           631        0.11%           0.36%
#>         [M-H6O3+H]+          6           144        0.11%           0.08%
#>       [M-C5H8O4+H]+          6           104        0.11%           0.06%
#>    [M-C12H20O10+H]+          6            68        0.11%           0.04%
#>  [M-C6H12O6-H2O+H]+          6            44        0.11%           0.03%
#>          [M-CO2+H]+          5           164        0.09%           0.09%
#>            [M+2H]2+          5           118        0.09%           0.07%
#>         [M-2H2O+H]+          5           110        0.09%           0.06%
#>       [M-C6H6O3+H]+          4            50        0.07%           0.03%
#>       [M+C2H3N+Na]+          3            96        0.05%           0.06%
#>       [M-C3H6O3+H]+          3            43        0.05%           0.02%
#>         [M-CHO2+H]+          3             4        0.05%           0.00%
#>    [M-C6H10O4+H4N]+          2            66        0.04%           0.04%
#>      [M+C2H6OS+Na]+          2            50        0.04%           0.03%
#>            [M-O+K]+          2            27        0.04%           0.02%
#>  [M-C6H10O4-H2O+H]+          2            17        0.04%           0.01%
#>         [M-H2O+Na]+          2            17        0.04%           0.01%
#>       [M-C8H8O3+H]+          2            14        0.04%           0.01%
#>        [M-H2O+H4N]+          2            14        0.04%           0.01%
#>       [M+C2H6OS+K]+          2             3        0.04%           0.00%
#>     [M-C6H10O5+Cu]+          2             3        0.04%           0.00%
#>   [M-H2O+C2H6OS+H]+          1           163        0.02%           0.09%
#>       [M-CO-H2O+H]+          1           111        0.02%           0.06%
#>           [M-O+Na]+          1            62        0.02%           0.04%
#>        [M-CH6O3+H]+          1            57        0.02%           0.03%
#>    [M-H+C2H6OS+Fe]+          1            29        0.02%           0.02%
#>   [M-C6H6O3-H2O+H]+          1            19        0.02%           0.01%
#>        [M-CO2+Fe]2+          1            13        0.02%           0.01%
#>     [M-C3H6O3+H4N]+          1            12        0.02%           0.01%
#>     [2M-C3H6O3+Na]+          1            11        0.02%           0.01%
#>      [M+2C2H6OS+H]+          1            11        0.02%           0.01%
#>     [M-C6H10O5+Na]+          1            11        0.02%           0.01%
#>          [M-O+H4N]+          1            11        0.02%           0.01%
#>      [M-C3H6O3+Na]+          1            10        0.02%           0.01%
#>      [M-C6H6O3+Na]+          1             9        0.02%           0.01%
#>        [M-H2O-O+H]+          1             9        0.02%           0.01%
#>      [M-C8H8O3+Na]+          1             7        0.02%           0.00%
#>        [2M-2H2O+H]+          1             5        0.02%           0.00%
#>     [M-C6H12O6+Na]+          1             5        0.02%           0.00%
#>    [2M-C6H12O6+Na]+          1             4        0.02%           0.00%
#>          [M-CO+Na]+          1             4        0.02%           0.00%
#>     [2M-C6H10O5+K]+          1             3        0.02%           0.00%
#>         [M-H2O+Cu]+          1             3        0.02%           0.00%
#>        [M-H2O4S+H]+          1             3        0.02%           0.00%
#>     [M-C12H20O8+H]+          1             2        0.02%           0.00%
#>      [M-C5H8O4+Na]+          1             2        0.02%           0.00%
#>       [M-C6H8O4+H]+          1             2        0.02%           0.00%
#>       [M-C8H8O4+H]+          1             2        0.02%           0.00%
#>        [M-CO2+H2]2+          1             2        0.02%           0.00%
#>       [M+C2H7N+Na]+          1             1        0.02%           0.00%
#>        [M-CO2+H4N]+          1             1        0.02%           0.00%
#> [2026-05-27 09:07:24.840] [INFO ] Adduct hypotheses retained without library match (by source):
#> [2026-05-27 09:07:24.842] [INFO ] 
#>       source N_features N_adduct_types
#>     baseline        625              1
#>         pair        152             12
#>  preassigned         35             11
#>         loss          9              4
#>      cluster          4              1
#>     evidence          1              1
#> [2026-05-27 09:07:24.912] [INFO ] Exporting parameters to: data/interim/params/260527_090724_annotate_masses.yaml
#> [2026-05-27 09:07:24.914] [INFO ] > Starting: export_output [file=data/interim/features/example_edgesMasses.tsv, n_rows=5574]
#> [2026-05-27 09:07:24.916] [INFO ] [OK] Completed: export_output [size_bytes=110593] (2ms)
#> [2026-05-27 09:07:26.128] [INFO ] > Starting: process_smiles [n_structures=50103]
#> [2026-05-27 09:07:26.130] [INFO ] Processing SMILES with RDKit
#> [2026-05-27 09:07:33.432] [INFO ] Processing 184 new SMILES with RDKit
#> [2026-05-27 09:07:33.435] [INFO ] Starting SMILES processing pipeline
#> [2026-05-27 09:07:33.435] [INFO ] Input: /tmp/RtmptH52Ax/file267e2b123cfb.smi
#> [2026-05-27 09:07:33.435] [INFO ] Output: /tmp/RtmptH52Ax/file267e2fec59b2.csv.gz
#> [2026-05-27 09:07:33.435] [INFO ] Input file validated: /tmp/RtmptH52Ax/file267e2b123cfb.smi
#> [2026-05-27 09:07:33.435] [INFO ] Output file validated: /tmp/RtmptH52Ax/file267e2fec59b2.csv.gz
#> [2026-05-27 09:07:33.435] [INFO ] Processing parameters: workers=8, batch_size=1000, progress_interval=10000
#> [2026-05-27 09:07:33.436] [INFO ] SMILES supplier initialized
#> [2026-05-27 09:07:33.767] [INFO ] Processing complete. Total molecules processed: 184
#> [2026-05-27 09:07:33.815] [INFO ] Successfully processed 184 SMILES
#> [2026-05-27 09:07:37.427] [INFO ] [OK] Completed: process_smiles [n_processed=50107] (11.3s)
#> [2026-05-27 09:07:39.714] [INFO ] > Starting: complement_metadata [n_input=174485]
#> [2026-05-27 09:08:02.508] [INFO ] [OK] Completed: complement_metadata [n_enriched=174485] (22.8s)
#> [2026-05-27 09:08:02.511] [INFO ] > Starting: export_output [file=data/interim/annotations/example_ms1Prepared.tsv.gz, n_rows=174485]
#> [2026-05-27 09:08:03.407] [INFO ] [OK] Completed: export_output [size_bytes=13105577] (896ms)
#> [2026-05-27 09:08:03.409] [INFO ] > Starting: export_output [file=data/interim/annotations/example_ms1Prepared_coverage.tsv.gz, n_rows=12]
#> [2026-05-27 09:08:03.410] [INFO ] [OK] Completed: export_output [size_bytes=291] (1ms)
#> [2026-05-27 09:08:03.411] [INFO ] Coverage summary written to: data/interim/annotations/example_ms1Prepared_coverage.tsv.gz
#> [2026-05-27 09:08:03.412] [INFO ] [OK] Completed: annotate_masses [n_annotations=174485, n_edges=5574] (2m 6s)
#> ✔ ann_ms1_pre completed [2m 6.3s, 13.22 MB]
#> + ann_sir_pre_can dispatched
#> ✔ ann_sir_pre_can completed [0ms, 830 B]
#> + ann_sir_pre_str dispatched
#> ✔ ann_sir_pre_str completed [0ms, 97.22 kB]
#> + ann_sir_pre_for dispatched
#> ✔ ann_sir_pre_for completed [0ms, 514 B]
#> + ann_ms1_pre_edg dispatched
#> ✔ ann_ms1_pre_edg completed [0ms, 110.59 kB]
#> + ann_ms1_pre_ann dispatched
#> ✔ ann_ms1_pre_ann completed [0ms, 13.11 MB]
#> + fea_edg_pre dispatched
#> [2026-05-27 09:08:07.325] [INFO ] > Starting: prepare_features_edges [n_edge_types=2]
#> [2026-05-27 09:08:07.373] [INFO ] [OK] Completed: prepare_features_edges [n_edges=13676] (47ms)
#> [2026-05-27 09:08:07.394] [INFO ] Exporting parameters to: data/interim/params/260527_090807_prepare_features_edges.yaml
#> [2026-05-27 09:08:07.396] [INFO ] > Starting: export_output [file=data/interim/features/example_edges.tsv, n_rows=13676]
#> [2026-05-27 09:08:07.399] [INFO ] [OK] Completed: export_output [size_bytes=610790] (3ms)
#> ✔ fea_edg_pre completed [76ms, 610.79 kB]
#> + ann_fil dispatched
#> [2026-05-27 09:08:07.796] [INFO ] > Starting: filter_annotations [n_annotation_files=6, tolerance_rt=Inf]
#> [2026-05-27 09:08:07.797] [INFO ] Filtering annotations
#> [2026-05-27 09:08:07.838] [INFO ] Processing 5328 unique features for annotation filtering
#> [2026-05-27 09:08:15.042] [INFO ] Removing MS1 annotations superseded by quality spectral matches
#> [2026-05-27 09:08:19.533] [INFO ] Removed 10387 redundant MS1 annotations
#> [2026-05-27 09:08:19.534] [INFO ] Total annotations after MS1 deduplication: 866623
#> [2026-05-27 09:08:21.519] [INFO ] Joining RT library and computing RT deltas
#> [2026-05-27 09:08:24.554] [INFO ] Removed 2346 duplicate RT library matches (keeping best match per annotation)
#> [2026-05-27 09:08:24.557] [INFO ] RT deltas computed for 0 annotations (no hard cutoff applied; scoring handles RT penalty)
#> [2026-05-27 09:08:24.559] [INFO ] Removed 2346 duplicate RT library matches during join
#> [2026-05-27 09:08:24.878] [INFO ] Exporting parameters to: data/interim/params/260527_090824_filter_annotations.yaml
#> [2026-05-27 09:08:24.880] [INFO ] > Starting: export_output [file=data/interim/annotations/example_annotationsFiltered.tsv.gz, n_rows=864274]
#> [2026-05-27 09:08:29.190] [INFO ] [OK] Completed: export_output [size_bytes=76930734] (4.3s)
#> [2026-05-27 09:08:29.191] [INFO ] [OK] Completed: filter_annotations [n_filtered=864274] (21.4s)
#> ✔ ann_fil completed [21.4s, 76.93 MB]
#> + fea_com dispatched
#> [2026-05-27 09:08:30.273] [INFO ] > Starting: create_components [n_input_files=1]
#> [2026-05-27 09:08:30.274] [INFO ] Creating components from 1 edge file(s)
#> [2026-05-27 09:08:30.286] [INFO ] Loaded 13108 edges connecting 5328 unique features
#> [2026-05-27 09:08:30.298] [INFO ] Found 2533 components
#> [2026-05-27 09:08:30.312] [INFO ] Component sizes - Min: 1, Max: 1784, Mean: 2.1
#> [2026-05-27 09:08:30.328] [INFO ] Exporting parameters to: data/interim/params/260527_090830_create_components.yaml
#> [2026-05-27 09:08:30.329] [INFO ] > Starting: export_output [file=data/interim/features/example_components.tsv, n_rows=5328]
#> [2026-05-27 09:08:30.331] [INFO ] [OK] Completed: export_output [size_bytes=47868] (2ms)
#> [2026-05-27 09:08:30.332] [INFO ] Components written to: data/interim/features/example_components.tsv
#> [2026-05-27 09:08:30.333] [INFO ] [OK] Completed: create_components [n_components=2533, n_features=5328] (60ms)
#> ✔ fea_com completed [64ms, 47.87 kB]
#> + fea_com_pre dispatched
#> [2026-05-27 09:08:30.752] [INFO ] > Starting: prepare_features_components [n_files=1]
#> [2026-05-27 09:08:30.757] [INFO ] [OK] Completed: prepare_features_components [n_assignments=5328] (5ms)
#> [2026-05-27 09:08:30.773] [INFO ] Exporting parameters to: data/interim/params/260527_090830_prepare_features_components.yaml
#> [2026-05-27 09:08:30.775] [INFO ] > Starting: export_output [file=data/interim/features/example_componentsPrepared.tsv, n_rows=5328]
#> [2026-05-27 09:08:30.777] [INFO ] [OK] Completed: export_output [size_bytes=47863] (2ms)
#> ✔ fea_com_pre completed [30ms, 47.86 kB]
#> + ann_wei dispatched
#> [2026-05-27 09:08:31.165] [INFO ] Starting annotation weighting and scoring
#> [2026-05-27 09:08:31.166] [INFO ] > Starting: weight_annotations [n_candidates_neighbors=16, n_candidates_final=1]
#> [2026-05-27 09:08:52.055] [INFO ] 
#>    candidate_library      n    Pct
#>      ISDB - Wikidata 586738 67.95%
#>             TIMA MS1 161383 18.69%
#>               merlin  41565  4.81%
#>  ISDB - NormanSusDat  40926  4.74%
#>                 gnps  25234  2.92%
#>             massbank   5039  0.58%
#>               SIRIUS   2561  0.30%
#> [2026-05-27 09:08:56.947] [INFO ] > Starting: weight_bio [n_annotations=731519, n_sop=767415]
#> [2026-05-27 09:08:56.948] [INFO ] Weighting 731519 annotations by biological source
#> [2026-05-27 09:09:04.426] [INFO ] [OK] Completed: weight_bio [n_weighted=731519] (7.5s)
#> [2026-05-27 09:09:04.428] [INFO ] > Starting: decorate_bio [n_annotations=731519]
#> [2026-05-27 09:09:05.044] [INFO ] Taxonomically informed metabolite annotation reranked:
#>     Kingdom  level: 162391 candidates (44319 unique)
#>     Phylum   level: 161253 candidates (43846 unique)
#>     Class    level: 131034 candidates (37545 unique)
#>     Order    level: 31836 candidates (10270 unique)
#>     Family   level: 25952 candidates (8242 unique)
#>     Tribe    level: 5333 candidates (1339 unique)
#>     Genus    level: 4483 candidates (1047 unique)
#>     Species  level: 2526 candidates (524 unique)
#>     Variety  level: 383 candidates (98 unique)
#>     Biota    level: 383 candidates (98 unique)
#> [2026-05-27 09:09:05.045] [INFO ] [OK] Completed: decorate_bio [n_processed=731519] (617ms)
#> [2026-05-27 09:09:05.047] [INFO ] > Starting: clean_bio [n_annotations=731519, minimal_consistency=0]
#> [2026-05-27 09:09:30.244] [INFO ] [OK] Completed: clean_bio [n_cleaned=731519] (25.2s)
#> [2026-05-27 09:09:30.246] [INFO ] > Starting: weight_chemo [n_input=731519]
#> [2026-05-27 09:09:30.247] [INFO ] Weighting 731519 annotations by chemical consistency
#> [2026-05-27 09:09:33.108] [INFO ] [OK] Completed: weight_chemo [n_weighted=731519] (2.9s)
#> [2026-05-27 09:09:33.110] [INFO ] > Starting: decorate_chemo [n_annotations=731519]
#> [2026-05-27 09:09:36.012] [INFO ] Chemically informed metabolite annotation reranked:
#>   Classyfire:
#>     Kingdom level:    110971 candidates (73220 unique)
#>     Superclass level: 77071 candidates (47233 unique)
#>     Class level:      58803 candidates (33702 unique)
#>     Parent level:     38693 candidates (22545 unique)
#>   NPClassifier:
#>     Pathway level:    156783 candidates (98142 unique)
#>     Superclass level: 89498 candidates (54091 unique)
#>     Class level:      56276 candidates (32679 unique)
#> [2026-05-27 09:09:36.013] [INFO ] [OK] Completed: decorate_chemo [n_processed=731519] (2.9s)
#> [2026-05-27 09:09:36.045] [INFO ] > Starting: clean_chemo [n_annotations=731519, candidates_final=1, high_confidence=FALSE]
#> [2026-05-27 09:09:58.004] [INFO ] Sampling candidates for 3171 features with more than 7 candidates per score
#> [2026-05-27 09:09:58.006] [INFO ] > Starting: filter_high_confidence [n_input=515822, context=filtered]
#> [2026-05-27 09:09:58.043] [INFO ] [filtered]  Removed 513787 low-confidence candidates (99.6% of 515822 total)
#> [2026-05-27 09:09:58.044] [INFO ] [filtered]  2035 high-confidence candidates remaining (0.4%)
#> [2026-05-27 09:09:58.045] [INFO ] [OK] Completed: filter_high_confidence [n_filtered=2035, n_removed=513787] (39ms)
#> [2026-05-27 09:09:58.049] [INFO ] Summarizing annotation results
#> [2026-05-27 09:09:58.489] [INFO ] Annotated features: 812/5328 (15.2%)
#> [2026-05-27 09:10:02.352] [INFO ] Summarizing annotation results
#> [2026-05-27 09:10:10.994] [INFO ] Annotated features: 5049/5328 (94.8%)
#> [2026-05-27 09:10:12.853] [INFO ] [OK] Completed: clean_chemo [n_final_full=515822, n_final_filtered=5536, n_final_mini=5536, n_features=5328] (36.8s)
#> [2026-05-27 09:10:12.855] [INFO ] [OK] Completed: weight_annotations [n_annotations=NULL] (1m 42s)
#> [2026-05-27 09:10:12.878] [INFO ] Exporting parameters to: data/processed/20260527_091012_example/260527_091012_prepare_params.yaml
#> [2026-05-27 09:10:12.901] [INFO ] Exporting parameters to: data/processed/20260527_091012_example/260527_091012_prepare_params_advanced.yaml
#> [2026-05-27 09:10:12.903] [INFO ] > Starting: export_output [file=data/processed/20260527_091012_example/example_results_mini.tsv, n_rows=5536]
#> [2026-05-27 09:10:12.907] [INFO ] [OK] Completed: export_output [size_bytes=1077561] (4ms)
#> [2026-05-27 09:10:12.909] [INFO ] > Starting: export_output [file=data/processed/20260527_091012_example/example_results_filtered.tsv, n_rows=5536]
#> [2026-05-27 09:10:12.914] [INFO ] [OK] Completed: export_output [size_bytes=1476390] (5ms)
#> [2026-05-27 09:10:12.916] [INFO ] > Starting: export_output [file=data/processed/20260527_091012_example/example_results.tsv, n_rows=515822]
#> [2026-05-27 09:10:13.635] [INFO ] [OK] Completed: export_output [size_bytes=287168526] (718ms)
#> [2026-05-27 09:10:13.636] [INFO ] Results exported: example_results.tsv
#> ✔ ann_wei completed [1m 42.5s, 288.64 MB]
#> + exp_mzt dispatched
#> [2026-05-27 09:10:14.850] [INFO ] > Starting: write_mztab [input=example_results_filtered.tsv, output=example_results.mztab]
#> [2026-05-27 09:10:17.815] [INFO ] [OK] Completed: write_mztab [n_sml=812, n_smf=5328, n_sme=1020] (3s)
#> ✔ exp_mzt completed [3s, 2.03 MB]
#> ✔ ended pipeline [16m 18.9s, 146 completed, 0 skipped]
#> There were 13 warnings (use warnings() to see them)

The final exported file is formatted in order to be easily imported in Cytoscape to further explore your data!

We hope you enjoyed using TIMA and are pleased to hear from you!

For any remark or suggestion, please fill an issue or feel free to contact us directly.

Reuse

Citation

BibTeX citation:
@online{rutz2026,
  author = {Rutz, Adriano},
  title = {3 {Performing} {Taxonomically} {Informed} {Metabolite}
    {Annotation}},
  date = {2026-05-27},
  url = {https://taxonomicallyinformedannotation.github.io/tima/vignettes/articles/III-processing.html},
  langid = {en}
}
For attribution, please cite this work as:
Rutz, Adriano. 2026. “3 Performing Taxonomically Informed Metabolite Annotation.” May 27. https://taxonomicallyinformedannotation.github.io/tima/vignettes/articles/III-processing.html.