Prepare merged structure organism pairs libraries

Description

This function merges all structure-organism pair libraries (LOTUS, HMDB, ECMDB, etc.) into a single comprehensive library. Can optionally filter by taxonomic level to create biologically-focused subsets. Also splits structures into separate metadata tables.

Usage

prepare_libraries_sop_merged(
  files = get_params(step = "prepare_libraries_sop_merged")\$files\$libraries\$sop\$prepared,
  filter = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$mode,
  level = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$level,
  value = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$value,
  cache = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$processed,
  npc_cache = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$n,
  cla_cache = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$c,
  output_key = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$keys,
  output_org_tax_ott = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$organisms\$taxonomies\$ott,
  output_str_can = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$canonical,
  output_str_stereo = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$stereo,
  output_str_met = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$metadata,
  output_str_tax_cla = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$cla,
  output_str_tax_npc = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$npc
)

Arguments

files character Character vector or list of paths to prepared library files
filter logical Logical whether to filter the merged library by taxonomy
level character Character string taxonomic rank for filtering (kingdom, phylum, family, genus, etc.)
value character Character string taxon name(s) to keep (can use | for multiple, e.g., ‘Gentianaceae|Apocynaceae’)
cache character Character string path to cache directory for processed SMILES
npc_cache character Optional path to an additional NPClassifier taxonomy cache file (TSV/TSV.gz). Structures present in the merged library but missing NPClassifier taxonomy will be looked up in this cache. Expected columns: structure_smiles, structure_tax_npc_01pat, structure_tax_npc_02sup, structure_tax_npc_03cla. Alternative column names from external tools (e.g., pathway, superclass, class) are also supported.
cla_cache character Optional path to an additional ClassyFire taxonomy cache file (TSV/TSV.gz). Structures present in the merged library but missing ClassyFire taxonomy will be looked up in this cache. Expected columns: structure_inchikey, structure_tax_cla_chemontid, structure_tax_cla_01kin, structure_tax_cla_02sup, structure_tax_cla_03cla, structure_tax_cla_04dirpar. Alternative column names (e.g., inchikey, chemontid, kingdom, superclass, class, directparent) are also supported.
output_key character Character string path for output keys file
output_org_tax_ott character Character string path for organisms taxonomy (OTT) file
output_str_can character Character string path for structures canonical SMILES file
output_str_stereo character Character string path for structures stereochemistry file
output_str_met character Character string path for structures metadata file
output_str_tax_cla character Character string path for ClassyFire taxonomy file
output_str_tax_npc character Character string path for NPClassifier taxonomy file

Details

Creates merged library by combining all available SOP sources, optionally filtering by taxonomic criteria (e.g., only Gentianaceae). Splits output into structures metadata, names, taxonomy, and organisms.

Value

Character string path to the prepared merged SOP library

See Also

Other preparation: prepare_annotations_gnps(), prepare_annotations_mzmine(), prepare_annotations_sirius(), prepare_annotations_spectra(), prepare_features_components(), prepare_features_edges(), prepare_features_tables(), prepare_libraries_rt(), prepare_libraries_sop_bigg(), prepare_libraries_sop_closed(), prepare_libraries_sop_ecmdb(), prepare_libraries_sop_hmdb(), prepare_libraries_sop_lotus(), prepare_libraries_spectra(), prepare_params(), prepare_taxa()

Examples

library("tima")

copy_backbone()
go_to_cache()
github <- "https://raw.githubusercontent.com/"
repo <- "taxonomicallyinformedannotation/tima-example-files/main/"
dir <- paste0(github, repo)
files <- get_params(step =
    "prepare_libraries_sop_merged")$files$libraries$sop$prepared$lotus |>
  gsub(pattern = ".gz", replacement = "", fixed = TRUE)
get_file(url = paste0(dir, files), export = files)
prepare_libraries_sop_merged(files = files)
unlink("data", recursive = TRUE)