Prepare merged structure organism pairs libraries

Description

This function merges all structure-organism pair libraries (LOTUS, HMDB, ECMDB, etc.) into a single comprehensive library. Can optionally filter by taxonomic level to create biologically-focused subsets. Also splits structures into separate metadata tables.

Usage

prepare_libraries_sop_merged(
  files = get_params(step = "prepare_libraries_sop_merged")\$files\$libraries\$sop\$prepared,
  filter = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$mode,
  level = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$level,
  value = get_params(step = "prepare_libraries_sop_merged")\$organisms\$filter\$value,
  cache = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$processed,
  output_key = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$keys,
  output_org_tax_ott = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$organisms\$taxonomies\$ott,
  output_str_stereo = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$stereo,
  output_str_met = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$metadata,
  output_str_nam = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$names,
  output_str_tax_cla = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$cla,
  output_str_tax_npc = get_params(step =
    "prepare_libraries_sop_merged")\$files\$libraries\$sop\$merged\$structures\$taxonomies\$npc
)

Arguments

files Character vector or list of paths to prepared library files
filter Logical whether to filter the merged library by taxonomy
level Character string taxonomic rank for filtering (kingdom, phylum, family, genus, etc.)
value Character string taxon name(s) to keep (can use | for multiple, e.g., ‘Gentianaceae|Apocynaceae’)
cache Character string path to cache directory for processed SMILES
output_key Character string path for output keys file
output_org_tax_ott Character string path for organisms taxonomy (OTT) file
output_str_stereo Character string path for structures stereochemistry file
output_str_met Character string path for structures metadata file
output_str_nam Character string path for structures names file
output_str_tax_cla Character string path for ClassyFire taxonomy file
output_str_tax_npc Character string path for NPClassifier taxonomy file

Details

Creates merged library by combining all available SOP sources, optionally filtering by taxonomic criteria (e.g., only Gentianaceae). Splits output into structures metadata, names, taxonomy, and organisms.

Value

Character string path to the prepared merged SOP library

Examples

library("tima")

copy_backbone()
go_to_cache()
github <- "https://raw.githubusercontent.com/"
repo <- "taxonomicallyinformedannotation/tima-example-files/main/"
dir <- paste0(github, repo)
files <- get_params(step = "prepare_libraries_sop_merged")$files$libraries$sop$prepared$lotus |>
  gsub(pattern = ".gz", replacement = "", fixed = TRUE)
get_file(url = paste0(dir, files), export = files)
prepare_libraries_sop_merged(files = files)
unlink("data", recursive = TRUE)