Annotate spectra

Description

Annotates MS/MS query spectra against one or more spectral libraries, computing similarity scores and returning best candidate annotations above a similarity threshold.

Usage

annotate_spectra(
  input = get_params(step = "annotate_spectra")\$files\$spectral\$raw,
  libraries = get_params(step = "annotate_spectra")\$files\$libraries\$spectral,
  polarity = get_params(step = "annotate_spectra")\$ms\$polarity,
  output = get_params(step = "annotate_spectra")\$files\$annotations\$raw\$spectral\$spectral,
  method = get_params(step = "annotate_spectra")\$similarities\$methods\$annotations,
  threshold = get_params(step = "annotate_spectra")\$similarities\$thresholds\$annotations,
  ppm = get_params(step = "annotate_spectra")\$ms\$tolerances\$mass\$ppm\$ms2,
  dalton = get_params(step = "annotate_spectra")\$ms\$tolerances\$mass\$dalton\$ms2,
  qutoff = get_params(step = "annotate_spectra")\$ms\$thresholds\$ms2\$intensity,
  approx = get_params(step = "annotate_spectra")\$annotations\$ms2approx
)

Arguments

input Character vector or list of query spectral file paths (.mgf).
libraries Character vector or list of library spectral file paths (.mgf / Spectra-supported). Must contain at least one path.
polarity MS polarity; one of VALID_MS_MODES ("pos", "neg").
output Output file path (the function writes a tabular file here).
method Similarity method; one of VALID_SIMILARITY_METHODS.
threshold Minimal similarity score to retain candidates (0-1).
ppm Relative mass tolerance (ppm) for MS/MS matching.
dalton Absolute mass tolerance (Daltons) for MS/MS matching.
qutoff Intensity cutoff under which MS2 fragments are removed. (Parameter name kept for backwards compatibility; spelled "cutoff").
approx Logical; if TRUE perform matching ignoring precursor masses (broader, slower); if FALSE restrict library to precursor-tolerant spectra first.

Details

This is an orchestration wrapper that performs:

  1. Input validation & normalization (query + libraries, numeric params).

  2. Query spectra import & light preprocessing (intensity cutoff).

  3. Library spectra import, cleaning of empty peak lists, optional polarity filtering, optional precursor-based library size reduction (when approx = FALSE).

  4. Similarity computation via calculate_entropy_and_similarity().

  5. Candidate metadata extraction (formula, name, etc.).

  6. Result shaping: derive error (mz), select canonical output columns, threshold filtering, keep best per (feature_id, library, connectivity layer).

  7. Export of parameters & results to the configured output path.

If no annotations are produced (empty inputs or below threshold), a standardized empty template (see fake_annotations_columns()) is exported to ensure downstream code receives expected columns.

Value

Character scalar: the output file path (invisible). Side effect: writes the annotations table to output.

Robustness

The function performs strict validation and logs informative messages. File existence is checked early; similarity computation is wrapped in a tryCatch to surface errors without leaving partially allocated objects.

Performance

Library precursor reduction (when approx = FALSE) limits similarity computation to precursor-tolerant spectra, reducing complexity for large libraries. Repeated metadata extraction uses a single vectorized helper.

Examples

library("tima")

copy_backbone()
go_to_cache()
get_file(
  url = get_default_paths()$urls$examples$spectra_mini,
  export = get_params(step = "annotate_spectra")$files$spectral$raw
)
get_file(
  url = get_default_paths()$urls$examples$spectral_lib_mini$with_rt,
  export = get_default_paths()$data$source$libraries$spectra$exp$with_rt
)
annotate_spectra(
  libraries = get_default_paths()$data$source$libraries$spectra$exp$with_rt
)
unlink("data", recursive = TRUE)