process_smiles

Process SMILES strings

Description

Processes SMILES using RDKit (via Python) to standardize structures, generate InChIKeys, calculate molecular properties, and extract 2D representations. Results are cached to avoid reprocessing.

Usage

process_smiles(df, smiles_colname = "structure_smiles_initial", cache = NULL)

Arguments

df Data frame containing SMILES strings

smiles_colname Column name containing SMILES (default: "structure_smiles_initial")

cache Path to cached processed SMILES file, or NULL to skip caching

Value

Data frame with processed SMILES including InChIKey, molecular formula (with isotopes shown), exact mass (with isotope contributions), 2D SMILES, xLogP, and connectivity layer

Examples

library("tima")

# Natural compound
df <- data.frame(
  structure_smiles_initial = "OC[C@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O"
)
result <- process_smiles(df)
# Formula: C6H12O6, Mass: 180.063 Da

# Isotope-labeled compound (4× 13C)
df_labeled <- data.frame(
  structure_smiles_initial = "OC[13C@H]1OC(O)[13C@H](O)[13C@H](O)[13C@H]1O"
)
result_labeled <- process_smiles(df_labeled)
# Formula: C2[13C]4H12O6 (isotopes shown separately)
# Mass: 184.077 Da (difference of ~4.013 Da from natural)
# SMILES preserves [13C] notation
# InChIKey differs from natural glucose