library("tima")
# Natural compound
df <- data.frame(
structure_smiles_initial = "OC[C@H]1OC(O)[C@H](O)[C@H](O)[C@H]1O"
)
result <- process_smiles(df)
# Formula: C6H12O6, Mass: 180.063 Da
# Isotope-labeled compound (4× 13C)
df_labeled <- data.frame(
structure_smiles_initial = "OC[13C@H]1OC(O)[13C@H](O)[13C@H](O)[13C@H]1O"
)
result_labeled <- process_smiles(df_labeled)
# Formula: C2[13C]4H12O6 (isotopes shown separately)
# Mass: 184.077 Da (difference of ~4.013 Da from natural)
# SMILES preserves [13C] notation
# InChIKey differs from natural glucoseProcess SMILES strings
Description
Processes SMILES using RDKit (via Python) to standardize structures, generate InChIKeys, calculate molecular properties, and extract 2D representations. Results are cached to avoid reprocessing.
Usage
process_smiles(df, smiles_colname = "structure_smiles_initial", cache = NULL)
Arguments
df
|
Data frame containing SMILES strings |
smiles_colname
|
Column name containing SMILES (default: "structure_smiles_initial") |
cache
|
Path to cached processed SMILES file, or NULL to skip caching |
Value
Data frame with processed SMILES including InChIKey, molecular formula (with isotopes shown), exact mass (with isotope contributions), 2D SMILES, xLogP, and connectivity layer