Process SMILES strings

Description

Processes SMILES using RDKit (via Python) to standardize structures, generate InChIKeys, calculate molecular properties, and extract 2D representations. Results are cached to avoid reprocessing.

Usage

process_smiles(df, smiles_colname = "structure_smiles_initial", cache = NULL)

Arguments

df Data frame containing SMILES strings
smiles_colname Column name containing SMILES (default: "structure_smiles_initial")
cache Path to cached processed SMILES file, or NULL to skip caching

Value

Data frame with processed SMILES including InChIKey, molecular formula, exact mass, 2D SMILES, xLogP, and connectivity layer

Examples

library("tima")

smiles <- "C=C[C@H]1[C@H](OC=C2C1=CCOC2=O)O[C@H]3[C@H]([C@H]([C@H]([C@H](O3)CO)O)O)O"
df <- data.frame(structure_smiles_initial = smiles)
process_smiles(df)