DeepSA: Deep Learning-Driven Predictor of Compound Synthesis Accessibility
A deep learning model that predicts the synthesis accessibility of compounds with high accuracy, helping researchers select cost-effective molecules for synthesis.
Abstract
DeepSA is proposed to predict synthesis accessibility of compounds, and has a much higher early enrichment rate in discriminating molecules that are difficult to synthesize. This helps users to select less expensive molecules for synthesis, thus reducing the time for drug discovery and development.
Traditional synthesis accessibility scoring methods like SAScore rely on fragment-based rules and often fail to capture the complex structural features that make a molecule hard to synthesize. DeepSA leverages deep learning to learn these complex patterns directly from molecular data, providing more reliable predictions of synthesis difficulty.
Method
DeepSA employs a graph neural network (GNN) architecture to predict compound synthesis accessibility. The model takes molecular graphs as input, where atoms are represented as nodes and bonds as edges. Through multiple message-passing layers, the model learns rich representations of molecular structures that capture the intricate relationships between structural features and synthesis difficulty.
The key components of DeepSA include:
- Molecular Graph Representation: Compounds are encoded as molecular graphs with atom and bond features
- Graph Neural Network: Multi-layer message passing to capture structural patterns
- Synthesis Accessibility Score: A continuous score indicating the ease of synthesis
Key Results
- Superior Performance: DeepSA achieves significantly higher early enrichment rates compared to existing methods like SAScore and SCScore
- Broad Applicability: The model generalizes well across diverse chemical spaces
- Practical Utility: Integrated into a web server for easy access by the research community
- Drug Discovery Impact: Helps reduce costs by filtering out synthetically intractable molecules early in the drug design pipeline
BibTeX
@article{wang2023deepsa,
title={DeepSA: a deep-learning driven predictor of compound synthesis accessibility},
author={Wang, Shihang and Wang, Lin and Li, Fenglei and Bai, Fang},
journal={Journal of Cheminformatics},
volume={15},
number={1},
pages={103},
year={2023},
publisher={Springer},
doi={10.1186/s13321-023-00771-3}
}