2025

Chemically-aware Attention-based Multi-modal Fusion Framework for Molecular

Learning effective molecular representations is crucial for accurate property prediction in artificial intelligence (AI)-aided drug discovery. Graph and fingerprint representations have been widely used to encode molecular topological structures and chemical substructures. To enhance the feature embedding of each modality and leverage their complementary strengths, we propose a novel Chemically-aware Attention-based Multi-modal Fusion Framework (CAMFF) for molecular representation learning, which integrates molecular graphs and extended-connectivity fingerprints by exploiting various attention mechanisms. Specifically, the proposed CAMFF consists of three modules: 1) a graph embedding module incorporating multi-head attention to capture local heterogeneous interactions and all-pair self-attention to capture long-range atomic dependencies from molecular graph representations; 2) a fingerprint embedding module using a pre-trained Mol2Vec model to generate dense chemical substructure representations; and 3) a chemically-aware feature interaction and fusion module incorporating self-attention to enable interactions between various chemical substructures and cross-attention to ensure effective multi-modal alignment and fusion. To evaluate the effectiveness of CAMFF, we compare it with 14 state-of-the-art methods across 9 molecular property prediction benchmarks. CAMFF demonstrates competitive predictive performance and improves interpretability through attention-based visualization, showing its potential for real-world drug discovery.

ModalComputer scienceRepresentation (politics)FusionArtificial intelligenceHuman–computer interactionMaterials scienceLinguistics

View on Publisher Site

Chemically-aware Attention-based Multi-modal Fusion Framework for Molecular

Abstract

Keywords