What can attribution methods show us about chemical language models?
Publication year
2024Number of pages
11 p.
Source
Digital Discovery, 3, 9, (2024), pp. 1738-1748ISSN
Publication type
Article / Letter to editor
Display more detailsDisplay less details
Organization
Physical Organic Chemistry
SW OZ DCC AI
Journal title
Digital Discovery
Volume
vol. 3
Issue
iss. 9
Languages used
English (eng)
Page start
p. 1738
Page end
p. 1748
Subject
Cognitive artificial intelligence; Physical Organic ChemistryAbstract
Language models trained on molecular string representations have shown strong performance in predictive and generative tasks. However, practical applications require not only making accurate predictions, but also explainability - the ability to explain the reasons and rationale behind the predictions. In this work, we explore explainability for a chemical language model by adapting a transformer-specific and a model-agnostic input attribution technique. We fine-tune a pretrained model to predict aqueous solubility, compare training and architecture variants, and evaluate visualizations of attributed relevance. The model-agnostic SHAP technique provides sensible attributions, highlighting the positive influence of individual electronegative atoms, but does not explain the model in terms of functional groups or explain how the model represents molecular strings internally to make predictions. In contrast, the adapted transformer-specific explainability technique produces sparse attributions, which cannot be directly attributed to functional groups relevant to solubility. Instead, the attributions are more characteristic of how the model maps molecular strings to its latent space, which seems to represent features relevant to molecular similarity rather than functional groups. These findings provide insight into the representations underpinning chemical language models, which we propose may be leveraged for the design of informative chemical spaces for training more accurate, advanced and explainable models.
This item appears in the following Collection(s)
- Academic publications [244001]
- Electronic publications [130886]
- Faculty of Science [36982]
- Faculty of Social Sciences [30023]
- Open Access publications [105048]
Upload full text
Use your RU credentials (u/z-number and password) to log in with SURFconext to upload a file for processing by the repository team.