TL;DR
In this blog post, we'll explore the lasso_highlight_image()
function in Datamol, a Python library for scientists working with molecular data. We'll see how it can be used to highlight specific parts or features of a molecule quickly. We'll also provide some examples of how to use the function and discuss its limitations and areas for improvement for future contributors.
Intro
If you work with chemical data and need to visualize it, you'll want to check out Datamol. This Python library has a new addition called lasso highlight, which was initially produced by Christian W. Feldmann. The lasso highlighting function allows you to quickly identify and visualize specific parts or features of a molecule. It's useful for identifying functional groups, comparing and contrasting different molecules, and analyzing molecular structure and properties.
Examples
To use the lasso_highlight_image()
function, you supply a molecule and specify the substructures you want to highlight. The function returns an image of the molecule with the specified substructures highlighted. You can provide the target and search molecules in SMILES format or as a rdkit.Chem.mol object. The function also takes parameters to specify the image type (PNG or SVG), size, and image characteristics.The details of each parameter can be found here.
Here are two examples to help you get started:
1.Lasso highlight with multiple substructures with PNG image
import datamol as dm
target_molecule = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
substructure = ["CONN", "N#CC~CO"]
dm.lasso_highlight_image(target_molecule, substructures, (400, 400), use_svg=False)
- Lasso highlight with single substructures with SVG image
import datamol as dm
target_molecule = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
substructure = dm.to_smarts("CONN")
dm.lasso_highlight_image(target_molecule, substructure, (300, 300))
Limitations and What's Next
Although the lasso_highlight_image() function is highly valuable, it does have a few limitations. To enhance its capabilities and overcome these constraints, the following features could be incorporated:
- Add functionality to write to a file, similar to the
to_image()
function inDatamol/viz/viz.py
. - Allow the analysis of multiple target molecules at once.
- Canonicalize search molecules to prevent duplicate highlighting.
- Update the documentation in
Visualization.ipynb
.
If you're interested in contributing to the project, check out the Datamol website and the contribution guidelines. Alternatively, feel free to contact me from my social media on my personal website for any questions or feedback.
Top comments (0)