SAIR Structurally Augmented IC50 Repository
SandboxAQ unveils the Structurally Augmented IC50 Repository (SAIR), an open dataset of protein-ligand structures labelled by binding affinity.
Today, SandboxAQ, a business-to-business (B2B) organization at the nexus of quantum and artificial intelligence (AI), announced the introduction of SAIR (Structurally Augmented IC50 Repository). This historic release, which is hailed as the biggest and most comprehensive open dataset of protein-ligand pairs with annotated experimental potency data, has the potential to completely transform computational drug development. By using cutting-edge AI models, SAIR is intended to give researchers access to a never-before-seen resource that will greatly improve the speed and precision of binding affinity predictions.
The creation of SAIR directly addresses a crucial gap in AI-driven drug design, marking an important turning point in the application of AI to the biological sciences. Historically, a serious dearth of available data has hampered deep learning models that use 3D chemical structures for medication creation. Many AI techniques must rely on less direct data, such as sequences or 2D chemical structures, because very few protein-ligand complexes have both a resolved 3D structure and a measurable potency (such as IC50 or Ki values).
You can also read Quantum Korea 2025 Vision Is Based On 100 Years Of Progress
By offering a vast collection of computationally folded protein-ligand structures accompanied by comparable experimental affinity values, SAIR was developed expressly to get around this restriction. The main objective is to close this data gap and spur the creation of stronger and more precise machine learning models for predicting binding affinity.
SAIR has 5.2 million synthetic 3D molecular structures and more than one million distinct protein-ligand complexes, 1,048,857 of which are unique pairs. The Boltz-1x model was used to cofold the structures in this large dataset, which was selected from pre-existing databases like as ChEMBL and BindingDB. A total of 2.5 gigabytes of data are accessible to the public. These structures were created by utilising SandboxAQ’s sophisticated AI Large Quantitative Model (LQM) capabilities in conjunction with NVIDIA DGX Cloud, a potent development platform for AI training and fine-tuning. In order to provide the optimised computing infrastructure required for SAIR’s development, the partnership with NVIDIA was crucial, helping to double GPU utilisation and boost throughput across SandboxAQ’s scientific workloads.
The distinctive way that SAIR combines essential LQM skills with physics-based modelling allows for better generalisation, increased dependability, and increased application across a wide range of drug development tasks. By making the SAIR dataset openly accessible, SandboxAQ demonstrates the unmatched potential of its proprietary LQMs in addition to showcasing its distinct proficiency in quantitative AI for drug discovery.
Using knowledge and AI LQM capabilities with NVIDIA’s accelerated computation, it designed SAIR to make large-scale in silico protein-ligand binding affinity predictions, as stated. Nadia Harhen, General Manager of AI Simulation at SandboxAQ, underscored the importance of this accomplishment. The statement “This achievement marks a pivotal moment in drug discovery, demonstrating capacity to fundamentally transform the traditional trial-and-error process into a rapid, data-driven approach” was another way Harhen emphasised the revolutionary potential.
“It is giving every scientist the raw fuel to train breakthrough models overnight, setting a new pace for drug discovery,” she continued, adding that more than five million affinity-labeled protein-ligand structures had been made publicly available. SAIR transforms limited experimental data into a chance, and this release offers a sneak peek at the breadth and complexity of SandboxAQ’s LQM platform.
You can also read Spacetime Dimension Field: A New Approach to Quantum Gravity
Researchers may train sophisticated AI models that can precisely predict protein-ligand binding affinities with the help of the SAIR dataset, which is a thorough and excellent resource. These models can produce forecasts at least 1,000 times quicker than traditional physics-based techniques by utilising SAIR. Drug developers’ journey from original discovery to market is expected to be greatly accelerated by this tremendous acceleration, which will result in quicker therapeutic breakthroughs and better patient outcomes. The bioRxiv preprint “SAIR (Structurally Augmented IC50 Repository): Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset” contains more technical information about the dataset.
SandboxAQ’s quantitative artificial intelligence technology has already produced excellent results through strategic alliances with leading academic institutions and pharmaceutical companies. These partners include Riboscience, the Michael J. Fox Foundation, the Institute of Neurodegenerative Diseases at UCSF, and most recently, Stand Up To Cancer (SU2C). When compared to conventional techniques, the company’s large quantitative models routinely yield higher success rates, indicating a revolutionary breakthrough in speeding up medicinal development.
The SAIR dataset is freely accessible for non-commercial usage under the terms of the CC BY-NC-SA 4.0 license. After completing a brief form and sending it to SandboxAQ, commercial customers can also use the data for free. The dataset is currently available to researchers directly via SandboxAQ or through Google Cloud Platform.
SandboxAQ also invites researchers to reach out to them at [email protected] to collaborate on expanding SAIR or utilizing these unique models for their most challenging targets. More details on how to access and use the data will be available in a future webinar with the SandboxAQ team and a special guest from NVIDIA. In the future, SandboxAQ aims to release new datasets, AI models, and groundbreaking solutions that transform the entire drug development process.
About SandboxAQ
SandboxAQ is a business-to-business organization that specialises in providing solutions that combine quantum and artificial intelligence. Large Quantitative Models (LQMs) developed by the organization are intended to make significant advancements in a number of industries, such as financial services, navigation, and life sciences. Leading investors and strategic partners, including funds and accounts advised by T. Rowe Price Associates, Inc., Alger, IQT, US Innovative Technology Fund, S32, Paladin Capital, BNP Paribas, Eric Schmidt, Breyer Capital, Ray Dalio, Marc Benioff, Thomas Tull, and Yann LeCun, among others, helped SandboxAQ emerge as an independent, growth-backed business from Alphabet Inc.
You can also read What is Variational Quantum Eigensolver VQE, How VQE Works