Skip to content

Quantum Computing News

Latest quantum computing, quantum tech, and quantum industry news.

  • Tutorials
    • Rust
    • Python
    • Quantum Computing
    • PHP
    • Cloud Computing
    • CSS3
    • IoT
    • Machine Learning
    • HTML5
    • Data Science
    • NLP
    • Java Script
    • C Language
  • Imp Links
    • Onlineexams
    • Code Minifier
    • Free Online Compilers
    • Maths2HTML
    • Prompt Generator Tool
  • Calculators
    • IP&Network Tools
    • Domain Tools
    • SEO Tools
    • Health&Fitness
    • Maths Solutions
    • Image & File tools
    • AI Tools
    • Developer Tools
    • Fun Tools
  • News
    • Quantum Computer News
    • Graphic Cards
    • Processors
  1. Home
  2. Quantum Computing
  3. Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize
Quantum Computing

Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize

Posted on October 17, 2025 by Agarapu Naveen5 min read
Thompson Sampling Via Fine-Tuning LLM for Bayesian Optimize

Thompson Sampling Via Fine-Tuning (ToSFiT) of LLMs Achieves Scalable Bayesian Optimization in Complex Discrete Spaces

Thompson Sampling via Fine-Tuning (ToSFiT), a major breakthrough in optimization algorithms, has been revealed by researchers from ETH Zürich and IBM Research, Zurich. This innovative strategy overcomes the difficulty of searching huge and complex search spaces, where conventional gradient-based approaches usually fall short, by utilizing the power of large language models (LLMs). ToSFiT presents a scalable approach to Bayesian optimization (BO) that effectively circumvents the computationally costly acquisition function maximization procedure.

The novel method achieves high theoretical performance guarantees while significantly increasing efficiency in real-world applications by gradually modifying LLMs to reflect the growing understanding of the search area. Together with Abbas Rahimi from IBM Research, Zurich, the team behind this work consists of Nicolas Menet, Aleksandar Terzić, and Andreas Krause from ETH Zürich.

You can also read SPDC Quantum: Spontaneous Parametric Down Conversion news

Overcoming the Optimization Hurdle in Discrete Domains

When reward function evaluations are expensive or time-consuming, Bayesian optimization is a crucial algorithmic framework for automated discovery and large-scale experimental design. BO employs this statistical model to direct the search for promising configurations while maintaining a posterior distribution over unknown rewards. Traditionally, the process of choosing new candidates involves optimizing an acquisition function that strikes a compromise between exploitation (improving on current solutions) and exploration (testing out new options).

Thompson sampling (TS) is unique among acquisition procedures because to its robust empirical performance and cutting-edge convergence guarantees. In order to treat the realization as an acquisition function, TS usually draws a reward function realization from the posterior and chooses the point that maximizes it.

However, because effective search is impossible in huge unstructured discrete domains like the space of amino acid sequences or correct code for quantum circuits due to the lack of gradients, this maximization stage poses a fundamental issue. An exhaustive search is impossible because, for example, a protein search space with 20 amino acids and a maximum sequence length of 100 already surpasses the number of atoms in the observable universe. In these combinatorial spaces, conventional gradient-based techniques are intractable and frequently necessitate iteration over every point.

You can also read Universal Gröbner Bases Enable Next-Gen Post-Quantum World

ToSFiT: LLMs as Generative Optimizers

The researchers created ToSFiT in order to scale BO to these complicated, high-dimensional areas. ToSFiT uses a generative LLM to parameterize the probability of minimality (PoM), or the likelihood that a candidate solution is optimal, directly rather than maximizing an acquisition function. By treating the resulting proposals as Thompson samples, costly acquisition function maximization is avoided.
The Variational Bayesian Optimistic Sampling (VBOS) paradigm serves as the foundation for ToSFiT. Importantly, ToSFiT begins the optimization process with a pre-trained language model that has been prompt-conditioned. This gives it a solid prior knowledge basis, which speeds up learning. Online fine-tuning is the method by which it carefully adjusts the model parameters towards the posterior PoM utilising the VBOS objective.

In order to compute the reward posterior in closed form and enable conditioning on observations, the researchers used linear kernels over learnt features to implement scalable Gaussian process (GP) inference. This indicates that the memory and computational complexity scale in Θ(dim(H) 2) rather than the number of previous observations.

Reinforcement learning techniques, notably the Reinforce Leave-One-Out (RLOO) baseline, were used to stabilize the gradient estimation needed for fine-tuning the LLM. Group Relative Policy Optimization’s (GRPO) advantage function is technically identical to standardized RLOO.

You can also read Meson-Antimeson Mixing Studies CP Violation in Standard Model

Theoretical Guarantees and Policy Initialization

The study offers substantial theoretical support for ToSFiT. In order to show that the cumulative regret scales with the maximal information gain (γT) rather than the size of the search space (∣X∣), the researchers developed a novel regret restriction for a variational formulation of Thompson Sampling. This significantly outperforms earlier constraints for precise VBOS, which scaled as O ~ ( T∣X∣), a constraint in combinatorically huge domains that is vacuous. In d dimensions, this new bound scales nicely as O(dlogT) for a linear kernel.

This theoretical approach emphasizes how important careful adaptation is. The approximation error between the precise VBOS maximiser (πt) and the sampling policy (~t) It runs the risk of overpowering the total remorse. To address this, it is crucial to initialise ToSFiT through pre-training and context, which guarantees that the policy begins in the appropriate area of the probability simplex. A robust initial policy produces significantly superior performance, according to empirical research, and careful adaptation (using low learning rates) is necessary to preserve this prior knowledge and prevent performance stagnation.

Validation Across Diverse Tasks

ToSFiT’s efficacy in sample efficiency with minimal impact on computing cost was confirmed by empirical validation across three very different search issues.

  1. FAQ Response Refinement: Using a Qwen3-1.7B model, this natural language challenge optimizes content according to semantic alignment to an unknown ground-truth response.
  2. Thermally Stable Protein Search: Creating amino acid sequences that optimize thermal stability a crucial characteristic for medication development is the challenge in this field. Sequences were sampled using ProtGPT2, and the search space is exponentially big.
  3. Quantum Circuit Design: This calls for employing a Qwen2.5-Coder-1.5B model to navigate a large, discrete space of legitimate quantum programs in order to create Qiskit circuits that prepare low-energy quantum states in unknown contexts.

Because Unguided Generation does not use feedback, it rapidly reaches an unsatisfactory reward level in all experimental conditions. Although Post-Generation TS, a traditional BO technique over a predetermined subset of candidates, finds effective solutions quickly, it is limited to its starting pool and saturates too soon. ToSFiT, on the other hand, performs BO throughout the whole solution space and constantly finds candidates with larger rewards. Additionally, it demonstrated better exploration efficiency through optimism in the face of uncertainty, outperforming baselines such as Actor Critic and Soft Actor Critic.

Additionally, Thompson sampling is ideal for batched optimization since it naturally produces a variety of candidates. This ability is demonstrated by ToSFiT, which shows that batching greatly increases iteration efficiency and reaches target performance in fewer rounds, even while it somewhat decreases sample efficiency. This is crucial when observations are time-consuming or delayed.

The results validate that complex, discrete search problems can be solved by combining principled Bayesian optimization with strong foundation models. To further lower computing cost, future work will try to incorporate jointly learnt task-adaptive embeddings, investigate more expressive reward models like Bayesian neural networks, or limit updates to just a subset of the generative model.

You can also read Quantum Enhanced Markov Chain Monte Carlo MCMC Method

Tags

ETH ZürichFine tuningFine tuning llmFine-TuningLLM fine tuningSampling thompsonThompson sampling (TS)Thompson Sampling Via Fine-TuningToSFiT

Written by

Agarapu Naveen

Naveen is a technology journalist and editorial contributor focusing on quantum computing, cloud infrastructure, AI systems, and enterprise innovation. As an editor at Govindhtech Solutions, he specializes in analyzing breakthrough research, emerging startups, and global technology trends. His writing emphasizes the practical impact of advanced technologies on industries such as healthcare, finance, cybersecurity, and manufacturing. Naveen is committed to delivering informative and future-oriented content that bridges scientific research with industry transformation.

Post navigation

Previous: NMI-Q Initiative Unites G7 & Australia for Quantum Standards
Next: Researchers Investigate Rydberg Atoms QRC For AI Systems

Keep reading

Infleqtion at Canaccord Genuity Conference Quantum Symposium

Infleqtion at Canaccord Genuity Conference Quantum Symposium

4 min read
Quantum Heat Engine Built Using Superconducting Circuits

Quantum Heat Engine Built Using Superconducting Circuits

4 min read
Relativity and Decoherence of Spacetime Superpositions

Relativity and Decoherence of Spacetime Superpositions

4 min read

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Infleqtion at Canaccord Genuity Conference Quantum Symposium Infleqtion at Canaccord Genuity Conference Quantum Symposium May 17, 2026
  • Quantum Heat Engine Built Using Superconducting Circuits Quantum Heat Engine Built Using Superconducting Circuits May 17, 2026
  • Relativity and Decoherence of Spacetime Superpositions Relativity and Decoherence of Spacetime Superpositions May 17, 2026
  • KZM Kibble Zurek Mechanism & Quantum Criticality Separation KZM Kibble Zurek Mechanism & Quantum Criticality Separation May 17, 2026
  • QuSecure Named 2026 MIT Sloan CIO Symposium Innovation QuSecure Named 2026 MIT Sloan CIO Symposium Innovation May 17, 2026
  • Nord Quantique Hire Tammy Furlong As Chief Financial Officer Nord Quantique Hire Tammy Furlong As Chief Financial Officer May 16, 2026
  • VGQEC Helps Quantum Computers Learn Their Own Noise Patterns VGQEC Helps Quantum Computers Learn Their Own Noise Patterns May 16, 2026
  • Quantum Cyber Launches Quantum-Cyber.AI Defense Platform Quantum Cyber Launches Quantum-Cyber.AI Defense Platform May 16, 2026
  • Illinois Wesleyan University News on Fisher Quantum Center Illinois Wesleyan University News on Fisher Quantum Center May 16, 2026
View all
  • NSF Launches $1.5B X-Labs to Drive Future Technologies NSF Launches $1.5B X-Labs to Drive Future Technologies May 16, 2026
  • IQM and Real Asset Acquisition Corp. Plan $1.8B SPAC Deal IQM and Real Asset Acquisition Corp. Plan $1.8B SPAC Deal May 16, 2026
  • Infleqtion Q1 Financial Results and Quantum Growth Outlook Infleqtion Q1 Financial Results and Quantum Growth Outlook May 15, 2026
  • Xanadu First Quarter Financial Results & Business Milestones Xanadu First Quarter Financial Results & Business Milestones May 15, 2026
  • Santander Launches The Quantum AI Leap Innovation Challenge Santander Launches The Quantum AI Leap Innovation Challenge May 15, 2026
  • CSUSM Launches Quantum STEM Education With National Funding CSUSM Launches Quantum STEM Education With National Funding May 14, 2026
  • NVision Quantum Raises $55M to Transform Drug Discovery NVision Quantum Raises $55M to Transform Drug Discovery May 14, 2026
  • Photonics Inc News 2026 Raises $200M for Quantum Computing Photonics Inc News 2026 Raises $200M for Quantum Computing May 13, 2026
  • D-Wave Quantum Financial Results 2026 Show Strong Growth D-Wave Quantum Financial Results 2026 Show Strong Growth May 13, 2026
View all

Search

Latest Posts

  • Infleqtion at Canaccord Genuity Conference Quantum Symposium May 17, 2026
  • Quantum Heat Engine Built Using Superconducting Circuits May 17, 2026
  • Relativity and Decoherence of Spacetime Superpositions May 17, 2026
  • KZM Kibble Zurek Mechanism & Quantum Criticality Separation May 17, 2026
  • QuSecure Named 2026 MIT Sloan CIO Symposium Innovation May 17, 2026

Tutorials

  • Quantum Computing
  • IoT
  • Machine Learning
  • PostgreSql
  • BlockChain
  • Kubernettes

Calculators

  • AI-Tools
  • IP Tools
  • Domain Tools
  • SEO Tools
  • Developer Tools
  • Image & File Tools

Imp Links

  • Free Online Compilers
  • Code Minifier
  • Maths2HTML
  • Online Exams
  • Youtube Trend
  • Processor News
© 2026 Quantum Computing News. All rights reserved.
Back to top