⟨QuantumLLMInstruct⟩ A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing.

JHU


github.io Paper Code Hugging Face HuggingFace Space Demo Hugging Face HuggingFace Datasets

"QuantumLLMInstruct."

Dataset Creation Workflow

Comprehensive quantum computing instruction-pair domains spanning over 90 primary fields.

QuantumLLMInstruct (QLMMI) is a groundbreaking dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs tailored for quantum computing. Spanning 90 predefined domains and hundreds of subdomains, QLMMI empowers Large Language Models (LLMs) to tackle intricate quantum challenges, setting a new standard for quantum-focused datasets.

QLMMI leverages advanced reasoning techniques to address unique quantum computing tasks, including Hamiltonian simulations, circuit optimization, and algorithm design. These tasks demand high reasoning complexity, such as solving quantum models with Yang-Baxter equations or designing circuits for quantum algorithms. QLMMI integrates LLM-driven automation for scalable problem generation, solution development, and validation.

Features and Workflow

The creation of QLMMI follows a robust four-stage process:

  • Stage I: Problem Generation – Foundational problems are generated using carefully crafted templates, ensuring domain specificity and mathematical rigor. Tasks include synthetic Hamiltonians, QASM code generation, Jordan-Wigner transformations, and Trotter-Suzuki decompositions for models such as Ising and Heisenberg spin chains. Problems are enriched with metadata such as domain, difficulty level, and computational requirements.
  • Stage II: Solution Development – Structured prompts guide LLMs to generate detailed solutions in LaTeX format, ensuring clarity, correctness, and domain specificity. Models like Qwen-2-coder-instruct are optimized for producing rigorous, step-by-step derivations.
  • Stage III: Dataset Enrichment – Advanced reasoning frameworks like Chain-of-Thought (CoT) and Task-Oriented Reasoning and Action (ToRA) enhance the dataset's quality and diversity. These methods enable LLMs to solve complex tasks, including randomized circuit optimization and Gibbs state preparation.
  • Stage IV: Validation – A Judge LLM rigorously evaluates problem-solution pairs using self-critique techniques, ensuring high accuracy and reliability. This automated quality assurance minimizes human oversight.

Quantum Physics Domains

QLMMI spans a diverse range of 90 predefined quantum domains, including:

  • Synthetic Hamiltonians: Ground-state energy computations and time evolution.
  • Quantum Spin Chains: Models such as Ising, Heisenberg, and advanced integrable systems.
  • Yang-Baxter Solvability: Exploring solvability in quantum integrable models.
  • Trotter-Suzuki Decompositions: Simulating Hamiltonian dynamics efficiently.
  • Quantum Phase Estimation: A cornerstone of quantum algorithms.
  • Cluster State Verification: Critical in quantum error correction and communication.
  • Variational Quantum Eigensolvers (VQE): Solving for molecular energies and optimization tasks.
  • Randomized Circuit Optimization: Enhancing algorithm performance under noisy conditions.
  • Quantum Signal Processing: Applications in filtering and spectral analysis.
  • Quantum Thermodynamics: Gibbs state preparation and entropy calculations.

Beyond these predefined domains, LLMs demonstrate emergent capabilities by autonomously generating new quantum fields, showcasing their adaptability and scalability in problem-solving. This adaptability ensures QLMMI remains a dynamic and versatile resource.

Key Contributions

  • Comprehensive Dataset: Over 500,000 curated quantum problem-solution pairs enable fine-tuning LLMs for quantum-specific tasks.
  • Scalable Framework: A four-stage workflow combining templates, CoT, and ToRA ensures both quality and adaptability.
  • Innovative Infrastructure: Integration of tools like DuckDB and Gradio for efficient management and interactive exploration.
  • Open Access: Public datasets, generation tools, and training scripts foster collaboration in quantum computing research.

QLMMI bridges the gap between LLM advancements and quantum computing challenges, enabling scalable and precise solutions. With its comprehensive domain coverage, rigorous methodologies, and open-access design, QLMMI sets a new benchmark for instructional datasets in computational science.

Examples

Dataset Creation Workflow
  • QASM Generation: Generate quantum circuits and produce corresponding QASM code.
  • Quantum Hamiltonians: Analyze Hamiltonian time evolution and ground-state energy calculations.
  • Yang-Baxter Solvability: Determine solvability of quantum models using the Yang-Baxter equation.
  • Trotter-Suzuki Decomposition: Simulate Hamiltonians using Trotter-Suzuki decomposition methods.
  • Lindblad Dynamics: Model open quantum systems using the Lindblad equation.
  • Randomized Circuits Optimization: Optimize randomized quantum circuits to minimize error rates.
  • Quantum Phase Estimation: Implement quantum phase estimation for eigenvalue calculations.
  • Cluster States Verification: Prepare and verify cluster states for measurement-based quantum computation.
  • VQE Analysis: Construct and optimize Variational Quantum Eigensolvers (VQE) for molecular Hamiltonians.
  • Quantum Algorithm Development: Develop quantum algorithms for problems such as integer factorization and database search.
  • Entanglement and Quantum Information Theory: Explore properties and applications of entangled states.
  • Quantum Error Correction: Design quantum error correction codes to protect qubits from decoherence.
  • Semiclassical Quantum Simulation: Simulate quantum systems with semiclassical methods.
  • Quantum Communication Protocols: Develop protocols such as Quantum Key Distribution (QKD) and superdense coding.
  • Topological Quantum Computing: Study fault-tolerant computing using braiding operations of anyons.
  • Quantum Complexity Classes: Investigate computational problem classifications using quantum algorithms.
  • Quantum Thermodynamics: Analyze thermodynamic properties of quantum systems.
  • Interacting Quantum Systems: Study dynamics and correlations in interacting quantum systems.
  • Quantum Cryptography: Explore quantum cryptographic protocols like QKD.
  • Quantum Channels: Analyze the mathematical properties of quantum information transfer channels.
  • Quantum Fourier Transform: Explore the implementation and applications of quantum Fourier transforms.
  • Quantum Machine Learning: Apply quantum circuits to machine learning tasks such as classification.
  • Quantum State Tomography: Reconstruct quantum states using measurement data.
  • Bell Inequalities and Nonlocality: Test Bell inequalities and study quantum nonlocality.
  • Diagonalization of Two-Spin Hamiltonians: Solve eigenvalue problems for two-spin systems.
  • Energy Eigenvalues via Perturbation Theory: Compute eigenvalues using perturbation theory and diagonalization.
  • Measurement in Plus-Minus Basis: Analyze measurement probabilities in non-standard bases.
  • Pauli Spin Matrices Analysis: Explore properties and applications of Pauli matrices.
  • Born's Rule and State Measurement: Apply Born's rule to calculate quantum measurement probabilities.
  • PennyLane Quantum Circuits: Implement quantum circuits using the PennyLane framework.
  • PennyLane Circuit Analysis: Analyze PennyLane quantum circuits for functionality and structure.
  • Building Molecular Hamiltonians: Construct molecular Hamiltonians for quantum chemistry.
  • Variational Quantum Eigensolver (VQE): Optimize VQE methods for specific Hamiltonians.
  • Subspace Search-Quantum Variational Quantum Eigensolver (SSVQE): Find multiple eigenstates using SSVQE.
  • Variational Quantum State Diagonalization (VQSD): Diagonalize density matrices with VQSD techniques.
  • Gibbs State Preparation: Prepare Gibbs states for specific Hamiltonians.
  • The Classical Shadow of Unknown Quantum States: Approximate quantum state properties using classical shadows.
  • Estimation of Quantum State Properties Based on Classical Shadows: Estimate quantum state properties using shadow protocols.
  • Hamiltonian Simulation with Product Formula: Simulate Hamiltonians using Trotter product formulas.
  • Simulate the Spin Dynamics on a Heisenberg Chain: Model spin dynamics on a Heisenberg chain.
  • Distributed Variational Quantum Eigensolver Based on Schmidt Decomposition: Apply distributed VQE algorithms.
  • Quantum Signal Processing and Quantum Singular Value Transformation: Use quantum signal processing for operator transformations.
  • Hamiltonian Simulation with qDRIFT: Simulate Hamiltonians using the qDRIFT method.
  • Quantum Phase Processing: Apply quantum phase processing for signal amplification.
  • Variational Quantum Metrology: Optimize quantum sensing with variational quantum metrology.
  • Encoding Classical Data into Quantum States: Encode classical data into quantum states.
  • Quantum Classifier: Implement quantum classifiers for machine learning.
  • Variational Shadow Quantum Learning (VSQL): Approximate quantum properties with VSQL.
  • Quantum Kernel Methods: Develop kernel functions for quantum machine learning.
  • Quantum Autoencoder: Compress quantum data using quantum autoencoders.
  • Quantum GAN: Generate quantum states using quantum GANs.
  • Variational Quantum Singular Value Decomposition (VQSVD): Approximate singular value decompositions with VQSVD.
  • Data Encoding Analysis: Analyze data encoding methods for quantum computation.
  • Quantum Neural Network Approximating Functions: Approximate functions using quantum neural networks.
  • Variational Quantum Amplitude Estimation: Estimate quantum amplitudes variationally.
  • Quantum Approximation Optimization Algorithm (QAOA): Solve optimization problems with QAOA.
  • Solving Max-Cut Problem with QAOA: Apply QAOA to solve Max-Cut problems.
  • Large-Scale QAOA via Divide-and-Conquer: Scale QAOA for larger problem instances using divide-and-conquer techniques.
  • Travelling Salesman Problem: Solve the Travelling Salesman Problem using quantum algorithms.
  • Jordan-Wigner Transformations: Map spin models to fermionic systems using Jordan-Wigner transformations.
  • Bethe Ansatz Application: Solve the Heisenberg spin chain spectrum using the Bethe Ansatz.
  • Generalized Spin Chain Compression: Compress quantum circuits using Yang-Baxter equations in spin chain models.
  • Wave-Particle Duality: Explore the dual nature of particles and waves through phenomena like the photoelectric effect.
  • Uncertainty Principle: Analyze implications of Heisenberg's uncertainty principle.
  • Perturbation Theory: Examine corrections to energy levels using perturbation theory.
  • Angular Momentum: Investigate eigenstates and addition rules for angular momentum.
  • Hydrogen Atom: Study quantization of energy levels and transitions in the hydrogen atom.
  • Scattering Theory: Analyze quantum scattering and phase shifts using the Born approximation.
  • Quantum Tunneling: Investigate tunneling phenomena using the WKB approximation.
  • Entanglement: Examine quantum entanglement and its applications.
  • Time Evolution: Analyze quantum systems' time evolution using the Schrödinger equation.
  • Quantum Measurement: Explore wavefunction collapse and quantum measurement theory.
  • Quantum Harmonic Oscillator: Study energy eigenvalues and wavefunctions of the quantum harmonic oscillator.
  • Spin-Orbit Coupling: Examine spin-orbit interaction in atomic systems.
  • Quantum Zeno Effect: Explore repeated measurements' effects on quantum systems.
  • Quantum Gates: Construct and analyze quantum circuits with gates like Hadamard and CNOT.
  • Adiabatic Theorem: Examine systems' behavior under slowly varying potentials.
  • Bell Inequalities: Test Bell inequalities and analyze their implications.
  • Superposition Principle: Investigate the principle of superposition in quantum mechanics.
  • Quantum Decoherence: Analyze the loss of coherence in quantum systems.
  • Topological Quantum States: Explore the robustness of topological quantum states.
  • Quantum Cryptography: Study principles and protocols of quantum cryptography.
  • Quantum Eraser: Examine implications of the quantum eraser experiment.
  • Quantum Teleportation: Demonstrate quantum teleportation principles.
  • Path Integral Formulation: Utilize the path integral approach in quantum mechanics.
  • Quantum Annealing: Analyze quantum annealing for optimization problems.
  • Berry Phase: Study geometric phase in quantum systems.
  • Quantum Cloning: Explore the no-cloning theorem in quantum mechanics.
  • Density Matrix Formalism: Describe mixed states using density matrices.
  • Quantum Computation: Explore basic concepts of quantum computation.
  • Relativistic Quantum Mechanics: Solve problems involving the Klein-Gordon or Dirac equations.
  • Quantum Field Theory: Introduce concepts of quantum field theory.
CoT Reasoning for Hamiltonian Analytical Solution
Problem Statement
Solve the Hamiltonian \(H = X_1 + Z_1\) analytically by finding its matrix form and eigenvalues.
LLM Solution: Chain of Thought Reasoning
Step 1: Express Hamiltonian in matrix form.
The Pauli matrix \(X_1\) corresponds to \(\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\), and \(Z_1\) corresponds to \(\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\). Adding these two matrices gives the matrix representation of \(H\): \[ H = X_1 + Z_1 = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \] We now have the Hamiltonian in matrix form. Step 2: Set up the eigenvalue equation.
To find the eigenvalues of the Hamiltonian, we solve the equation \( H \vec{v} = \lambda \vec{v} \), where \(\lambda\) is the eigenvalue and \(\vec{v}\) is the eigenvector. This can be written as the determinant equation \( \det(H - \lambda I) = 0 \), where \(I\) is the identity matrix. Substituting \(H\) and \(I\), we get: \[ \det\begin{bmatrix} 1 - \lambda & 1 \\ 1 & -1 - \lambda \end{bmatrix} = 0 \] Explanation: We subtract \(\lambda\) times the identity matrix from \(H\) and set the determinant to zero to find the eigenvalues. Step 3: Solve the characteristic equation.
Expanding the determinant of the matrix gives the characteristic equation: \[ (1 - \lambda)(-1 - \lambda) - 1 = \lambda^2 - 1 = 0 \] Explanation: By expanding the determinant, we derive a quadratic equation in \(\lambda\), which represents the characteristic equation of the matrix. Step 4: Solve for the eigenvalues.
Solving the quadratic equation \(\lambda^2 - 1 = 0\) gives the eigenvalues: \[ \lambda = \pm 1 \] Explanation: The solutions to this simple quadratic equation are \(\lambda = 1\) and \(\lambda = -1\), which are the eigenvalues of the Hamiltonian. Step 5: Verify the eigenvalues.
To verify the solution, we substitute \(\lambda = 1\) and \(\lambda = -1\) back into the eigenvalue equation and check that the determinant condition holds. This confirms the correctness of the eigenvalues. Final result: The eigenvalues of the Hamiltonian are \([1, -1]\), and the matrix form of the Hamiltonian is: \[ H = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \]
Example of CoT reasoning for solving a Hamiltonian analytically using an LLM. Each step follows from the previous, guiding the reasoning process toward the correct solution.

Future Work

Our next steps involve fine-tuning models like Qwen-2.5-math family of models on the QLMMI dataset to evaluate their effectiveness in automating quantum circuit synthesis. We anticipate that QuantumLLMInstruct will significantly contribute to the development of efficient and precise quantum circuits.

BibTeX

@misc{sk2025QuantumLLMInstruct,
    title={QuantumLLMInstruct:  A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing.},
    author={Shlomo Kashani},
    year={2025},
    eprint={},
    archivePrefix={arXiv},
    primaryClass={}
}