Quantum Hybrid Modules for AI
Attention, Optimization, and Verification on Near-Term Quantum Hardware
Vikram Lex
Paper Summary
We propose Quantum Hybrid Modules (QHM), a framework for inserting quantum subroutines into classical AI pipelines with explicit assumptions about data loading, noise, and measurement overhead. The framework comprises three components: (1) a Quantum Self-Attention Neural Network (QSANN) computing sparse attention via Hadamard-test inner products in O(n√(n/k) · log d/ε) time assuming qRAM; (2) a QAOA-based module for combinatorial optimization sub-problems; and (3) a Grover-accelerated neurosymbolic verification loop for quadratically faster constraint checking. We evaluate QHM through 22 experiments spanning local statevector simulation and NVIDIA A100 GPU scaling up to 30 qubits, Azure and Braket cloud simulators/emulators, and five real-QPU experiments on Rigetti Cepheus and IonQ Forte, including 8–100-qubit superconducting QAOA and 8–30-qubit trapped-ion QAOA. Simulations confirm the intended mechanisms: QAOA exceeds the Goemans–Williamson reference value on selected 3-regular instances, ideal Hadamard-test attention reproduces classical attention, and Grover verification reaches a 20,862× query reduction at 30 qubits. Classical, dequantized, and qRAM-free baselines expose current limits: small quantum kernels and retrieval experiments lag tuned RBF/cosine baselines, hybrid VQC layers add training cost without consistent accuracy gains, and shot-based NISQ attention is limited by systematic inner-product bias. Hardware results are depth- and platform-dependent: shallow kernel and attention circuits retain measurable signal, Rigetti QAOA depolarizes to the random baseline and is not recovered by post-hoc mitigation, and IonQ Forte with default compiler-level mitigation preserves QAOA signal but remains below noiseless performance. We do not claim near-term quantum advantage; we characterize the gap between asymptotic theory and present-day hardware.
Three Quantum Modules
Each module targets a specific computational pattern where quantum subroutines offer asymptotic advantages under stated assumptions.
Quantum Self-Attention Neural Network
Computes attention via Hadamard-test inner products in O(n√(n/k) · log d/ε) time assuming qRAM. Statevector simulation achieves cosine similarity 1.000000 with classical attention at all scales (4–32 tokens).
Quantum Approximate Optimization Algorithm
Solves combinatorial optimization sub-problems embedded within AI decision-making pipelines. At depth p=3 on 3-regular graphs, approximation ratios reach 0.889–0.966, exceeding the Goemans–Williamson guarantee (0.878).
Grover-Accelerated Constraint Checking
Quadratically faster neurosymbolic verification for AI safety and alignment. At 22 qubits (4.2M states), reduces violations from 55% to 0.04% in 40 iterations with 665,762× fewer queries than classical search.
System Architecture

QHM architecture: a classical deep learning model (left) delegates computation to a quantum module (right) via amplitude encoding; a hybrid optimizer jointly updates both parameter sets.
Experiment Highlights
22 experiments across four tiers of quantum infrastructure, from local simulation to real QPU hardware.
Statevector Simulation
Local CPU simulation, 4–20 qubits
ZZ-feature map SVM: 0.617 accuracy (8q) vs Linear SVM 0.967. Honest negative result — small quantum kernels lack expressiveness.
At p=3: ratios 0.889–0.966, exceeding GW guarantee (0.878). At p=5: 0.908–0.983.
Cosine similarity 1.000000 at all scales (4–32 tokens). 1,000 shots suffice for >0.9999 fidelity.
O(√N) query scaling confirmed. 652× speedup at 20 qubits. Peak success 99.99% at optimal iterations.
Cloud Simulators & Emulators
Azure Quantum + Quantinuum H2-1E, 2–8 qubits
IonQ Simulator error <1%; Rigetti QVM 0.4–14%. Clear noise hierarchy across backends.
8q QAOA p=3 error 0.112 (IonQ), 0.010 (Rigetti). 6q kernel error 5×10⁻⁵.
QAOA 2–3% relative error; deeper circuits 4–15%. Clear noise hierarchy validated.
QPU Hardware
Rigetti Cepheus (superconducting) + IonQ Forte (trapped-ion)
43 QPU tasks. Edge-cut 0.489±0.009 — indistinguishable from random. Full depolarization at all scales.
10,240 shots. Systematic noise bias but reproducible (±0.04). Signal retained at 36 gates.
Ratios 0.558–0.615, 12–23% above random. Peak 0.783 at p=3 (140 gates), collapses at p=4.
Mean overlap 0.222 (3.6× random, 2.7× Cepheus). Within-class P(0)=0.89–0.93 vs cross-class 0.51.
GPU-Accelerated (NVIDIA A100)
Large-scale simulation, 10–30 qubits
7 encoders × 3 tasks. Dequantized baseline matches hybrid accuracy at 200× lower cost.
24-node graph, ratio 0.916 at p=4. Exceeds GW guarantee by +4.3%.
22q: violations 55%→0.04% in 40 iterations, 665,762× query reduction. 20,862× at 30q.
ZNE + readout correction on Cepheus noise model. No mitigation recovers thermalized circuits. Positive control confirms method works at moderate noise.
Selected Figures
Key results from simulation, cloud, and hardware experiments.

Data flow: classical input → amplitude encoding → quantum circuit (kernel / QAOA / Grover) → measurement → classical post-processing.

Hadamard test circuit for QSANN attention. The ancilla qubit mediates the inner product between query and key states.

Quantum-neurosymbolic verification loop: Grover-based oracle checks constraints in O(√(N/k)) queries per iteration.

QAOA p=1 on Rigetti Cepheus (8–100 qubits). All QPU runs cluster at ~0.49 (random), while noiseless simulator achieves 0.75–0.80.

IonQ Forte vs Rigetti Cepheus across graph sizes. IonQ preserves 12–23% above random; Cepheus thermalizes at every size.

GPU-accelerated Grover search scaling (10–30 qubits). Query speedup reaches 20,862× at 30 qubits (1.07B states, 17.2 GB VRAM).

Verification loop at 22 qubits (4.2M states). Violation rate drops from 55% to 0.04% with 665,762× fewer quantum queries than classical search.
Cite This Paper
@article{lex2025qhm,
title={Quantum Hybrid Modules for {AI}: Attention, Optimization, and
Verification on Near-Term Quantum Hardware},
author={Lex, Vikram},
year={2025},
doi={10.5281/zenodo.20103457},
note={Under review}
}