FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset.

该研究解决了现有主动学习算法在面对大规模数据集时的可扩展性问题。通过提出一种近似算法，显著降低了存储和计算复杂度，并实现了在GPU上的并行实现。研究结果表明，该方法在多个数据集上的准确性与现有最先进算法相当，同时具备更好的扩展性。

一种可扩展的主动学习算法