Meta Pseudo Labels (1)

Given a good teacher, the hope of Pseudo Labels is that the obtained $\theta_S^{\text{PL}}$ would ultimately achieve a low loss on labeled data, i.e.

$\mathbb{E}_{x_{l}, y_{l}}\left[\mathrm{CE}\left(y_{l}, S\left(x_{l}; \theta_{S}^{\mathrm{PL}}\right)\right)\right]:=\mathcal{L}_{l}\left(\theta_{S}^{\mathrm{PL}}\right)$

The ultimate student loss on labeled data $\mathcal{L}_{l}\left(\theta_{S}^{\mathrm{PL}}\left(\theta_{T}\right)\right)$ is also a "function" of $\theta_T$ .

Therefore, we could further optimize $\mathcal{L}_l$ with respect to $\theta_T$ :

$\begin{aligned} \min _{\theta_{T}} & \mathcal{L}_{l}\left(\theta_{S}^{\mathrm{PL}}\left(\theta_{T}\right)\right) \\ \text { where } & \theta_{S}^{\mathrm{PL}}\left(\theta_{T}\right)=\underset{\theta_{S}}{\operatorname{argmin}} \mathcal{L}_{u}\left(\theta_{T}, \theta_{S}\right) . \end{aligned}$

Dataset	Image Resolution	#-Labeled Examples	#-Unlabeled Examples	#-Test Set
CIFAR-10-4K	32x32	4,000	41,000	10,000
SVHN-1K	32x32	1,000	603,000	26,032
ImageNet-10%	224x224	128,000	1,280,000	50,000

Meta Pseudo Labels

Outline

Introduction

Pseudo Labels

Drawback of Pseudo Labels

Confirmation Bias