The datasets of face recognition contain an enormous number of identities and
instances. However, conventional methods have difficulty in reflecting the
entire distribution of the datasets because a mini-batch of small size contains
only a small portion of all identities. To overcome t