Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper,we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2% (4.4% absolute improvement), ISCX-VPN-Service to 98.9% (5.2% absolute improvement), Cross-Platform (Android) to 92.5% (5.4% absolute improvement), CSTNET-TLS 1.3 to 97.4% (10.0% absolute improvement). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.

本文提出了一种新颖的流量表示模型称为ET-BERT，该模型在大规模未标记的数据上对Deep contextualized datagram-level进行预训练，然后在少量特定任务的有标签数据上进行微调，取得了在五个加密流量分类任务上的最新成果，尤其是在ISCX-Tor任务上的99.2％的F1分数所达到的显着的提高是本文的重点，作者解释了预训练模型的强力原因，并分析了加密流量之间边界的分类能力，为未来的研究和应用提供了新的思路。

使用预训练的Transformer生成上下文感知的数据报文表示，用于加密流量分类：ET-BERT