Recently, researchers have achieved significant results in the skeleton-based action recognition. To better model the skeleton sequences, we drive the encoder to learn more discriminative representations in the self-supervised setting. We find that instead of clustering feature vectors to assign pseudo labels for samples as in DeepCluster, ranking them is a more reasonable, reliable, and efficient way to learn more effective feature representations. With this intuition, we propose a novel self-supervised learning framework, DeepRank. Specifically, we rank triplets of skeleton sequences with the ranking labels, obtained from the relative distances among them. Besides, to deeply mine complementary discriminative information that exists in different modalities of skeleton sequences, we further propose Multi-View DeepRank (MV-DeepRank) to enable encoders to comprehensively learn complementary features from multiple modalities. Extensive experimental results on the NTU RGB+D, NTU RGB+D 120, PKU-MMD I, and PKU-MMD II datasets under various evaluation settings demonstrate the generality, transferability, and superiority of our proposed self-supervised learning frameworks. Notably, our frameworks surpass the previous methods that employ the same backbone networks as ours by at least 1.8% (ST-GCN) and 2.1% (STTFormer) under the finetuning setting. Additionally, DeepRank gains a significant advantage on computational complexities, O(1), over the contrastive learning-based methods, O(batchsize), and the clustering-based methods, O(numberofclusters).