redis源码阅读(三):数据结构之Dict

博主： jettcc
发布时间：2022 年 08 月 11 日
1936 次浏览
6 条评论
16969字数
分类：技术相关

前言

如果您能看到我的blog并且感到有帮助, 我倍感荣幸

我们知道, Redis是一个键值型（Key-Value Pair）的数据库, 我们可以根据键实现快速的增删改查。而Dict数据结构在其中更是起到了非常非常重要的作用, 我们甚至可以认为, 整个redis其实就是一个巨大的Dict。
而键与值的映射关系正是通过Dict来实现的。

这次文章内容比较多, 所以我很多内容都在注释里 一定要看注释呀！

dict 的定义

dict 数据结构的实现

Dict的实现在7.0进行了一次大改，首先是取消了dictht的结构，转而用数组实现dictht，因为我喜欢研究新的代码, 就懒得去分析6.0的代码了，见谅

Redis 的字典的结构定义主要分为：dict和 dictEntry, 源码:

要注意看注释

/*
7.0：
*/
typedef struct dict {
    dictType *type;     // 字典类型，会跟 hash 函数等方法的具体实现有关

    dictEntry **ht_table[2];        // 两个哈希表数组
    unsigned long ht_used[2]; // 分别表示哈希表数组中各自已经存放键值对的个数
    long rehashidx;     // 代表rehash到了什么位置, rehashidx==-1 代表未进行rehash

    /* Keep small vars at end for optimal (minimal) struct padding */
    int16_t pauserehash; /* 表示 rehash 的状态，大于0时表示 rehash 暂停了，小于0表示出错了 */
    signed char ht_size_exp[2]; /* 表示两个哈希表数组的大小，通过 1 << ht_size_exp[0/1] 来计算 */
} dict;

/* 哈希表节点，单个 Node */
typedef struct dictEntry {
    void *key;              // key, 存储哈希表的 key
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;                    // value, 存储哈希表的 value
    struct dictEntry *next; // 单链表结构，指向下一个节点，用于解决哈希冲突
    void *metadata[];  // metadata 是一块任意长度的数据，具体的长度由 dictType 中的 dictEntryMetadataBytes() 返回，作用相当于 privdata(redis 6.0 用于指向任意数据的指针)
} dictEntry;

需要留意的一些地方

dict数组里有一行注释:/* Keep small vars at end for optimal (minimal) struct padding */ , 将小变量放在结构体的后面, 为了最佳或最小的填充,即节省空间。
在dict结构中的**ht_table[2]可以看出这是一个节点数组，由此可知在redis中ht的结构是数组+链表。也就是我们常说的拉链法解决冲突。
那为什么要有两个哈希表？
1. 因为redis是单进程单线程模型，但它既要支撑一个大容量，还要保持高性能的读写性能, 所以如果dict也像Java的hashmap一样扩容, 在大数据量时就会导致长时间的阻塞, 所以redis的dict是由两个哈希表+渐进式rehash的方式来实现扩容机制的。由此实现不阻塞读写，扩容又比较平滑的操作。
2. 字典的数据通常都是在ht[0]进行的。当字典判断需要扩容的时候，就会停止对ht[0]进行写操作，然后先给ht[1]赋予一个有ht[0] 2倍大小的新哈希表, 再将所有写操作指向ht[1], 此时哈希表扩容完成, 随后进入rehash阶段，即开始渐进式数据迁移。
3. 在rehash的过程中，写操作全部在h[1]执行，但读操作会在ht[0]、h[1]上都执行，直到ht[0]的所有数据迁移到ht[1]后, 会将h[0]的指针指向h[1], 释放掉h[0], 并完成整个扩容和rehash操作。
4. 所以我们可以简单的总结出两个哈希表分别承担的角色是：
  - ht[0]是主要的数据存储表, 正常情况下对外提供读写。
  - ht[1]作为扩容时使用的临时表，保证扩容机制平滑进行。

后面在讲扩容函数会再详细聊聊这个

他们之间的关系如下:
dict之间关系

字典类型 DcitType

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key); // 哈希函数（hashcode方法）
    void *(*keyDup)(dict *d, const void *key); // key的copy函数
    void *(*valDup)(dict *d, const void *obj); // val的copy函数
    int (*keyCompare)(dict *d, const void *key1, const void *key2); // key的比较函数
    void (*keyDestructor)(dict *d, void *key); // key的销毁函数
    void (*valDestructor)(dict *d, void *obj); // val的销毁函数
    int (*expandAllowed)(size_t moreMem, double usedRatio); // 是否允许扩容

    /* 允许 dictEntry 携带额外的调用者定义的元数据。
     * 这段额外信息的内存会在条目分配时被零初始化. */
    size_t (*dictEntryMetadataBytes)(dict *d);
} dictType;

这里很好理解，就是为了实现多态

dictIterator 迭代器

就像Java和C++里的Map, Redis也提供了dict的迭代器, 代码如下：

typedef struct dictIterator {
    dict *d; // 指向当前迭代的字典
    long index; // 表示指向的键值对 的索引
    int table, safe; // table = 哈希表的号码(ht[0] / ht[1])
    // safe 表示该迭代器是否安全。安全时可以掉用 dictAdd,dictFind等等其他函数，不安全时只能调用 dictNext
    dictEntry *entry, *nextEntry;
    // entry 指向迭代器所指的键值对，nextEntry 指向下一个键值对
    /* unsafe iterator fingerprint for misuse detection. */
    unsigned long long fingerprint;
    // fingerprint 指纹? 用于检查不安全迭代器的误用? 这里有点没懂，希望有大佬补充
} dictIterator;

一些常量和宏定义

初始化定义:

/* This is the initial size of every hash table */
#define DICT_HT_INITIAL_EXP      2
#define DICT_HT_INITIAL_SIZE     (1<<(DICT_HT_INITIAL_EXP))

这里...有点难绷，实际上就是哈希表的初始大小为1 << 2, 可能是为了方便修改

set

下面是设置值的宏

setkey

设置key的值

#define dictSetKey(d, entry, _key_) do { \
    if ((d)->type->keyDup) \
        (entry)->key = (d)->type->keyDup((d), _key_); \
    else \
        (entry)->key = (_key_); \
} while(0)

在dictType中如果有设置key的方法，就调用方法赋值，否则的话就直接赋值。

setval

设置value的值

#define dictSetVal(d, entry, _val_) do { \
    if ((d)->type->valDup) \
        (entry)->v.val = (d)->type->valDup((d), _val_); \
    else \
        (entry)->v.val = (_val_); \
} while(0)

几乎一样的代码，在dictType中如果有设置value的方法，就调用方法赋值，否则的话就直接赋值。

free 释放

下面是释放key、value的宏

free key

#define dictFreeKey(d, entry) \
    if ((d)->type->keyDestructor) \
        (d)->type->keyDestructor((d), (entry)->key)

free value

#define dictFreeVal(d, entry) \
    if ((d)->type->valDestructor) \
        (d)->type->valDestructor((d), (entry)->v.val)

代码原理: 如果在dictType中有提供keyDestructor / valDestructor函数, 就调用函数释放key / value，但是我这里没有看到没有提供函数时的释放方式，这里应该是默认都有设置的。

比较

#define dictCompareKeys(d, key1, key2) \
    (((d)->type->keyCompare) ? \
        (d)->type->keyCompare((d), key1, key2) : \
        (key1) == (key2))

代码原理: dictType的keyCompare函数指针为空就用==来比较，不然就调用keyCompare函数来比较。

其他的一些宏

没必要细说，但是也是比较有意义的内容，就不单独讲了

获取 key 和 val

// 获取 key 的哈希值
#define dictHashKey(d, key) (d)->type->hashFunction(key) 
// 相当于get方法
#define dictGetKey(he) ((he)->key)
#define dictGetVal(he) ((he)->v.val)
// 不同的类型
#define dictGetSignedIntegerVal(he) ((he)->v.s64)
#define dictGetUnsignedIntegerVal(he) ((he)->v.u64)
#define dictGetDoubleVal(he) ((he)->v.d)

哈希表大小

// 获取哈希表的大小
#define DICTHT_SIZE(exp) ((exp) == -1 ? 0 : (unsigned long)1<<(exp))
// 哈希表的掩码(size - 1)
#define DICTHT_SIZE_MASK(exp) ((exp) == -1 ? 0 : (DICTHT_SIZE(exp))-1)

获取字典保存键值对的总数

#define dictSize(d) ((d)->ht_used[0]+(d)->ht_used[1])

rehash 的宏

可以留意一下, rehash是个重点内容

// 判断当前是否在 rehash
#define dictIsRehashing(d) ((d)->rehashidx != -1)
// 暂停 rehash
#define dictPauseRehashing(d) (d)->pauserehash++
// 恢复 rehash
#define dictResumeRehashing(d) (d)->pauserehash--

dict的方法

创建Dict

在dict.c中创建字典的方法是dictCreate

源码如下:

/* Create a new hash table */
dict *dictCreate(dictType *type)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type);
    return d;
}

可以看到在其中首先为字典结构分配了空间，随后调用了_dictInit，源码如下：

/* Initialize the hash table */
int _dictInit(dict *d, dictType *type)
{
    _dictReset(d, 0);
    _dictReset(d, 1);
    d->type = type;
    d->rehashidx = -1;
    d->pauserehash = 0;
    return DICT_OK;
}


/* Reset hash table parameters already initialized with _dictInit()*/
static void _dictReset(dict *d, int htidx)
{
    d->ht_table[htidx] = NULL;
    d->ht_size_exp[htidx] = -1;
    d->ht_used[htidx] = 0;
}

可以看到init函数里面主要是通过_dictReset函数来设置参数

没有什么特别关键的内容，就是前面提到的，最开始的时候用的是ht[0]

注意在_dictInit函数的返回值是一个宏，用来标记操作是否完成，定义如下：

#define DICT_OK 0
#define DICT_ERR 1

修改Dict

修改dict一般是扩容或者缩容。

dictResize 缩容

在字典中，缩容函数是dictResize，源码如下：

/* Resize the table to the minimal size that contains all the elements,
 * but with the invariant of a USED/BUCKETS ratio near to <= 1 */
int dictResize(dict *d)
{
    unsigned long minimal;

    if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d->ht_used[0];
    if (minimal < DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal);
}

可以看到缩容的条件就是 dict_can_resize!=0和dictIsRehashing(d)==false，后者函数判断的是当前是否正在进行rehash，如果正在进行肯定不能缩容，这很好理解。

而前者是dict.c中定义的一个全局变量

static int dict_can_resize = 1;

作用是指示resize能否进行

那，在什么情况下它会变成0呢？
答：在系统运行有后台进程时，不允许自动自动调整大小，这是为了为了使得类linux系统的copy-on-write有更好的性能（没有调整大小，就没有rehash，这样父进程的db没有改变，子进程就不需要真的copy）。在后台子进程退出后，又会允许resize。这里说到的后台子进程主要跟redis的持久化机制有关。在后台持久化时，为了防止自然rehash，会把它设置为0，减少程序对内存的碰撞，当持久化任务完成后，又会被设回1.
以上文字来自petermao的club（我找了好久才找到答案...

回到源码可以看到，它先通过ht_used[0]获取已有元素，然后在dictExpand中进行扩容，源码如下:

/* return DICT_ERR if expand was not performed */
int dictExpand(dict *d, unsigned long size) {
    return _dictExpand(d, size, NULL);
}

/* Expand or create the hash table,
 * when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).
 * Returns DICT_OK if expand was performed, and DICT_ERR if skipped. */
int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
    if (malloc_failed) *malloc_failed = 0;

    // 如果正在rehash或者ht_used[0]长度大于传入长度，说明错误了
    if (dictIsRehashing(d) || d->ht_used[0] > size)
        return DICT_ERR;

    // 建立新的表并申请内存
    dictEntry **new_ht_table;
    unsigned long new_ht_used;
    signed char new_ht_size_exp = _dictNextExp(size);

    // 新的长度 
    size_t newsize = 1ul<<new_ht_size_exp;
    if (newsize < size || newsize * sizeof(dictEntry*) < newsize)
        return DICT_ERR;

    // 相同的长度的话就没意义了
    if (new_ht_size_exp == d->ht_size_exp[0]) return DICT_ERR;

    // 为新的哈希表申请内存，并且把所有的指针都指向null
    if (malloc_failed) {
        new_ht_table = ztrycalloc(newsize*sizeof(dictEntry*));
        *malloc_failed = new_ht_table == NULL;
        if (*malloc_failed)
            return DICT_ERR;
    } else
        new_ht_table = zcalloc(newsize*sizeof(dictEntry*));

    new_ht_used = 0;

    /* 这是第一次初始化吗？如果是，那它就不算是真正的rehash
     * 我们刚刚设置了第一个哈希表，以便它可以接受键。 */
    if (d->ht_table[0] == NULL) {
        d->ht_size_exp[0] = new_ht_size_exp;
        d->ht_used[0] = new_ht_used;
        d->ht_table[0] = new_ht_table;
        return DICT_OK;
    }

    /* 准备第二个哈希表用于增量重散列 */
    d->ht_size_exp[1] = new_ht_size_exp;
    d->ht_used[1] = new_ht_used;
    d->ht_table[1] = new_ht_table;
    d->rehashidx = 0;
    return DICT_OK;
}

解析：在_dictExpand函数中，首先会把新长度设置为刚好大于等于原有长度的2次幂。然后重新申请新内存并且初始化。然后会判断是否是第一次哈希，如果，是那其实这也只算是一次初始化而已，也就是说:

_dictExpand 函数不是只为了 rehash，还可以初始化字典。 _dictExpand函数判断是否是 rehash是通过判断 ht_table[0] 是否为空来判断的。
换句话说：如果调用_dictExpand 的是非空字典字典，则增容后的哈希表就是在 ht_table[1] 中，此时需要手动释放 ht_table[0]，并将 ht_table[1] 放到 ht_table[0] 位置上（指针指过去）。

Rehash 扩容（重点）

先看看源码：

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht_used[0] != 0) {
        dictEntry *de, *nextde;
      
          /* 这里的rehashidx不会溢出，因为ht[0].used != 0, 所以我们确定这里还有很多元素*/
        assert(DICTHT_SIZE(d->ht_size_exp[0]) > (unsigned long)d->rehashidx);
        while(d->ht_table[0][d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht_table[0][d->rehashidx];
        /* 将这个桶中的所有键从旧的哈希 table 移动到新的哈希 table */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* 获取新哈希表中的索引 */
            h = dictHashKey(d, de->key) & DICTHT_SIZE_MASK(d->ht_size_exp[1]);
            de->next = d->ht_table[1][h];
            d->ht_table[1][h] = de;
            d->ht_used[0]--;
            d->ht_used[1]++;
            de = nextde;
        }
        d->ht_table[0][d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* 检查我们是否已经重新散列了整个表。 */
    if (d->ht_used[0] == 0) {
        zfree(d->ht_table[0]);
        /* 将新的 ht 复制到旧的 ht 上, 就是指针指向变动 */
        d->ht_table[0] = d->ht_table[1];
        d->ht_used[0] = d->ht_used[1];
        d->ht_size_exp[0] = d->ht_size_exp[1];
        _dictReset(d, 1);
        d->rehashidx = -1;
        return 0;
    }

    /*
    每次 rehash 都是以 n 个桶为单位的，将每个桶上的链都移到新的哈希表上。一次 rehash 完成以后，如果之后还有桶要 rehash，则返回1，如果 rehash 完成，则返回0。
    但是实际上，rehashidx 指向的桶可能是空桶，所以为了效率，一次 rehash 最多要遍历 10n 个空桶，遍历完了 10 n 个空桶就会返回。
    */
    return 1;
}

rehash扩容分为两种：

自然扩容，条件是used_size / all_size >= 1 && dict_can_resize = 1
强制扩容，条件是used_size / all_size > dict_force_resize_ratio，而

static unsigned int dict_force_resize_ratio = 5;

Rehash 不能一步完成，否则数据量特别大时会导致服务器暂停服务来迁移数据。而渐进式的好处就是数据的迁移发生在对数据的增删改查时，这样就将迁移平摊在每个增删改查的操作上。

Rehash的详细步骤如下：

为 ht[1] 分配空间，让字典同时持有 ht[0] 和 ht[1] 两个哈希表。
所有的写入、更新操作全部指向h[1]，只有读取操作继续在h[0]执行（或者说两个都会查）
rehash都是以n个桶为单位的，将每个桶上的链都移到新的哈希表上。一次 rehash 完成以后，如果之后还有桶要 rehash，则返回1，如果 rehash 完成，则返回0。但是实际上，rehashidx 指向的桶可能是空桶，所以为了效率，一次 rehash 最多要遍历 10n 个空桶，当空桶数量达到10的时候就会返回，等待下一次rehash。
随着字典操作的不断执行，最终在某个时间点上， ht[0] 的所有键值对都会被rehash 至 ht[1] ，这时程序将 rehashidx 属性的值设为 -1 ，表示 rehash 操作已完成。

释放Dict

dict中负责释放的函数是dictRelease

其中在dictRelease中调用了两次_dictClear，用于清空两个哈希表，清空完成后才释放dict，两个方法的源码如下

/* Clear & Release the hash table */
void dictRelease(dict *d)
{
    _dictClear(d,0,NULL);
    _dictClear(d,1,NULL);
    zfree(d);
}

/* Destroy an entire dictionary */
int _dictClear(dict *d, int htidx, void(callback)(dict*)) {
    unsigned long i;

    /* Free all the elements */
    for (i = 0; i < DICTHT_SIZE(d->ht_size_exp[htidx]) && d->ht_used[htidx] > 0; i++) {
        dictEntry *he, *nextHe;

        if (callback && (i & 65535) == 0) callback(d);

        if ((he = d->ht_table[htidx][i]) == NULL) continue;
        while(he) {
            nextHe = he->next;
            dictFreeKey(d, he);
            dictFreeVal(d, he);
            zfree(he);
            d->ht_used[htidx]--;
            he = nextHe;
        }
    }
    /* Free the table and the allocated cache structure */
    zfree(d->ht_table[htidx]);
    /* Re-initialize the table */
    _dictReset(d, htidx);
    return DICT_OK; /* never fails */
}

_dictClear会遍历所有的索引，当该索引上有键值对时，则释放该索引上的整条链表。释放完所有键值对之后，再释放该哈希表并重置其成员。

HashTable中的增删查改

键值对的添加

hashtable中实现键值对添加的函数是dictAdd，源码如下

/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

根据代码我们可以看到它调用了dictAddRaw,源码如下：

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    int htidx;

    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* 获得新元素的索引， 如果是 -1，说明已经存在了
        dictKeyIndex这个函数作用时返回key在hash表中的索引，如果==-1就代表存在，
          要注意，如果当前正在rehash，则返回的索引就一直会是第二张哈希表的索引 。
      */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

    /* 
    分配内存并存储新条目. 
    在顶部插入元素，假设在数据库系统中，最近添加的条目更有可能被更频繁地访问。
    */
    htidx = dictIsRehashing(d) ? 1 : 0;
    size_t metasize = dictMetadataSize(d);
    entry = zmalloc(sizeof(*entry) + metasize);
    if (metasize > 0) {
        memset(dictMetadata(entry), 0, metasize);
    }
    entry->next = d->ht_table[htidx][index];
    d->ht_table[htidx][index] = entry;
    d->ht_used[htidx]++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}

从代码可以看出，首先它会先判断是否正在进行rehash，如果正在进行，则会执行一次_dictRehashStep，其实就是从ht[0]迁移数据到ht[1]的操作（迁移一个桶）。接下来就是申请内存然后赋值(插入hash表中)最后返回给dictAdd, 注意这个时候dictEntry是已经插入了的。

键值对的删除

删除键值对的函数是dictDelete源码如下：

int dictDelete(dict *ht, const void *key) {
    return dictGenericDelete(ht,key,0) ? DICT_OK : DICT_ERR;
}

可以看到实际上删除函数应该是dictGenericDelete，源码如下：

static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
    uint64_t h, idx;
    dictEntry *he, *prevHe;
    int table;

    /* dict is empty */
    if (dictSize(d) == 0) return NULL;
    // 如果正在rehash，执行一次迁移
    if (dictIsRehashing(d)) _dictRehashStep(d);
      // 获取索引
    h = dictHashKey(d, key);
        // 两个hash表都要找
    for (table = 0; table <= 1; table++) {
        // 先计算hash值，然后找到对应桶（链表）的头节点
        idx = h & DICTHT_SIZE_MASK(d->ht_size_exp[table]);
        he = d->ht_table[table][idx];
        prevHe = NULL;
        while(he) {
              // 找到这个key
            if (key==he->key || dictCompareKeys(d, key, he->key)) {
                // 删除元素，和adlist的删除有点像
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht_table[table][idx] = he->next;
                if (!nofree) {
                      // 释放
                    dictFreeUnlinkedEntry(d, he);
                }
                  // 数量--
                d->ht_used[table]--;
                return he;
            }
            prevHe = he;
            he = he->next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return NULL; /* not found */
}

删除后释放

我们可以看到在删除元素之后会调用dictFreeUnlinkedEntry方法，它的实现非常简单：

void dictFreeUnlinkedEntry(dict *d, dictEntry *he) {
    if (he == NULL) return;
    dictFreeKey(d, he); // 释放key
    dictFreeVal(d, he); // 释放value
    zfree(he); // 释放dictEntry
}

删除但不释放

dict还有一个函数dictUnlink，源码如下

dictEntry *dictUnlink(dict *d, const void *key) {
    return dictGenericDelete(d,key,1);
}

它的区别仅仅在第三个函数是1, 作用就是在删除一个key的时候不会去释放它

ps：说实话我不太懂这是干嘛的，如果不释放的话这个键值对不会成为野指针吗...
网上也没有很好的解答，这个地方我还要再去了解一下

键值对的更新

dict中实现了dictReplace来对键值对进行更新, 源码如下

int dictReplace(dict *d, void *key, void *val)
{
    dictEntry *entry, *existing, auxentry;

    // 调用dictAddRaw判断key是否存在
    entry = dictAddRaw(d,key,&existing);
    // 如果存在就设置val值并返回1, 否则就更新值
    if (entry) {
        dictSetVal(d, entry, val);
        return 1;
    }
    // 设置新值然后再释放旧值*
    auxentry = *existing;
    dictSetVal(d, existing, val);
    dictFreeVal(d, &auxentry);
    return 0;
}

注意：
方法的最后几行，先设置新值再释放旧值，这个顺序很重要，因为新值和旧值可能相同，在这种情况下就要考虑引用计数，所以我们应该先增加计数，再减少计数。

更新或返回

dict中还有一个方法dictArrOrFind，字面意思就是“添加或者查找”，源码如下：

dictEntry *dictAddOrFind(dict *d, void *key) {
    dictEntry *entry, *existing;
    // 通过dictAddRaw判断key是否存在
    entry = dictAddRaw(d,key,&existing);
    // 如果存在就返回key，如果不存在就先插入 再返回
    return entry ? entry : existing;
}

键值对的查找

dict中的查找函数是dictFind，源码如下：

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (dictSize(d) == 0) return NULL; /* dict is empty */
    // rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);
    // 拿到key
    h = dictHashKey(d, key);
    for (table = 0; table <= 1; table++) {
        // 计算出hash值然后拿到hash值对应那个桶的头节点
        idx = h & DICTHT_SIZE_MASK(d->ht_size_exp[table]);
        he = d->ht_table[table][idx];
        // 遍历查找，找到直接返回
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        } 
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

值得注意的一点是：for循环的最后一个if，如果此时没有在进行rehash，那ht[0]没有就是没有了（返回NULL），如果正在进行rehash，那再查一次ht[1]。这里考虑的情况是在rehash时新的写操作是在ht[0]上进行的，所以要查找的key可能在h[1]上

总结

自此，dict关键部分的源码已经全部阅读完了，如果文章不够详细，可以再去翻翻源码，dict的源码在dict.h && dict.c中

总结一下重点

7.0的dict结构大改，从dict->dictht->dictEntry 改成了**dictEntry hash_table[2], 但本质并没有变化，还是数组+链表，我个人觉得这个改动主要是为了把操作元素结构简单化，而且在内存空间的占用上也是大大缩小了。
dict中有两张hash表，ht[0]主要用于日常操作，当触发rehash的时候，会使用h[1]来充当临时表
dict的rehash是渐进式的，会把插入和删除的操作都变更到临时表ht[1]上，然后在每次执行操作时都会将一部分数据从ht[0]挪到ht[1]，等到全部完成后，会把原本指向ht[0]的指针指向ht[1]，然后释放原本的ht[0]
rehash的条件是：
- 自然扩容，条件是used_size / all_size >= 1 && dict_can_resize = 1
- 强制扩容，条件是used_size / all_size > dict_force_resize_ratio
dict真的很有意思

最后: 篇幅很长，感谢您耐心读完

最后修改：2022 年 08 月 11 日

如果觉得我的文章对你有用，请随意赞赏

6 条评论

qhkdeigbsk
February 28th, 2025 at 04:17 pm

文章紧扣主题，观点鲜明，展现出深刻的思考维度。

回复
pywnmnabjv
October 19th, 2024 at 05:30 pm

文章的确不错啊https://www.cscnn.com/

回复
zwbcwqwkyy
October 6th, 2024 at 07:50 pm

不错不错，我喜欢看 www.jiwenlaw.com

回复
bndqdfnpqj
October 4th, 2024 at 09:14 pm

想想你的文章写的特别好https://www.ea55.com/

回复
puuxjyggrl
October 1st, 2024 at 08:48 pm

不错不错，我喜欢看 https://www.237fa.com/

回复
qeqvuhzajl
September 23rd, 2024 at 04:50 pm

叼茂SEO.bfbikes.com

回复

发表评论取消回复

评论 *

私密评论

名称 *

🎲

邮箱 *

地址

uhtrzahzgr
这篇文章不错！
yasbtxeqyt
这篇文章不错！
gwtcdkgmtc
若能在案例选择上更贴近现实，说服力会进一步提升。
dveukaicuh
首尾呼应，主题鲜明，收束有力。
sfkvpyrsyi
对趋势的预判具有战略参考价值。

redis源码阅读(三):数据结构之Dict

jettcc • 2022 年 08 月 11 日

<h1>前言</h1><blockquote>如果您能看到我的blog并且感到有帮助, 我倍感荣幸</blockquote><p>我们知道, <code>Redis</code>是一个键值型<code>（Key-Value Pair）</code>的数据库, 我们可以根据键实现快速的增删改查。而<code>Dict</code>数据结构在其中更是起到了非常非常重要的作用, 我们甚至可以认为, 整个redis其实就是一个巨大的<code>Dict</code>。<br>而键与值的映射关系正是通过<code>Dict</code>来实现的。</p><blockquote>这次文章内容比较多, 所以我很多内容都在注释里 <strong>一定要看注释呀！</strong></blockquote><h1>dict 的定义</h1><h2>dict 数据结构的实现</h2><p>Dict的实现在7.0进行了一次大改，首先是取消了<code>dictht</code>的结构，转而用数组实现<code>dictht</code>，因为我喜欢研究新的代码, 就懒得去分析6.0的代码了，见谅</p><p>Redis 的字典的结构定义主要分为：<code>dict</code>和 <code>dictEntry</code>, 源码:</p><blockquote><strong>要注意看注释</strong></blockquote><pre><code class="lang-C">/*
7.0：
*/
typedef struct dict {
    dictType *type;     // 字典类型，会跟 hash 函数等方法的具体实现有关

dictEntry **ht_table[2];        // 两个哈希表数组
    unsigned long ht_used[2]; // 分别表示哈希表数组中各自已经存放键值对的个数
    long rehashidx;     // 代表rehash到了什么位置, rehashidx==-1 代表未进行rehash

/* Keep small vars at end for optimal (minimal) struct padding */
    int16_t pauserehash; /* 表示 rehash 的状态，大于0时表示 rehash 暂停了，小于0表示出错了 */
    signed char ht_size_exp[2]; /* 表示两个哈希表数组的大小，通过 1 &lt;&lt; ht_size_exp[0/1] 来计算 */
} dict;

/* 哈希表节点，单个 Node */
typedef struct dictEntry {
    void *key;              // key, 存储哈希表的 key
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;                    // value, 存储哈希表的 value
    struct dictEntry *next; // 单链表结构，指向下一个节点，用于解决哈希冲突
    void *metadata[];  // metadata 是一块任意长度的数据，具体的长度由 dictType 中的 dictEntryMetadataBytes() 返回，作用相当于 privdata(redis 6.0 用于指向任意数据的指针)
} dictEntry;</code></pre><p>需要留意的一些地方</p><ul><li><code>dict</code>数组里有一行注释:<code>/* Keep small vars at end for optimal (minimal) struct padding */ </code>, 将小变量放在结构体的后面, 为了最佳或最小的填充,即节省空间。</li><li>在<code>dict</code>结构中的<code>**ht_table[2]</code>可以看出这是一个节点数组，由此可知在<code>redis</code>中<code>ht</code>的结构是数组+链表。也就是我们常说的拉链法解决冲突。</li><li><p><strong>那为什么要有两个哈希表？</strong></p><ol><li>因为<code>redis</code>是单进程单线程模型，但它既要支撑一个大容量，还要保持高性能的读写性能, 所以如果<code>dict</code>也像<code>Java</code>的<code>hashmap</code>一样扩容, 在大数据量时就会导致长时间的阻塞, 所以<code>redis</code>的<code>dict</code>是由两个哈希表+渐进式<code>rehash</code>的方式来实现扩容机制的。由此实现不阻塞读写，扩容又比较平滑的操作。</li><li>字典的数据通常都是在<code>ht[0]</code>进行的。当字典判断需要扩容的时候，就会停止对<code>ht[0]</code>进行写操作，然后先给<code>ht[1]</code>赋予一个有<code>ht[0] 2</code>倍大小的新哈希表, 再将所有写操作指向<code>ht[1]</code>, 此时哈希表扩容完成, 随后进入<code>rehash</code>阶段，即开始渐进式数据迁移。</li><li>在<code>rehash</code>的过程中，写操作全部在<code>h[1]</code>执行，但读操作会在<code>ht[0]、h[1]</code>上都执行，直到<code>ht[0]</code>的所有数据迁移到<code>ht[1]</code>后, 会将<code>h[0]的指针指向h[1]</code>, 释放掉<code>h[0]</code>, 并完成整个扩容和<code>rehash</code>操作。</li><li><p>所以我们可以简单的总结出两个哈希表分别承担的角色是：</p><ul><li><code>ht[0]</code>是主要的数据存储表, 正常情况下对外提供读写。</li><li><code>ht[1]</code>作为扩容时使用的临时表，保证扩容机制平滑进行。</li></ul></li></ol></li></ul><blockquote>后面在讲扩容函数会再详细聊聊这个</blockquote><p>他们之间的关系如下:<br><img src="https://hexo-blog-1306179590.cos.ap-guangzhou.myqcloud.com/redis/clipbord_1660126762407.png" alt="dict之间关系" title="dict之间关系"style=""></p><h2>字典类型 DcitType</h2><pre><code class="lang-C">typedef struct dictType {
    uint64_t (*hashFunction)(const void *key); // 哈希函数（hashcode方法）
    void *(*keyDup)(dict *d, const void *key); // key的copy函数
    void *(*valDup)(dict *d, const void *obj); // val的copy函数
    int (*keyCompare)(dict *d, const void *key1, const void *key2); // key的比较函数
    void (*keyDestructor)(dict *d, void *key); // key的销毁函数
    void (*valDestructor)(dict *d, void *obj); // val的销毁函数
    int (*expandAllowed)(size_t moreMem, double usedRatio); // 是否允许扩容

/* 允许 dictEntry 携带额外的调用者定义的元数据。
     * 这段额外信息的内存会在条目分配时被零初始化. */
    size_t (*dictEntryMetadataBytes)(dict *d);
} dictType;</code></pre><p>这里很好理解，就是为了实现多态</p><h2>dictIterator 迭代器</h2><p>就像<code>Java</code>和<code>C++</code>里的<code>Map</code>, <code>Redis</code>也提供了<code>dict</code>的迭代器, 代码如下：</p><pre><code class="lang-C">typedef struct dictIterator {
    dict *d; // 指向当前迭代的字典
    long index; // 表示指向的键值对 的索引
    int table, safe; // table = 哈希表的号码(ht[0] / ht[1])
    // safe 表示该迭代器是否安全。安全时可以掉用 dictAdd,dictFind等等其他函数，不安全时只能调用 dictNext
    dictEntry *entry, *nextEntry;
    // entry 指向迭代器所指的键值对，nextEntry 指向下一个键值对
    /* unsafe iterator fingerprint for misuse detection. */
    unsigned long long fingerprint;
    // fingerprint 指纹? 用于检查不安全迭代器的误用? 这里有点没懂，希望有大佬补充
} dictIterator;</code></pre><h2>一些常量和宏定义</h2><h3>初始化定义:</h3><pre><code class="lang-C">/* This is the initial size of every hash table */
#define DICT_HT_INITIAL_EXP      2
#define DICT_HT_INITIAL_SIZE     (1&lt;&lt;(DICT_HT_INITIAL_EXP))</code></pre><p>这里...有点难绷，实际上就是哈希表的初始大小为<code>1 &lt;&lt; 2</code>, 可能是为了方便修改</p><h3>set</h3><p>下面是设置值的宏</p><h4>setkey</h4><blockquote>设置key的值</blockquote><pre><code class="lang-C">#define dictSetKey(d, entry, _key_) do { \
    if ((d)-&gt;type-&gt;keyDup) \
        (entry)-&gt;key = (d)-&gt;type-&gt;keyDup((d), _key_); \
    else \
        (entry)-&gt;key = (_key_); \
} while(0)</code></pre><p>在<code>dictType</code>中如果有设置<code>key</code>的方法，就调用方法赋值，否则的话就直接赋值。</p><h4>setval</h4><blockquote>设置value的值</blockquote><pre><code class="lang-C">#define dictSetVal(d, entry, _val_) do { \
    if ((d)-&gt;type-&gt;valDup) \
        (entry)-&gt;v.val = (d)-&gt;type-&gt;valDup((d), _val_); \
    else \
        (entry)-&gt;v.val = (_val_); \
} while(0)</code></pre><p>几乎一样的代码，在<code>dictType</code>中如果有设置<code>value</code>的方法，就调用方法赋值，否则的话就直接赋值。</p><h3>free 释放</h3><p>下面是释放<code>key、value</code>的宏</p><h4>free key</h4><pre><code class="lang-C">#define dictFreeKey(d, entry) \
    if ((d)-&gt;type-&gt;keyDestructor) \
        (d)-&gt;type-&gt;keyDestructor((d), (entry)-&gt;key)</code></pre><h4>free value</h4><pre><code class="lang-C">#define dictFreeVal(d, entry) \
    if ((d)-&gt;type-&gt;valDestructor) \
        (d)-&gt;type-&gt;valDestructor((d), (entry)-&gt;v.val)</code></pre><p>代码原理: 如果在<code>dictType</code>中有提供<code>keyDestructor / valDestructor</code>函数, 就调用函数释放<code>key / value</code>，但是我这里没有看到没有提供函数时的释放方式，这里应该是默认都有设置的。</p><h3>比较</h3><pre><code class="lang-C">#define dictCompareKeys(d, key1, key2) \
    (((d)-&gt;type-&gt;keyCompare) ? \
        (d)-&gt;type-&gt;keyCompare((d), key1, key2) : \
        (key1) == (key2))</code></pre><p>代码原理: <code>dictType</code>的<code>keyCompare</code>函数指针为空就用<code>==</code>来比较，不然就调用<code>keyCompare</code>函数来比较。</p><h3>其他的一些宏</h3><p>没必要细说，但是也是比较有意义的内容，就不单独讲了</p><h4>获取 key 和 val</h4><pre><code class="lang-C">// 获取 key 的哈希值
#define dictHashKey(d, key) (d)-&gt;type-&gt;hashFunction(key) 
// 相当于get方法
#define dictGetKey(he) ((he)-&gt;key)
#define dictGetVal(he) ((he)-&gt;v.val)
// 不同的类型
#define dictGetSignedIntegerVal(he) ((he)-&gt;v.s64)
#define dictGetUnsignedIntegerVal(he) ((he)-&gt;v.u64)
#define dictGetDoubleVal(he) ((he)-&gt;v.d)</code></pre><h4>哈希表大小</h4><pre><code class="lang-C">// 获取哈希表的大小
#define DICTHT_SIZE(exp) ((exp) == -1 ? 0 : (unsigned long)1&lt;&lt;(exp))
// 哈希表的掩码(size - 1)
#define DICTHT_SIZE_MASK(exp) ((exp) == -1 ? 0 : (DICTHT_SIZE(exp))-1)</code></pre><h4>获取字典保存键值对的总数</h4><pre><code class="lang-C">#define dictSize(d) ((d)-&gt;ht_used[0]+(d)-&gt;ht_used[1])</code></pre><h3>rehash 的宏</h3><p>可以留意一下, <code>rehash</code>是个重点内容</p><pre><code class="lang-C">// 判断当前是否在 rehash
#define dictIsRehashing(d) ((d)-&gt;rehashidx != -1)
// 暂停 rehash
#define dictPauseRehashing(d) (d)-&gt;pauserehash++
// 恢复 rehash
#define dictResumeRehashing(d) (d)-&gt;pauserehash--</code></pre><h1>dict的方法</h1><h2>创建Dict</h2><p>在<code>dict.c</code>中创建字典的方法是<code>dictCreate</code></p><p>源码如下:</p><pre><code class="lang-C">/* Create a new hash table */
dict *dictCreate(dictType *type)
{
    dict *d = zmalloc(sizeof(*d));

_dictInit(d,type);
    return d;
}</code></pre><p>可以看到在其中首先为字典结构分配了空间，随后调用了<code>_dictInit</code>，源码如下：</p><pre><code class="lang-C">/* Initialize the hash table */
int _dictInit(dict *d, dictType *type)
{
    _dictReset(d, 0);
    _dictReset(d, 1);
    d-&gt;type = type;
    d-&gt;rehashidx = -1;
    d-&gt;pauserehash = 0;
    return DICT_OK;
}

/* Reset hash table parameters already initialized with _dictInit()*/
static void _dictReset(dict *d, int htidx)
{
    d-&gt;ht_table[htidx] = NULL;
    d-&gt;ht_size_exp[htidx] = -1;
    d-&gt;ht_used[htidx] = 0;
}</code></pre><p>可以看到<code>init</code>函数里面主要是通过<code>_dictReset</code>函数来设置参数</p><p>没有什么特别关键的内容，就是前面提到的，最开始的时候用的是<code>ht[0]</code></p><p>注意在<code>_dictInit</code>函数的返回值是一个宏，用来标记操作是否完成，定义如下：</p><pre><code class="lang-c">#define DICT_OK 0
#define DICT_ERR 1</code></pre><h2>修改Dict</h2><p>修改<code>dict</code>一般是扩容或者缩容。</p><h3>dictResize 缩容</h3><p>在字典中，缩容函数是<code>dictResize</code>，源码如下：</p><pre><code class="lang-C">/* Resize the table to the minimal size that contains all the elements,
 * but with the invariant of a USED/BUCKETS ratio near to &lt;= 1 */
int dictResize(dict *d)
{
    unsigned long minimal;

if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d-&gt;ht_used[0];
    if (minimal &lt; DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    return dictExpand(d, minimal);
}</code></pre><p>可以看到缩容的条件就是 <code>dict_can_resize!=0</code>和<code>dictIsRehashing(d)==false</code>，后者函数判断的是当前是否正在进行<code>rehash</code>，如果正在进行肯定不能缩容，这很好理解。</p><p>而前者是<code>dict.c</code>中定义的一个全局变量</p><pre><code class="lang-C">static int dict_can_resize = 1;</code></pre><p>作用是指示<code>resize</code>能否进行</p><blockquote><p>那，在什么情况下它会变成<code>0</code>呢？</p><p>答：在系统运行有后台进程时，不允许自动自动调整大小，这是为了为了使得类linux系统的copy-on-write有更好的性能（没有调整大小， 就没有rehash，这样父进程的db没有改变，子进程就不需要真的copy）。在后台子进程退出后，又会允许resize。这里说到的后台子进程主要跟redis的持久化机制有关。在后台持久化时，为了防止自然<code>rehash</code>，会把它设置为<code>0</code>，减少程序对内存的碰撞，当持久化任务完成后，又会被设回<code>1</code>.</p><p>以上文字来自<span class="external-link"><a class="no-external-link" href="http://www.petermao.com/redis/75.html" target="_blank"><i data-feather="external-link"></i>petermao的club</a></span>（我找了好久才找到答案...</p></blockquote><p>回到源码可以看到，它先通过<code>ht_used[0]</code>获取已有元素，然后在<code>dictExpand</code>中进行扩容，源码如下:</p><pre><code class="lang-C">/* return DICT_ERR if expand was not performed */
int dictExpand(dict *d, unsigned long size) {
    return _dictExpand(d, size, NULL);
}

/* Expand or create the hash table,
 * when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).
 * Returns DICT_OK if expand was performed, and DICT_ERR if skipped. */
int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
    if (malloc_failed) *malloc_failed = 0;

// 如果正在rehash或者ht_used[0]长度大于传入长度，说明错误了
    if (dictIsRehashing(d) || d-&gt;ht_used[0] &gt; size)
        return DICT_ERR;

// 建立新的表并申请内存
    dictEntry **new_ht_table;
    unsigned long new_ht_used;
    signed char new_ht_size_exp = _dictNextExp(size);

// 新的长度 
    size_t newsize = 1ul&lt;&lt;new_ht_size_exp;
    if (newsize &lt; size || newsize * sizeof(dictEntry*) &lt; newsize)
        return DICT_ERR;

// 相同的长度的话就没意义了
    if (new_ht_size_exp == d-&gt;ht_size_exp[0]) return DICT_ERR;

// 为新的哈希表申请内存，并且把所有的指针都指向null
    if (malloc_failed) {
        new_ht_table = ztrycalloc(newsize*sizeof(dictEntry*));
        *malloc_failed = new_ht_table == NULL;
        if (*malloc_failed)
            return DICT_ERR;
    } else
        new_ht_table = zcalloc(newsize*sizeof(dictEntry*));

new_ht_used = 0;

/* 这是第一次初始化吗？如果是，那它就不算是真正的rehash
     * 我们刚刚设置了第一个哈希表，以便它可以接受键。 */
    if (d-&gt;ht_table[0] == NULL) {
        d-&gt;ht_size_exp[0] = new_ht_size_exp;
        d-&gt;ht_used[0] = new_ht_used;
        d-&gt;ht_table[0] = new_ht_table;
        return DICT_OK;
    }

/* 准备第二个哈希表用于增量重散列 */
    d-&gt;ht_size_exp[1] = new_ht_size_exp;
    d-&gt;ht_used[1] = new_ht_used;
    d-&gt;ht_table[1] = new_ht_table;
    d-&gt;rehashidx = 0;
    return DICT_OK;
}</code></pre><p>解析：在<code>_dictExpand</code>函数中，首先会把新长度设置为刚好大于等于原有长度的2次幂。然后重新申请新内存并且初始化。然后会判断是否是第一次哈希，如果，是那其实这也只算是一次初始化而已，也就是说:</p><blockquote><p><code>_dictExpand</code> 函数不是只为了 <code>rehash</code>，还可以初始化字典。 <code>_dictExpand</code>函数判断是否是 <code>rehash</code>是通过判断 <code>ht_table[0] </code>是否为空来判断的。</p><p>换句话说：如果调用<code>_dictExpand </code>的是非空字典字典，则增容后的哈希表就是在<code> ht_table[1] </code>中，此时需要手动释放<code> ht_table[0]</code>，并将<code> ht_table[1] </code>放到<code> ht_table[0] </code>位置上（指针指过去）。</p></blockquote><h3>Rehash 扩容（重点）</h3><p>先看看源码：</p><pre><code class="lang-C">int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

while(n-- &amp;&amp; d-&gt;ht_used[0] != 0) {
        dictEntry *de, *nextde;
      
          /* 这里的rehashidx不会溢出，因为ht[0].used != 0, 所以我们确定这里还有很多元素*/
        assert(DICTHT_SIZE(d-&gt;ht_size_exp[0]) &gt; (unsigned long)d-&gt;rehashidx);
        while(d-&gt;ht_table[0][d-&gt;rehashidx] == NULL) {
            d-&gt;rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d-&gt;ht_table[0][d-&gt;rehashidx];
        /* 将这个桶中的所有键从旧的哈希 table 移动到新的哈希 table */
        while(de) {
            uint64_t h;

nextde = de-&gt;next;
            /* 获取新哈希表中的索引 */
            h = dictHashKey(d, de-&gt;key) &amp; DICTHT_SIZE_MASK(d-&gt;ht_size_exp[1]);
            de-&gt;next = d-&gt;ht_table[1][h];
            d-&gt;ht_table[1][h] = de;
            d-&gt;ht_used[0]--;
            d-&gt;ht_used[1]++;
            de = nextde;
        }
        d-&gt;ht_table[0][d-&gt;rehashidx] = NULL;
        d-&gt;rehashidx++;
    }

/* 检查我们是否已经重新散列了整个表。 */
    if (d-&gt;ht_used[0] == 0) {
        zfree(d-&gt;ht_table[0]);
        /* 将新的 ht 复制到旧的 ht 上, 就是指针指向变动 */
        d-&gt;ht_table[0] = d-&gt;ht_table[1];
        d-&gt;ht_used[0] = d-&gt;ht_used[1];
        d-&gt;ht_size_exp[0] = d-&gt;ht_size_exp[1];
        _dictReset(d, 1);
        d-&gt;rehashidx = -1;
        return 0;
    }

/*
    每次 rehash 都是以 n 个桶为单位的，将每个桶上的链都移到新的哈希表上。一次 rehash 完成以后，如果之后还有桶要 rehash，则返回1，如果 rehash 完成，则返回0。
    但是实际上，rehashidx 指向的桶可能是空桶，所以为了效率，一次 rehash 最多要遍历 10n 个空桶，遍历完了 10 n 个空桶就会返回。
    */
    return 1;
}</code></pre><p>rehash扩容分为两种：</p><ol><li>自然扩容，条件是<code>used_size / all_size &gt;= 1 &amp;&amp; dict_can_resize = 1</code></li><li>强制扩容，条件是<code>used_size / all_size &gt; dict_force_resize_ratio</code>，而</li></ol><pre><code class="lang-C">static unsigned int dict_force_resize_ratio = 5;</code></pre><p><code>Rehash</code> 不能一步完成，否则数据量特别大时会导致服务器暂停服务来迁移数据。而渐进式的好处就是数据的迁移发生在对数据的增删改查时，这样就将迁移平摊在每个增删改查的操作上。</p><p><code>Rehash</code>的详细步骤如下：</p><ol><li>为 <code>ht[1]</code> 分配空间， 让字典同时持有 <code>ht[0]</code> 和 <code>ht[1]</code> 两个哈希表。</li><li>所有的写入、更新操作全部指向<code>h[1]</code>，只有读取操作继续在<code>h[0]</code>执行（或者说两个都会查）</li><li><code>rehash</code>都是以<code>n</code>个桶为单位的，将每个桶上的链都移到新的哈希表上。一次 <code>rehash</code> 完成以后，如果之后还有桶要 <code>rehash</code>，则返回<code>1</code>，如果 <code>rehash</code> 完成，则返回<code>0</code>。但是实际上，<code>rehashidx</code> 指向的桶可能是空桶，所以为了效率，一次 <code>rehash </code>最多要遍历<code> 10n </code>个空桶，当空桶数量达到<code>10</code>的时候就会返回，等待下一次<code>rehash</code>。</li><li>随着字典操作的不断执行， 最终在某个时间点上， <code>ht[0]</code> 的所有键值对都会被<code>rehash </code>至 <code>ht[1]</code> ， 这时程序将 <code>rehashidx</code> 属性的值设为 <code>-1</code> ， 表示 rehash 操作已完成。</li></ol><h2>释放Dict</h2><p><code>dict</code>中负责释放的函数是<code>dictRelease</code></p><p>其中在<code>dictRelease</code>中调用了两次<code>_dictClear</code>，用于清空两个哈希表，清空完成后才释放<code>dict</code>，两个方法的源码如下</p><pre><code class="lang-C">/* Clear &amp; Release the hash table */
void dictRelease(dict *d)
{
    _dictClear(d,0,NULL);
    _dictClear(d,1,NULL);
    zfree(d);
}

/* Destroy an entire dictionary */
int _dictClear(dict *d, int htidx, void(callback)(dict*)) {
    unsigned long i;

/* Free all the elements */
    for (i = 0; i &lt; DICTHT_SIZE(d-&gt;ht_size_exp[htidx]) &amp;&amp; d-&gt;ht_used[htidx] &gt; 0; i++) {
        dictEntry *he, *nextHe;

if (callback &amp;&amp; (i &amp; 65535) == 0) callback(d);

if ((he = d-&gt;ht_table[htidx][i]) == NULL) continue;
        while(he) {
            nextHe = he-&gt;next;
            dictFreeKey(d, he);
            dictFreeVal(d, he);
            zfree(he);
            d-&gt;ht_used[htidx]--;
            he = nextHe;
        }
    }
    /* Free the table and the allocated cache structure */
    zfree(d-&gt;ht_table[htidx]);
    /* Re-initialize the table */
    _dictReset(d, htidx);
    return DICT_OK; /* never fails */
}</code></pre><p><code>_dictClear</code>会遍历所有的索引，当该索引上有键值对时，则释放该索引上的整条链表。释放完所有键值对之后，再释放该哈希表并重置其成员。</p><h1>HashTable中的增删查改</h1><h2>键值对的添加</h2><p><code>hashtable</code>中实现键值对添加的函数是<code>dictAdd</code>，源码如下</p><pre><code class="lang-C">/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL);

if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}</code></pre><p>根据代码我们可以看到它调用了<code>dictAddRaw</code>,源码如下：</p><pre><code class="lang-C">dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    int htidx;

if (dictIsRehashing(d)) _dictRehashStep(d);

/* 获得新元素的索引， 如果是 -1，说明已经存在了
        dictKeyIndex这个函数作用时返回key在hash表中的索引，如果==-1就代表存在，
          要注意，如果当前正在rehash，则返回的索引就一直会是第二张哈希表的索引 。
      */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1)
        return NULL;

/* 
    分配内存并存储新条目. 
    在顶部插入元素，假设在数据库系统中，最近添加的条目更有可能被更频繁地访问。
    */
    htidx = dictIsRehashing(d) ? 1 : 0;
    size_t metasize = dictMetadataSize(d);
    entry = zmalloc(sizeof(*entry) + metasize);
    if (metasize &gt; 0) {
        memset(dictMetadata(entry), 0, metasize);
    }
    entry-&gt;next = d-&gt;ht_table[htidx][index];
    d-&gt;ht_table[htidx][index] = entry;
    d-&gt;ht_used[htidx]++;

/* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}</code></pre><p>从代码可以看出，首先它会先判断是否正在进行<code>rehash</code>，如果正在进行，则会执行一次<code>_dictRehashStep</code>，其实就是从<code>ht[0]</code>迁移数据到<code>ht[1]</code>的操作（迁移一个桶）。接下来就是申请内存然后赋值(插入<code>hash</code>表中)最后返回给<code>dictAdd</code>, 注意这个时候<code>dictEntry</code>是已经插入了的。</p><h2>键值对的删除</h2><p>删除键值对的函数是<code>dictDelete</code>源码如下：</p><pre><code class="lang-C">int dictDelete(dict *ht, const void *key) {
    return dictGenericDelete(ht,key,0) ? DICT_OK : DICT_ERR;
}</code></pre><p>可以看到实际上删除函数应该是<code>dictGenericDelete</code>，源码如下：</p><pre><code class="lang-C">static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
    uint64_t h, idx;
    dictEntry *he, *prevHe;
    int table;

/* dict is empty */
    if (dictSize(d) == 0) return NULL;
    // 如果正在rehash，执行一次迁移
    if (dictIsRehashing(d)) _dictRehashStep(d);
      // 获取索引
    h = dictHashKey(d, key);
        // 两个hash表都要找
    for (table = 0; table &lt;= 1; table++) {
        // 先计算hash值，然后找到对应桶（链表）的头节点
        idx = h &amp; DICTHT_SIZE_MASK(d-&gt;ht_size_exp[table]);
        he = d-&gt;ht_table[table][idx];
        prevHe = NULL;
        while(he) {
              // 找到这个key
            if (key==he-&gt;key || dictCompareKeys(d, key, he-&gt;key)) {
                // 删除元素，和adlist的删除有点像
                if (prevHe)
                    prevHe-&gt;next = he-&gt;next;
                else
                    d-&gt;ht_table[table][idx] = he-&gt;next;
                if (!nofree) {
                      // 释放
                    dictFreeUnlinkedEntry(d, he);
                }
                  // 数量--
                d-&gt;ht_used[table]--;
                return he;
            }
            prevHe = he;
            he = he-&gt;next;
        }
        if (!dictIsRehashing(d)) break;
    }
    return NULL; /* not found */
}</code></pre><h3>删除后释放</h3><p>我们可以看到在删除元素之后会调用<code>dictFreeUnlinkedEntry</code>方法，它的实现非常简单：</p><pre><code class="lang-C">void dictFreeUnlinkedEntry(dict *d, dictEntry *he) {
    if (he == NULL) return;
    dictFreeKey(d, he); // 释放key
    dictFreeVal(d, he); // 释放value
    zfree(he); // 释放dictEntry
}</code></pre><h3>删除但不释放</h3><p><code>dict</code>还有一个函数<code>dictUnlink</code>，源码如下</p><pre><code class="lang-C">dictEntry *dictUnlink(dict *d, const void *key) {
    return dictGenericDelete(d,key,1);
}</code></pre><p>它的区别仅仅在第三个函数是<code>1</code>, 作用就是在删除一个<code>key</code>的时候不会去释放它</p><blockquote>ps：说实话我不太懂这是干嘛的，如果不释放的话这个键值对不会成为野指针吗...<br>网上也没有很好的解答，这个地方我还要再去了解一下</blockquote><h2>键值对的更新</h2><p><code>dict</code>中实现了<code>dictReplace</code>来对键值对进行更新, 源码如下</p><pre><code class="lang-C">int dictReplace(dict *d, void *key, void *val)
{
    dictEntry *entry, *existing, auxentry;

// 调用dictAddRaw判断key是否存在
    entry = dictAddRaw(d,key,&amp;existing);
    // 如果存在就设置val值并返回1, 否则就更新值
    if (entry) {
        dictSetVal(d, entry, val);
        return 1;
    }
    // 设置新值然后再释放旧值*
    auxentry = *existing;
    dictSetVal(d, existing, val);
    dictFreeVal(d, &amp;auxentry);
    return 0;
}</code></pre><blockquote>注意：<br>方法的最后几行，先设置新值再释放旧值，这个顺序很重要，因为新值和旧值可能相同，在这种情况下就要考虑引用计数，所以我们应该先增加计数，再减少计数。</blockquote><h3>更新或返回</h3><p><code>dict</code>中还有一个方法<code>dictArrOrFind</code>，字面意思就是“添加或者查找”，源码如下：</p><pre><code class="lang-C">dictEntry *dictAddOrFind(dict *d, void *key) {
    dictEntry *entry, *existing;
    // 通过dictAddRaw判断key是否存在
    entry = dictAddRaw(d,key,&amp;existing);
    // 如果存在就返回key，如果不存在就先插入 再返回
    return entry ? entry : existing;
}</code></pre><h2>键值对的查找</h2><p><code>dict</code>中的查找函数是<code>dictFind</code>，源码如下：</p><pre><code class="lang-C">dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

if (dictSize(d) == 0) return NULL; /* dict is empty */
    // rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);
    // 拿到key
    h = dictHashKey(d, key);
    for (table = 0; table &lt;= 1; table++) {
        // 计算出hash值然后拿到hash值对应那个桶的头节点
        idx = h &amp; DICTHT_SIZE_MASK(d-&gt;ht_size_exp[table]);
        he = d-&gt;ht_table[table][idx];
        // 遍历查找，找到直接返回
        while(he) {
            if (key==he-&gt;key || dictCompareKeys(d, key, he-&gt;key))
                return he;
            he = he-&gt;next;
        } 
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}</code></pre><p>值得注意的一点是：<code>for</code>循环的最后一个<code>if</code>，如果此时没有在进行<code>rehash</code>，那<code>ht[0]</code>没有就是没有了（返回<code>NULL</code>），如果正在进行<code>rehash</code>，那再查一次<code>ht[1]</code>。这里考虑的情况是在<code>rehash</code>时新的写操作是在<code>ht[0]</code>上进行的，所以要查找的<code>key</code>可能在<code>h[1]</code>上</p><h1>总结</h1><p>自此，<code>dict</code>关键部分的源码已经全部阅读完了，如果文章不够详细，可以再去翻翻源码，<code>dict</code>的源码在<code>dict.h &amp;&amp; dict.c</code>中</p><p>总结一下重点</p><ol><li><code>7.0</code>的<code>dict</code>结构大改，从<code>dict</code>-&gt;<code>dictht</code>-&gt;<code>dictEntry</code> 改成了<code>**dictEntry hash_table[2]</code>, 但本质并没有变化，还是数组+链表，我个人觉得这个改动主要是为了把操作元素结构简单化，而且在内存空间的占用上也是大大缩小了。</li><li><code>dict</code>中有两张<code>hash</code>表，<code>ht[0]</code>主要用于日常操作，当触发<code>rehash</code>的时候，会使用<code>h[1]</code>来充当临时表</li><li><code>dict</code>的<code>rehash</code>是渐进式的，会把插入和删除的操作都变更到临时表<code>ht[1]</code>上，然后在每次执行操作时都会将一部分数据从<code>ht[0]</code>挪到<code>ht[1]</code>，等到全部完成后，会把原本指向<code>ht[0]</code>的指针指向<code>ht[1]</code>，然后释放原本的<code>ht[0]</code></li><li><p><code>rehash</code>的条件是：</p><ul><li>自然扩容，条件是<code>used_size / all_size &gt;= 1 &amp;&amp; dict_can_resize = 1</code></li><li>强制扩容，条件是<code>used_size / all_size &gt; dict_force_resize_ratio</code></li></ul></li><li><code>dict</code>真的很有意思</li></ol><blockquote>最后: 篇幅很长，感谢您耐心读完</blockquote>

前言

dict 的定义

dict 数据结构的实现

字典类型 DcitType

dictIterator 迭代器

一些常量和宏定义

初始化定义:

set

setkey

setval

free 释放

free key

free value

比较

其他的一些宏

获取 key 和 val

哈希表大小

获取字典保存键值对的总数

rehash 的宏

dict的方法

创建Dict

修改Dict

dictResize 缩容

Rehash 扩容（重点）

释放Dict

HashTable中的增删查改

键值对的添加

键值对的删除

删除后释放

删除但不释放

键值对的更新

更新或返回

键值对的查找

总结

6 条评论

发表评论 取消回复

redis源码阅读(三):数据结构之Dict

发表评论取消回复