LevelDB二级索引实现姚凯文(kevinyao0901) 姜嘉祺

kevinyao0901 452fc80039 Update the report			vor 1 Monat
Report/png	Update the report	删除	vor 1 Monat
benchmarks	init the repo with base	删除	vor 1 Monat
cmake	init the repo with base	删除	vor 1 Monat
db	finish Secondary Index based on DBImpl&indexdb_	删除	vor 1 Monat
doc	init the repo with base	删除	vor 1 Monat
helpers/memenv	init the repo with base	删除	vor 1 Monat
include/leveldb	finish Secondary Index based on DBImpl&indexdb_	删除	vor 1 Monat
issues	init the repo with base	删除	vor 1 Monat
port	init the repo with base	删除	vor 1 Monat
table	init the repo with base	删除	vor 1 Monat
test	finish Secondary Index based on DBImpl&indexdb_	删除	vor 1 Monat
third_party	init the repo with base	删除	vor 1 Monat
util	init the repo with base	删除	vor 1 Monat
.clang-format	init the repo with base		vor 1 Monat
.gitignore	finish Secondary Index based on DBImpl&indexdb_		vor 1 Monat
.gitmodules	init the repo with base		vor 1 Monat
AUTHORS	init the repo with base		vor 1 Monat
CMakeLists.txt	finish Secondary Index based on DBImpl&indexdb_		vor 1 Monat
CONTRIBUTING.md	init the repo with base		vor 1 Monat
LICENSE	init the repo with base		vor 1 Monat
NEWS	init the repo with base		vor 1 Monat
README.md	Update the report		vor 1 Monat
TODO	init the repo with base		vor 1 Monat

README.md

实验报告：在 LevelDB 中构建二级索引的设计与实现

实验目的

在 LevelDB 的基础上设计和实现一个支持二级索引的功能，优化特定字段的查询效率。通过此功能，用户能够根据字段值高效地检索对应的数据记录，而不需要遍历整个数据库。

实现思路

1. 二级索引的概念

二级索引是一种额外的数据结构，用于加速某些特定字段的查询。在 LevelDB 中，键值对的存储是以 key:value 的形式。通过创建二级索引，我们将目标字段的值与原始 key 建立映射关系，存储在独立的索引数据库中，从而支持基于字段值的快速查询。

例如，原始数据如下：

k_1 : name:Customer#000000001|address:IVhzIApeRb|phone:25-989-741-2988
k_2 : name:Customer#000000002|address:XSTf4,NCwDVaW|phone:23-768-687-3665
k_3 : name:Customer#000000001|address:MG9kdTD2WBHm|phone:11-719-748-3364

为字段 name 创建索引后，索引数据库中的条目如下：

name:Customer#000000001-k_1 : k_1
name:Customer#000000001-k_3 : k_3
name:Customer#000000002-k_2 : k_2

2. 设计目标

创建索引：扫描数据库中的所有记录，基于指定字段提取值，并将字段值和原始 key 编码后写入二级索引数据库 indexDb_。
查询索引：在二级索引数据库中快速定位字段值对应的原始 key。
删除索引：移除二级索引数据库中所有与目标字段相关的条目。

具体实现

1. DBImpl 类的设计

在 LevelDB 的核心类 DBImpl 中，增加了对二级索引的支持，包括：

索引字段管理：使用成员变量 fieldWithIndex_ 保存所有已经创建索引的字段名。
索引数据库：使用成员变量 indexDb_ 管理二级索引数据库。

class DBImpl : public DB {
private:
    std::vector<std::string> fieldWithIndex_; // 已创建索引的字段列表
    leveldb::DB* indexDb_;                    // 存储二级索引的数据库
};

2. 二级索引的创建

在 DBImpl 中实现 CreateIndexOnField 方法，用于对指定字段创建二级索引：

遍历主数据库中的所有数据记录。
解析目标字段的值。
在索引数据库中写入二级索引条目，键为 "fieldName:field_value-key"，值为原始数据的键。

示例：

核心代码：

Status DBImpl::CreateIndexOnField(const std::string& fieldName) {
    // 检查字段是否已创建索引
    for (const auto& field : fieldWithIndex_) {
        if (field == fieldName) {
            return Status::InvalidArgument("Index already exists for this field");
        }
    }

    // 添加到已创建索引的字段列表
    fieldWithIndex_.push_back(fieldName);

    // 遍历主数据库，解析字段值并写入索引数据库
    leveldb::ReadOptions read_options;
    leveldb::Iterator* it = this->NewIterator(read_options);

    for (it->SeekToFirst(); it->Valid(); it->Next()) {
        std::string key = it->key().ToString();
        std::string value = it->value().ToString();

        // 提取字段值
        size_t field_pos = value.find(fieldName + ":");
        if (field_pos != std::string::npos) {
            size_t value_start = field_pos + fieldName.size() + 1;
            size_t value_end = value.find("|", value_start);
            if (value_end == std::string::npos) value_end = value.size();

            std::string field_value = value.substr(value_start, value_end - value_start);
            std::string index_key = fieldName + ":" + field_value;

            // 在索引数据库中创建条目
            leveldb::Status s = indexDb_->Put(WriteOptions(), Slice(index_key), Slice(key));
            if (!s.ok()) {
                delete it;
                return s;
            }
        }
    }

    delete it;
    return Status::OK();
}

3. 二级索引的查询

在 DBImpl 中实现 QueryByIndex 方法，通过目标字段值查找对应的原始键：

在索引数据库中遍历 fieldName:field_value 开头的条目。
收集结果并返回。

核心代码：

std::vector<std::string> DBImpl::QueryByIndex(const std::string& fieldName) {
    std::vector<std::string> results;
    leveldb::ReadOptions read_options;
    leveldb::Iterator* it = indexDb_->NewIterator(read_options);

    for (it->Seek(fieldName); it->Valid(); it->Next()) {
        std::string value = it->value().ToString();
        if (!value.empty()) {
            results.push_back(value);
        }
    }

    delete it;
    return results;
}

4. 二级索引的删除

在 DBImpl 中实现 DeleteIndex 方法，通过目标字段名移除对应的所有索引条目：

在 fieldWithIndex_ 中移除字段。
遍历索引数据库，删除所有以 fieldName: 开头的条目。

核心代码：

Status DBImpl::DeleteIndex(const std::string& fieldName) {
    auto it = std::find(fieldWithIndex_.begin(), fieldWithIndex_.end(), fieldName);
    if (it == fieldWithIndex_.end()) {
        return Status::NotFound("Index not found for this field");
    }

    // 从已创建索引列表中移除字段
    fieldWithIndex_.erase(it);

    // 遍历索引数据库，删除相关条目
    leveldb::ReadOptions read_options;
    leveldb::Iterator* it_index = indexDb_->NewIterator(read_options);

    for (it_index->SeekToFirst(); it_index->Valid(); it_index->Next()) {
        std::string index_key = it_index->key().ToString();
        if (index_key.find(fieldName + ":") == 0) {
            Status s = indexDb_->Delete(WriteOptions(), Slice(index_key));
            if (!s.ok()) {
                delete it_index;
                return s;
            }
        }
    }

    delete it_index;
    return Status::OK();
}

示例流程

插入原始数据：

k_1 : name:Customer#000000001|address:IVhzIApeRb|phone:25-989-741-2988
k_2 : name:Customer#000000002|address:XSTf4,NCwDVaW|phone:23-768-687-3665

创建索引：
- 调用 CreateIndexOnField("name")，索引数据库生成条目：
```
name:Customer#000000001-k_1 : k_1
name:Customer#000000002-k_2 : k_2
```
查询索引：
- 调用 QueryByIndex("name:Customer#000000001")，返回 ["k_1"]。
删除索引：
- 调用 DeleteIndex("name")，移除所有 name: 开头的索引条目。

测试结果：

总结

本实验通过在 DBImpl 中集成索引管理功能，实现了对二级索引的创建、查询和删除。二级索引数据存储在独立的 indexDb_ 中，通过高效的键值映射提升了字段值查询的效率。