LevelDB二级索引实现 姚凯文(kevinyao0901) 姜嘉祺
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

271 lines
8.2 KiB

  1. // Copyright (c) 2011 The LevelDB Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style license that can be
  3. // found in the LICENSE file. See the AUTHORS file for names of contributors.
  4. #include "leveldb/table.h"
  5. #include "leveldb/cache.h"
  6. #include "leveldb/comparator.h"
  7. #include "leveldb/env.h"
  8. #include "leveldb/filter_policy.h"
  9. #include "leveldb/options.h"
  10. #include "table/block.h"
  11. #include "table/filter_block.h"
  12. #include "table/format.h"
  13. #include "table/two_level_iterator.h"
  14. #include "util/coding.h"
  15. namespace leveldb {
  16. struct Table::Rep {
  17. ~Rep() {
  18. delete filter;
  19. delete[] filter_data;
  20. delete index_block;
  21. }
  22. Options options;
  23. Status status;
  24. RandomAccessFile* file;
  25. uint64_t cache_id;
  26. FilterBlockReader* filter;
  27. const char* filter_data;
  28. BlockHandle metaindex_handle; // Handle to metaindex_block: saved from footer
  29. Block* index_block;
  30. };
  31. Status Table::Open(const Options& options, RandomAccessFile* file,
  32. uint64_t size, Table** table) {
  33. *table = nullptr;
  34. if (size < Footer::kEncodedLength) {
  35. return Status::Corruption("file is too short to be an sstable");
  36. }
  37. char footer_space[Footer::kEncodedLength];
  38. Slice footer_input;
  39. Status s = file->Read(size - Footer::kEncodedLength, Footer::kEncodedLength,
  40. &footer_input, footer_space);
  41. if (!s.ok()) return s;
  42. Footer footer;
  43. s = footer.DecodeFrom(&footer_input);
  44. if (!s.ok()) return s;
  45. // Read the index block
  46. BlockContents index_block_contents;
  47. ReadOptions opt;
  48. if (options.paranoid_checks) {
  49. opt.verify_checksums = true;
  50. }
  51. s = ReadBlock(file, opt, footer.index_handle(), &index_block_contents);
  52. if (s.ok()) {
  53. // We've successfully read the footer and the index block: we're
  54. // ready to serve requests.
  55. Block* index_block = new Block(index_block_contents);
  56. Rep* rep = new Table::Rep;
  57. rep->options = options;
  58. rep->file = file;
  59. rep->metaindex_handle = footer.metaindex_handle();
  60. rep->index_block = index_block;
  61. rep->cache_id = (options.block_cache ? options.block_cache->NewId() : 0);
  62. rep->filter_data = nullptr;
  63. rep->filter = nullptr;
  64. *table = new Table(rep);
  65. (*table)->ReadMeta(footer);
  66. }
  67. return s;
  68. }
  69. void Table::ReadMeta(const Footer& footer) {
  70. if (rep_->options.filter_policy == nullptr) {
  71. return; // Do not need any metadata
  72. }
  73. // TODO(sanjay): Skip this if footer.metaindex_handle() size indicates
  74. // it is an empty block.
  75. ReadOptions opt;
  76. if (rep_->options.paranoid_checks) {
  77. opt.verify_checksums = true;
  78. }
  79. BlockContents contents;
  80. if (!ReadBlock(rep_->file, opt, footer.metaindex_handle(), &contents).ok()) {
  81. // Do not propagate errors since meta info is not needed for operation
  82. return;
  83. }
  84. Block* meta = new Block(contents);
  85. Iterator* iter = meta->NewIterator(BytewiseComparator());
  86. std::string key = "filter.";
  87. key.append(rep_->options.filter_policy->Name());
  88. iter->Seek(key);
  89. if (iter->Valid() && iter->key() == Slice(key)) {
  90. ReadFilter(iter->value());
  91. }
  92. delete iter;
  93. delete meta;
  94. }
  95. void Table::ReadFilter(const Slice& filter_handle_value) {
  96. Slice v = filter_handle_value;
  97. BlockHandle filter_handle;
  98. if (!filter_handle.DecodeFrom(&v).ok()) {
  99. return;
  100. }
  101. // We might want to unify with ReadBlock() if we start
  102. // requiring checksum verification in Table::Open.
  103. ReadOptions opt;
  104. if (rep_->options.paranoid_checks) {
  105. opt.verify_checksums = true;
  106. }
  107. BlockContents block;
  108. if (!ReadBlock(rep_->file, opt, filter_handle, &block).ok()) {
  109. return;
  110. }
  111. if (block.heap_allocated) {
  112. rep_->filter_data = block.data.data(); // Will need to delete later
  113. }
  114. rep_->filter = new FilterBlockReader(rep_->options.filter_policy, block.data);
  115. }
  116. Table::~Table() { delete rep_; }
  117. static void DeleteBlock(void* arg, void* ignored) {
  118. delete reinterpret_cast<Block*>(arg);
  119. }
  120. static void DeleteCachedBlock(const Slice& key, void* value) {
  121. Block* block = reinterpret_cast<Block*>(value);
  122. delete block;
  123. }
  124. static void ReleaseBlock(void* arg, void* h) {
  125. Cache* cache = reinterpret_cast<Cache*>(arg);
  126. Cache::Handle* handle = reinterpret_cast<Cache::Handle*>(h);
  127. cache->Release(handle);
  128. }
  129. // Convert an index iterator value (i.e., an encoded BlockHandle)
  130. // into an iterator over the contents of the corresponding block.
  131. Iterator* Table::BlockReader(void* arg, const ReadOptions& options,
  132. const Slice& index_value) {
  133. Table* table = reinterpret_cast<Table*>(arg);
  134. Cache* block_cache = table->rep_->options.block_cache;
  135. Block* block = nullptr;
  136. Cache::Handle* cache_handle = nullptr;
  137. BlockHandle handle;
  138. Slice input = index_value;
  139. Status s = handle.DecodeFrom(&input);
  140. // We intentionally allow extra stuff in index_value so that we
  141. // can add more features in the future.
  142. if (s.ok()) {
  143. BlockContents contents;
  144. if (block_cache != nullptr) {
  145. char cache_key_buffer[16];
  146. EncodeFixed64(cache_key_buffer, table->rep_->cache_id);
  147. EncodeFixed64(cache_key_buffer + 8, handle.offset());
  148. Slice key(cache_key_buffer, sizeof(cache_key_buffer));
  149. cache_handle = block_cache->Lookup(key);
  150. if (cache_handle != nullptr) {
  151. block = reinterpret_cast<Block*>(block_cache->Value(cache_handle));
  152. } else {
  153. s = ReadBlock(table->rep_->file, options, handle, &contents);
  154. if (s.ok()) {
  155. block = new Block(contents);
  156. if (contents.cachable && options.fill_cache) {
  157. cache_handle = block_cache->Insert(key, block, block->size(),
  158. &DeleteCachedBlock);
  159. }
  160. }
  161. }
  162. } else {
  163. s = ReadBlock(table->rep_->file, options, handle, &contents);
  164. if (s.ok()) {
  165. block = new Block(contents);
  166. }
  167. }
  168. }
  169. Iterator* iter;
  170. if (block != nullptr) {
  171. iter = block->NewIterator(table->rep_->options.comparator);
  172. if (cache_handle == nullptr) {
  173. iter->RegisterCleanup(&DeleteBlock, block, nullptr);
  174. } else {
  175. iter->RegisterCleanup(&ReleaseBlock, block_cache, cache_handle);
  176. }
  177. } else {
  178. iter = NewErrorIterator(s);
  179. }
  180. return iter;
  181. }
  182. Iterator* Table::NewIterator(const ReadOptions& options) const {
  183. return NewTwoLevelIterator(
  184. rep_->index_block->NewIterator(rep_->options.comparator),
  185. &Table::BlockReader, const_cast<Table*>(this), options);
  186. }
  187. Status Table::InternalGet(const ReadOptions& options, const Slice& k, void* arg,
  188. void (*handle_result)(void*, const Slice&,
  189. const Slice&)) {
  190. Status s;
  191. Iterator* iiter = rep_->index_block->NewIterator(rep_->options.comparator);
  192. iiter->Seek(k);
  193. if (iiter->Valid()) {
  194. Slice handle_value = iiter->value();
  195. FilterBlockReader* filter = rep_->filter;
  196. BlockHandle handle;
  197. if (filter != nullptr && handle.DecodeFrom(&handle_value).ok() &&
  198. !filter->KeyMayMatch(handle.offset(), k)) {
  199. // Not found
  200. } else {
  201. Iterator* block_iter = BlockReader(this, options, iiter->value());
  202. block_iter->Seek(k);
  203. if (block_iter->Valid()) {
  204. (*handle_result)(arg, block_iter->key(), block_iter->value());
  205. }
  206. s = block_iter->status();
  207. delete block_iter;
  208. }
  209. }
  210. if (s.ok()) {
  211. s = iiter->status();
  212. }
  213. delete iiter;
  214. return s;
  215. }
  216. uint64_t Table::ApproximateOffsetOf(const Slice& key) const {
  217. Iterator* index_iter =
  218. rep_->index_block->NewIterator(rep_->options.comparator);
  219. index_iter->Seek(key);
  220. uint64_t result;
  221. if (index_iter->Valid()) {
  222. BlockHandle handle;
  223. Slice input = index_iter->value();
  224. Status s = handle.DecodeFrom(&input);
  225. if (s.ok()) {
  226. result = handle.offset();
  227. } else {
  228. // Strange: we can't decode the block handle in the index block.
  229. // We'll just return the offset of the metaindex block, which is
  230. // close to the whole file size for this case.
  231. result = rep_->metaindex_handle.offset();
  232. }
  233. } else {
  234. // key is past the last key in the file. Approximate the offset
  235. // by returning the offset of the metaindex block (which is
  236. // right near the end of the file).
  237. result = rep_->metaindex_handle.offset();
  238. }
  239. delete index_iter;
  240. return result;
  241. }
  242. } // namespace leveldb