LevelDB二级索引实现 姚凯文(kevinyao0901) 姜嘉祺
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

401 lines
12 KiB

  1. // Copyright (c) 2011 The LevelDB Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style license that can be
  3. // found in the LICENSE file. See the AUTHORS file for names of contributors.
  4. #include "leveldb/cache.h"
  5. #include <cassert>
  6. #include <cstdio>
  7. #include <cstdlib>
  8. #include "port/port.h"
  9. #include "port/thread_annotations.h"
  10. #include "util/hash.h"
  11. #include "util/mutexlock.h"
  12. namespace leveldb {
  13. Cache::~Cache() {}
  14. namespace {
  15. // LRU cache implementation
  16. //
  17. // Cache entries have an "in_cache" boolean indicating whether the cache has a
  18. // reference on the entry. The only ways that this can become false without the
  19. // entry being passed to its "deleter" are via Erase(), via Insert() when
  20. // an element with a duplicate key is inserted, or on destruction of the cache.
  21. //
  22. // The cache keeps two linked lists of items in the cache. All items in the
  23. // cache are in one list or the other, and never both. Items still referenced
  24. // by clients but erased from the cache are in neither list. The lists are:
  25. // - in-use: contains the items currently referenced by clients, in no
  26. // particular order. (This list is used for invariant checking. If we
  27. // removed the check, elements that would otherwise be on this list could be
  28. // left as disconnected singleton lists.)
  29. // - LRU: contains the items not currently referenced by clients, in LRU order
  30. // Elements are moved between these lists by the Ref() and Unref() methods,
  31. // when they detect an element in the cache acquiring or losing its only
  32. // external reference.
  33. // An entry is a variable length heap-allocated structure. Entries
  34. // are kept in a circular doubly linked list ordered by access time.
  35. struct LRUHandle {
  36. void* value;
  37. void (*deleter)(const Slice&, void* value);
  38. LRUHandle* next_hash;
  39. LRUHandle* next;
  40. LRUHandle* prev;
  41. size_t charge; // TODO(opt): Only allow uint32_t?
  42. size_t key_length;
  43. bool in_cache; // Whether entry is in the cache.
  44. uint32_t refs; // References, including cache reference, if present.
  45. uint32_t hash; // Hash of key(); used for fast sharding and comparisons
  46. char key_data[1]; // Beginning of key
  47. Slice key() const {
  48. // next is only equal to this if the LRU handle is the list head of an
  49. // empty list. List heads never have meaningful keys.
  50. assert(next != this);
  51. return Slice(key_data, key_length);
  52. }
  53. };
  54. // We provide our own simple hash table since it removes a whole bunch
  55. // of porting hacks and is also faster than some of the built-in hash
  56. // table implementations in some of the compiler/runtime combinations
  57. // we have tested. E.g., readrandom speeds up by ~5% over the g++
  58. // 4.4.3's builtin hashtable.
  59. class HandleTable {
  60. public:
  61. HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); }
  62. ~HandleTable() { delete[] list_; }
  63. LRUHandle* Lookup(const Slice& key, uint32_t hash) {
  64. return *FindPointer(key, hash);
  65. }
  66. LRUHandle* Insert(LRUHandle* h) {
  67. LRUHandle** ptr = FindPointer(h->key(), h->hash);
  68. LRUHandle* old = *ptr;
  69. h->next_hash = (old == nullptr ? nullptr : old->next_hash);
  70. *ptr = h;
  71. if (old == nullptr) {
  72. ++elems_;
  73. if (elems_ > length_) {
  74. // Since each cache entry is fairly large, we aim for a small
  75. // average linked list length (<= 1).
  76. Resize();
  77. }
  78. }
  79. return old;
  80. }
  81. LRUHandle* Remove(const Slice& key, uint32_t hash) {
  82. LRUHandle** ptr = FindPointer(key, hash);
  83. LRUHandle* result = *ptr;
  84. if (result != nullptr) {
  85. *ptr = result->next_hash;
  86. --elems_;
  87. }
  88. return result;
  89. }
  90. private:
  91. // The table consists of an array of buckets where each bucket is
  92. // a linked list of cache entries that hash into the bucket.
  93. uint32_t length_;
  94. uint32_t elems_;
  95. LRUHandle** list_;
  96. // Return a pointer to slot that points to a cache entry that
  97. // matches key/hash. If there is no such cache entry, return a
  98. // pointer to the trailing slot in the corresponding linked list.
  99. LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
  100. LRUHandle** ptr = &list_[hash & (length_ - 1)];
  101. while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
  102. ptr = &(*ptr)->next_hash;
  103. }
  104. return ptr;
  105. }
  106. void Resize() {
  107. uint32_t new_length = 4;
  108. while (new_length < elems_) {
  109. new_length *= 2;
  110. }
  111. LRUHandle** new_list = new LRUHandle*[new_length];
  112. memset(new_list, 0, sizeof(new_list[0]) * new_length);
  113. uint32_t count = 0;
  114. for (uint32_t i = 0; i < length_; i++) {
  115. LRUHandle* h = list_[i];
  116. while (h != nullptr) {
  117. LRUHandle* next = h->next_hash;
  118. uint32_t hash = h->hash;
  119. LRUHandle** ptr = &new_list[hash & (new_length - 1)];
  120. h->next_hash = *ptr;
  121. *ptr = h;
  122. h = next;
  123. count++;
  124. }
  125. }
  126. assert(elems_ == count);
  127. delete[] list_;
  128. list_ = new_list;
  129. length_ = new_length;
  130. }
  131. };
  132. // A single shard of sharded cache.
  133. class LRUCache {
  134. public:
  135. LRUCache();
  136. ~LRUCache();
  137. // Separate from constructor so caller can easily make an array of LRUCache
  138. void SetCapacity(size_t capacity) { capacity_ = capacity; }
  139. // Like Cache methods, but with an extra "hash" parameter.
  140. Cache::Handle* Insert(const Slice& key, uint32_t hash, void* value,
  141. size_t charge,
  142. void (*deleter)(const Slice& key, void* value));
  143. Cache::Handle* Lookup(const Slice& key, uint32_t hash);
  144. void Release(Cache::Handle* handle);
  145. void Erase(const Slice& key, uint32_t hash);
  146. void Prune();
  147. size_t TotalCharge() const {
  148. MutexLock l(&mutex_);
  149. return usage_;
  150. }
  151. private:
  152. void LRU_Remove(LRUHandle* e);
  153. void LRU_Append(LRUHandle* list, LRUHandle* e);
  154. void Ref(LRUHandle* e);
  155. void Unref(LRUHandle* e);
  156. bool FinishErase(LRUHandle* e) EXCLUSIVE_LOCKS_REQUIRED(mutex_);
  157. // Initialized before use.
  158. size_t capacity_;
  159. // mutex_ protects the following state.
  160. mutable port::Mutex mutex_;
  161. size_t usage_ GUARDED_BY(mutex_);
  162. // Dummy head of LRU list.
  163. // lru.prev is newest entry, lru.next is oldest entry.
  164. // Entries have refs==1 and in_cache==true.
  165. LRUHandle lru_ GUARDED_BY(mutex_);
  166. // Dummy head of in-use list.
  167. // Entries are in use by clients, and have refs >= 2 and in_cache==true.
  168. LRUHandle in_use_ GUARDED_BY(mutex_);
  169. HandleTable table_ GUARDED_BY(mutex_);
  170. };
  171. LRUCache::LRUCache() : capacity_(0), usage_(0) {
  172. // Make empty circular linked lists.
  173. lru_.next = &lru_;
  174. lru_.prev = &lru_;
  175. in_use_.next = &in_use_;
  176. in_use_.prev = &in_use_;
  177. }
  178. LRUCache::~LRUCache() {
  179. assert(in_use_.next == &in_use_); // Error if caller has an unreleased handle
  180. for (LRUHandle* e = lru_.next; e != &lru_;) {
  181. LRUHandle* next = e->next;
  182. assert(e->in_cache);
  183. e->in_cache = false;
  184. assert(e->refs == 1); // Invariant of lru_ list.
  185. Unref(e);
  186. e = next;
  187. }
  188. }
  189. void LRUCache::Ref(LRUHandle* e) {
  190. if (e->refs == 1 && e->in_cache) { // If on lru_ list, move to in_use_ list.
  191. LRU_Remove(e);
  192. LRU_Append(&in_use_, e);
  193. }
  194. e->refs++;
  195. }
  196. void LRUCache::Unref(LRUHandle* e) {
  197. assert(e->refs > 0);
  198. e->refs--;
  199. if (e->refs == 0) { // Deallocate.
  200. assert(!e->in_cache);
  201. (*e->deleter)(e->key(), e->value);
  202. free(e);
  203. } else if (e->in_cache && e->refs == 1) {
  204. // No longer in use; move to lru_ list.
  205. LRU_Remove(e);
  206. LRU_Append(&lru_, e);
  207. }
  208. }
  209. void LRUCache::LRU_Remove(LRUHandle* e) {
  210. e->next->prev = e->prev;
  211. e->prev->next = e->next;
  212. }
  213. void LRUCache::LRU_Append(LRUHandle* list, LRUHandle* e) {
  214. // Make "e" newest entry by inserting just before *list
  215. e->next = list;
  216. e->prev = list->prev;
  217. e->prev->next = e;
  218. e->next->prev = e;
  219. }
  220. Cache::Handle* LRUCache::Lookup(const Slice& key, uint32_t hash) {
  221. MutexLock l(&mutex_);
  222. LRUHandle* e = table_.Lookup(key, hash);
  223. if (e != nullptr) {
  224. Ref(e);
  225. }
  226. return reinterpret_cast<Cache::Handle*>(e);
  227. }
  228. void LRUCache::Release(Cache::Handle* handle) {
  229. MutexLock l(&mutex_);
  230. Unref(reinterpret_cast<LRUHandle*>(handle));
  231. }
  232. Cache::Handle* LRUCache::Insert(const Slice& key, uint32_t hash, void* value,
  233. size_t charge,
  234. void (*deleter)(const Slice& key,
  235. void* value)) {
  236. MutexLock l(&mutex_);
  237. LRUHandle* e =
  238. reinterpret_cast<LRUHandle*>(malloc(sizeof(LRUHandle) - 1 + key.size()));
  239. e->value = value;
  240. e->deleter = deleter;
  241. e->charge = charge;
  242. e->key_length = key.size();
  243. e->hash = hash;
  244. e->in_cache = false;
  245. e->refs = 1; // for the returned handle.
  246. std::memcpy(e->key_data, key.data(), key.size());
  247. if (capacity_ > 0) {
  248. e->refs++; // for the cache's reference.
  249. e->in_cache = true;
  250. LRU_Append(&in_use_, e);
  251. usage_ += charge;
  252. FinishErase(table_.Insert(e));
  253. } else { // don't cache. (capacity_==0 is supported and turns off caching.)
  254. // next is read by key() in an assert, so it must be initialized
  255. e->next = nullptr;
  256. }
  257. while (usage_ > capacity_ && lru_.next != &lru_) {
  258. LRUHandle* old = lru_.next;
  259. assert(old->refs == 1);
  260. bool erased = FinishErase(table_.Remove(old->key(), old->hash));
  261. if (!erased) { // to avoid unused variable when compiled NDEBUG
  262. assert(erased);
  263. }
  264. }
  265. return reinterpret_cast<Cache::Handle*>(e);
  266. }
  267. // If e != nullptr, finish removing *e from the cache; it has already been
  268. // removed from the hash table. Return whether e != nullptr.
  269. bool LRUCache::FinishErase(LRUHandle* e) {
  270. if (e != nullptr) {
  271. assert(e->in_cache);
  272. LRU_Remove(e);
  273. e->in_cache = false;
  274. usage_ -= e->charge;
  275. Unref(e);
  276. }
  277. return e != nullptr;
  278. }
  279. void LRUCache::Erase(const Slice& key, uint32_t hash) {
  280. MutexLock l(&mutex_);
  281. FinishErase(table_.Remove(key, hash));
  282. }
  283. void LRUCache::Prune() {
  284. MutexLock l(&mutex_);
  285. while (lru_.next != &lru_) {
  286. LRUHandle* e = lru_.next;
  287. assert(e->refs == 1);
  288. bool erased = FinishErase(table_.Remove(e->key(), e->hash));
  289. if (!erased) { // to avoid unused variable when compiled NDEBUG
  290. assert(erased);
  291. }
  292. }
  293. }
  294. static const int kNumShardBits = 4;
  295. static const int kNumShards = 1 << kNumShardBits;
  296. class ShardedLRUCache : public Cache {
  297. private:
  298. LRUCache shard_[kNumShards];
  299. port::Mutex id_mutex_;
  300. uint64_t last_id_;
  301. static inline uint32_t HashSlice(const Slice& s) {
  302. return Hash(s.data(), s.size(), 0);
  303. }
  304. static uint32_t Shard(uint32_t hash) { return hash >> (32 - kNumShardBits); }
  305. public:
  306. explicit ShardedLRUCache(size_t capacity) : last_id_(0) {
  307. const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
  308. for (int s = 0; s < kNumShards; s++) {
  309. shard_[s].SetCapacity(per_shard);
  310. }
  311. }
  312. ~ShardedLRUCache() override {}
  313. Handle* Insert(const Slice& key, void* value, size_t charge,
  314. void (*deleter)(const Slice& key, void* value)) override {
  315. const uint32_t hash = HashSlice(key);
  316. return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
  317. }
  318. Handle* Lookup(const Slice& key) override {
  319. const uint32_t hash = HashSlice(key);
  320. return shard_[Shard(hash)].Lookup(key, hash);
  321. }
  322. void Release(Handle* handle) override {
  323. LRUHandle* h = reinterpret_cast<LRUHandle*>(handle);
  324. shard_[Shard(h->hash)].Release(handle);
  325. }
  326. void Erase(const Slice& key) override {
  327. const uint32_t hash = HashSlice(key);
  328. shard_[Shard(hash)].Erase(key, hash);
  329. }
  330. void* Value(Handle* handle) override {
  331. return reinterpret_cast<LRUHandle*>(handle)->value;
  332. }
  333. uint64_t NewId() override {
  334. MutexLock l(&id_mutex_);
  335. return ++(last_id_);
  336. }
  337. void Prune() override {
  338. for (int s = 0; s < kNumShards; s++) {
  339. shard_[s].Prune();
  340. }
  341. }
  342. size_t TotalCharge() const override {
  343. size_t total = 0;
  344. for (int s = 0; s < kNumShards; s++) {
  345. total += shard_[s].TotalCharge();
  346. }
  347. return total;
  348. }
  349. };
  350. } // end anonymous namespace
  351. Cache* NewLRUCache(size_t capacity) { return new ShardedLRUCache(capacity); }
  352. } // namespace leveldb