小组成员:姚凯文(kevinyao0901),姜嘉琪
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1585 lines
51 KiB

5 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
Fix snapshot compaction bug Closes google/leveldb#320 During compaction it was possible that records from a block b1=(l1,u1) would be pushed down from level i to level i+1. If there is a block b2=(l2,u2) at level i with k1 = user_key(u1) = user_key(l2) then a subsequent search for k1 will yield the record l2 which has a smaller sequence number than u1 because the sort order for records sorts increasing by user key but decreaing by sequence number. This change add a call to a new function AddBoundaryInputs to SetupOtherInputs. AddBoundaryInputs searches for a block b2 matching the criteria above and adds it to the set of files to be compacted. Whenever AddBoundaryInputs is called it is important that the compaction fileset in level i+1 (known as c->inputs_[1] in the code) be recomputed. Each call to AddBoundaryInputs is followed by a call to GetOverlappingInputs. SetupOtherInputs is called on both manual and automated compaction passes. It is called for both level zero and for levels greater than 0. The original change posted in https://github.com/google/leveldb/pull/339 has been modified to also include changed made by Chris Mumford<cmumford@google.com> in https://github.com/cmumford/leveldb/commit/4b72cb14f8da2aab12451c24b8e205aff686e9dc 1. Releasing snapshots during test cleanup to avoid memory leak warnings. 2. Refactored test to use testutil.h to be in line with other issue tests and to create the test database in the correct temporary location. 3. Added copyright banner. Otherwise, just minor formatting and limiting character width to 80 characters. Additionally the change was rebased on top of current master and changes previously made to the Makefile were ported to the CMakeLists.txt. Testing Done: A test program (issue320_test) was constructed that performs mutations while snapshots are active. issue320_test fails without this bug fix after 64k writes. It passes with this bug fix. It was run with 200M writes and passed. Unit tests were written for the new function that was added to the code. Make test was run and seen to pass. Signed-off-by: Richard Cole <richcole@amazon.com>
8 years ago
  1. // Copyright (c) 2011 The LevelDB Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style license that can be
  3. // found in the LICENSE file. See the AUTHORS file for names of contributors.
  4. #include "db/version_set.h"
  5. #include <stdio.h>
  6. #include <algorithm>
  7. #include "db/filename.h"
  8. #include "db/log_reader.h"
  9. #include "db/log_writer.h"
  10. #include "db/memtable.h"
  11. #include "db/table_cache.h"
  12. #include "leveldb/env.h"
  13. #include "leveldb/table_builder.h"
  14. #include "table/merger.h"
  15. #include "table/two_level_iterator.h"
  16. #include "util/coding.h"
  17. #include "util/logging.h"
  18. namespace leveldb {
  19. static size_t TargetFileSize(const Options* options) {
  20. return options->max_file_size;
  21. }
  22. // Maximum bytes of overlaps in grandparent (i.e., level+2) before we
  23. // stop building a single file in a level->level+1 compaction.
  24. static int64_t MaxGrandParentOverlapBytes(const Options* options) {
  25. return 10 * TargetFileSize(options);
  26. }
  27. // Maximum number of bytes in all compacted files. We avoid expanding
  28. // the lower level file set of a compaction if it would make the
  29. // total compaction cover more than this many bytes.
  30. static int64_t ExpandedCompactionByteSizeLimit(const Options* options) {
  31. return 25 * TargetFileSize(options);
  32. }
  33. static double MaxBytesForLevel(const Options* options, int level) {
  34. // Note: the result for level zero is not really used since we set
  35. // the level-0 compaction threshold based on number of files.
  36. // Result for both level-0 and level-1
  37. double result = 10. * 1048576.0;
  38. while (level > 1) {
  39. result *= 10;
  40. level--;
  41. }
  42. return result;
  43. }
  44. static uint64_t MaxFileSizeForLevel(const Options* options, int level) {
  45. // We could vary per level to reduce number of files?
  46. return TargetFileSize(options);
  47. }
  48. static int64_t TotalFileSize(const std::vector<FileMetaData*>& files) {
  49. int64_t sum = 0;
  50. for (size_t i = 0; i < files.size(); i++) {
  51. sum += files[i]->file_size;
  52. }
  53. return sum;
  54. }
  55. Version::~Version() {
  56. assert(refs_ == 0);
  57. // Remove from linked list
  58. prev_->next_ = next_;
  59. next_->prev_ = prev_;
  60. // Drop references to files
  61. for (int level = 0; level < config::kNumLevels; level++) {
  62. for (size_t i = 0; i < files_[level].size(); i++) {
  63. FileMetaData* f = files_[level][i];
  64. assert(f->refs > 0);
  65. f->refs--;
  66. if (f->refs <= 0) {
  67. delete f;
  68. }
  69. }
  70. }
  71. }
  72. int FindFile(const InternalKeyComparator& icmp,
  73. const std::vector<FileMetaData*>& files, const Slice& key) {
  74. uint32_t left = 0;
  75. uint32_t right = files.size();
  76. while (left < right) {
  77. uint32_t mid = (left + right) / 2;
  78. const FileMetaData* f = files[mid];
  79. if (icmp.InternalKeyComparator::Compare(f->largest.Encode(), key) < 0) {
  80. // Key at "mid.largest" is < "target". Therefore all
  81. // files at or before "mid" are uninteresting.
  82. left = mid + 1;
  83. } else {
  84. // Key at "mid.largest" is >= "target". Therefore all files
  85. // after "mid" are uninteresting.
  86. right = mid;
  87. }
  88. }
  89. return right;
  90. }
  91. static bool AfterFile(const Comparator* ucmp, const Slice* user_key,
  92. const FileMetaData* f) {
  93. // null user_key occurs before all keys and is therefore never after *f
  94. return (user_key != nullptr &&
  95. ucmp->Compare(*user_key, f->largest.user_key()) > 0);
  96. }
  97. static bool BeforeFile(const Comparator* ucmp, const Slice* user_key,
  98. const FileMetaData* f) {
  99. // null user_key occurs after all keys and is therefore never before *f
  100. return (user_key != nullptr &&
  101. ucmp->Compare(*user_key, f->smallest.user_key()) < 0);
  102. }
  103. bool SomeFileOverlapsRange(const InternalKeyComparator& icmp,
  104. bool disjoint_sorted_files,
  105. const std::vector<FileMetaData*>& files,
  106. const Slice* smallest_user_key,
  107. const Slice* largest_user_key) {
  108. const Comparator* ucmp = icmp.user_comparator();
  109. if (!disjoint_sorted_files) {
  110. // Need to check against all files
  111. for (size_t i = 0; i < files.size(); i++) {
  112. const FileMetaData* f = files[i];
  113. if (AfterFile(ucmp, smallest_user_key, f) ||
  114. BeforeFile(ucmp, largest_user_key, f)) {
  115. // No overlap
  116. } else {
  117. return true; // Overlap
  118. }
  119. }
  120. return false;
  121. }
  122. // Binary search over file list
  123. uint32_t index = 0;
  124. if (smallest_user_key != nullptr) {
  125. // Find the earliest possible internal key for smallest_user_key
  126. InternalKey small_key(*smallest_user_key, kMaxSequenceNumber,
  127. kValueTypeForSeek);
  128. index = FindFile(icmp, files, small_key.Encode());
  129. }
  130. if (index >= files.size()) {
  131. // beginning of range is after all files, so no overlap.
  132. return false;
  133. }
  134. return !BeforeFile(ucmp, largest_user_key, files[index]);
  135. }
  136. // An internal iterator. For a given version/level pair, yields
  137. // information about the files in the level. For a given entry, key()
  138. // is the largest key that occurs in the file, and value() is an
  139. // 16-byte value containing the file number and file size, both
  140. // encoded using EncodeFixed64.
  141. class Version::LevelFileNumIterator : public Iterator {
  142. public:
  143. LevelFileNumIterator(const InternalKeyComparator& icmp,
  144. const std::vector<FileMetaData*>* flist)
  145. : icmp_(icmp), flist_(flist), index_(flist->size()) { // Marks as invalid
  146. }
  147. virtual bool Valid() const { return index_ < flist_->size(); }
  148. virtual void Seek(const Slice& target) {
  149. index_ = FindFile(icmp_, *flist_, target);
  150. }
  151. virtual void SeekToFirst() { index_ = 0; }
  152. virtual void SeekToLast() {
  153. index_ = flist_->empty() ? 0 : flist_->size() - 1;
  154. }
  155. virtual void Next() {
  156. assert(Valid());
  157. index_++;
  158. }
  159. virtual void Prev() {
  160. assert(Valid());
  161. if (index_ == 0) {
  162. index_ = flist_->size(); // Marks as invalid
  163. } else {
  164. index_--;
  165. }
  166. }
  167. Slice key() const {
  168. assert(Valid());
  169. return (*flist_)[index_]->largest.Encode();
  170. }
  171. Slice value() const {
  172. assert(Valid());
  173. EncodeFixed64(value_buf_, (*flist_)[index_]->number);
  174. EncodeFixed64(value_buf_ + 8, (*flist_)[index_]->file_size);
  175. return Slice(value_buf_, sizeof(value_buf_));
  176. }
  177. virtual Status status() const { return Status::OK(); }
  178. private:
  179. const InternalKeyComparator icmp_;
  180. const std::vector<FileMetaData*>* const flist_;
  181. uint32_t index_;
  182. // Backing store for value(). Holds the file number and size.
  183. mutable char value_buf_[16];
  184. };
  185. static Iterator* GetFileIterator(void* arg, const ReadOptions& options,
  186. const Slice& file_value) {
  187. TableCache* cache = reinterpret_cast<TableCache*>(arg);
  188. if (file_value.size() != 16) {
  189. return NewErrorIterator(
  190. Status::Corruption("FileReader invoked with unexpected value"));
  191. } else {
  192. return cache->NewIterator(options, DecodeFixed64(file_value.data()),
  193. DecodeFixed64(file_value.data() + 8));
  194. }
  195. }
  196. Iterator* Version::NewConcatenatingIterator(const ReadOptions& options,
  197. int level) const {
  198. return NewTwoLevelIterator(
  199. new LevelFileNumIterator(vset_->icmp_, &files_[level]), &GetFileIterator,
  200. vset_->table_cache_, options);
  201. }
  202. void Version::AddIterators(const ReadOptions& options,
  203. std::vector<Iterator*>* iters) {
  204. // Merge all level zero files together since they may overlap
  205. for (size_t i = 0; i < files_[0].size(); i++) {
  206. iters->push_back(vset_->table_cache_->NewIterator(
  207. options, files_[0][i]->number, files_[0][i]->file_size));
  208. }
  209. // For levels > 0, we can use a concatenating iterator that sequentially
  210. // walks through the non-overlapping files in the level, opening them
  211. // lazily.
  212. for (int level = 1; level < config::kNumLevels; level++) {
  213. if (!files_[level].empty()) {
  214. iters->push_back(NewConcatenatingIterator(options, level));
  215. }
  216. }
  217. }
  218. // Callback from TableCache::Get()
  219. namespace {
  220. enum SaverState {
  221. kNotFound,
  222. kFound,
  223. kDeleted,
  224. kCorrupt,
  225. };
  226. struct Saver {
  227. SaverState state;
  228. const Comparator* ucmp;
  229. Slice user_key;
  230. std::string* value;
  231. };
  232. } // namespace
  233. static void SaveValue(void* arg, const Slice& ikey, const Slice& v) {
  234. Saver* s = reinterpret_cast<Saver*>(arg);
  235. ParsedInternalKey parsed_key;
  236. if (!ParseInternalKey(ikey, &parsed_key)) {
  237. s->state = kCorrupt;
  238. } else {
  239. if (s->ucmp->Compare(parsed_key.user_key, s->user_key) == 0) {
  240. s->state = (parsed_key.type == kTypeValue) ? kFound : kDeleted;
  241. if (s->state == kFound) {
  242. s->value->assign(v.data(), v.size());
  243. }
  244. }
  245. }
  246. }
  247. static bool NewestFirst(FileMetaData* a, FileMetaData* b) {
  248. return a->number > b->number;
  249. }
  250. void Version::ForEachOverlapping(Slice user_key, Slice internal_key, void* arg,
  251. bool (*func)(void*, int, FileMetaData*)) {
  252. // TODO(sanjay): Change Version::Get() to use this function.
  253. const Comparator* ucmp = vset_->icmp_.user_comparator();
  254. // Search level-0 in order from newest to oldest.
  255. std::vector<FileMetaData*> tmp;
  256. tmp.reserve(files_[0].size());
  257. for (uint32_t i = 0; i < files_[0].size(); i++) {
  258. FileMetaData* f = files_[0][i];
  259. if (ucmp->Compare(user_key, f->smallest.user_key()) >= 0 &&
  260. ucmp->Compare(user_key, f->largest.user_key()) <= 0) {
  261. tmp.push_back(f);
  262. }
  263. }
  264. if (!tmp.empty()) {
  265. std::sort(tmp.begin(), tmp.end(), NewestFirst);
  266. for (uint32_t i = 0; i < tmp.size(); i++) {
  267. if (!(*func)(arg, 0, tmp[i])) {
  268. return;
  269. }
  270. }
  271. }
  272. // Search other levels.
  273. for (int level = 1; level < config::kNumLevels; level++) {
  274. size_t num_files = files_[level].size();
  275. if (num_files == 0) continue;
  276. // Binary search to find earliest index whose largest key >= internal_key.
  277. uint32_t index = FindFile(vset_->icmp_, files_[level], internal_key);
  278. if (index < num_files) {
  279. FileMetaData* f = files_[level][index];
  280. if (ucmp->Compare(user_key, f->smallest.user_key()) < 0) {
  281. // All of "f" is past any data for user_key
  282. } else {
  283. if (!(*func)(arg, level, f)) {
  284. return;
  285. }
  286. }
  287. }
  288. }
  289. }
  290. Status Version::Get(const ReadOptions& options, const LookupKey& k,
  291. std::string* value, GetStats* stats) {
  292. Slice ikey = k.internal_key();
  293. Slice user_key = k.user_key();
  294. const Comparator* ucmp = vset_->icmp_.user_comparator();
  295. Status s;
  296. stats->seek_file = nullptr;
  297. stats->seek_file_level = -1;
  298. FileMetaData* last_file_read = nullptr;
  299. int last_file_read_level = -1;
  300. // We can search level-by-level since entries never hop across
  301. // levels. Therefore we are guaranteed that if we find data
  302. // in a smaller level, later levels are irrelevant.
  303. std::vector<FileMetaData*> tmp;
  304. FileMetaData* tmp2;
  305. for (int level = 0; level < config::kNumLevels; level++) {
  306. size_t num_files = files_[level].size();
  307. if (num_files == 0) continue;
  308. // Get the list of files to search in this level
  309. FileMetaData* const* files = &files_[level][0];
  310. if (level == 0) {
  311. // Level-0 files may overlap each other. Find all files that
  312. // overlap user_key and process them in order from newest to oldest.
  313. tmp.reserve(num_files);
  314. for (uint32_t i = 0; i < num_files; i++) {
  315. FileMetaData* f = files[i];
  316. if (ucmp->Compare(user_key, f->smallest.user_key()) >= 0 &&
  317. ucmp->Compare(user_key, f->largest.user_key()) <= 0) {
  318. tmp.push_back(f);
  319. }
  320. }
  321. if (tmp.empty()) continue;
  322. std::sort(tmp.begin(), tmp.end(), NewestFirst);
  323. files = &tmp[0];
  324. num_files = tmp.size();
  325. } else {
  326. // Binary search to find earliest index whose largest key >= ikey.
  327. uint32_t index = FindFile(vset_->icmp_, files_[level], ikey);
  328. if (index >= num_files) {
  329. files = nullptr;
  330. num_files = 0;
  331. } else {
  332. tmp2 = files[index];
  333. if (ucmp->Compare(user_key, tmp2->smallest.user_key()) < 0) {
  334. // All of "tmp2" is past any data for user_key
  335. files = nullptr;
  336. num_files = 0;
  337. } else {
  338. files = &tmp2;
  339. num_files = 1;
  340. }
  341. }
  342. }
  343. for (uint32_t i = 0; i < num_files; ++i) {
  344. if (last_file_read != nullptr && stats->seek_file == nullptr) {
  345. // We have had more than one seek for this read. Charge the 1st file.
  346. stats->seek_file = last_file_read;
  347. stats->seek_file_level = last_file_read_level;
  348. }
  349. FileMetaData* f = files[i];
  350. last_file_read = f;
  351. last_file_read_level = level;
  352. Saver saver;
  353. saver.state = kNotFound;
  354. saver.ucmp = ucmp;
  355. saver.user_key = user_key;
  356. saver.value = value;
  357. s = vset_->table_cache_->Get(options, f->number, f->file_size, ikey,
  358. &saver, SaveValue);
  359. if (!s.ok()) {
  360. return s;
  361. }
  362. switch (saver.state) {
  363. case kNotFound:
  364. break; // Keep searching in other files
  365. case kFound:
  366. return s;
  367. case kDeleted:
  368. s = Status::NotFound(Slice()); // Use empty error message for speed
  369. return s;
  370. case kCorrupt:
  371. s = Status::Corruption("corrupted key for ", user_key);
  372. return s;
  373. }
  374. }
  375. }
  376. return Status::NotFound(Slice()); // Use an empty error message for speed
  377. }
  378. bool Version::UpdateStats(const GetStats& stats) {
  379. FileMetaData* f = stats.seek_file;
  380. if (f != nullptr) {
  381. f->allowed_seeks--;
  382. if (f->allowed_seeks <= 0 && file_to_compact_ == nullptr) {
  383. file_to_compact_ = f;
  384. file_to_compact_level_ = stats.seek_file_level;
  385. return true;
  386. }
  387. }
  388. return false;
  389. }
  390. bool Version::RecordReadSample(Slice internal_key) {
  391. ParsedInternalKey ikey;
  392. if (!ParseInternalKey(internal_key, &ikey)) {
  393. return false;
  394. }
  395. struct State {
  396. GetStats stats; // Holds first matching file
  397. int matches;
  398. static bool Match(void* arg, int level, FileMetaData* f) {
  399. State* state = reinterpret_cast<State*>(arg);
  400. state->matches++;
  401. if (state->matches == 1) {
  402. // Remember first match.
  403. state->stats.seek_file = f;
  404. state->stats.seek_file_level = level;
  405. }
  406. // We can stop iterating once we have a second match.
  407. return state->matches < 2;
  408. }
  409. };
  410. State state;
  411. state.matches = 0;
  412. ForEachOverlapping(ikey.user_key, internal_key, &state, &State::Match);
  413. // Must have at least two matches since we want to merge across
  414. // files. But what if we have a single file that contains many
  415. // overwrites and deletions? Should we have another mechanism for
  416. // finding such files?
  417. if (state.matches >= 2) {
  418. // 1MB cost is about 1 seek (see comment in Builder::Apply).
  419. return UpdateStats(state.stats);
  420. }
  421. return false;
  422. }
  423. void Version::Ref() { ++refs_; }
  424. void Version::Unref() {
  425. assert(this != &vset_->dummy_versions_);
  426. assert(refs_ >= 1);
  427. --refs_;
  428. if (refs_ == 0) {
  429. delete this;
  430. }
  431. }
  432. bool Version::OverlapInLevel(int level, const Slice* smallest_user_key,
  433. const Slice* largest_user_key) {
  434. return SomeFileOverlapsRange(vset_->icmp_, (level > 0), files_[level],
  435. smallest_user_key, largest_user_key);
  436. }
  437. int Version::PickLevelForMemTableOutput(const Slice& smallest_user_key,
  438. const Slice& largest_user_key) {
  439. int level = 0;
  440. if (!OverlapInLevel(0, &smallest_user_key, &largest_user_key)) {
  441. // Push to next level if there is no overlap in next level,
  442. // and the #bytes overlapping in the level after that are limited.
  443. InternalKey start(smallest_user_key, kMaxSequenceNumber, kValueTypeForSeek);
  444. InternalKey limit(largest_user_key, 0, static_cast<ValueType>(0));
  445. std::vector<FileMetaData*> overlaps;
  446. while (level < config::kMaxMemCompactLevel) {
  447. if (OverlapInLevel(level + 1, &smallest_user_key, &largest_user_key)) {
  448. break;
  449. }
  450. if (level + 2 < config::kNumLevels) {
  451. // Check that file does not overlap too many grandparent bytes.
  452. GetOverlappingInputs(level + 2, &start, &limit, &overlaps);
  453. const int64_t sum = TotalFileSize(overlaps);
  454. if (sum > MaxGrandParentOverlapBytes(vset_->options_)) {
  455. break;
  456. }
  457. }
  458. level++;
  459. }
  460. }
  461. return level;
  462. }
  463. // Store in "*inputs" all files in "level" that overlap [begin,end]
  464. void Version::GetOverlappingInputs(int level, const InternalKey* begin,
  465. const InternalKey* end,
  466. std::vector<FileMetaData*>* inputs) {
  467. assert(level >= 0);
  468. assert(level < config::kNumLevels);
  469. inputs->clear();
  470. Slice user_begin, user_end;
  471. if (begin != nullptr) {
  472. user_begin = begin->user_key();
  473. }
  474. if (end != nullptr) {
  475. user_end = end->user_key();
  476. }
  477. const Comparator* user_cmp = vset_->icmp_.user_comparator();
  478. for (size_t i = 0; i < files_[level].size();) {
  479. FileMetaData* f = files_[level][i++];
  480. const Slice file_start = f->smallest.user_key();
  481. const Slice file_limit = f->largest.user_key();
  482. if (begin != nullptr && user_cmp->Compare(file_limit, user_begin) < 0) {
  483. // "f" is completely before specified range; skip it
  484. } else if (end != nullptr && user_cmp->Compare(file_start, user_end) > 0) {
  485. // "f" is completely after specified range; skip it
  486. } else {
  487. inputs->push_back(f);
  488. if (level == 0) {
  489. // Level-0 files may overlap each other. So check if the newly
  490. // added file has expanded the range. If so, restart search.
  491. if (begin != nullptr && user_cmp->Compare(file_start, user_begin) < 0) {
  492. user_begin = file_start;
  493. inputs->clear();
  494. i = 0;
  495. } else if (end != nullptr &&
  496. user_cmp->Compare(file_limit, user_end) > 0) {
  497. user_end = file_limit;
  498. inputs->clear();
  499. i = 0;
  500. }
  501. }
  502. }
  503. }
  504. }
  505. std::string Version::DebugString() const {
  506. std::string r;
  507. for (int level = 0; level < config::kNumLevels; level++) {
  508. // E.g.,
  509. // --- level 1 ---
  510. // 17:123['a' .. 'd']
  511. // 20:43['e' .. 'g']
  512. r.append("--- level ");
  513. AppendNumberTo(&r, level);
  514. r.append(" ---\n");
  515. const std::vector<FileMetaData*>& files = files_[level];
  516. for (size_t i = 0; i < files.size(); i++) {
  517. r.push_back(' ');
  518. AppendNumberTo(&r, files[i]->number);
  519. r.push_back(':');
  520. AppendNumberTo(&r, files[i]->file_size);
  521. r.append("[");
  522. r.append(files[i]->smallest.DebugString());
  523. r.append(" .. ");
  524. r.append(files[i]->largest.DebugString());
  525. r.append("]\n");
  526. }
  527. }
  528. return r;
  529. }
  530. // A helper class so we can efficiently apply a whole sequence
  531. // of edits to a particular state without creating intermediate
  532. // Versions that contain full copies of the intermediate state.
  533. class VersionSet::Builder {
  534. private:
  535. // Helper to sort by v->files_[file_number].smallest
  536. struct BySmallestKey {
  537. const InternalKeyComparator* internal_comparator;
  538. bool operator()(FileMetaData* f1, FileMetaData* f2) const {
  539. int r = internal_comparator->Compare(f1->smallest, f2->smallest);
  540. if (r != 0) {
  541. return (r < 0);
  542. } else {
  543. // Break ties by file number
  544. return (f1->number < f2->number);
  545. }
  546. }
  547. };
  548. typedef std::set<FileMetaData*, BySmallestKey> FileSet;
  549. struct LevelState {
  550. std::set<uint64_t> deleted_files;
  551. FileSet* added_files;
  552. };
  553. VersionSet* vset_;
  554. Version* base_;
  555. LevelState levels_[config::kNumLevels];
  556. public:
  557. // Initialize a builder with the files from *base and other info from *vset
  558. Builder(VersionSet* vset, Version* base) : vset_(vset), base_(base) {
  559. base_->Ref();
  560. BySmallestKey cmp;
  561. cmp.internal_comparator = &vset_->icmp_;
  562. for (int level = 0; level < config::kNumLevels; level++) {
  563. levels_[level].added_files = new FileSet(cmp);
  564. }
  565. }
  566. ~Builder() {
  567. for (int level = 0; level < config::kNumLevels; level++) {
  568. const FileSet* added = levels_[level].added_files;
  569. std::vector<FileMetaData*> to_unref;
  570. to_unref.reserve(added->size());
  571. for (FileSet::const_iterator it = added->begin(); it != added->end();
  572. ++it) {
  573. to_unref.push_back(*it);
  574. }
  575. delete added;
  576. for (uint32_t i = 0; i < to_unref.size(); i++) {
  577. FileMetaData* f = to_unref[i];
  578. f->refs--;
  579. if (f->refs <= 0) {
  580. delete f;
  581. }
  582. }
  583. }
  584. base_->Unref();
  585. }
  586. // Apply all of the edits in *edit to the current state.
  587. void Apply(VersionEdit* edit) {
  588. // Update compaction pointers
  589. for (size_t i = 0; i < edit->compact_pointers_.size(); i++) {
  590. const int level = edit->compact_pointers_[i].first;
  591. vset_->compact_pointer_[level] =
  592. edit->compact_pointers_[i].second.Encode().ToString();
  593. }
  594. // Delete files
  595. const VersionEdit::DeletedFileSet& del = edit->deleted_files_;
  596. for (VersionEdit::DeletedFileSet::const_iterator iter = del.begin();
  597. iter != del.end(); ++iter) {
  598. const int level = iter->first;
  599. const uint64_t number = iter->second;
  600. levels_[level].deleted_files.insert(number);
  601. }
  602. // Add new files
  603. for (size_t i = 0; i < edit->new_files_.size(); i++) {
  604. const int level = edit->new_files_[i].first;
  605. FileMetaData* f = new FileMetaData(edit->new_files_[i].second);
  606. f->refs = 1;
  607. // We arrange to automatically compact this file after
  608. // a certain number of seeks. Let's assume:
  609. // (1) One seek costs 10ms
  610. // (2) Writing or reading 1MB costs 10ms (100MB/s)
  611. // (3) A compaction of 1MB does 25MB of IO:
  612. // 1MB read from this level
  613. // 10-12MB read from next level (boundaries may be misaligned)
  614. // 10-12MB written to next level
  615. // This implies that 25 seeks cost the same as the compaction
  616. // of 1MB of data. I.e., one seek costs approximately the
  617. // same as the compaction of 40KB of data. We are a little
  618. // conservative and allow approximately one seek for every 16KB
  619. // of data before triggering a compaction.
  620. f->allowed_seeks = static_cast<int>((f->file_size / 16384U));
  621. if (f->allowed_seeks < 100) f->allowed_seeks = 100;
  622. levels_[level].deleted_files.erase(f->number);
  623. levels_[level].added_files->insert(f);
  624. }
  625. }
  626. // Save the current state in *v.
  627. void SaveTo(Version* v) {
  628. BySmallestKey cmp;
  629. cmp.internal_comparator = &vset_->icmp_;
  630. for (int level = 0; level < config::kNumLevels; level++) {
  631. // Merge the set of added files with the set of pre-existing files.
  632. // Drop any deleted files. Store the result in *v.
  633. const std::vector<FileMetaData*>& base_files = base_->files_[level];
  634. std::vector<FileMetaData*>::const_iterator base_iter = base_files.begin();
  635. std::vector<FileMetaData*>::const_iterator base_end = base_files.end();
  636. const FileSet* added = levels_[level].added_files;
  637. v->files_[level].reserve(base_files.size() + added->size());
  638. for (FileSet::const_iterator added_iter = added->begin();
  639. added_iter != added->end(); ++added_iter) {
  640. // Add all smaller files listed in base_
  641. for (std::vector<FileMetaData*>::const_iterator bpos =
  642. std::upper_bound(base_iter, base_end, *added_iter, cmp);
  643. base_iter != bpos; ++base_iter) {
  644. MaybeAddFile(v, level, *base_iter);
  645. }
  646. MaybeAddFile(v, level, *added_iter);
  647. }
  648. // Add remaining base files
  649. for (; base_iter != base_end; ++base_iter) {
  650. MaybeAddFile(v, level, *base_iter);
  651. }
  652. #ifndef NDEBUG
  653. // Make sure there is no overlap in levels > 0
  654. if (level > 0) {
  655. for (uint32_t i = 1; i < v->files_[level].size(); i++) {
  656. const InternalKey& prev_end = v->files_[level][i - 1]->largest;
  657. const InternalKey& this_begin = v->files_[level][i]->smallest;
  658. if (vset_->icmp_.Compare(prev_end, this_begin) >= 0) {
  659. fprintf(stderr, "overlapping ranges in same level %s vs. %s\n",
  660. prev_end.DebugString().c_str(),
  661. this_begin.DebugString().c_str());
  662. abort();
  663. }
  664. }
  665. }
  666. #endif
  667. }
  668. }
  669. void MaybeAddFile(Version* v, int level, FileMetaData* f) {
  670. if (levels_[level].deleted_files.count(f->number) > 0) {
  671. // File is deleted: do nothing
  672. } else {
  673. std::vector<FileMetaData*>* files = &v->files_[level];
  674. if (level > 0 && !files->empty()) {
  675. // Must not overlap
  676. assert(vset_->icmp_.Compare((*files)[files->size() - 1]->largest,
  677. f->smallest) < 0);
  678. }
  679. f->refs++;
  680. files->push_back(f);
  681. }
  682. }
  683. };
  684. VersionSet::VersionSet(const std::string& dbname, const Options* options,
  685. TableCache* table_cache,
  686. const InternalKeyComparator* cmp)
  687. : env_(options->env),
  688. dbname_(dbname),
  689. options_(options),
  690. table_cache_(table_cache),
  691. icmp_(*cmp),
  692. next_file_number_(2),
  693. manifest_file_number_(0), // Filled by Recover()
  694. last_sequence_(0),
  695. log_number_(0),
  696. prev_log_number_(0),
  697. descriptor_file_(nullptr),
  698. descriptor_log_(nullptr),
  699. dummy_versions_(this),
  700. current_(nullptr) {
  701. AppendVersion(new Version(this));
  702. }
  703. VersionSet::~VersionSet() {
  704. current_->Unref();
  705. assert(dummy_versions_.next_ == &dummy_versions_); // List must be empty
  706. delete descriptor_log_;
  707. delete descriptor_file_;
  708. }
  709. void VersionSet::AppendVersion(Version* v) {
  710. // Make "v" current
  711. assert(v->refs_ == 0);
  712. assert(v != current_);
  713. if (current_ != nullptr) {
  714. current_->Unref();
  715. }
  716. current_ = v;
  717. v->Ref();
  718. // Append to linked list
  719. v->prev_ = dummy_versions_.prev_;
  720. v->next_ = &dummy_versions_;
  721. v->prev_->next_ = v;
  722. v->next_->prev_ = v;
  723. }
  724. Status VersionSet::LogAndApply(VersionEdit* edit, port::Mutex* mu) {
  725. if (edit->has_log_number_) {
  726. assert(edit->log_number_ >= log_number_);
  727. assert(edit->log_number_ < next_file_number_);
  728. } else {
  729. edit->SetLogNumber(log_number_);
  730. }
  731. if (!edit->has_prev_log_number_) {
  732. edit->SetPrevLogNumber(prev_log_number_);
  733. }
  734. edit->SetNextFile(next_file_number_);
  735. edit->SetLastSequence(last_sequence_);
  736. Version* v = new Version(this);
  737. {
  738. Builder builder(this, current_);
  739. builder.Apply(edit);
  740. builder.SaveTo(v);
  741. }
  742. Finalize(v);
  743. // Initialize new descriptor log file if necessary by creating
  744. // a temporary file that contains a snapshot of the current version.
  745. std::string new_manifest_file;
  746. Status s;
  747. if (descriptor_log_ == nullptr) {
  748. // No reason to unlock *mu here since we only hit this path in the
  749. // first call to LogAndApply (when opening the database).
  750. assert(descriptor_file_ == nullptr);
  751. new_manifest_file = DescriptorFileName(dbname_, manifest_file_number_);
  752. edit->SetNextFile(next_file_number_);
  753. s = env_->NewWritableFile(new_manifest_file, &descriptor_file_);
  754. if (s.ok()) {
  755. descriptor_log_ = new log::Writer(descriptor_file_);
  756. s = WriteSnapshot(descriptor_log_);
  757. }
  758. }
  759. // Unlock during expensive MANIFEST log write
  760. {
  761. mu->Unlock();
  762. // Write new record to MANIFEST log
  763. if (s.ok()) {
  764. std::string record;
  765. edit->EncodeTo(&record);
  766. s = descriptor_log_->AddRecord(record);
  767. if (s.ok()) {
  768. s = descriptor_file_->Sync();
  769. }
  770. if (!s.ok()) {
  771. Log(options_->info_log, "MANIFEST write: %s\n", s.ToString().c_str());
  772. }
  773. }
  774. // If we just created a new descriptor file, install it by writing a
  775. // new CURRENT file that points to it.
  776. if (s.ok() && !new_manifest_file.empty()) {
  777. s = SetCurrentFile(env_, dbname_, manifest_file_number_);
  778. }
  779. mu->Lock();
  780. }
  781. // Install the new version
  782. if (s.ok()) {
  783. AppendVersion(v);
  784. log_number_ = edit->log_number_;
  785. prev_log_number_ = edit->prev_log_number_;
  786. } else {
  787. delete v;
  788. if (!new_manifest_file.empty()) {
  789. delete descriptor_log_;
  790. delete descriptor_file_;
  791. descriptor_log_ = nullptr;
  792. descriptor_file_ = nullptr;
  793. env_->DeleteFile(new_manifest_file);
  794. }
  795. }
  796. return s;
  797. }
  798. Status VersionSet::Recover(bool* save_manifest) {
  799. struct LogReporter : public log::Reader::Reporter {
  800. Status* status;
  801. virtual void Corruption(size_t bytes, const Status& s) {
  802. if (this->status->ok()) *this->status = s;
  803. }
  804. };
  805. // Read "CURRENT" file, which contains a pointer to the current manifest file
  806. std::string current;
  807. Status s = ReadFileToString(env_, CurrentFileName(dbname_), &current);
  808. if (!s.ok()) {
  809. return s;
  810. }
  811. if (current.empty() || current[current.size() - 1] != '\n') {
  812. return Status::Corruption("CURRENT file does not end with newline");
  813. }
  814. current.resize(current.size() - 1);
  815. std::string dscname = dbname_ + "/" + current;
  816. SequentialFile* file;
  817. s = env_->NewSequentialFile(dscname, &file);
  818. if (!s.ok()) {
  819. if (s.IsNotFound()) {
  820. return Status::Corruption("CURRENT points to a non-existent file",
  821. s.ToString());
  822. }
  823. return s;
  824. }
  825. bool have_log_number = false;
  826. bool have_prev_log_number = false;
  827. bool have_next_file = false;
  828. bool have_last_sequence = false;
  829. uint64_t next_file = 0;
  830. uint64_t last_sequence = 0;
  831. uint64_t log_number = 0;
  832. uint64_t prev_log_number = 0;
  833. Builder builder(this, current_);
  834. {
  835. LogReporter reporter;
  836. reporter.status = &s;
  837. log::Reader reader(file, &reporter, true /*checksum*/,
  838. 0 /*initial_offset*/);
  839. Slice record;
  840. std::string scratch;
  841. while (reader.ReadRecord(&record, &scratch) && s.ok()) {
  842. VersionEdit edit;
  843. s = edit.DecodeFrom(record);
  844. if (s.ok()) {
  845. if (edit.has_comparator_ &&
  846. edit.comparator_ != icmp_.user_comparator()->Name()) {
  847. s = Status::InvalidArgument(
  848. edit.comparator_ + " does not match existing comparator ",
  849. icmp_.user_comparator()->Name());
  850. }
  851. }
  852. if (s.ok()) {
  853. builder.Apply(&edit);
  854. }
  855. if (edit.has_log_number_) {
  856. log_number = edit.log_number_;
  857. have_log_number = true;
  858. }
  859. if (edit.has_prev_log_number_) {
  860. prev_log_number = edit.prev_log_number_;
  861. have_prev_log_number = true;
  862. }
  863. if (edit.has_next_file_number_) {
  864. next_file = edit.next_file_number_;
  865. have_next_file = true;
  866. }
  867. if (edit.has_last_sequence_) {
  868. last_sequence = edit.last_sequence_;
  869. have_last_sequence = true;
  870. }
  871. }
  872. }
  873. delete file;
  874. file = nullptr;
  875. if (s.ok()) {
  876. if (!have_next_file) {
  877. s = Status::Corruption("no meta-nextfile entry in descriptor");
  878. } else if (!have_log_number) {
  879. s = Status::Corruption("no meta-lognumber entry in descriptor");
  880. } else if (!have_last_sequence) {
  881. s = Status::Corruption("no last-sequence-number entry in descriptor");
  882. }
  883. if (!have_prev_log_number) {
  884. prev_log_number = 0;
  885. }
  886. MarkFileNumberUsed(prev_log_number);
  887. MarkFileNumberUsed(log_number);
  888. }
  889. if (s.ok()) {
  890. Version* v = new Version(this);
  891. builder.SaveTo(v);
  892. // Install recovered version
  893. Finalize(v);
  894. AppendVersion(v);
  895. manifest_file_number_ = next_file;
  896. next_file_number_ = next_file + 1;
  897. last_sequence_ = last_sequence;
  898. log_number_ = log_number;
  899. prev_log_number_ = prev_log_number;
  900. // See if we can reuse the existing MANIFEST file.
  901. if (ReuseManifest(dscname, current)) {
  902. // No need to save new manifest
  903. } else {
  904. *save_manifest = true;
  905. }
  906. }
  907. return s;
  908. }
  909. bool VersionSet::ReuseManifest(const std::string& dscname,
  910. const std::string& dscbase) {
  911. if (!options_->reuse_logs) {
  912. return false;
  913. }
  914. FileType manifest_type;
  915. uint64_t manifest_number;
  916. uint64_t manifest_size;
  917. if (!ParseFileName(dscbase, &manifest_number, &manifest_type) ||
  918. manifest_type != kDescriptorFile ||
  919. !env_->GetFileSize(dscname, &manifest_size).ok() ||
  920. // Make new compacted MANIFEST if old one is too big
  921. manifest_size >= TargetFileSize(options_)) {
  922. return false;
  923. }
  924. assert(descriptor_file_ == nullptr);
  925. assert(descriptor_log_ == nullptr);
  926. Status r = env_->NewAppendableFile(dscname, &descriptor_file_);
  927. if (!r.ok()) {
  928. Log(options_->info_log, "Reuse MANIFEST: %s\n", r.ToString().c_str());
  929. assert(descriptor_file_ == nullptr);
  930. return false;
  931. }
  932. Log(options_->info_log, "Reusing MANIFEST %s\n", dscname.c_str());
  933. descriptor_log_ = new log::Writer(descriptor_file_, manifest_size);
  934. manifest_file_number_ = manifest_number;
  935. return true;
  936. }
  937. void VersionSet::MarkFileNumberUsed(uint64_t number) {
  938. if (next_file_number_ <= number) {
  939. next_file_number_ = number + 1;
  940. }
  941. }
  942. void VersionSet::Finalize(Version* v) {
  943. // Precomputed best level for next compaction
  944. int best_level = -1;
  945. double best_score = -1;
  946. for (int level = 0; level < config::kNumLevels - 1; level++) {
  947. double score;
  948. if (level == 0) {
  949. // We treat level-0 specially by bounding the number of files
  950. // instead of number of bytes for two reasons:
  951. //
  952. // (1) With larger write-buffer sizes, it is nice not to do too
  953. // many level-0 compactions.
  954. //
  955. // (2) The files in level-0 are merged on every read and
  956. // therefore we wish to avoid too many files when the individual
  957. // file size is small (perhaps because of a small write-buffer
  958. // setting, or very high compression ratios, or lots of
  959. // overwrites/deletions).
  960. score = v->files_[level].size() /
  961. static_cast<double>(config::kL0_CompactionTrigger);
  962. } else {
  963. // Compute the ratio of current size to size limit.
  964. const uint64_t level_bytes = TotalFileSize(v->files_[level]);
  965. score =
  966. static_cast<double>(level_bytes) / MaxBytesForLevel(options_, level);
  967. }
  968. if (score > best_score) {
  969. best_level = level;
  970. best_score = score;
  971. }
  972. }
  973. v->compaction_level_ = best_level;
  974. v->compaction_score_ = best_score;
  975. }
  976. Status VersionSet::WriteSnapshot(log::Writer* log) {
  977. // TODO: Break up into multiple records to reduce memory usage on recovery?
  978. // Save metadata
  979. VersionEdit edit;
  980. edit.SetComparatorName(icmp_.user_comparator()->Name());
  981. // Save compaction pointers
  982. for (int level = 0; level < config::kNumLevels; level++) {
  983. if (!compact_pointer_[level].empty()) {
  984. InternalKey key;
  985. key.DecodeFrom(compact_pointer_[level]);
  986. edit.SetCompactPointer(level, key);
  987. }
  988. }
  989. // Save files
  990. for (int level = 0; level < config::kNumLevels; level++) {
  991. const std::vector<FileMetaData*>& files = current_->files_[level];
  992. for (size_t i = 0; i < files.size(); i++) {
  993. const FileMetaData* f = files[i];
  994. edit.AddFile(level, f->number, f->file_size, f->smallest, f->largest);
  995. }
  996. }
  997. std::string record;
  998. edit.EncodeTo(&record);
  999. return log->AddRecord(record);
  1000. }
  1001. int VersionSet::NumLevelFiles(int level) const {
  1002. assert(level >= 0);
  1003. assert(level < config::kNumLevels);
  1004. return current_->files_[level].size();
  1005. }
  1006. const char* VersionSet::LevelSummary(LevelSummaryStorage* scratch) const {
  1007. // Update code if kNumLevels changes
  1008. static_assert(config::kNumLevels == 7, "");
  1009. snprintf(scratch->buffer, sizeof(scratch->buffer),
  1010. "files[ %d %d %d %d %d %d %d ]", int(current_->files_[0].size()),
  1011. int(current_->files_[1].size()), int(current_->files_[2].size()),
  1012. int(current_->files_[3].size()), int(current_->files_[4].size()),
  1013. int(current_->files_[5].size()), int(current_->files_[6].size()));
  1014. return scratch->buffer;
  1015. }
  1016. uint64_t VersionSet::ApproximateOffsetOf(Version* v, const InternalKey& ikey) {
  1017. uint64_t result = 0;
  1018. for (int level = 0; level < config::kNumLevels; level++) {
  1019. const std::vector<FileMetaData*>& files = v->files_[level];
  1020. for (size_t i = 0; i < files.size(); i++) {
  1021. if (icmp_.Compare(files[i]->largest, ikey) <= 0) {
  1022. // Entire file is before "ikey", so just add the file size
  1023. result += files[i]->file_size;
  1024. } else if (icmp_.Compare(files[i]->smallest, ikey) > 0) {
  1025. // Entire file is after "ikey", so ignore
  1026. if (level > 0) {
  1027. // Files other than level 0 are sorted by meta->smallest, so
  1028. // no further files in this level will contain data for
  1029. // "ikey".
  1030. break;
  1031. }
  1032. } else {
  1033. // "ikey" falls in the range for this table. Add the
  1034. // approximate offset of "ikey" within the table.
  1035. Table* tableptr;
  1036. Iterator* iter = table_cache_->NewIterator(
  1037. ReadOptions(), files[i]->number, files[i]->file_size, &tableptr);
  1038. if (tableptr != nullptr) {
  1039. result += tableptr->ApproximateOffsetOf(ikey.Encode());
  1040. }
  1041. delete iter;
  1042. }
  1043. }
  1044. }
  1045. return result;
  1046. }
  1047. void VersionSet::AddLiveFiles(std::set<uint64_t>* live) {
  1048. for (Version* v = dummy_versions_.next_; v != &dummy_versions_;
  1049. v = v->next_) {
  1050. for (int level = 0; level < config::kNumLevels; level++) {
  1051. const std::vector<FileMetaData*>& files = v->files_[level];
  1052. for (size_t i = 0; i < files.size(); i++) {
  1053. live->insert(files[i]->number);
  1054. }
  1055. }
  1056. }
  1057. }
  1058. int64_t VersionSet::NumLevelBytes(int level) const {
  1059. assert(level >= 0);
  1060. assert(level < config::kNumLevels);
  1061. return TotalFileSize(current_->files_[level]);
  1062. }
  1063. int64_t VersionSet::MaxNextLevelOverlappingBytes() {
  1064. int64_t result = 0;
  1065. std::vector<FileMetaData*> overlaps;
  1066. for (int level = 1; level < config::kNumLevels - 1; level++) {
  1067. for (size_t i = 0; i < current_->files_[level].size(); i++) {
  1068. const FileMetaData* f = current_->files_[level][i];
  1069. current_->GetOverlappingInputs(level + 1, &f->smallest, &f->largest,
  1070. &overlaps);
  1071. const int64_t sum = TotalFileSize(overlaps);
  1072. if (sum > result) {
  1073. result = sum;
  1074. }
  1075. }
  1076. }
  1077. return result;
  1078. }
  1079. // Stores the minimal range that covers all entries in inputs in
  1080. // *smallest, *largest.
  1081. // REQUIRES: inputs is not empty
  1082. void VersionSet::GetRange(const std::vector<FileMetaData*>& inputs,
  1083. InternalKey* smallest, InternalKey* largest) {
  1084. assert(!inputs.empty());
  1085. smallest->Clear();
  1086. largest->Clear();
  1087. for (size_t i = 0; i < inputs.size(); i++) {
  1088. FileMetaData* f = inputs[i];
  1089. if (i == 0) {
  1090. *smallest = f->smallest;
  1091. *largest = f->largest;
  1092. } else {
  1093. if (icmp_.Compare(f->smallest, *smallest) < 0) {
  1094. *smallest = f->smallest;
  1095. }
  1096. if (icmp_.Compare(f->largest, *largest) > 0) {
  1097. *largest = f->largest;
  1098. }
  1099. }
  1100. }
  1101. }
  1102. // Stores the minimal range that covers all entries in inputs1 and inputs2
  1103. // in *smallest, *largest.
  1104. // REQUIRES: inputs is not empty
  1105. void VersionSet::GetRange2(const std::vector<FileMetaData*>& inputs1,
  1106. const std::vector<FileMetaData*>& inputs2,
  1107. InternalKey* smallest, InternalKey* largest) {
  1108. std::vector<FileMetaData*> all = inputs1;
  1109. all.insert(all.end(), inputs2.begin(), inputs2.end());
  1110. GetRange(all, smallest, largest);
  1111. }
  1112. Iterator* VersionSet::MakeInputIterator(Compaction* c) {
  1113. ReadOptions options;
  1114. options.verify_checksums = options_->paranoid_checks;
  1115. options.fill_cache = false;
  1116. // Level-0 files have to be merged together. For other levels,
  1117. // we will make a concatenating iterator per level.
  1118. // TODO(opt): use concatenating iterator for level-0 if there is no overlap
  1119. const int space = (c->level() == 0 ? c->inputs_[0].size() + 1 : 2);
  1120. Iterator** list = new Iterator*[space];
  1121. int num = 0;
  1122. for (int which = 0; which < 2; which++) {
  1123. if (!c->inputs_[which].empty()) {
  1124. if (c->level() + which == 0) {
  1125. const std::vector<FileMetaData*>& files = c->inputs_[which];
  1126. for (size_t i = 0; i < files.size(); i++) {
  1127. list[num++] = table_cache_->NewIterator(options, files[i]->number,
  1128. files[i]->file_size);
  1129. }
  1130. } else {
  1131. // Create concatenating iterator for the files from this level
  1132. list[num++] = NewTwoLevelIterator(
  1133. new Version::LevelFileNumIterator(icmp_, &c->inputs_[which]),
  1134. &GetFileIterator, table_cache_, options);
  1135. }
  1136. }
  1137. }
  1138. assert(num <= space);
  1139. Iterator* result = NewMergingIterator(&icmp_, list, num);
  1140. delete[] list;
  1141. return result;
  1142. }
  1143. Compaction* VersionSet::PickCompaction() {
  1144. Compaction* c;
  1145. int level;
  1146. // We prefer compactions triggered by too much data in a level over
  1147. // the compactions triggered by seeks.
  1148. const bool size_compaction = (current_->compaction_score_ >= 1);
  1149. const bool seek_compaction = (current_->file_to_compact_ != nullptr);
  1150. if (size_compaction) {
  1151. level = current_->compaction_level_;
  1152. assert(level >= 0);
  1153. assert(level + 1 < config::kNumLevels);
  1154. c = new Compaction(options_, level);
  1155. // Pick the first file that comes after compact_pointer_[level]
  1156. for (size_t i = 0; i < current_->files_[level].size(); i++) {
  1157. FileMetaData* f = current_->files_[level][i];
  1158. if (compact_pointer_[level].empty() ||
  1159. icmp_.Compare(f->largest.Encode(), compact_pointer_[level]) > 0) {
  1160. c->inputs_[0].push_back(f);
  1161. break;
  1162. }
  1163. }
  1164. if (c->inputs_[0].empty()) {
  1165. // Wrap-around to the beginning of the key space
  1166. c->inputs_[0].push_back(current_->files_[level][0]);
  1167. }
  1168. } else if (seek_compaction) {
  1169. level = current_->file_to_compact_level_;
  1170. c = new Compaction(options_, level);
  1171. c->inputs_[0].push_back(current_->file_to_compact_);
  1172. } else {
  1173. return nullptr;
  1174. }
  1175. c->input_version_ = current_;
  1176. c->input_version_->Ref();
  1177. // Files in level 0 may overlap each other, so pick up all overlapping ones
  1178. if (level == 0) {
  1179. InternalKey smallest, largest;
  1180. GetRange(c->inputs_[0], &smallest, &largest);
  1181. // Note that the next call will discard the file we placed in
  1182. // c->inputs_[0] earlier and replace it with an overlapping set
  1183. // which will include the picked file.
  1184. current_->GetOverlappingInputs(0, &smallest, &largest, &c->inputs_[0]);
  1185. assert(!c->inputs_[0].empty());
  1186. }
  1187. SetupOtherInputs(c);
  1188. return c;
  1189. }
  1190. // Finds the largest key in a vector of files. Returns true if files it not
  1191. // empty.
  1192. bool FindLargestKey(const InternalKeyComparator& icmp,
  1193. const std::vector<FileMetaData*>& files,
  1194. InternalKey* largest_key) {
  1195. if (files.empty()) {
  1196. return false;
  1197. }
  1198. *largest_key = files[0]->largest;
  1199. for (size_t i = 1; i < files.size(); ++i) {
  1200. FileMetaData* f = files[i];
  1201. if (icmp.Compare(f->largest, *largest_key) > 0) {
  1202. *largest_key = f->largest;
  1203. }
  1204. }
  1205. return true;
  1206. }
  1207. // Finds minimum file b2=(l2, u2) in level file for which l2 > u1 and
  1208. // user_key(l2) = user_key(u1)
  1209. FileMetaData* FindSmallestBoundaryFile(
  1210. const InternalKeyComparator& icmp,
  1211. const std::vector<FileMetaData*>& level_files,
  1212. const InternalKey& largest_key) {
  1213. const Comparator* user_cmp = icmp.user_comparator();
  1214. FileMetaData* smallest_boundary_file = nullptr;
  1215. for (size_t i = 0; i < level_files.size(); ++i) {
  1216. FileMetaData* f = level_files[i];
  1217. if (icmp.Compare(f->smallest, largest_key) > 0 &&
  1218. user_cmp->Compare(f->smallest.user_key(), largest_key.user_key()) ==
  1219. 0) {
  1220. if (smallest_boundary_file == nullptr ||
  1221. icmp.Compare(f->smallest, smallest_boundary_file->smallest) < 0) {
  1222. smallest_boundary_file = f;
  1223. }
  1224. }
  1225. }
  1226. return smallest_boundary_file;
  1227. }
  1228. // Extracts the largest file b1 from |compaction_files| and then searches for a
  1229. // b2 in |level_files| for which user_key(u1) = user_key(l2). If it finds such a
  1230. // file b2 (known as a boundary file) it adds it to |compaction_files| and then
  1231. // searches again using this new upper bound.
  1232. //
  1233. // If there are two blocks, b1=(l1, u1) and b2=(l2, u2) and
  1234. // user_key(u1) = user_key(l2), and if we compact b1 but not b2 then a
  1235. // subsequent get operation will yield an incorrect result because it will
  1236. // return the record from b2 in level i rather than from b1 because it searches
  1237. // level by level for records matching the supplied user key.
  1238. //
  1239. // parameters:
  1240. // in level_files: List of files to search for boundary files.
  1241. // in/out compaction_files: List of files to extend by adding boundary files.
  1242. void AddBoundaryInputs(const InternalKeyComparator& icmp,
  1243. const std::vector<FileMetaData*>& level_files,
  1244. std::vector<FileMetaData*>* compaction_files) {
  1245. InternalKey largest_key;
  1246. // Quick return if compaction_files is empty.
  1247. if (!FindLargestKey(icmp, *compaction_files, &largest_key)) {
  1248. return;
  1249. }
  1250. bool continue_searching = true;
  1251. while (continue_searching) {
  1252. FileMetaData* smallest_boundary_file =
  1253. FindSmallestBoundaryFile(icmp, level_files, largest_key);
  1254. // If a boundary file was found advance largest_key, otherwise we're done.
  1255. if (smallest_boundary_file != NULL) {
  1256. compaction_files->push_back(smallest_boundary_file);
  1257. largest_key = smallest_boundary_file->largest;
  1258. } else {
  1259. continue_searching = false;
  1260. }
  1261. }
  1262. }
  1263. void VersionSet::SetupOtherInputs(Compaction* c) {
  1264. const int level = c->level();
  1265. InternalKey smallest, largest;
  1266. AddBoundaryInputs(icmp_, current_->files_[level], &c->inputs_[0]);
  1267. GetRange(c->inputs_[0], &smallest, &largest);
  1268. current_->GetOverlappingInputs(level + 1, &smallest, &largest,
  1269. &c->inputs_[1]);
  1270. // Get entire range covered by compaction
  1271. InternalKey all_start, all_limit;
  1272. GetRange2(c->inputs_[0], c->inputs_[1], &all_start, &all_limit);
  1273. // See if we can grow the number of inputs in "level" without
  1274. // changing the number of "level+1" files we pick up.
  1275. if (!c->inputs_[1].empty()) {
  1276. std::vector<FileMetaData*> expanded0;
  1277. current_->GetOverlappingInputs(level, &all_start, &all_limit, &expanded0);
  1278. AddBoundaryInputs(icmp_, current_->files_[level], &expanded0);
  1279. const int64_t inputs0_size = TotalFileSize(c->inputs_[0]);
  1280. const int64_t inputs1_size = TotalFileSize(c->inputs_[1]);
  1281. const int64_t expanded0_size = TotalFileSize(expanded0);
  1282. if (expanded0.size() > c->inputs_[0].size() &&
  1283. inputs1_size + expanded0_size <
  1284. ExpandedCompactionByteSizeLimit(options_)) {
  1285. InternalKey new_start, new_limit;
  1286. GetRange(expanded0, &new_start, &new_limit);
  1287. std::vector<FileMetaData*> expanded1;
  1288. current_->GetOverlappingInputs(level + 1, &new_start, &new_limit,
  1289. &expanded1);
  1290. if (expanded1.size() == c->inputs_[1].size()) {
  1291. Log(options_->info_log,
  1292. "Expanding@%d %d+%d (%ld+%ld bytes) to %d+%d (%ld+%ld bytes)\n",
  1293. level, int(c->inputs_[0].size()), int(c->inputs_[1].size()),
  1294. long(inputs0_size), long(inputs1_size), int(expanded0.size()),
  1295. int(expanded1.size()), long(expanded0_size), long(inputs1_size));
  1296. smallest = new_start;
  1297. largest = new_limit;
  1298. c->inputs_[0] = expanded0;
  1299. c->inputs_[1] = expanded1;
  1300. GetRange2(c->inputs_[0], c->inputs_[1], &all_start, &all_limit);
  1301. }
  1302. }
  1303. }
  1304. // Compute the set of grandparent files that overlap this compaction
  1305. // (parent == level+1; grandparent == level+2)
  1306. if (level + 2 < config::kNumLevels) {
  1307. current_->GetOverlappingInputs(level + 2, &all_start, &all_limit,
  1308. &c->grandparents_);
  1309. }
  1310. // Update the place where we will do the next compaction for this level.
  1311. // We update this immediately instead of waiting for the VersionEdit
  1312. // to be applied so that if the compaction fails, we will try a different
  1313. // key range next time.
  1314. compact_pointer_[level] = largest.Encode().ToString();
  1315. c->edit_.SetCompactPointer(level, largest);
  1316. }
  1317. Compaction* VersionSet::CompactRange(int level, const InternalKey* begin,
  1318. const InternalKey* end) {
  1319. std::vector<FileMetaData*> inputs;
  1320. current_->GetOverlappingInputs(level, begin, end, &inputs);
  1321. if (inputs.empty()) {
  1322. return nullptr;
  1323. }
  1324. // Avoid compacting too much in one shot in case the range is large.
  1325. // But we cannot do this for level-0 since level-0 files can overlap
  1326. // and we must not pick one file and drop another older file if the
  1327. // two files overlap.
  1328. if (level > 0) {
  1329. const uint64_t limit = MaxFileSizeForLevel(options_, level);
  1330. uint64_t total = 0;
  1331. for (size_t i = 0; i < inputs.size(); i++) {
  1332. uint64_t s = inputs[i]->file_size;
  1333. total += s;
  1334. if (total >= limit) {
  1335. inputs.resize(i + 1);
  1336. break;
  1337. }
  1338. }
  1339. }
  1340. Compaction* c = new Compaction(options_, level);
  1341. c->input_version_ = current_;
  1342. c->input_version_->Ref();
  1343. c->inputs_[0] = inputs;
  1344. SetupOtherInputs(c);
  1345. return c;
  1346. }
  1347. Compaction::Compaction(const Options* options, int level)
  1348. : level_(level),
  1349. max_output_file_size_(MaxFileSizeForLevel(options, level)),
  1350. input_version_(nullptr),
  1351. grandparent_index_(0),
  1352. seen_key_(false),
  1353. overlapped_bytes_(0) {
  1354. for (int i = 0; i < config::kNumLevels; i++) {
  1355. level_ptrs_[i] = 0;
  1356. }
  1357. }
  1358. Compaction::~Compaction() {
  1359. if (input_version_ != nullptr) {
  1360. input_version_->Unref();
  1361. }
  1362. }
  1363. bool Compaction::IsTrivialMove() const {
  1364. const VersionSet* vset = input_version_->vset_;
  1365. // Avoid a move if there is lots of overlapping grandparent data.
  1366. // Otherwise, the move could create a parent file that will require
  1367. // a very expensive merge later on.
  1368. return (num_input_files(0) == 1 && num_input_files(1) == 0 &&
  1369. TotalFileSize(grandparents_) <=
  1370. MaxGrandParentOverlapBytes(vset->options_));
  1371. }
  1372. void Compaction::AddInputDeletions(VersionEdit* edit) {
  1373. for (int which = 0; which < 2; which++) {
  1374. for (size_t i = 0; i < inputs_[which].size(); i++) {
  1375. edit->DeleteFile(level_ + which, inputs_[which][i]->number);
  1376. }
  1377. }
  1378. }
  1379. bool Compaction::IsBaseLevelForKey(const Slice& user_key) {
  1380. // Maybe use binary search to find right entry instead of linear search?
  1381. const Comparator* user_cmp = input_version_->vset_->icmp_.user_comparator();
  1382. for (int lvl = level_ + 2; lvl < config::kNumLevels; lvl++) {
  1383. const std::vector<FileMetaData*>& files = input_version_->files_[lvl];
  1384. for (; level_ptrs_[lvl] < files.size();) {
  1385. FileMetaData* f = files[level_ptrs_[lvl]];
  1386. if (user_cmp->Compare(user_key, f->largest.user_key()) <= 0) {
  1387. // We've advanced far enough
  1388. if (user_cmp->Compare(user_key, f->smallest.user_key()) >= 0) {
  1389. // Key falls in this file's range, so definitely not base level
  1390. return false;
  1391. }
  1392. break;
  1393. }
  1394. level_ptrs_[lvl]++;
  1395. }
  1396. }
  1397. return true;
  1398. }
  1399. bool Compaction::ShouldStopBefore(const Slice& internal_key) {
  1400. const VersionSet* vset = input_version_->vset_;
  1401. // Scan to find earliest grandparent file that contains key.
  1402. const InternalKeyComparator* icmp = &vset->icmp_;
  1403. while (grandparent_index_ < grandparents_.size() &&
  1404. icmp->Compare(internal_key,
  1405. grandparents_[grandparent_index_]->largest.Encode()) >
  1406. 0) {
  1407. if (seen_key_) {
  1408. overlapped_bytes_ += grandparents_[grandparent_index_]->file_size;
  1409. }
  1410. grandparent_index_++;
  1411. }
  1412. seen_key_ = true;
  1413. if (overlapped_bytes_ > MaxGrandParentOverlapBytes(vset->options_)) {
  1414. // Too much overlap for current output; start new output
  1415. overlapped_bytes_ = 0;
  1416. return true;
  1417. } else {
  1418. return false;
  1419. }
  1420. }
  1421. void Compaction::ReleaseInputs() {
  1422. if (input_version_ != nullptr) {
  1423. input_version_->Unref();
  1424. input_version_ = nullptr;
  1425. }
  1426. }
  1427. } // namespace leveldb