10225501448 李度 10225101546 陈胤遒 10215501422 高宇菲
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

384 rivejä
11 KiB

  1. // Copyright (c) 2011 The LevelDB Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style license that can be
  3. // found in the LICENSE file. See the AUTHORS file for names of contributors.
  4. //
  5. // We recover the contents of the descriptor from the other files we find.
  6. // (1) Any log files are first converted to tables
  7. // (2) We scan every table to compute
  8. // (a) smallest/largest for the table
  9. // (b) largest sequence number in the table
  10. // (3) We generate descriptor contents:
  11. // - log number is set to zero
  12. // - next-file-number is set to 1 + largest file number we found
  13. // - last-sequence-number is set to largest sequence# found across
  14. // all tables (see 2c)
  15. // - compaction pointers are cleared
  16. // - every table file is added at level 0
  17. //
  18. // Possible optimization 1:
  19. // (a) Compute total size and use to pick appropriate max-level M
  20. // (b) Sort tables by largest sequence# in the table
  21. // (c) For each table: if it overlaps earlier table, place in level-0,
  22. // else place in level-M.
  23. // Possible optimization 2:
  24. // Store per-table metadata (smallest, largest, largest-seq#, ...)
  25. // in the table's meta section to speed up ScanTable.
  26. #include "db/builder.h"
  27. #include "db/db_impl.h"
  28. #include "db/dbformat.h"
  29. #include "db/filename.h"
  30. #include "db/log_reader.h"
  31. #include "db/log_writer.h"
  32. #include "db/memtable.h"
  33. #include "db/table_cache.h"
  34. #include "db/version_edit.h"
  35. #include "db/write_batch_internal.h"
  36. #include "leveldb/comparator.h"
  37. #include "leveldb/db.h"
  38. #include "leveldb/env.h"
  39. namespace leveldb {
  40. namespace {
  41. class Repairer {
  42. public:
  43. Repairer(const std::string& dbname, const Options& options)
  44. : dbname_(dbname),
  45. env_(options.env),
  46. icmp_(options.comparator),
  47. options_(SanitizeOptions(dbname, &icmp_, options)),
  48. owns_info_log_(options_.info_log != options.info_log),
  49. next_file_number_(1) {
  50. // TableCache can be small since we expect each table to be opened once.
  51. table_cache_ = new TableCache(dbname_, &options_, 10);
  52. }
  53. ~Repairer() {
  54. delete table_cache_;
  55. if (owns_info_log_) {
  56. delete options_.info_log;
  57. }
  58. }
  59. Status Run() {
  60. Status status = FindFiles();
  61. if (status.ok()) {
  62. ConvertLogFilesToTables();
  63. ExtractMetaData();
  64. status = WriteDescriptor();
  65. }
  66. if (status.ok()) {
  67. unsigned long long bytes = 0;
  68. for (size_t i = 0; i < tables_.size(); i++) {
  69. bytes += tables_[i].meta.file_size;
  70. }
  71. Log(env_, options_.info_log,
  72. "**** Repaired leveldb %s; "
  73. "recovered %d files; %llu bytes. "
  74. "Some data may have been lost. "
  75. "****",
  76. dbname_.c_str(),
  77. static_cast<int>(tables_.size()),
  78. bytes);
  79. }
  80. return status;
  81. }
  82. private:
  83. struct TableInfo {
  84. FileMetaData meta;
  85. SequenceNumber max_sequence;
  86. };
  87. std::string const dbname_;
  88. Env* const env_;
  89. InternalKeyComparator const icmp_;
  90. Options const options_;
  91. bool owns_info_log_;
  92. TableCache* table_cache_;
  93. VersionEdit edit_;
  94. std::vector<std::string> manifests_;
  95. std::vector<uint64_t> table_numbers_;
  96. std::vector<uint64_t> logs_;
  97. std::vector<TableInfo> tables_;
  98. uint64_t next_file_number_;
  99. Status FindFiles() {
  100. std::vector<std::string> filenames;
  101. Status status = env_->GetChildren(dbname_, &filenames);
  102. if (!status.ok()) {
  103. return status;
  104. }
  105. if (filenames.empty()) {
  106. return Status::IOError(dbname_, "repair found no files");
  107. }
  108. uint64_t number;
  109. FileType type;
  110. for (size_t i = 0; i < filenames.size(); i++) {
  111. if (ParseFileName(filenames[i], &number, &type)) {
  112. if (type == kDescriptorFile) {
  113. manifests_.push_back(filenames[i]);
  114. } else {
  115. if (number + 1 > next_file_number_) {
  116. next_file_number_ = number + 1;
  117. }
  118. if (type == kLogFile) {
  119. logs_.push_back(number);
  120. } else if (type == kTableFile) {
  121. table_numbers_.push_back(number);
  122. } else {
  123. // Ignore other files
  124. }
  125. }
  126. }
  127. }
  128. return status;
  129. }
  130. void ConvertLogFilesToTables() {
  131. for (size_t i = 0; i < logs_.size(); i++) {
  132. std::string logname = LogFileName(dbname_, logs_[i]);
  133. Status status = ConvertLogToTable(logs_[i]);
  134. if (!status.ok()) {
  135. Log(env_, options_.info_log, "Log #%llu: ignoring conversion error: %s",
  136. (unsigned long long) logs_[i],
  137. status.ToString().c_str());
  138. }
  139. ArchiveFile(logname);
  140. }
  141. }
  142. Status ConvertLogToTable(uint64_t log) {
  143. struct LogReporter : public log::Reader::Reporter {
  144. Env* env;
  145. WritableFile* info_log;
  146. uint64_t lognum;
  147. virtual void Corruption(size_t bytes, const Status& s) {
  148. // We print error messages for corruption, but continue repairing.
  149. Log(env, info_log, "Log #%llu: dropping %d bytes; %s",
  150. (unsigned long long) lognum,
  151. static_cast<int>(bytes),
  152. s.ToString().c_str());
  153. }
  154. };
  155. // Open the log file
  156. std::string logname = LogFileName(dbname_, log);
  157. SequentialFile* lfile;
  158. Status status = env_->NewSequentialFile(logname, &lfile);
  159. if (!status.ok()) {
  160. return status;
  161. }
  162. // Create the log reader.
  163. LogReporter reporter;
  164. reporter.env = env_;
  165. reporter.info_log = options_.info_log;
  166. reporter.lognum = log;
  167. // We intentially make log::Reader do checksumming so that
  168. // corruptions cause entire commits to be skipped instead of
  169. // propagating bad information (like overly large sequence
  170. // numbers).
  171. log::Reader reader(lfile, &reporter, false/*do not checksum*/,
  172. 0/*initial_offset*/);
  173. // Read all the records and add to a memtable
  174. std::string scratch;
  175. Slice record;
  176. WriteBatch batch;
  177. MemTable* mem = new MemTable(icmp_);
  178. mem->Ref();
  179. int counter = 0;
  180. while (reader.ReadRecord(&record, &scratch)) {
  181. if (record.size() < 12) {
  182. reporter.Corruption(
  183. record.size(), Status::Corruption("log record too small"));
  184. continue;
  185. }
  186. WriteBatchInternal::SetContents(&batch, record);
  187. status = WriteBatchInternal::InsertInto(&batch, mem);
  188. if (status.ok()) {
  189. counter += WriteBatchInternal::Count(&batch);
  190. } else {
  191. Log(env_, options_.info_log, "Log #%llu: ignoring %s",
  192. (unsigned long long) log,
  193. status.ToString().c_str());
  194. status = Status::OK(); // Keep going with rest of file
  195. }
  196. }
  197. delete lfile;
  198. // We ignore any version edits generated by the conversion to a Table
  199. // since ExtractMetaData() will also generate edits.
  200. VersionEdit skipped;
  201. FileMetaData meta;
  202. meta.number = next_file_number_++;
  203. Iterator* iter = mem->NewIterator();
  204. status = BuildTable(dbname_, env_, options_, table_cache_, iter,
  205. &meta, &skipped);
  206. delete iter;
  207. mem->Unref();
  208. mem = NULL;
  209. if (status.ok()) {
  210. if (meta.file_size > 0) {
  211. table_numbers_.push_back(meta.number);
  212. }
  213. }
  214. Log(env_, options_.info_log, "Log #%llu: %d ops saved to Table #%llu %s",
  215. (unsigned long long) log,
  216. counter,
  217. (unsigned long long) meta.number,
  218. status.ToString().c_str());
  219. return status;
  220. }
  221. void ExtractMetaData() {
  222. std::vector<TableInfo> kept;
  223. for (size_t i = 0; i < table_numbers_.size(); i++) {
  224. TableInfo t;
  225. t.meta.number = table_numbers_[i];
  226. Status status = ScanTable(&t);
  227. if (!status.ok()) {
  228. std::string fname = TableFileName(dbname_, table_numbers_[i]);
  229. Log(env_, options_.info_log, "Table #%llu: ignoring %s",
  230. (unsigned long long) table_numbers_[i],
  231. status.ToString().c_str());
  232. ArchiveFile(fname);
  233. } else {
  234. tables_.push_back(t);
  235. }
  236. }
  237. }
  238. Status ScanTable(TableInfo* t) {
  239. std::string fname = TableFileName(dbname_, t->meta.number);
  240. int counter = 0;
  241. Status status = env_->GetFileSize(fname, &t->meta.file_size);
  242. if (status.ok()) {
  243. Iterator* iter = table_cache_->NewIterator(
  244. ReadOptions(), t->meta.number, t->meta.file_size);
  245. bool empty = true;
  246. ParsedInternalKey parsed;
  247. t->max_sequence = 0;
  248. for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {
  249. Slice key = iter->key();
  250. if (!ParseInternalKey(key, &parsed)) {
  251. Log(env_, options_.info_log, "Table #%llu: unparsable key %s",
  252. (unsigned long long) t->meta.number,
  253. EscapeString(key).c_str());
  254. continue;
  255. }
  256. counter++;
  257. if (empty) {
  258. empty = false;
  259. t->meta.smallest.DecodeFrom(key);
  260. }
  261. t->meta.largest.DecodeFrom(key);
  262. if (parsed.sequence > t->max_sequence) {
  263. t->max_sequence = parsed.sequence;
  264. }
  265. }
  266. if (!iter->status().ok()) {
  267. status = iter->status();
  268. }
  269. delete iter;
  270. }
  271. Log(env_, options_.info_log, "Table #%llu: %d entries %s",
  272. (unsigned long long) t->meta.number,
  273. counter,
  274. status.ToString().c_str());
  275. return status;
  276. }
  277. Status WriteDescriptor() {
  278. std::string tmp = TempFileName(dbname_, 1);
  279. WritableFile* file;
  280. Status status = env_->NewWritableFile(tmp, &file);
  281. if (!status.ok()) {
  282. return status;
  283. }
  284. SequenceNumber max_sequence = 0;
  285. for (size_t i = 0; i < tables_.size(); i++) {
  286. if (max_sequence < tables_[i].max_sequence) {
  287. max_sequence = tables_[i].max_sequence;
  288. }
  289. }
  290. edit_.SetComparatorName(icmp_.user_comparator()->Name());
  291. edit_.SetLogNumber(0);
  292. edit_.SetNextFile(next_file_number_);
  293. edit_.SetLastSequence(max_sequence);
  294. for (size_t i = 0; i < tables_.size(); i++) {
  295. // TODO(opt): separate out into multiple levels
  296. const TableInfo& t = tables_[i];
  297. edit_.AddFile(0, t.meta.number, t.meta.file_size,
  298. t.meta.smallest, t.meta.largest);
  299. }
  300. //fprintf(stderr, "NewDescriptor:\n%s\n", edit_.DebugString().c_str());
  301. {
  302. log::Writer log(file);
  303. std::string record;
  304. edit_.EncodeTo(&record);
  305. status = log.AddRecord(record);
  306. }
  307. if (status.ok()) {
  308. status = file->Close();
  309. }
  310. delete file;
  311. file = NULL;
  312. if (!status.ok()) {
  313. env_->DeleteFile(tmp);
  314. } else {
  315. // Discard older manifests
  316. for (size_t i = 0; i < manifests_.size(); i++) {
  317. ArchiveFile(dbname_ + "/" + manifests_[i]);
  318. }
  319. // Install new manifest
  320. status = env_->RenameFile(tmp, DescriptorFileName(dbname_, 1));
  321. if (status.ok()) {
  322. status = SetCurrentFile(env_, dbname_, 1);
  323. } else {
  324. env_->DeleteFile(tmp);
  325. }
  326. }
  327. return status;
  328. }
  329. void ArchiveFile(const std::string& fname) {
  330. // Move into another directory. E.g., for
  331. // dir/foo
  332. // rename to
  333. // dir/lost/foo
  334. const char* slash = strrchr(fname.c_str(), '/');
  335. std::string new_dir;
  336. if (slash != NULL) {
  337. new_dir.assign(fname.data(), slash - fname.data());
  338. }
  339. new_dir.append("/lost");
  340. env_->CreateDir(new_dir); // Ignore error
  341. std::string new_file = new_dir;
  342. new_file.append("/");
  343. new_file.append((slash == NULL) ? fname.c_str() : slash + 1);
  344. Status s = env_->RenameFile(fname, new_file);
  345. Log(env_, options_.info_log, "Archiving %s: %s\n",
  346. fname.c_str(), s.ToString().c_str());
  347. }
  348. };
  349. }
  350. Status RepairDB(const std::string& dbname, const Options& options) {
  351. Repairer repairer(dbname, options);
  352. return repairer.Run();
  353. }
  354. }