LevelDB_TTL_

Author	SHA1	Message	Date
costan	7d8e41e49b	leveldb: Replace AtomicPointer with std::atomic. This CL removes AtomicPointer from leveldb's port interface. Its usage is replaced with std::atomic<> from the C++11 standard library. AtomicPointer was used to wrap flags, numbers, and pointers, so its instances are replaced with std::atomic<bool>, std::atomic<int>, std::atomic<size_t> and std::atomic<Node*>. This CL does not revise the memory ordering. AtomicPointer's methods are replaced mechanically with their std::atomic equivalents, even when the underlying usage is incorrect. (Example: DBImpl::has_imm_ is written using release stores, even though it is always read using relaxed ordering.) Revising the memory ordering is left for future CLs. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=237865146	6 years ago
costan	ed76289b25	Align windows_logger with posix_logger. Fixes GitHub issue #657. This CL also makes the Windows CI green. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=237255887	6 years ago
cmumford	c69d33b0ec	Added native support for Windows. This change adds a native Windows port (port_windows.h) and a Windows Env (WindowsEnv). Note1: "small" is defined when including <Windows.h> so some parameters were renamed to avoid conflict. Note2: leveldb::Env defines the method: "DeleteFile" which is also a constant defined when including <Windows.h>. The solution was to ensure this macro is defined in env.h which forces the function, when compiled, to be either DeleteFileA or DeleteFileW when building for MBCS or UNICODE respectively. This resolves #519 on GitHub. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=236364778	6 years ago
Adam Azarchs	75fceae700	Add O_CLOEXEC to open calls. This prevents file descriptors from leaking to child processes. When compiled for older (pre-2.6.23) kernels which lack support for O_CLOEXEC there is no change in behavior. With newer kernels, child processes will no longer inherit leveldb's file handles, which reduces the changes of accidentally corrupting the database. Fixes https://github.com/google/leveldb/issues/623	7 years ago
costan	296de8d5b8	leveldb: Fix PosixWritableFile::Sync() on Apple systems. Apple doesn't follow POSIX specifications for fsync(). Instead, fsync() guarantees to flush the buffer cache to the device, which means the data will survive kernel panics, but may not survive power outages. Applications that need stronger guarantees (like databases) need to use fcntl(F_FULLFSYNC). This CL switches PosixWritableFile::Sync() to get the stronger guarantees on Apple systems. The improved implementation follows the same principles as SQLite [1] and node.js [2]. Research for the fcntl() to fsync() fallback strategy: Apple's released source code at https://opensource.apple.com/ shows at least three different error codes being returned when a filesystem does not support F_FULLFSYNC. fcntl() is implemented in xnu-4903.221.2 in bsd/kern/kern_descrip.c, where it delegates to fcntl_nocancel(). The documentation for fcntl_nocancel() mentions error codes for some operations, but does not include F_FULLFSYNC. The F_FULLSYNC branch in fcntl_nocancel() calls VNOP_IOCTL(_, F_FULLSYNC, NULL, 0, _), whose return value sets the error code. VNOP_IOCTL() is implemented in bsd/vfs/kpi_vfs.c and calls the ioctl function in the vnode's operation vector. The per-filesystem function names follow the pattern _vnop_ioctl() for all the instances in opensource code: {hfs,msdosfs,nfs,ntfs,smbfs,webdav,zfs}_vnop_ioctl(). hfs-407.30.1, msdosfs-229.200.3, and nfs in xnu-4903.221.2 handle F_FULLFSYNC. ntfs-94.200.1 and smb-759.40.1 do not handle F_FULLFSYNC, and the default branch returns ENOSUP. webdav-380.200.1 also does not handle F_FULLFSYNC, but the default branch returns EINVAL. zfs-59 also does not handle F_FULLSYNC, and its default branch returns ENOTTY. From a different angle, Apple's ntfs-94.200.1 includes utility code that uses fcntl(F_FULLFSYNC) and falls back to fsync() just like we do, supporting the hypothesis that there is no good way to detect lack of F_FULLFSYNC support. Also, Apple's fcntl() man page [3] does not mention a way to detect lack of F_FULLFSYNC support. [1] https://www.sqlite.org/src/doc/trunk/src/os_unix.c [2] https://github.com/libuv/libuv/blob/master/src/unix/fs.c [3] https://developer.apple.com/library/archive/documentatiVon/System/Conceptual/ManPages_iPhoneOS/man2/fcntl.2.html Tested: https://travis-ci.org/pwnall/leveldb/builds/477318498 TAP global presubmit ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=228593729	6 years ago
cmumford	af7abf06ea	Add back space to POSIX Logger. The space in between the header and log message was mistakenly omitted in a prior commit. Re-adding. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=228202737	6 years ago
costan	1cb3840881	Clean up env_posix.cc. General cleanup principles: * Use override when applicable. * Remove static when redundant (methods and globals in anonymous namespaces). * Use const on class members where possible. * Standardize on "status" for Status local variables. * Renames where clarity can be improved. * Qualify standard library names with std:: when possible, to distinguish from POSIX names. * Qualify POSIX names with the global namespace (::) when possible, to distinguish from standard library names. This also refactors the background thread synchronization logic so that it's statically analyzable. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=219212089	6 years ago
costan	a7dc502e9f	Rework once initialization in env_posix.cc. C++11 guarantees thread-safe initialization of static variables inside functions. This is a more restricted form of std::call_once or pthread_once_t (e.g., single call site), so the compiler might be able to generate better code [1]. Equally important, having less platform-dependent code in env_posix.cc makes it easier to port to other platforms. Due to the change above, this CL introduced a new approach for storing the singleton PosixEnv instance returned by Env::Default(). The new approach avoids a dynamic memory allocation, which eliminates the false positive from LeakSanitizer reported in https://github.com/google/leveldb/issues/539 and https://github.com/google/leveldb/issues/113 [1] https://stackoverflow.com/a/27206650/ ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=214293129	7 years ago
costan	c43565dd39	C++11 cleanup for util/mutexlock.h. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=213398583	7 years ago
costan	73d5834ece	Rework threading in env_posix.cc. This commit replaces the use of pthreads in the POSIX port with std::thread and port::Mutex + port::CondVar. This is intended to simplify porting the env to a different platform. The indirect use of pthreads in PosixLogger is replaced with std:🧵:id(), based on an approach prototyped by @cmumfordx@. The pthreads dependency in CMakeFiles is not removed, because some C++ standard library implementations must be linked against pthreads for std::thread use. Figuring out this dependency is left for future work. Switching away from pthreads also fixes https://github.com/google/leveldb/issues/381 ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=212478311	7 years ago
costan	05709fb43e	Remove InitOnce from the port API. This is not an API-breaking change, because it reduces the API that the leveldb embedder must implement. The project will build just fine against ports that still implement InitOnce. C++11 guarantees thread-safe initialization of static variables inside functions. This is a more restricted form of std::call_once or pthread_once_t (e.g., single call site), so the compiler might be able to generate better code [1]. Equally important, having less code in port_example.h makes it easier to port to other platforms. Due to the change above, this CL introduces a new approach for storing the singleton BytewiseComparatorImpl instance returned by BytewiseComparator(). The new approach avoids a dynamic memory allocation, which eliminates the false positive from LeakSanitizer reported in https://github.com/google/leveldb/issues/200 [1] https://stackoverflow.com/a/27206650/ ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=212348004	7 years ago
costan	bb88f25115	Clean up PosixWritableFile in env_posix.cc. This is separated from the general cleanup because of the logic changes in SyncDirIfManifest(). General cleanup principles: * Use override when applicable. * Remove static when redundant (methods and globals in anonymous namespaces). * Use const on class members where possible. * Standardize on "status" for Status local variables. * Renames where clarity can be improved. * Qualify standard library names with std:: when possible, to distinguish from POSIX names. * Qualify POSIX names with the global namespace (::) when possible, to distinguish from standard library names. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=211709673	7 years ago
costan	7b945f2003	Clean up posix_logger.h. General cleanup principles: * Use override when applicable. * Use const on class members where possible. * Renames where clarity can be improved. * Qualify standard library names with std:: when possible, to distinguish from POSIX names. * Qualify POSIX names with the global namespace (::) when possible, to distinguish from standard library names. This also revamps the logic for putting together a message into the in-memory buffer before that is passed to fwrite(). While correct in practice, the current implementation advances a char pointer past the size of its buffer, which is technically undefined behavior. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=211472570	7 years ago
costan	03064cbbb2	Simplify Limiter in env_posix.cc. Now that we require C++11, we can use std::atomic<int>, which has primitives for most of the logic we need. As a bonus, the happy path for Limiter::Acquire() and Limiter::Release() only performs one atomic operation. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=211469518	7 years ago
costan	4de9594f6f	Add move constructor to Status. This will result in smaller code generation when Status instances are passed around. Benchmarks don't indicate a significant change either way. CPU: 48 * Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz CPUCache: 30720 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) Baseline: fillseq : 3.589 micros/op; 30.8 MB/s fillsync : 4165.299 micros/op; 0.0 MB/s (1000 ops) fillrandom : 5.864 micros/op; 18.9 MB/s overwrite : 7.830 micros/op; 14.1 MB/s readrandom : 5.534 micros/op; (1000000 of 1000000 found) readrandom : 4.292 micros/op; (1000000 of 1000000 found) readseq : 0.312 micros/op; 354.1 MB/s readreverse : 0.501 micros/op; 220.8 MB/s compact : 886211.000 micros/op; readrandom : 3.518 micros/op; (1000000 of 1000000 found) readseq : 0.251 micros/op; 441.2 MB/s readreverse : 0.456 micros/op; 242.4 MB/s fill100K : 1329.723 micros/op; 71.7 MB/s (1000 ops) crc32c : 1.976 micros/op; 1976.7 MB/s (4K per op) snappycomp : 4.705 micros/op; 830.2 MB/s (output: 55.1%) snappyuncomp : 0.958 micros/op; 4079.1 MB/s acquireload : 0.727 micros/op; (each op is 1000 loads) New: fillseq : 3.129 micros/op; 35.4 MB/s fillsync : 2748.099 micros/op; 0.0 MB/s (1000 ops) fillrandom : 5.394 micros/op; 20.5 MB/s overwrite : 7.253 micros/op; 15.3 MB/s readrandom : 5.655 micros/op; (1000000 of 1000000 found) readrandom : 4.425 micros/op; (1000000 of 1000000 found) readseq : 0.298 micros/op; 371.3 MB/s readreverse : 0.508 micros/op; 217.9 MB/s compact : 885842.000 micros/op; readrandom : 3.545 micros/op; (1000000 of 1000000 found) readseq : 0.252 micros/op; 438.2 MB/s readreverse : 0.425 micros/op; 260.2 MB/s fill100K : 1418.347 micros/op; 67.2 MB/s (1000 ops) crc32c : 1.987 micros/op; 1966.0 MB/s (4K per op) snappycomp : 4.767 micros/op; 819.4 MB/s (output: 55.1%) snappyuncomp : 0.916 micros/op; 4264.9 MB/s acquireload : 0.665 micros/op; (each op is 1000 loads) ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=194002392	7 years ago
costan	d177a0263c	Replace port_posix with port_stdcxx. The porting layer implements threading primitives: atomic pointers, condition variables, mutexes, thread-safe initialization. These are all specified in C++11, so the reference open source port implementation can become platform-independent. The porting layer will remain in place to allow the use of other implementations with more features, such as the built-in deadlock detection in abseil's Mutex. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=193245934	7 years ago
costan	8046a51b21	Add forgotten <limits> header to util/logging.cc. Commit `a0008deb67` introduced std::numeric_limits usage in logging.cc, but didn't #include <limits> ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=192840190	7 years ago
costan	a0008deb67	Reimplement ConsumeDecimalNumber. The old implementation caused odd crashes on ARM, which were fixed by changing a local variable type. The main suspect is the use of a static local variable. This CL replaces the static local variable with constexpr, which still ensures the compiler sees the expressions as constants. The CL also replaces Slice operations in the functions' inner loop with iterator-style pointer operations, which can help the compiler generate less code. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=192832175	7 years ago
costan	1f7dd5d5f6	Add tests for ConsumeDecimalNumber. ConsumeDecimalNumber has fairly non-trivial logic, and a previous version has crashed inexplicably on Android. Having some test coverage will make it easier to tweak / simplify the function later on. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=192821751	7 years ago
costan	09217fd067	Replace NULL with nullptr in C++ files. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=192365747	7 years ago
costan	0db30413a4	leveldb: Add more thread safety annotations. After this CL, all classes with Mutex members should be covered by annotations. Exceptions are atomic members, which shouldn't need locking, and DBImpl members that cause errors when annotated, which will be tackled separately. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=190260865	7 years ago
costan	0fa5a4f7b1	Extend thread safety annotations. This CL makes it easier to reason about thread safety by: 1) Adding Clang thread safety annotations according to comments. 2) Expanding a couple of variable names, without adding extra lines of code. 3) Adding const in a couple of places. 4) Replacing an always-non-null const pointer with a reference. 5) Fixing style warnings in the modified files. This CL does not annotate the DBImpl members that claim to be protected by the instance mutex, but are accessed without the mutex being held. Those members (and their unprotected accesses) will be addressed in future CLs. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=189354657	7 years ago
costan	8143c12f3f	Fix includes in util/testharness.h. This CL removes unused headers included by util/testharness.h, adds precise includes where the build breaks, and fixes style errors in the edited files. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=189331061	7 years ago
costan	aece2068d7	Remove extern from function declarations. External linkage is the default for function declarations in C++. This also fixes ClangTidy errors generated by removing the "extern" keyword as described above. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=188730416	7 years ago
costan	542590d2a8	leveldb: Include <algorithm> in util/env_test.cc. CL 170738066 introduced std::min and std::max to env_test.cc. These require the <algorithm> header. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=171024062	8 years ago
costan	8ae7998aab	Fix FD leak in POSIX Env. Deleting a PosixWritableFile without calling Close() leaks the file descriptor. While the API description in include/leveldb/env.h does not specify whether the caller is responsible for Close()ing the file before deleting it, all other Env file implementations do release underlying resources when destroyed, even if Close() is not called. The leak shows up when running db_tests on Mac Travis, or on a vanilla MacOS install. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170906843	8 years ago
costan	d9a9e02edf	leveldb: Add tests for CL 170769101. This also removes std::unique_ptr introduced in CL 170738066, because it's C++11-only, and the open source version still supports older versions at the moment. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170876919	8 years ago
costan	4447f9cace	Remove handling for unused LRUHandle representation special case. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170876103	8 years ago
sanjay	2372ac574f	Fix file writing bug in CL 170738066. If the file already existed, we should have truncated it. This was not detected by leveldb tests since leveldb code avoids reusing same files, but there was code elsewhere that was directly using leveldb files and relying on this behavior. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170769101	8 years ago
cmumford	1c75e88055	Fix use of uninitialized value in LRUHandle. If leveldb::Options::block_cache is set to a cache of zero capacity then it is possible for LRUHandle::next to be used without having been set. Conditional jump or move depends on uninitialised value(s): leveldb::(anonymous namespace)::LRUHandle::key() const (cache.cc:58) leveldb::(anonymous namespace)::LRUCache::Unref(leveldb::(anonymous namespace)::LRUHandle) (cache.cc:234) leveldb::(anonymous namespace)::LRUCache::Release(leveldb::Cache::Handle) (cache.cc:266) leveldb::(anonymous namespace)::ShardedLRUCache::Release(leveldb::Cache::Handle*) (cache.cc:375) leveldb::CacheTest::Insert(int, int, int) (cache_test.cc:59) This bug forced a commit reversion in Chromium. For more information see https://bugs.chromium.org/p/chromium/issues/detail?id=761398#c4 ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170749054	8 years ago
sanjay	7e12c00ecf	Fix issue 474: a race between the f*_unlocked() STDIO calls in env_posix.cc and concurrent application calls to fflush(NULL). The fix is to avoid using stdio in env_posix.cc but add our own buffering where we need it. Added a test to reproduce the bug. Added a test for Env reads/writes. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170738066	8 years ago
costan	bcd9a8ea4a	Use portable CRC32C from google/crc32c. Benchmark results below. More results at `354d61ef97`. New, MacBookPro13,3 with Core i7 6920HQ: LevelDB: version 1.20 Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) WARNING: Snappy compression is not enabled ------------------------------------------------ fillseq : 2.952 micros/op; 37.5 MB/s fillsync : 43.932 micros/op; 2.5 MB/s (1000 ops) fillrandom : 3.856 micros/op; 28.7 MB/s overwrite : 4.053 micros/op; 27.3 MB/s readrandom : 4.234 micros/op; (1000000 of 1000000 found) readrandom : 3.923 micros/op; (1000000 of 1000000 found) readseq : 0.201 micros/op; 550.8 MB/s readreverse : 0.356 micros/op; 310.6 MB/s compact : 436800.000 micros/op; readrandom : 2.375 micros/op; (1000000 of 1000000 found) readseq : 0.151 micros/op; 734.3 MB/s readreverse : 0.298 micros/op; 370.7 MB/s fill100K : 554.075 micros/op; 172.1 MB/s (1000 ops) crc32c : 1.393 micros/op; 2805.0 MB/s (4K per op) snappycomp : 3902.000 micros/op; (snappy failure) snappyuncomp : 3821.000 micros/op; (snappy failure) acquireload : 13.088 micros/op; (each op is 1000 loads) Baseline, MacBookPro13,3 with Core i7 6920HQ: LevelDB: version 1.20 Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000 RawSize: 110.6 MB (estimated) FileSize: 62.9 MB (estimated) WARNING: Snappy compression is not enabled ------------------------------------------------ fillseq : 3.000 micros/op; 36.9 MB/s fillsync : 46.721 micros/op; 2.4 MB/s (1000 ops) fillrandom : 3.922 micros/op; 28.2 MB/s overwrite : 4.080 micros/op; 27.1 MB/s readrandom : 4.409 micros/op; (1000000 of 1000000 found) readrandom : 3.895 micros/op; (1000000 of 1000000 found) readseq : 0.190 micros/op; 582.4 MB/s readreverse : 0.413 micros/op; 267.6 MB/s compact : 441076.000 micros/op; readrandom : 2.308 micros/op; (1000000 of 1000000 found) readseq : 0.170 micros/op; 651.2 MB/s readreverse : 0.302 micros/op; 366.2 MB/s fill100K : 614.289 micros/op; 155.3 MB/s (1000 ops) crc32c : 3.547 micros/op; 1101.2 MB/s (4K per op) snappycomp : 3393.000 micros/op; (snappy failure) snappyuncomp : 3171.000 micros/op; (snappy failure) acquireload : 12.761 micros/op; (each op is 1000 loads) ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=170100372	8 years ago
cmumford	09a3c8e741	Switched variable type from int to uint64_t in ConsumeDecimalNumber. An Android test was occasionally crashing with a SEGV in ConsumeDecimalNumber Switching a local variable from an int to uint64_t eliminated these crashes. Speculating this is either a compiler, runtime library, or emulator issue. Switching this type to uint64_t also eliminates a compiler warning about comparing an int with a uint64_t. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=166399695	8 years ago
costan	8415f00eee	leveldb: Report missing CURRENT manifest file as database corruption. BTRFS reorders rename and write operations, so it is possible that a filesystem crash and recovery results in a situation where the file pointed to by CURRENT does not exist. DB::Open currently reports an I/O error in this case. Reporting database corruption is a better hint to the caller, which can attempt to recover the database or erase it and start over. This issue is not merely theoretical. It was reported as having showed up in the wild at https://github.com/google/leveldb/issues/195 and at https://crbug.com/738961. Also, asides from the BTRFS case described above, incorrect data in CURRENT seems like a possible corruption case that should be handled gracefully. The Env API changes here can be considered backwards compatible, because an implementation that returns Status::IOError instead of Status::NotFound will still get the same functionality as before. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=161432630	8 years ago
果冻	5b817400a0	fix comment	8 years ago
costan	f3f139737c	Separate Env tests from PosixEnv tests. env_test.cc defines EnvPosixTest which tests the Env implementation returned by Env::Default(). The naming is a bit unfortunate, as the tests in env_test.cc are written against the Env contract, and therefore are applicable to any Env implementation. An instance of the confusion caused by the naming is [] which added a dependency from env_test.cc to EnvPosixTestHelper, which is closely coupled to EnvPosix. This change disentangles EnvPosix-specific test code into a env_posix_test.cc file. The code there uses EnvPosixTestHelper and specifically targets the EnvPosix implementation. env_test.cc now implements EnvTest, and contains tests that are also applicable to other ports, which may define their own Env implementation. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=148914642	8 years ago
costan	ea175e28f8	Implement support for Intel crc32 instruction (SSE 4.2) This change authored by vadimskipin and submitted via: https://github.com/google/leveldb/pull/309 Changes made to support iOS builds and other architectures without support for SSE 4.2. db_bench reports original crc32 speed at: crc32c : 3.610 micros/op; 1082.0 MB/s (4K per op) with this change performance has increased to: crc32c : 0.843 micros/op; 4633.6 MB/s (4K per op) ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=148694935	8 years ago
cmumford	95cd743e5e	Including <limits> for std::numeric_limits. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=146841327	8 years ago
cmumford	646c3588de	Limit the number of read-only files the POSIX Env will have open. Background compaction can create an unbounded number of leveldb::RandomAccessFile instances. On 64-bit systems mmap is used and file descriptors are only used beyond a certain number of mmap's. 32-bit systems to not use mmap at all. leveldb::RandomAccessFile does not observe Options.max_open_files so compaction could exhaust the file descriptor limit. This change uses getrlimit to determine the maximum number of open files and limits RandomAccessFile to approximately 20% of that value. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=143505556	8 years ago
corrado	a2fb086d07	Add option for max file size. The currend hard-coded value of 2M is inefficient in colossus. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=134391640	9 years ago
m3b	06a191b8de	fix problems in LevelDB's caching code Background: LevelDB uses a cache (util/cache.h, util/cache.cc) of (key,value) pairs for two purposes: - a cache of (table, file handle) pairs - a cache of blocks The cache places the (key,value) pairs in a reference-counted wrapper. When it returns a value, it returns a reference to this wrapper. When the client has finished using the reference and its enclosed (key,value), it calls Release() to decrement the reference count. Each (key,value) pair has an associated resource usage. The cache maintains the sum of the usages of the elements it holds, and removes values as needed to keep the sum below a capacity threshold. It maintains an LRU list so that it will remove the least-recently used elements first. The max_open_files option to LevelDB sets the size of the cache of (table, file handle) pairs. The option is not used in any other way. The observed behaviour: If LevelDB at any time used more file handles concurrently than the cache size set via max_open_files, it attempted to reduce the number by evicting entries from the table cache. This could happen most easily during compaction, and if max_open_files was low. Because the handles were in use, their reference count did not drop to zero, and so the usage sum in the cache was not modified by the evictions. Subsequent Insert() calls returned valid handles, but their entries were immediately evicted from the cache, which though empty still acted as though full. As a result, there was effectively no caching, and the number of open file handles rose []ly until it hit system-imposed limits and the process died. If one set max_open_files lower, the cache was more likely to exhibit this beahviour, and cause the process to run out of file descriptors. That is, max_open_files acted in almost exactly the opposite manner from what was intended. The problems: 1. The cache kept all elements on its LRU list eligible for capacity eviction---even those with outstanding references from clients. This was ineffective in reducing resource consumption because there was an outstanding reference, guaranteeing that the items remained. A secondary issue was that there is no guarantee that these in-use items will be the last things reached in the LRU chain, which actually recorded "least-recently requested" rather than "least-recently used". 2. The sum of usages was decremented not when a (key,value) was evicted from the cache, but when its reference count went to zero. Thus, when things were removed from the cache, either by garbage collection or via Erase(), the usage sum was not necessarily decreased. This allowed the cache to act as though full when it was in fact not, reducing caching effectiveness, and leading to more resources being consumed---the opposite of what the evictions were intended to achieve. 3. (minor) The cache's clients insert items into it by first looking up the key, and inserting only if no value is found. Although the cache has an internal lock, the clients use no locking to ensure atomicity of the Lookup/Insert pair. (see table/table.cc: block_cache->Insert() and db/table_cache.cc: cache_->Insert()). Thus, if two threads Insert() at about the same time, they can both Lookup(), find nothing, and both Insert(). The second Insert() would evict the first value, leaving each thread with a handle on its own version of the data, and with the second version in the cache. It would be better if both threads ended up with a handle on the same (key,value) pair, which implies it must be the first item inserted. This suggests that Insert() should not replace an existing value. This can be made safe with current usage inside LeveDB itself, but this is not easy to change first because Cache is a public interface, so to change the semantics of an existing call might break things, second because Cache is an abstract virtual class, so adding a new abstract virtual method may break other implementations, and third, the new method "insert without replacing" cannot be implemented in terms of the existing methods, so cannot be implemented with a non-abstract default. But fortunately, the effects of this issue are minor, so this issue is not fixed by this change. The changes: The assumption in the fixes is that it is always better to cache entries unless removal from the cache would lead to deallocation. Cache entries now have an "in_cache" boolean indicating whether the cache has a reference on the entry. The only ways that this can become false without the entry being passed to its "deleter" are via Erase(), via Insert() when an element with a duplicate key is inserted, or on destruction of the cache. The cache now keeps two linked lists instead of one. All items in the cache are in one list or the other, and never both. Items still referenced by clients but erased from the cache are in neither list. The lists are: - in-use: contains the items currently referenced by clients, in no particular order. (This list is used for invariant checking. If we removed the check, elements that would otherwise be on this list could be left as disconnected singleton lists.) - LRU: contains the items not currently referenced by clients, in LRU order A new internal Ref() method increments the reference count. If incrementing from 1 to 2 for an item in the cache, it is moved from the LRU list to the in-use list. The Unref() call now moves things from the in-use list to the LRU list if the reference count falls to 1, and the item is in the cache. It no longer adjusts the usage sum. The usage sum now reflects only what is in the cache, rather than including still-referenced items that have been evicted. The LRU_Append() now takes a "list" parameter so that it can be used to append either to the LRU list or the in-use list. Lookup() is modified to use the new Ref() call, rather than adjusting the reference count and LRU chain directly. Insert() eviction code is also modified to adjust the usage sum and the in_cache boolean of the evicted elements. Some LevelDB tests assume that there will be no caching whatsoever if the cache size is set to zero, so this is handled as a special case. A new private method FinishErase() is factored out with the common code from where items are removed from the cache. Erase() is modified to adjust the usage sum and the in_cache boolean of the erased elements, and to use FinishErase(). Prune() is modified to use FinishErase() also, and to make use of the fact that the lru_ list now contains only items with reference count 1. - EvictionPolicy is modified to test that an entry with an outstanding handle is not evicted. This test fails with the old cache.cc. - A new test case UseExceedsCacheSize verifies that even when the cache is overfull of entries with outstanding handles, none are evicted. This test fails with the old cache.cc, and is the key issue that causes file descriptors to run out when the cache size is set too small. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=123247237	9 years ago
ssid	706b7f8d43	Resolve race when getting approximate-memory-usage property The write operations in the table happens without holding the mutex lock, but concurrent writes are avoided using "writers_" queue. The Arena::MemoryUsage could access the blocks when write happens. So, the memory usage is cached in atomic word and can be loaded from any thread safely. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=107573379	10 years ago
ssid	528c2bc6ad	Add "approximate-memory-usage" property to leveldb::DB::GetProperty The approximate RAM usage of the database is calculated from the memory allocated for write buffers and the block cache. This is to give an estimate of memory usage to leveldb clients. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=104222307	10 years ago
tzik	359b6bcec2	Add leveldb::Cache::Prune Prune() drops on-memory read cache of the database, so that the client can relief its memory shortage. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=101335710	10 years ago
pkasting	50e77a8263	Fix size_t/int comparison/conversion issues in leveldb. The create function took \|num_keys\| as an int, but callers and implementers wanted it to function as a size_t (e.g. passing std::vector::size() in, passing it to vector constructors as a size arg, indexing containers by it, etc.). This resulted in implicit conversions between the two types as well as warnings (found with Chromium's external copy of these sources, built with MSVC) about signed vs. unsigned comparisons. The leveldb sources were already widely using size_t elsewhere, e.g. for key and filter lengths, so using size_t here is not inconsistent with the existing code. However, it does change the public C API. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=101074871	10 years ago
Sanjay Ghemawat	ac1d69da31	LevelDB now attempts to reuse the preceding MANIFEST and log file when re-opened. (Based on a suggestion by cmumford.) "open" benchmark on my workstation speeds up significantly since we can now avoid three fdatasync calls and a compaction per open: Before: ~80000 microseconds After: ~130 microseconds Details: (1) Added Options::reuse_logs (currently defaults to false) to control new behavior. The intention is to change the default to true after some baking. (2) Added Env::NewAppendableFile() whose default implementation returns a not-supported error. (3) VersionSet::Recovery attempts to reuse the MANIFEST from which it is recovering. (4) DBImpl recovery attempts to reuse the last log file and memtable. (5) db_test.cc now tests a new configuration that sets reuse_logs to true. (6) fault_injection_test also tests a reuse_logs==true config. (7) Added a new recovery_test.	10 years ago
Chris Mumford	803d69203a	Release 1.18 Changes are: * Update version number to 1.18 * Replace the basic fprintf call with a call to fwrite in order to work around the apparent compiler optimization/rewrite failure that we are seeing with the new toolchain/iOS SDKs provided with Xcode6 and iOS8. * Fix ALL the header guards. * Createed a README.md with the LevelDB project description. * A new CONTRIBUTING file. * Don't implicitly convert uint64_t to size_t or int. Either preserve it as uint64_t, or explicitly cast. This fixes MSVC warnings about possible value truncation when compiling this code in Chromium. * Added a DumpFile() library function that encapsulates the guts of the "leveldbutil dump" command. This will allow clients to dump data to their log files instead of stdout. It will also allow clients to supply their own environment. * leveldb: Remove unused function 'ConsumeChar'. * leveldbutil: Remove unused member variables from WriteBatchItemPrinter. * OpenBSD, NetBSD and DragonflyBSD have _LITTLE_ENDIAN, so define PLATFORM_IS_LITTLE_ENDIAN like on FreeBSD. This fixes: * issue #143 * issue #198 * issue #249 * Switch from <cstdatomic> to <atomic>. The former never made it into the standard and doesn't exist in modern gcc versions at all. The later contains everything that leveldb was using from the former. This problem was noticed when porting to Portable Native Client where no memory barrier is defined. The fact that <cstdatomic> is missing normally goes unnoticed since memory barriers are defined for most architectures. * Make Hash() treat its input as unsigned. Before this change LevelDB files from platforms with different signedness of char were not compatible. This change fixes: issue #243 * Verify checksums of index/meta/filter blocks when paranoid_checks set. * Invoke all tools for iOS with xcrun. (This was causing problems with the new XCode 5.1.1 image on pulse.) * include <sys/stat.h> only once, and fix the following linter warning: "Found C system header after C++ system header" * When encountering a corrupted table file, return Status::Corruption instead of Status::InvalidArgument. * Support cygwin as build platform, patch is from https://code.google.com/p/leveldb/issues/detail?id=188 * Fix typo, merge patch from https://code.google.com/p/leveldb/issues/detail?id=159 * Fix typos and comments, and address the following two issues: * issue #166 * issue #241 * Add missing db synchronize after "fillseq" in the benchmark. * Removed unused variable in SeekRandom: value (issue #201)	11 years ago
David Grogan	0cfb990d58	Release LevelDB 1.15 - switched from mmap based writing to simpler stdio based writing. Has a minor impact (0.5 microseconds) on microbenchmarks for asynchronous writes. Synchronous writes speed up from 30ms to 10ms on linux/ext4. Should be much more reliable on diverse platforms. - compaction errors now immediately put the database into a read-only mode (until it is re-opened). As a downside, a disk going out of space and then space being created will require a re-open to recover from, whereas previously that would happen automatically. On the plus side, many corruption possibilities go away. - force the DB to enter an error-state so that all future writes fail when a synchronous log write succeeds but the sync fails. - repair now regenerates sstables that exhibit problems - fix issue 218 - Use native memory barriers on OSX - fix issue 212 - QNX build is broken - fix build on iOS with xcode 5 - make tests compile and pass on windows	11 years ago
David Grogan	0b9a89f40e	Release LevelDB 1.14 Fix issues 200, 201 Also, * Fix link to bigtable paper in docs. * New sstables will have the file extension .ldb. .sst files will continue to be recognized. * When building for iOS, use xcrun to execute the compiler. This may affect issue 177.	12 years ago
David Grogan	748539c183	LevelDB 1.13 Fix issues 77, 87, 182, 190. Additionally, fix the bug described in https://groups.google.com/d/msg/leveldb/yL6h1mAOc20/vLU64RylIdMJ where a large contiguous keyspace of deleted data was not getting compacted. Also fix a bug where options.max_open_files was not getting clamped properly.	12 years ago

1 2 3

117 Commits (a6b3a2012e9c598258a295aef74d88b796c47a2b)