Chen Lixiang dfdaa316dc | 1 year ago | ||
---|---|---|---|
dataset | 删除 | 1 year ago | |
inc | 删除 | 1 year ago | |
src | 删除 | 1 year ago | |
.gitignore | 1 year ago | ||
CMakeLists.txt | 1 year ago | ||
README.md | 1 year ago |
Make sure CMake
and other build tools are installed:
sudo apt-get install cmake build-essentials
Create a \build
folder inside the hnswlab
directory.
Change directory to the build
folder:
cd build
Run CMake
to generate the build files:
cmake ..
Build the project:
make
Run the test program:
./hnsw_test data_file_path data_size query_file_path groundtruth_file_path
For example:
./hnsw_test ../dataset/siftsmall/siftsmall_base.fvecs 10000 ../dataset/siftsmall/siftsmall_query.fvecs 100 ../dataset/siftsmall/siftsmall_groundtruth.ivecs
Our test program will report the recall value and time costs of your algorithm.
You need to implement two functions inside hnsw.h and hnsw.c in HNSW way:
HNSWContext *hnsw_init_context(const char *filename, size_t dim, size_t len); // load data and build graph
void hnsw_approximate_knn(HNSWContext *ctx, VecData *q, int *results, int k); // search KNN results
We have implemented data loading and provided a simplest KNN algorithm. But our implementation can only handle small batches of data(SIFTSMALL dataset), please implement a new approximate KNN algorithm based on the HNSW algorithm so that it can handle large batches of data(SIFT dataset) efficiently.
Please visit http://corpus-texmex.irisa.fr/
TODO: We should provide a script to download datasets automatically