|
|
- # AI Data Index Design
-
-
- ### 1. Testing Steps
-
- 1. Make sure `CMake` and other build tools are installed:
- ```shell
- sudo apt-get install cmake build-essentials
- ```
-
- 2. Create a `\build` folder inside the `hnswlab` directory.
-
- 3. Change directory to the `build` folder:
- ```shell
- cd build
- ```
-
- 4. Run `CMake` to generate the build files:
- ```shell
- cmake ..
- ```
-
- 5. Build the project:
- ```shell
- make
- ```
-
- 6. Run the test program:
- ```shell
- ./hnsw_test data_file_path data_size query_file_path groundtruth_file_path
- ```
-
- For example:
- ```shell
- ./hnsw_test ../dataset/siftsmall/siftsmall_base.fvecs 10000 ../dataset/siftsmall/siftsmall_query.fvecs 100 ../dataset/siftsmall/siftsmall_groundtruth.ivecs
- ```
-
- Our test program will report the recall value and time costs of your algorithm.
-
- ### 2. Mission Description
-
- You need to implement two functions inside hnsw.h and hnsw.c in HNSW way:
-
- ```C
- HNSWContext *hnsw_init_context(const char *filename, size_t dim, size_t len); // load data and build graph
- void hnsw_approximate_knn(HNSWContext *ctx, VecData *q, int *results, int k); // search KNN results
- ```
-
- We have implemented data loading and provided a simplest KNN algorithm. But our implementation can only handle small batches of data(SIFTSMALL dataset), please implement a new approximate KNN algorithm based on the HNSW algorithm so that it can handle large batches of data(SIFT dataset) efficiently.
-
- ### 3. Data Download
-
- Please visit http://corpus-texmex.irisa.fr/
-
- TODO: We should provide a script to download datasets automatically
|