選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。

55 行
1.6 KiB

  1. # AI Data Index Design
  2. ### 1. Testing Steps
  3. 1. Make sure `CMake` and other build tools are installed:
  4. ```shell
  5. sudo apt-get install cmake build-essentials
  6. ```
  7. 2. Create a `\build` folder inside the `hnswlab` directory.
  8. 3. Change directory to the `build` folder:
  9. ```shell
  10. cd build
  11. ```
  12. 4. Run `CMake` to generate the build files:
  13. ```shell
  14. cmake ..
  15. ```
  16. 5. Build the project:
  17. ```shell
  18. make
  19. ```
  20. 6. Run the test program:
  21. ```shell
  22. ./hnsw_test data_file_path data_size query_file_path groundtruth_file_path
  23. ```
  24. For example:
  25. ```shell
  26. ./hnsw_test ../dataset/siftsmall/siftsmall_base.fvecs 10000 ../dataset/siftsmall/siftsmall_query.fvecs 100 ../dataset/siftsmall/siftsmall_groundtruth.ivecs
  27. ```
  28. Our test program will report the recall value and time costs of your algorithm.
  29. ### 2. Mission Description
  30. You need to implement two functions inside hnsw.h and hnsw.c in HNSW way:
  31. ```C
  32. HNSWContext *hnsw_init_context(const char *filename, size_t dim, size_t len); // load data and build graph
  33. void hnsw_approximate_knn(HNSWContext *ctx, VecData *q, int *results, int k); // search KNN results
  34. ```
  35. We have implemented data loading and provided a simplest KNN algorithm. But our implementation can only handle small batches of data(SIFTSMALL dataset), please implement a new approximate KNN algorithm based on the HNSW algorithm so that it can handle large batches of data(SIFT dataset) efficiently.
  36. ### 3. Data Download
  37. Please visit http://corpus-texmex.irisa.fr/
  38. TODO: We should provide a script to download datasets automatically