DEV Community

Eugene Yan
Eugene Yan

Posted on • Originally published at eugeneyan.com on

 

How to Install Google’s Scalable Nearest Neighbors (ScaNN) on Mac

A few months back, Google shared about Scalable Nearest Neighbors, ScaNN (Paper, Code) for efficient vector similarity search. It seemed to beat the SOTA benchmarks on angular distance (i.e., >2x throughput for a given recall level).

Alt Text
ANN Benchmarks on the GloVe embeddings (dim=100) (source)

Recently, I found some time to try it out but was frustrated by how tricky it was to install on a Mac. Here are the steps I took to install it successfully.

Step-by-step walkthrough

First, we install the necessary compilers.

brew install bazel
brew install llvm
brew install gcc
Enter fullscreen mode Exit fullscreen mode

Then, we set up our Python version via pyenv

brew update && brew upgrade pyenv
pyenv --version
> pyenv 1.2.21

pyenv install 3.8.6. # Doesn't work with 3.9 yet
pyenv local 3.8.6
python --version
> Python 3.8.6
Enter fullscreen mode Exit fullscreen mode

Now, we create our virtual environment.

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
Enter fullscreen mode Exit fullscreen mode

ScaNN is part of the google-research repo which is huge. There are more than 200 directories in there and we don’t need all of them. Thus, we’ll do the following to only checkout the ScaNN directory.

git clone --depth 1 --filter=blob:none --no-checkout https://github.com/google-research/google-research.git
git checkout master -- scann
cd scann
Enter fullscreen mode Exit fullscreen mode

Next, we’ll need to install the Python dependencies.

pip install wheel
python configure.py
# There might be complaints about "tensorflow 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.2 which is incompatible." but it's fine
Enter fullscreen mode Exit fullscreen mode

Several issues prevent a direct installation and we’ll be manually fixing them here.

First, we’ll update .bazelrc and .bazel-query.sh. (It’s not absolutely necessary to update .bazel-query.sh but I thought we do it anyway for completeness). We should replace:

TF_SHARED_LIBRARY_NAME="ensorflow_framework.2"
Enter fullscreen mode Exit fullscreen mode

With:

TF_SHARED_LIBRARY_NAME="libtensorflow_framework.2.dylib"
Enter fullscreen mode Exit fullscreen mode

Then, we’ll need to update the C++ imports by replacing (there are four of these):

#include <hash_set>
Enter fullscreen mode Exit fullscreen mode

With:

#include <ext/hash_set>
Enter fullscreen mode Exit fullscreen mode

Now, we can build it via bazel. Instead of using clang-8 as specified, I just used the latest version of clang and it worked fine.

CC=/usr/local/opt/llvm/bin/clang CXX=/usr/local/opt/gcc/bin/gcc bazel build -c opt --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg
Enter fullscreen mode Exit fullscreen mode

If it builds successfully, we should see output similar to this.

INFO: Elapsed time: 316.366s, Critical Path: 206.32s
INFO: 1066 processes: 319 internal, 747 local.
INFO: Build completed successfully, 1066 total actions
Enter fullscreen mode Exit fullscreen mode

Then, we build the Python wheel:

./bazel-bin/build_pip_pkg
Enter fullscreen mode Exit fullscreen mode

And now we can install it:

pip install scann-1.1.1-<replace with your package suffix>
Enter fullscreen mode Exit fullscreen mode

You can test if the installation was successful in Python:

import scann
scann.scann_ops_pybind.builder()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: builder() missing 3 required positional arguments: 'db', 'num_neighbors', and 'distance_measure'
Enter fullscreen mode Exit fullscreen mode

You should get the error if installation was successful. Here’s a sample demo on using it.

Top comments (3)

Collapse
 
stephenlagree profile image
stephenlagree

Thanks, I had some issues installing on the 1.0.0 version from source, it became easier once they released the pip package.

I wrote up a post about using it on docker if you are interested.
postoak.io/scann/ai/ml/2020/12/29/...

Collapse
 
akinolaoke13 profile image
akinolaoke13

whenever I do git checkout master -- scann I get this error after it processes some already: error: cannot create standard output pipe for fetch-pack: Too many open files. I then encounter some issues when trying to change my .bazelrc file because I dont see where to make the '#include change'. I suspect these issues are not allowing my bazel build correctly cause I then get this error when I run 'CC=/usr/local/opt/llvm/bin/clang CXX=/usr/local/opt/gcc/bin/gcc bazel build -c opt --copt=-mavx2 --copt=-mfma --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --cxxopt="-std=c++17" --copt=-fsized-deallocation --copt=-w :build_pip_pkg':

ERROR: /Users/akoke/google-research/scann/BUILD.bazel:6:10: no such package 'scann/scann_ops/py': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.

  • /Users/akoke/google-research/scann/scann/scann_ops/py and referenced by '//:build_pip_pkg' ERROR: Analysis of target '//:build_pip_pkg' failed; build aborted: Analysis failed INFO: Elapsed time: 0.262s INFO: 0 processes. FAILED: Build did NOT complete successfully (4 packages loaded, 5 targets conf\ igured)
Collapse
 
italodamato profile image
Italo

had to do cd google-research before git checkout master -- scann

An Animated Guide to Node.js Event Loop

>> Check out this classic DEV post <<