<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: thelogicdudes</title>
    <description>The latest articles on DEV Community by thelogicdudes (@logicdudes).</description>
    <link>https://dev.to/logicdudes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F851366%2F5a73d237-efb0-4e12-ad36-3d98de49819d.png</url>
      <title>DEV Community: thelogicdudes</title>
      <link>https://dev.to/logicdudes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/logicdudes"/>
    <language>en</language>
    <item>
      <title>How to Build a Machine Learning Recommendation Engine w/ TensorFlow &amp; HarperDB</title>
      <dc:creator>thelogicdudes</dc:creator>
      <pubDate>Wed, 24 Aug 2022 18:49:00 +0000</pubDate>
      <link>https://dev.to/harperfast/how-to-build-a-machine-learning-recommendation-engine-w-tensorflow-harperdb-51jo</link>
      <guid>https://dev.to/harperfast/how-to-build-a-machine-learning-recommendation-engine-w-tensorflow-harperdb-51jo</guid>
      <description>&lt;h4&gt;
  
  
  This article explains how to create a recommendation system using &lt;a href="https://harperdb.io/" rel="noopener noreferrer"&gt;HarperDB&lt;/a&gt; and &lt;a href="https://harperdb.io/docs/custom-functions/" rel="noopener noreferrer"&gt;Custom Functions&lt;/a&gt;.
&lt;/h4&gt;

&lt;p&gt;With HarperDB we are able to deploy a machine learning model to servers located on the edge of the Internet to provide users with content recommendations. We have two examples to demonstrate how this can be done: a book recommender and a song recommender.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2z64h9l9fpfnn0vo8z6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2z64h9l9fpfnn0vo8z6.png" alt=" " width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  An Intro to Recommendation Systems
&lt;/h2&gt;

&lt;p&gt;Recommendation engines are the most universally used type of machine learning. We interact with them everywhere, and it has become the driving force for how we explore new content. And while some are less than perfect, even the least of them are miles ahead of the previous methods we had available to find new things.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdnu8mwamef9ss04082c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbdnu8mwamef9ss04082c.jpg" alt=" " width="480" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Take shows and movies for example. Do you remember the preview channel… staring at it for hours, trying to guess which titles would be worth further exploration? The same is true for audio content, where we were limited to radio stations to find new music. And books were even more difficult. I'd walk along the bookstore aisles picking random books to read their summary before finally finding one to take a chance on.&lt;/p&gt;

&lt;p&gt;All of these methods did work, but they limited us. We could only truly explore the content which was made available by the system at hand - such as the preview channel, radio station, and bookstore shelves. But there is so much more content out there, especially today. So instead of having to choose from the limited options presented by the Best Sellers section at the bookstore, how does one find new content that’s based on their individual interests? &lt;/p&gt;

&lt;p&gt;Recommendation Systems!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Recommendation System? How do they work?
&lt;/h2&gt;

&lt;p&gt;Recommendation systems are exactly what they sound to be; a system that provides recommendations. You give it an idea about what you like, and it points you in the direction of other things you might like.&lt;/p&gt;

&lt;p&gt;There are many ways to set up such a system. For example, if we're talking about products, a system could be built that finds all of the users who bought the same product as you, and then displays the most common products among that group. This could be expanded by adding more items to the input. Such as moving from a single product to three products that you bought, grouping together all of the users who also bought those three items, and then displaying the most common other products among that group.&lt;/p&gt;

&lt;p&gt;What if we're including ratings with the items of interest? For example, if many people are watching one particular movie, does that automatically mean it’s a good movie to recommend? And if not, then how would we prevent it from appearing in the recommendations? When including ratings and reviews, we could set up a filter to remove any content with a bad review to ensure the group of items we're aggregating always contains positive content. This is a great starting point. It works for many cases where you have a dataset of items and user interactions with those items.&lt;/p&gt;

&lt;p&gt;The challenging part arises from the evolving nature of content. How would a song, book, or product that's new get into the output of the recommendation system?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35uv2c28sv26t0rkb1j4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35uv2c28sv26t0rkb1j4.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Machine Learning
&lt;/h2&gt;

&lt;p&gt;While we could keep adding manual filters to the content by sorting the ratings and weighting interactions with new items, there would be areas where we fall short. Especially when creating binary gates such as a positive/negative rating threshold to determine which items are included in the output.&lt;/p&gt;

&lt;p&gt;This is where machine learning takes over. Using libraries such as &lt;a href="https://www.tensorflow.org/recommenders" rel="noopener noreferrer"&gt;TensorFlow Recommenders&lt;/a&gt; with &lt;a href="https://keras.io/" rel="noopener noreferrer"&gt;Keras&lt;/a&gt; models, it's easy to shape the data in ways that will allow the items and users to be viewed and compared in a multidimensional perspective. Qualitative features such as item categories and user profile attributes can be mapped into mathematical concepts that can be quantitatively compared with one another, ultimately providing new insights and better recommendations.&lt;/p&gt;

&lt;p&gt;Using a machine learning pipeline like this also allows the models to be continuously trained by taking the results of the previous model, the successfulness of those recommendations to drive user interactions, and using that data to create better models. I recently wrote an article about using TensorFlowJS and HarperDB for Machine Learning if you’d like to learn more. &lt;/p&gt;

&lt;h2&gt;
  
  
  HarperDB Recommenders
&lt;/h2&gt;

&lt;p&gt;One of the awesome features of living on the edge is the ability to connect users to new insights with low latency and minimal traffic. With HarperDB Custom Functions, we can deploy a model to multiple nodes in a cluster, allowing the closest and most available to return the result to the user.&lt;/p&gt;

&lt;p&gt;Let’s look at two example projects to demonstrate how this would look when providing recommendations to a user based on provided content to create a song recommender and a book recommender.&lt;/p&gt;

&lt;p&gt;To ensure the examples are reusable and straightforward, these projects are the frontside of the process, where we're serving an already trained model. In production we would connect a second piece to this puzzle where we track user interactions in a central location, train new models, and deploy them back out to the edge with HaperDB.&lt;/p&gt;

&lt;h2&gt;
  
  
  HarperDB Song Recommender
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy97ocu00uvvhvauaf3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiy97ocu00uvvhvauaf3v.png" alt=" " width="678" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Github repo: &lt;a href="https://github.com/HarperDB/song-recommender" rel="noopener noreferrer"&gt;HarperDB/song-recommender&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our Song Recommender example, a user can find three of their favorite songs in the UI, and then the recommendation system returns other songs they're likely to enjoy.&lt;/p&gt;

&lt;p&gt;There's a dataset called &lt;a href="http://millionsongdataset.com/" rel="noopener noreferrer"&gt;The Million Song Dataset&lt;/a&gt; that contains very detailed information on over one million songs, ranging from audio analysis to the location of the artist.&lt;/p&gt;

&lt;p&gt;There's a nice subset of that data called &lt;a href="http://millionsongdataset.com/tasteprofile/" rel="noopener noreferrer"&gt;The Echo Nest Taste Profile Subset&lt;/a&gt; which is a list of users, songs, and play counts. Each row in the data contains a userid, songid, and playcount. This is what we used to build the model for this project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xf3tyavch84gc1c9qih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xf3tyavch84gc1c9qih.png" alt=" " width="800" height="246"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Training the Model
&lt;/h3&gt;

&lt;p&gt;We created a Two-Tower model, which is a design where two latent vectors are created, compared during training to shape the embedding layers, and compared during inference to find the best match.&lt;/p&gt;

&lt;p&gt;Tower One is the Query Tower. This is essentially the input to the equation, which in this case is three songs that the user selects.&lt;/p&gt;

&lt;p&gt;Tower Two is the Candidate Tower. In this case it's the users in the dataset.&lt;/p&gt;

&lt;p&gt;We find the user in the dataset most similar to the user using the UI, and then provide their most played songs as recommendations.&lt;/p&gt;

&lt;p&gt;To look at the training specifics, here's the &lt;a href="https://github.com/HarperDB/song-recommender/blob/main/notebooks/song_recommender.ipynb" rel="noopener noreferrer"&gt;training notebook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The basic steps that are taken are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the most played songs for each user&lt;/li&gt;
&lt;li&gt;Create a query/candidate pair of three of those songs and the user ([songs], user)&lt;/li&gt;
&lt;li&gt;Build two models and setup TFRS metrics for training&lt;/li&gt;
&lt;li&gt;Feed the query/candidate pair into the model&lt;/li&gt;
&lt;li&gt;When training is complete, create a new model that wraps the above two&lt;/li&gt;
&lt;li&gt;Apply the BruteForce method to extract the best candidate match provided an input&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Generating the Data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a dictionary of inputs and outputs
dataset = {'songs': [], 'user': []}
for user, songs in users_songs.items():
    for _ in range(len(songs) * 5):
        # randomly select n_songsns_in from the user's isbns
        selected_songs = np.random.choice(songs, n_songs_in)
        # add them to the inputs
        dataset['songs'].append(selected_songs)
        # add the user to the output
        dataset['user'].append(user)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Building the Models&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create the query and candidate models
n_embedding_dimensions = 24

## QUERY
songs_in_in = tf.keras.Input(shape=(n_songs_in))
songs_in_emb = tf.keras.layers.Embedding(n_songs+1, n_embedding_dimensions)(songs_in_in)
songs_in_emb_avg = tf.keras.layers.AveragePooling1D(pool_size=3)(songs_in_emb)
query = tf.keras.layers.Flatten()(songs_in_emb_avg)
query_model = tf.keras.Model(inputs=songs_in_in, outputs=query)

## CANDIDATE
user_in = tf.keras.Input(shape=(1))
user_emb = tf.keras.layers.Embedding(n_users+1, n_embedding_dimensions)(user_in)
candidate = tf.keras.layers.Flatten()(user_emb)
candidate_model = tf.keras.Model(inputs=user_in, outputs=candidate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Creating the TensorFlow Recommenders Task&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# TFRS TASK SETUP
candidates = dataset.batch(128).map(lambda x: candidate_model(x['user']))
metrics = tfrs.metrics.FactorizedTopK(candidates=candidates)
task = tfrs.tasks.Retrieval(metrics=metrics)


## TFRS MODEL CLASS
class Model(tfrs.Model):
    def __init__(self, query_model, candidate_model):
        super().__init__()
        self._query_model = query_model
        self._candidate_model = candidate_model
        self._task = task

    def compute_loss(self, features, training=False):
        query_embedding = self._query_model(features['songs'])
        candidate_embedding = self._candidate_model(features['user'])
        return self._task(query_embedding, candidate_embedding)

## COMPILE AND TRAIN MODEL
model = Model(query_model, candidate_model)
# load model weights - this is to resume training
# model._query_model.load_weights(weights_dir.format('query'))
# model._candidate_model.load_weights(weights_dir.format('candidate'))

model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
model.fit(dataset.repeat().shuffle(300_000).batch(4096), steps_per_epoch=50, epochs=30, verbose=1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying the Model
&lt;/h3&gt;

&lt;p&gt;To deploy this model, the final step in the above notebook is converting it to TensorFlowJS.&lt;/p&gt;

&lt;p&gt;From there, we add it to the Custom Function's directory along with the logic in recommend.js which converts the user's three favorite songs into tensors which are provided to the model.&lt;/p&gt;

&lt;p&gt;The output of the model is a reference to the most similar user to the one using the UI.&lt;/p&gt;

&lt;p&gt;The top songs for that user are then returned and displayed in the UI as our recommendations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating an input tensor and getting results&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; if (!this.model) {
   const modelPath = path.join(__dirname, '../', 'tfjs-model', 'model.json');
   this.model = await tf.loadGraphModel(`file://${modelPath}`);
 }

 const inputTensor = tf.tensor([songIdxs], [1, 3], 'int32')
 const results = this.model.execute(inputTensor)
 const r0 = await results[0].data()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  HarperDB Book Recommender
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff63bibtgygyboowbc9nl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff63bibtgygyboowbc9nl.png" alt=" " width="637" height="181"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Github repo: &lt;a href="https://github.com/HarperDB/book-recommender" rel="noopener noreferrer"&gt;HarperDB/book-recommender&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In our Book Recommender example, a user can find three of their favorite books in the UI, and then the recommendation system returns other books they're likely to enjoy.&lt;/p&gt;

&lt;p&gt;There's a &lt;a href="https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset" rel="noopener noreferrer"&gt;dataset on Kaggle&lt;/a&gt; that includes a list of users, books, and ratings for about 250,000 different titles&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvp4q3zmrnu99ag5pezt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvp4q3zmrnu99ag5pezt.png" alt=" " width="800" height="238"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Training the Model
&lt;/h3&gt;

&lt;p&gt;We again created a Two-Tower model, which is a design where two latent vectors are created, compared during training to shape the embedding layers, and compared during inference to find the best match.&lt;/p&gt;

&lt;p&gt;Tower One is the Query Tower. This is essentially the input to the equation, which in this case is three books that the user rated highly (a 5 and above out of 10).&lt;/p&gt;

&lt;p&gt;Tower Two is the Candidate Tower. In this case it's the users in the dataset.&lt;/p&gt;

&lt;p&gt;We find the user in the dataset most similar to the user using the UI, and then provide their highest rated books as recommendations.&lt;/p&gt;

&lt;p&gt;To look at the training specifics, here's the &lt;a href="https://github.com/HarperDB/book-recommender/blob/main/notebooks/book_recommender.ipynb" rel="noopener noreferrer"&gt;training notebook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The basic steps that are taken are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the top rated books for each user&lt;/li&gt;
&lt;li&gt;Create a query/candidate pair of three of those books and the user ([books], user)&lt;/li&gt;
&lt;li&gt;Build two models and setup TFRS metrics for training&lt;/li&gt;
&lt;li&gt;Feed the query/candidate pair into the model&lt;/li&gt;
&lt;li&gt;When training is complete, create a new model that wraps the above two&lt;/li&gt;
&lt;li&gt;Apply the BruteForce method to extract the best candidate match provided an input&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Generating the Data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a dictionary of inputs and outputs
dataset = {'isbns': [], 'user': []}
for user_id, isbns in user_isbns.items():
    # use 5x the number of isbns gathered for the user
    # this ensures a larger amount of training data
    for _ in range(len(isbns) * 5):
        # randomly select n_isbns_in from the user's isbns
        selected_isbns = np.random.choice(isbns, n_isbns_in)
        # add them to the inputs
        dataset['isbns'].append(selected_isbns)
        # add the user to the output
        dataset['user'].append(user_idxs[user_id])


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Building the Models&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create the query and candidate models
n_embedding_dimensions = 24

## QUERY
isbns_in_in = tf.keras.Input(shape=(n_isbns_in))
isbns_in_emb = tf.keras.layers.Embedding(n_isbns+1, n_embedding_dimensions)(isbns_in_in)
isbns_in_emb_avg = tf.keras.layers.AveragePooling1D(pool_size=3)(isbns_in_emb)
query = tf.keras.layers.Flatten()(isbns_in_emb_avg)
query_model = tf.keras.Model(inputs=isbns_in_in, outputs=query)

## CANDIDATE
isbns_out_in = tf.keras.Input(shape=(1))
isbns_out_emb = tf.keras.layers.Embedding(n_users+1, n_embedding_dimensions)(isbns_out_in)
candidate = tf.keras.layers.Flatten()(isbns_out_emb)
candidate_model = tf.keras.Model(inputs=isbns_out_in, outputs=candidate)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Creating the TensorFlow Recommenders Task&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# TFRS TASK SETUP
candidates = dataset.batch(128).map(lambda x: candidate_model(x['user']))
metrics = tfrs.metrics.FactorizedTopK(candidates=candidates)
task = tfrs.tasks.Retrieval(metrics=metrics)


## TFRS MODEL CLASS
class Model(tfrs.Model):
    def __init__(self, query_model, candidate_model):
        super().__init__()
        self._query_model = query_model
        self._candidate_model = candidate_model
        self._task = task

    def compute_loss(self, features, training=False):
        query_embedding = self._query_model(features['isbns'])
        candidate_embedding = self._candidate_model(features['user'])
        return self._task(query_embedding, candidate_embedding)

## COMPILE AND TRAIN MODEL
model = Model(query_model, candidate_model)
# load model weights - this is to resume training
# model._query_model.load_weights(weights_dir.format('query'))
# model._candidate_model.load_weights(weights_dir.format('candidate'))

model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
model.fit(dataset.repeat().shuffle(300_000).batch(4096), steps_per_epoch=50, epochs=30, verbose=1)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create the BruteForce Comparison
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create the index model to lookup the best candidate match for a query
index = tfrs.layers.factorized_top_k.BruteForce(model._query_model)
index.index_from_dataset(
    tf.data.Dataset.zip((
      dataset.map(lambda x: x['user']).batch(100),
      dataset.batch(100).map(lambda x: model._candidate_model(x['user']))
    ))
)
for features in dataset.shuffle(2000).batch(1).take(1):
    print('isbns', features['isbns'])
    scores, users = index(features['isbns'])
    print('recommended users', users)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying the Model
&lt;/h3&gt;

&lt;p&gt;To deploy this model, the final step in the above notebook is converting it to TensorFlowJS.&lt;/p&gt;

&lt;p&gt;From there, we add it to the Custom Function's directory along with the logic in recommend.js which converts the user's three favorite books into tensors which are provided to the model.&lt;/p&gt;

&lt;p&gt;The output of the model is a reference to the most similar user to the one using the UI.&lt;/p&gt;

&lt;p&gt;The top rated books for that user are then returned and displayed in the UI as our recommendations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;And there you have it, that’s how you can create a recommendation system with HarperDB Custom Functions.&lt;/p&gt;

&lt;p&gt;These machine learning models were pre-trained for these examples, as they do take 12+ hours to reach their most accurate state. In a production environment, there would most likely be a single instance responsible for continuously training the model and distributing it out to the other nodes on the edge.&lt;/p&gt;

&lt;p&gt;Go ahead and launch a HarperDB instance with Custom Functions where you can pull in one of these repos and get recommendations for new songs and books to check out. If you get an interesting result that you enjoy, please let us know!&lt;/p&gt;

&lt;p&gt;Thanks for reading,&lt;/p&gt;

&lt;p&gt;–Kevin&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>database</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Using TensorFlowJS &amp; HarperDB for Machine Learning</title>
      <dc:creator>thelogicdudes</dc:creator>
      <pubDate>Tue, 03 May 2022 15:33:00 +0000</pubDate>
      <link>https://dev.to/harperfast/using-tensorflowjs-harperdb-for-machine-learning-1me1</link>
      <guid>https://dev.to/harperfast/using-tensorflowjs-harperdb-for-machine-learning-1me1</guid>
      <description>&lt;h4&gt;
  
  
  Implementing a Dog Breed Classifier Using Stanford Dogs and MobileNet with HarperDB Custom Functions
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1r41aghr7ahbfwvplr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fov1r41aghr7ahbfwvplr.png" alt="HarperDB Logo" width="563" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;HarperDB is an easy-to-use database solution that has a simple method of creating endpoints to interact with data, called Custom Functions. These Custom Functions can even be used to implement a machine learning algorithm to classify incoming data. TensorFlowJS is a library released by Google that makes it possible to use JavaScript for machine learning so it can be done in the browser or on a NodeJS server like we'll be doing in this article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd93wubbnamxo8zr3urt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd93wubbnamxo8zr3urt.png" alt="from Stanford Dogs" width="200" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What We're Going To Do
&lt;/h3&gt;

&lt;p&gt;This article will explain how to train and use a &lt;a href="https://www.tensorflow.org/js" rel="noopener noreferrer"&gt;TensorFlowJS&lt;/a&gt; model to classify dog breeds with &lt;a href="https://harperdb.io/docs/custom-functions/" rel="noopener noreferrer"&gt;HarperDB Custom Functions&lt;/a&gt;, using the &lt;a href="http://vision.stanford.edu/aditya86/ImageNetDogs/" rel="noopener noreferrer"&gt;Stanford Dogs dataset&lt;/a&gt; and &lt;a href="https://www.npmjs.com/package/@tensorflow-models/mobilenet" rel="noopener noreferrer"&gt;MobileNetV2&lt;/a&gt; as a base for transfer learning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stanford Dogs
&lt;/h3&gt;

&lt;p&gt;There's an awesome dataset that was released by Stanford with 20,000 images of dogs. The images are grouped into different folders, each folder containing the name of the breed. There are additional annotations available for bounding boxes as well, but today we'll be focused solely on classifying the breed.&lt;/p&gt;

&lt;h3&gt;
  
  
  MobileNet
&lt;/h3&gt;

&lt;p&gt;There's a SOTA (state of the art) model published by Google called MobileNet which is a relatively small model with the ability to classify over 1,000 images. It's built small so it'll run on mobile devices without taking up too many resources. We'll be using version 2 of this model which is available in the &lt;a href="https://www.npmjs.com/package/@tensorflow-models/mobilenet" rel="noopener noreferrer"&gt;@tensorflow-models/mobilenet&lt;/a&gt; package.&lt;/p&gt;

&lt;h3&gt;
  
  
  Transfer Learning
&lt;/h3&gt;

&lt;p&gt;Transfer learning is the technique of taking a pretrained model and training it to output new data. Like teaching an old dog new tricks! For that we'll be using &lt;a href="https://www.npmjs.com/package/@tensorflow-models/knn-classifier" rel="noopener noreferrer"&gt;@tensorflow-models/knn-classifier&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We'll be sending an image into MobileNet and getting out the logits, which is the bit right before the classification. Then we'll send those logits into a KNN-Classifier which uses the K-Nearest Neighbors algorithm to associate those logits with specific dog breeds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;If that all sounds complicated, &lt;strong&gt;don't worry&lt;/strong&gt;. This implementation will be quick and easy thanks to HarperDB Custom Functions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyg1sy07xr0e9e6zra4e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnyg1sy07xr0e9e6zra4e.png" alt="Screenshot of HarperDB Studio - Classification Table w/ a Stanford Dog" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prereqs
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;A HarperDB Account&lt;/li&gt;
&lt;li&gt;A HarperDB Local Database&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Clone the Repo
&lt;/h3&gt;

&lt;p&gt;Clone &lt;a href="https://github.com/HarperDB/hdb-cf-dogml" rel="noopener noreferrer"&gt;this repo&lt;/a&gt; into your Custom Functions folder&lt;br&gt;
&lt;code&gt;git clone https://github.com/HarperDB/hdb-cf-dogml.git ~/hdb/src/custom_functions/dogml&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Restart Custom Functions
&lt;/h3&gt;

&lt;p&gt;Use the link in the HarperDB Studio Functions page (bottom left of the screen) to refresh the projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh4qz2ux5bfntrtb0l9o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh4qz2ux5bfntrtb0l9o.png" alt="Screenshot of Custom Functions link" width="800" height="223"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff64g2ry71hxvrnfsktz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff64g2ry71hxvrnfsktz8.png" alt="Screenshot of Server Restart button" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Run /setup
&lt;/h3&gt;

&lt;p&gt;The training data and TensorFlowJS modules need to be installed. This can be done via the &lt;code&gt;/setup&lt;/code&gt; endpoint.&lt;/p&gt;

&lt;p&gt;If you go to &lt;a href="http://localhost:9926/dogml/setup" rel="noopener noreferrer"&gt;http://localhost:9926/dogml/setup&lt;/a&gt; it'll start the setup. You can check on the progress in the logs - either in stdout from the locally running database or in the logs section of the Status page inside of the Studio.&lt;/p&gt;

&lt;p&gt;The expected output of starting setup is &lt;code&gt;{success: true, message: ML Setup Started}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This will use the &lt;code&gt;$HOME/dogml&lt;/code&gt; directory in relation to the database for all of the training materials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be sure to wait for the ML Setup Complete note in the database logs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b5opm185rrx4o06oglw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b5opm185rrx4o06oglw.png" alt="Screenshot of HarperDB Logs" width="800" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Activate
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Run /train
&lt;/h3&gt;

&lt;p&gt;To train the model, visit the &lt;code&gt;/train&lt;/code&gt; endpoint by going to &lt;a href="http://localhost:9926/dogml/train" rel="noopener noreferrer"&gt;http://localhost:9926/dogml/train&lt;/a&gt;. This will begin the model training. You can see the status inside of the console logs (similar to viewing the info during /setup), or inside of the logs table inside of the schema.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verify Model
&lt;/h3&gt;

&lt;p&gt;Once the logs indicate that the training is complete, you should be able to see the model appear in the models table in the schema.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi14q2dcm3x7h2skaib76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi14q2dcm3x7h2skaib76.png" alt="Screenshot of HarperDB Studio - Models Table" width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Classify a Dog Breed!
&lt;/h3&gt;

&lt;p&gt;Travel to the UI at &lt;a href="http://localhost:9926/dogml/ui" rel="noopener noreferrer"&gt;http://localhost:9926/dogml/ui&lt;/a&gt; and try uploading an image of a dog (one of the images in the &lt;code&gt;$HOME/dogml/training_data/Images&lt;/code&gt; directory will do).&lt;br&gt;
The results should appear in the UI as well as in the classifications table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu97r2mnl5rnt8ggmhi32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu97r2mnl5rnt8ggmhi32.png" alt="Screenshot of HarperDB ML Dog Dashboard" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Go Deeper
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Add New Training Data
&lt;/h3&gt;

&lt;p&gt;You can add more training data by adding new images to the &lt;code&gt;$HOME/dogml/training_data/Images&lt;/code&gt; directory - either by putting the image in the correct folder or making a new folder (if it's a breed without a folder already present). All images should be JPEGs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Removing Training Data
&lt;/h3&gt;

&lt;p&gt;You can also remove training data in the &lt;code&gt;$HOME/dogml/training_data/Images&lt;/code&gt; directory to better target specific breeds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Update the Model
&lt;/h3&gt;

&lt;p&gt;If you modify the training data and use the &lt;code&gt;/train&lt;/code&gt; endpoint to create a new model, be sure to then call the &lt;code&gt;/update&lt;/code&gt; endpoint at &lt;a href="http://localhost:9926/dogml/update" rel="noopener noreferrer"&gt;http://localhost:9926/dogml/update&lt;/a&gt; to ensure the new model is loaded into the classifier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Train w/ GPU
&lt;/h3&gt;

&lt;p&gt;To train the model 200% faster, use the &lt;code&gt;/train_gpu&lt;/code&gt; endpoint at &lt;a href="http://localhost:9926/dogml/train_gpu" rel="noopener noreferrer"&gt;http://localhost:9926/dogml/train_gpu&lt;/a&gt;. This will take advantage of a CUDA-Enabled Nvidia GPU to process the training mathematics quicker.&lt;/p&gt;

&lt;p&gt;Be sure the necessary drivers and CUDA libraries are installed&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.tensorflow.org/install/gpu#linux_setup" rel="noopener noreferrer"&gt;Here's a guide to install CUDA on Ubuntu&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Review
&lt;/h2&gt;

&lt;p&gt;There you have it, you've just trained a machine learning model on dog breed data and can now use it to classify images of dogs and determine the breed. To do this, we used a HarperDB Custom Function and TensorFlowJS to train a MobileNet model on the Stanford Dogs dataset.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>machinelearning</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
