<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nkwam Philip </title>
    <description>The latest articles on DEV Community by Nkwam Philip  (@nkwamphilip).</description>
    <link>https://dev.to/nkwamphilip</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F867403%2F83aaea85-c678-4d34-8c17-78c5fc2bc8dd.jpeg</url>
      <title>DEV Community: Nkwam Philip </title>
      <link>https://dev.to/nkwamphilip</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nkwamphilip"/>
    <language>en</language>
    <item>
      <title>PowerBI pORTf</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Thu, 04 Aug 2022 11:41:34 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/powerbi-portf-1eh8</link>
      <guid>https://dev.to/nkwamphilip/powerbi-portf-1eh8</guid>
      <description>&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiZTA0M2FiZGMtNDYwNC00ZmJiLWIwOTMtODgyNzc5ZDEwNmM3IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiZTA0M2FiZGMtNDYwNC00ZmJiLWIwOTMtODgyNzc5ZDEwNmM3IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiODVhYTBlZDktM2I4OC00OTM2LTg0NDYtMjlkNGU4ZWJmYzgxIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiODVhYTBlZDktM2I4OC00OTM2LTg0NDYtMjlkNGU4ZWJmYzgxIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiYjc5ZTlhZmMtYjQ5ZC00MWU4LWEyMzAtZWVjNTllODZlOTc5IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiYjc5ZTlhZmMtYjQ5ZC00MWU4LWEyMzAtZWVjNTllODZlOTc5IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiYWY3YmYxN2ItY2RhMi00NzQ2LWFhYTktNjlhYWY4MTcyY2E1IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiYWY3YmYxN2ItY2RhMi00NzQ2LWFhYTktNjlhYWY4MTcyY2E1IiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiMjdjYzViNTUtYTdkNy00NzZlLWJlZWItODI5ODk1MWI0NzMyIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiMjdjYzViNTUtYTdkNy00NzZlLWJlZWItODI5ODk1MWI0NzMyIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.powerbi.com/view?r=eyJrIjoiMjkyYjM1MDktNjgyMS00Mzc3LWJmZTEtOTIxM2RjMGEwMWVmIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9"&gt;https://app.powerbi.com/view?r=eyJrIjoiMjkyYjM1MDktNjgyMS00Mzc3LWJmZTEtOTIxM2RjMGEwMWVmIiwidCI6Ijc0MzBjOGJlLWQ1ZTMtNDgxYi1hNTcwLTZjOGI0MzRkZGY4OCIsImMiOjZ9&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Explaining Precision and Recall</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Tue, 02 Aug 2022 18:40:19 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/explaining-precision-and-recall-5ej9</link>
      <guid>https://dev.to/nkwamphilip/explaining-precision-and-recall-5ej9</guid>
      <description>&lt;p&gt;In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.&lt;/p&gt;

&lt;p&gt;Predicted Positive Values - Values predicted to be positive&lt;br&gt;
Real Positive Values - Values that are actually positive&lt;/p&gt;

&lt;p&gt;Consider a computer program for recognising dogs (the relevant element) in a digital photograph. Upon processing a picture which contains ten cats and twelve dogs, the program identifies eight dogs. Of the eight elements identified as dogs, only five actually are dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). The program's precision is then 5/8 (true positives / Predicted Positive values) while its recall is 5/12 (true positives / Real Positive values). &lt;/p&gt;

&lt;p&gt;Precision and recall can also be referred to as positive predicted value(PPV) and sensitivity(TPR). All of these would be properly explained.&lt;/p&gt;

&lt;p&gt;Precision can be referred to as the number of real positive value predicted divided by the total "positively" predicted outcome. It explains how many true positive values out of the total values are predicted to be true.&lt;/p&gt;

&lt;p&gt;TP/(TP + FP)&lt;/p&gt;

&lt;p&gt;Precision explains how well your model has performed in terms of correctly predicting true value outcomes out of the seemingly true outcomes. The model would happened to have predicted some values to be true when it's clearly not.&lt;/p&gt;

&lt;p&gt;That explains why it is called POSITIVE PREDICTED VALUE&lt;/p&gt;

&lt;p&gt;Recall on the other hand explains how many True Positive values where predicted out of the total True Positive Values. The True positive outcome divided by the total True Positive Values.&lt;/p&gt;

&lt;p&gt;TP/(TP + FN)&lt;/p&gt;

&lt;p&gt;Recall further reveals how many True positive values were predicted to be false and it substantially measures how well a model has predicted true positive values from a range of Real Positive values.&lt;/p&gt;

&lt;p&gt;Recall is therefore referred to as SENSITIVITY(True Positive Rate)&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building an IoT Analytics Pipeline on Google Cloud</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Thu, 28 Jul 2022 06:14:04 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/building-an-iot-analytics-pipeline-on-google-cloud-4j7c</link>
      <guid>https://dev.to/nkwamphilip/building-an-iot-analytics-pipeline-on-google-cloud-4j7c</guid>
      <description>&lt;p&gt;Internet of Things (IoT) refers to the interconnection of physical devices with the global Internet. These devices are equipped with sensors and networking hardware, and each is globally identifiable.&lt;br&gt;
Cloud IOT Core is a fully managed service that allows you to easily and securely connect, manage, and ingest data from millions of globally dispersed devices. The service connects IoT devices that use the standard Message Queue Telemetry Transport (MQTT) protocol to other Google Cloud data services.&lt;/p&gt;

&lt;p&gt;Cloud IoT Core has two main components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A device manager for registering devices with the service, so you can then monitor and configure them.&lt;/li&gt;
&lt;li&gt;A protocol bridge that supports MQTT, which devices can use to connect to Google Cloud.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;OBJECTIVES&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Connect and manage MQTT-based devices using Cloud IoT Core (using simulated devices)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ingest a stream of information from Cloud IoT Core using Cloud Pub/Sub.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Process the IoT data using Cloud Dataflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analyze the IoT data using BigQuery.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;After signing into GCP, In the Cloud Console, click Navigation menu &amp;gt; APIs &amp;amp; Services.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scroll down in the list of enabled APIs, and confirm that these APIs are enabled:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Google Cloud IoT API&lt;br&gt;
Cloud Pub/Sub API&lt;br&gt;
Dataflow API&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If one or more API is not enabled, click the ENABLE APIS AND SERVICES button at the top. Search for the APIs by name and enable each API for your current project.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Ensure that the Dataflow API is successfully enabled&lt;br&gt;
To ensure access to the necessary API, restart the connection to the Dataflow API.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the Cloud Console, enter Dataflow API in the top search bar. Click on the result for Dataflow API.&lt;/p&gt;

&lt;p&gt;Click Manage.&lt;/p&gt;

&lt;p&gt;Click Disable API.&lt;/p&gt;

&lt;p&gt;If asked to confirm, click Disable.&lt;/p&gt;

&lt;p&gt;Click Enable.&lt;br&gt;
When the API has been enabled again, the page will show the option to disable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create a Cloud Pub/Sub topic&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud Pub/Sub is an asynchronous global messaging service. By decoupling senders and receivers, it allows for secure and highly available communication between independently written applications. Cloud Pub/Sub delivers low-latency, durable messaging.&lt;/p&gt;

&lt;p&gt;In Cloud Pub/Sub, publisher applications and subscriber applications connect with one another through the use of a shared string called a topic. A publisher application creates and sends messages to a topic. Subscriber applications create a subscription to a topic to receive messages from it.&lt;/p&gt;

&lt;p&gt;In an IoT solution built with Cloud IoT Core, device telemetry data is forwarded to a Cloud Pub/Sub topic.&lt;/p&gt;

&lt;p&gt;To define a new Cloud Pub/Sub topic:&lt;/p&gt;

&lt;p&gt;In the Cloud Console, in the top search bar enter Pub/Sub then you should be on the Topics page.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click + CREATE TOPIC. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Create a topic dialog shows you a partial URL path.&lt;br&gt;
&lt;em&gt;Note: If you see qwiklabs-resources as your project name, cancel the dialog and return to the Cloud Console. Use the menu to the right of the Google Cloud logo to select the correct project. Then return to this step.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Add this string as your Topic ID:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;iotlab&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Click CREATE TOPIC.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;*&lt;em&gt;Click on the Topics page to see the new topic whose partial URL ends in iotlab. Click the three-dot icon at the right edge of its row to open the context menu. *&lt;/em&gt;&lt;br&gt;
Choose View permissions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the Permissions dialogue, click ADD PRINCIPAL and copy the below principal as New principals:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="mailto:cloud-iot@system.gserviceaccount.com"&gt;cloud-iot@system.gserviceaccount.com&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;From the Select a role menu, give the new member the Pub/Sub &amp;gt; Pub/Sub Publisher role.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click Save.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Create a BigQuery dataset&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;BigQuery is a serverless data warehouse. Tables in BigQuery are organized into datasets. In this lab, messages published into Pub/Sub will be aggregated and stored in BigQuery.&lt;/p&gt;

&lt;p&gt;To create a new BigQuery dataset:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the Cloud Console, go to Navigation menu &amp;gt; BigQuery.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click Done.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To create a dataset, click on the View actions icon next to your Project ID and then select Create dataset.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Name the dataset iotlabdataset, leave all the other fields the way they are, and click Create dataset.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click on your project name to see the newly created dataset under your project&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To create a table, click on the View actions icon next to the iotlabdataset dataset and select Create table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure that the source field is set to Empty table.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the Destination section's Table name field, enter sensordata.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the Schema section, click the + Add field button and add the following fields:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;timestamp, set the field's Type to TIMESTAMP.&lt;br&gt;
device, set the field's Type to STRING.&lt;br&gt;
temperature, set the field's Type to FLOAT.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leave the other defaults unmodified. Click Create Table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Create a cloud storage bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.&lt;/p&gt;

&lt;p&gt;For this lab Cloud Storage will provide working space for your Cloud Dataflow pipeline.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the Cloud Console, go to Navigation menu &amp;gt; Cloud Storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click +CREATE BUCKET.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Name, use your Project ID then add -bucket, then click Continue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Location type, click Multi-region if it is not already selected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Location, choose the selection closest to you.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click CREATE.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Set up a Cloud Dataflow Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud Dataflow is a serverless way to carry out data analysis. In this lab, you will set up a streaming data pipeline to read sensor data from Pub/Sub, compute the maximum temperature within a time window, and write this out to BigQuery.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the Cloud Console, in the top search bar enter "Dataflow", then click on Dataflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the top menu bar, click + CREATE JOB FROM TEMPLATE.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Job name, enter iotlabflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Regional Endpoint, choose the region as us-west1.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Dataflow template, choose Pub/Sub Topic to BigQuery. When you choose this template, the form updates to review new fields below.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Input Pub/Sub topic, choose from the dropdown menu. The resulting string will look like this: &lt;em&gt;projects/PROJECT_ID/topics/iotlab&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The BigQuery output table takes the form of Project ID:&lt;em&gt;dataset.table (:iotlabdataset.sensordata)&lt;/em&gt;. The resulting string will look like this: &lt;em&gt;PROJECT_ID:iotlabdataset.sensordata&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Temporary location, enter your Cloud Storage bucket name then &lt;em&gt;/tmp/&lt;/em&gt;. The resulting string will look like this: &lt;em&gt;gs://PROJECT-bucket/tmp/&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click SHOW OPTIONAL PARAMETERS.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For Max workers, enter 2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click RUN JOB.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;A new streaming job is started. You can now see a visual representation of the data pipeline.&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mJ7w70AF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/brpmtofml07ordczdpf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mJ7w70AF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/brpmtofml07ordczdpf5.png" alt="Image description" width="880" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prepare your compute engine VM&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In your project, a pre-provisioned VM instance named iot-device-simulator will let you run instances of a Python script that emulate an MQTT-connected IoT device. Before you emulate the devices, you will also use this VM instance to populate your Cloud IoT Core device registry.&lt;/p&gt;

&lt;p&gt;To connect to the iot-device-simulator VM instance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the Cloud Console, go to Navigation menu &amp;gt; Compute Engine &amp;gt; VM Instances. You'll see your VM instance listed as &lt;em&gt;iot-device-simulator&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click the SSH drop-down arrow and select Open in browser window.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In your SSH session, enter following commands to create a virtual environment.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;sudo pip3 install virtualenv&lt;br&gt;
virtualenv -p python3 venv&lt;br&gt;
source venv/bin/activate&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Initialize the gcloud SDK.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud auth login --no-launch-browser&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;If you get the error message "Command not found," you might have forgotten to exit your previous SSH session and start a new one.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;When you are asked whether to authenticate with an @developer.gserviceaccount.com account or to log in with a new account, choose log in with a new account.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you are asked "Are you sure you want to authenticate with your personal account? Do you want to continue (Y/n)?" enter Y.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Click on the URL shown to open a new browser window that displays a verification code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy the verification code and paste it in response to the "Enter verification code:" prompt, then press Enter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enter this command to update the system's information about Debian Linux package repositories:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;sudo apt-get update&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Enter this command to make sure that various required software packages are installed:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;sudo apt-get install python-pip openssl git -y&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Use pip to add needed Python components:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;pip install pyjwt paho-mqtt cryptography&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Enter this command to add data to analyze during this lab:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;git clone &lt;a href="http://github.com/GoogleCloudPlatform/training-data-analyst"&gt;http://github.com/GoogleCloudPlatform/training-data-analyst&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Create a registry for IoT devices&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To register devices, you must create a registry for the devices. The registry is a point of control for devices.&lt;/p&gt;

&lt;p&gt;To create the registry:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your SSH session on the iot-device-simulator VM instance, run the following, adding your Project ID as the value for PROJECT_ID:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;export PROJECT_ID=PROJECT_ID&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Your completed command will look like this: &lt;em&gt;export PROJECT_ID=PROJECT_ID&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You must choose a region for your IoT registry. At this time, these regions are supported:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;us-west1&lt;br&gt;
europe-west1&lt;br&gt;
asia-east1&lt;/p&gt;

&lt;p&gt;Choose the region that is closest to you. To set an environment variable containing your preferred region, enter this command followed by the region name:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;export MY_REGION=&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Your completed command will look like this: &lt;em&gt;export MY_REGION=us-west1.&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enter this command to create the device registry:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;gcloud iot registries create iotlab-registry \&lt;br&gt;
   --project=$PROJECT_ID \&lt;br&gt;
   --region=$MY_REGION \&lt;br&gt;
   --event-notification-config=topic=projects/$PROJECT_ID/topics/iotlab&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create a Cryptographic Keypair&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To allow IoT devices to connect securely to Cloud IoT Core, you must create a cryptographic keypair.&lt;/p&gt;

&lt;p&gt;In your SSH session on the iot-device-simulator VM instance, enter these commands to create the keypair in the appropriate directory:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;cd $HOME/training-data-analyst/quests/iotlab/&lt;br&gt;
openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem \&lt;br&gt;
    -nodes -out rsa_cert.pem -subj "/CN=unused"&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;This &lt;em&gt;openssl&lt;/em&gt; command creates an RSA cryptographic keypair and writes it to a file called &lt;em&gt;rsa_private.pem&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add simulated devices to the registry&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a device to be able to connect to Cloud IoT Core, it must first be added to the registry.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your SSH session on the &lt;em&gt;iot-device-simulator&lt;/em&gt; VM instance, enter this command to create a device called &lt;em&gt;temp-sensor-buenos-aires&lt;/em&gt;:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud iot devices create temp-sensor-buenos-aires \&lt;br&gt;
  --project=$PROJECT_ID \&lt;br&gt;
  --region=$MY_REGION \&lt;br&gt;
  --registry=iotlab-registry \&lt;br&gt;
  --public-key path=rsa_cert.pem,type=rs256&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Enter this command to create a device called &lt;em&gt;temp-sensor-istanbul&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud iot devices create temp-sensor-istanbul \&lt;br&gt;
  --project=$PROJECT_ID \&lt;br&gt;
  --region=$MY_REGION \&lt;br&gt;
  --registry=iotlab-registry \&lt;br&gt;
  --public-key path=rsa_cert.pem,type=rs256&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Run simulated devices&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your SSH session on the &lt;em&gt;iot-device-simulator&lt;/em&gt; VM instance, enter these commands to download the CA root certificates from pki.google.com to the appropriate directory:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;cd $HOME/training-data-analyst/quests/iotlab/&lt;br&gt;
curl -o roots.pem -s -m 10 --retry 0 "&lt;a href="https://pki.goog/roots.pem"&gt;https://pki.goog/roots.pem&lt;/a&gt;"&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Enter this command to run the first simulated device:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;python cloudiot_mqtt_example_json.py \&lt;br&gt;
   --project_id=$PROJECT_ID \&lt;br&gt;
   --cloud_region=$MY_REGION \&lt;br&gt;
   --registry_id=iotlab-registry \&lt;br&gt;
   --device_id=temp-sensor-buenos-aires \&lt;br&gt;
   --private_key_file=rsa_private.pem \&lt;br&gt;
   --message_type=event \&lt;br&gt;
   --algorithm=RS256 &amp;gt; buenos-aires-log.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;It will continue to run in the background.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enter this command to run the second simulated device:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;python cloudiot_mqtt_example_json.py \&lt;br&gt;
   --project_id=$PROJECT_ID \&lt;br&gt;
   --cloud_region=$MY_REGION \&lt;br&gt;
   --registry_id=iotlab-registry \&lt;br&gt;
   --device_id=temp-sensor-istanbul \&lt;br&gt;
   --private_key_file=rsa_private.pem \&lt;br&gt;
   --message_type=event \&lt;br&gt;
   --algorithm=RS256&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Telemetry data will flow from the simulated devices through Cloud IoT Core to your Cloud Pub/Sub topic. In turn, your Dataflow job will read messages from your Pub/Sub topic and write their contents to your BigQuery table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analyze the Sensor Data Using BigQuery&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To analyze the data as it is streaming:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;In the Cloud Console, open the Navigation menu and select BigQuery.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Enter the following query in the Query editor and click RUN:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;SELECT timestamp, device, temperature from iotlabdataset.sensordata&lt;br&gt;
ORDER BY timestamp DESC&lt;br&gt;
LIMIT 100&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;ol&gt;
&lt;li&gt;Browse the Results. What is the temperature trend at each of the locations?&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Normalization in RDBMS</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Fri, 15 Jul 2022 15:52:55 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/normalization-in-rdbms-12n0</link>
      <guid>https://dev.to/nkwamphilip/normalization-in-rdbms-12n0</guid>
      <description>&lt;p&gt;Normalisation in DBMS&lt;/p&gt;

&lt;p&gt;In clear terms, Normalisation is simply a process of organising data in a database, enhance its integrity by preventing data inconsistency and redundancy. It requires techniques and designs of breaking your complex database tables into bits that are connected to each other. This sort of relational model has been proven to be consistent with RDBMS. &lt;br&gt;
For data to be used or stored efficiently and understandably, it has to be properly linked and distributed. Chaos sets In when data is scattered and are not linked to each other properly. Normalisation removes all anomalies and brings the database into a consistent state.&lt;br&gt;
Normalisation in any DBMS follows a normal form of rules. The rules are listed into 6 forms with the 6th form being a newly developed form.&lt;/p&gt;

&lt;p&gt;Before we proceed, Let’s understand a few things.&lt;br&gt;
&lt;strong&gt;SQL Key&lt;/strong&gt; - Primary Keys and Composite Keys&lt;/p&gt;

&lt;p&gt;  &lt;strong&gt;Primary Key&lt;/strong&gt; - a primary key is a key uniquely designated to identify each table record in a database. It is very crucial to a consistent and efficient relational database. A database table must have a primary key to insert, update, restore, or delete data from a database table. A primary key can be set to created manually, serially or automated in random numbers while defining your schema.&lt;br&gt;
It has the following attributes:&lt;br&gt;
    1. A Primary Key cannot be NULL&lt;br&gt;
    2. A Primary Key must be unique&lt;br&gt;
    3. The Primary Keys should rarely be changed &lt;br&gt;
    4. The Primary key must be given a specific value once a new record has been inserted&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composite Key&lt;/strong&gt; - A composite key is a primary key composed of multiple columns used to identify a record uniquely. Imagine a database with closely similar values but little distinction, a composite key would hold multiple columns together. &lt;/p&gt;

&lt;p&gt;Now Let’s move into the rules:&lt;br&gt;
 &lt;strong&gt;The 1NF - **1. Each table cell should contain a single value&lt;br&gt;
          2. Each table cell must be unique&lt;br&gt;
**The 2NF -&lt;/strong&gt; 1. Must be in 1NF&lt;br&gt;
          2. Single column primary key is not functionally dependent on the subsets of candidate keys in the database&lt;br&gt;
In 2NF, tables are connected to each other with Foreign Keys. Foreign Keys references the primary key of another table.&lt;br&gt;
    - A foreign key can have a different name from the Primary Key&lt;br&gt;
    - It ensures rows in one table has  corresponding rows in another&lt;br&gt;
    - They do not have to be unique&lt;br&gt;
    - They can be NULL, Primary Keys cannot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TRANSITIVE Functional Dependencies&lt;/strong&gt; - changing a Non-Key column that might affect the other non-keys column to change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 3NF -&lt;/strong&gt; 1. Be in 2NF&lt;br&gt;
          2. Have no transitive functional dependency&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Boyce-Cord Normal Form&lt;/strong&gt;&lt;br&gt;
        1. Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.&lt;br&gt;
Sometimes is BCNF is also referred as 3.5 Normal Form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 4NF&lt;/strong&gt; - 1. If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 5NF&lt;/strong&gt; - 1. A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 6NF Proposed&lt;/strong&gt; - 1. 6th Normal Form is not standardsed, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear &amp;amp; standardised definition for 6th Normal Form in the near future&lt;/p&gt;

&lt;p&gt;Database Normalisation and designing are critical to a successful implementation of a good DBMS. Database can be designed further than 3NF, being the standard normal form for any database.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Loading Data Into BigQuery Using The CLI/Console</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Fri, 15 Jul 2022 15:34:27 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/loading-data-into-bigquery-using-the-cliconsole-33pb</link>
      <guid>https://dev.to/nkwamphilip/loading-data-into-bigquery-using-the-cliconsole-33pb</guid>
      <description>&lt;p&gt;BigQuery is Google's fully managed, NoOps, low cost analytics database. &lt;br&gt;
With BigQuery you can query terabytes in seconds and petabytes in minutes of data without having any infrastructure to manage or needing a database administrator. &lt;br&gt;
BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analysing data to find meaningful insights. Scales up resources(compute &amp;amp; storage) based on needs.&lt;br&gt;
BigQuery maximizes flexibility by separating the compute engine that analyzes your data from your storage choices. You can store and analyze your data within BigQuery or use BigQuery to assess your data where it lives.&lt;/p&gt;

&lt;p&gt;In this session, we will be loading data into BigQuery using the CLI/Console&lt;/p&gt;

&lt;p&gt;Start with creating a dataset under your Project ID. Check the view actions and click on "Create Dataset" and name it anything you want.&lt;br&gt;
Then ingest a new dataset from a CSV file. You can use this &lt;a href="https://storage.googleapis.com/cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv"&gt;https://storage.googleapis.com/cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the BigQuery console, click on the dataset created and create a table.&lt;/p&gt;

&lt;p&gt;Specify the below table options:&lt;/p&gt;

&lt;p&gt;Source:&lt;/p&gt;

&lt;p&gt;Create table from: Upload&lt;br&gt;
Choose File: select the file you downloaded locally earlier&lt;br&gt;
File format: CSV&lt;br&gt;
&lt;strong&gt;Destination:&lt;/strong&gt; your local storage&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table name:&lt;/strong&gt; 2018trips Leave all other setting at default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema:&lt;/strong&gt;&lt;br&gt;
Check Auto Detect (tip: Not seeing the checkbox? Ensure the file format is CSV and not Avro)&lt;br&gt;
Advanced Options&lt;/p&gt;

&lt;p&gt;Leave at default values&lt;br&gt;
Click Create Table.&lt;/p&gt;

&lt;p&gt;Select Preview and confirm all columns have been loaded:&lt;br&gt;
You have successfully loaded in a CSV file into a new BigQuery table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ingest a new Dataset from Google Cloud Storage&lt;/strong&gt;&lt;br&gt;
Now, lets try load another subset of the same 2018 trip data that is available on Cloud Storage. And this time, let's use the CLI tool to do it.&lt;/p&gt;

&lt;p&gt;In your Cloud Shell, run the following command :&lt;br&gt;
bq load \&lt;br&gt;
--source_format=CSV \&lt;br&gt;
--autodetect \&lt;br&gt;
--noreplace  \&lt;br&gt;
nyctaxi.2018trips \&lt;br&gt;
gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_2.csv&lt;br&gt;
Copied!&lt;/p&gt;

&lt;p&gt;Note: With the above load job, you are specifying that this subset is to be appended to the existing 2018trips table that you created above.&lt;/p&gt;

&lt;p&gt;When the load job is complete, you will get a confirmation on the screen.&lt;/p&gt;

&lt;p&gt;Back on your BigQuery console, select the 2018trips table and view details. Confirm that the row count has now almost doubled.&lt;/p&gt;

&lt;p&gt;You may want to run the same query like earlier to see if the top 5 most expensive trips have changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create tables from other tables with DDL&lt;/strong&gt;&lt;br&gt;
The 2018trips table now has trips from throughout the year. What if you were only interested in January trips? For the purpose of this lab, we will keep it simple and focus only on pickup date and time. Let's use DDL to extract this data and store it in another table&lt;/p&gt;

&lt;p&gt;In the Query Editor, run the following CREATE TABLE command :&lt;/p&gt;

&lt;h1&gt;
  
  
  standardSQL
&lt;/h1&gt;

&lt;p&gt;CREATE TABLE&lt;br&gt;
  nyctaxi.january_trips AS&lt;br&gt;
SELECT&lt;br&gt;
  *&lt;br&gt;
FROM&lt;br&gt;
  nyctaxi.2018trips&lt;br&gt;
WHERE&lt;br&gt;
  EXTRACT(Month&lt;br&gt;
  FROM&lt;br&gt;
    pickup_datetime)=1;&lt;/p&gt;

&lt;p&gt;Now run the below query in your Query Editor find the longest distance traveled in the month of January:&lt;/p&gt;

&lt;h1&gt;
  
  
  standardSQL
&lt;/h1&gt;

&lt;p&gt;SELECT&lt;br&gt;
  *&lt;br&gt;
FROM&lt;br&gt;
  nyctaxi.january_trips&lt;br&gt;
ORDER BY&lt;br&gt;
  trip_distance DESC&lt;br&gt;
LIMIT&lt;br&gt;
  1&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Loading Data Into Google Cloud SQL</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Thu, 14 Jul 2022 11:32:44 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/loading-data-into-google-cloud-sql-1jka</link>
      <guid>https://dev.to/nkwamphilip/loading-data-into-google-cloud-sql-1jka</guid>
      <description>&lt;p&gt;Cloud SQL is Google Cloud fully-managed database service that allows one setup, maintain, manage and administer other relational databases. Could be MySQL, PostgreSQL or Microsoft SQL Server.&lt;/p&gt;

&lt;p&gt;Objectives: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create Cloud SQL instance&lt;/li&gt;
&lt;li&gt;Create a Cloud SQL database&lt;/li&gt;
&lt;li&gt;Import text data into Cloud SQL&lt;/li&gt;
&lt;li&gt;Check the data for integrity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first thing required is to activate Cloud Shell at the top right corner of the GCP Home Page, list the active account name with the command&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud auth list&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;List the Project id with&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud config list project&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Create environmental variables and the storage bucket that will contain the data&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;export PROJECT_ID=$(gcloud info --format='value(config.project)')&lt;br&gt;
export BUCKET=${PROJECT_ID}-ml&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Create a Cloud SQL Instance named taxi&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud sql instances create taxi \&lt;br&gt;
    --tier=db-n1-standard-1 --activation-policy=ALWAYS&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Set root Password for the Cloud SQL Instance&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud sql users set-password root --host % --instance taxi \&lt;br&gt;
 --password Passw0rd&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Now create an environment variable with the IP address of the Cloud Shell&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;export ADDRESS=$(wget -qO - &lt;a href="http://ipecho.net/plain)/32"&gt;http://ipecho.net/plain)/32&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;export ADDRESS=$(wget -qO - &lt;a href="http://ipecho.net/plain)/32"&gt;http://ipecho.net/plain)/32&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gcloud sql instances patch taxi --authorized-networks $ADDRESS&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;When prompted press Y to accept the change.&lt;/p&gt;

&lt;p&gt;Get the IP address of your Cloud SQL instance by running:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;MYSQLIP=$(gcloud sql instances describe \&lt;br&gt;
taxi --format="value(ipAddresses.ipAddress)")&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Check the variable MYSQLIP, you should get the IP Address as an output:&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;echo $MYSQLIP&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Create the taxi trips table by logging into the mysql command line interface and enter password whe prompted&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;mysql --host=$MYSQLIP --user=root \&lt;br&gt;
      --password --verbose&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Create a Schema for the trips&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;create database if not exists bts;&lt;br&gt;
use bts;&lt;br&gt;
drop table if exists trips;&lt;br&gt;
create table trips (&lt;br&gt;
  vendor_id VARCHAR(16),&lt;br&gt;&lt;br&gt;
  pickup_datetime DATETIME,&lt;br&gt;
  dropoff_datetime DATETIME,&lt;br&gt;
  passenger_count INT,&lt;br&gt;
  trip_distance FLOAT,&lt;br&gt;
  rate_code VARCHAR(16),&lt;br&gt;
  store_and_fwd_flag VARCHAR(16),&lt;br&gt;
  payment_type VARCHAR(16),&lt;br&gt;
  fare_amount FLOAT,&lt;br&gt;
  extra FLOAT,&lt;br&gt;
  mta_tax FLOAT,&lt;br&gt;
  tip_amount FLOAT,&lt;br&gt;
  tolls_amount FLOAT,&lt;br&gt;
  imp_surcharge FLOAT,&lt;br&gt;
  total_amount FLOAT,&lt;br&gt;
  pickup_location_id VARCHAR(16),&lt;br&gt;
  dropoff_location_id VARCHAR(16)&lt;br&gt;
);&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;In the mysql command line interface check the import and query the trips table&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;describe trips;&lt;br&gt;
select distinct(pickup_location_id) from trips;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;This will return an empty set as there is no data in the database yet, then Exit&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;exit&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;h2&gt;
  
  
  Add data to Cloud SQL instance
&lt;/h2&gt;

&lt;p&gt;**&lt;/p&gt;

&lt;p&gt;run the following -  the database is pulled from google public datasets&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_1.csv trips.csv-1&lt;br&gt;
gsutil cp gs://cloud-training/OCBL013/nyc_tlc_yellow_trips_2018_subset_2.csv trips.csv-2&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Connect to the mysql interactive console to load local infile data&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;mysql --host=$MYSQLIP --user=root  --password  --local-infile&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;In the mysql interactive console select the database&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;use bts;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Load the local CSV file data using local-infile&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;LOAD DATA LOCAL INFILE 'trips.csv-1' INTO TABLE trips&lt;br&gt;
FIELDS TERMINATED BY ','&lt;br&gt;
LINES TERMINATED BY '\n'&lt;br&gt;
IGNORE 1 LINES&lt;br&gt;
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);&lt;/p&gt;

&lt;p&gt;LOAD DATA LOCAL INFILE 'trips.csv-2' INTO TABLE trips&lt;br&gt;
FIELDS TERMINATED BY ','&lt;br&gt;
LINES TERMINATED BY '\n'&lt;br&gt;
IGNORE 1 LINES&lt;br&gt;
(vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code,store_and_fwd_flag,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,imp_surcharge,total_amount,pickup_location_id,dropoff_location_id);&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;Checking For Data Integrity &lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;select distinct(pickup_location_id) from trips;&lt;br&gt;
select&lt;br&gt;
  max(trip_distance),&lt;br&gt;
  min(trip_distance)&lt;br&gt;
from&lt;br&gt;
  trips;&lt;br&gt;
select count(&lt;em&gt;) from trips where trip_distance = 0;&lt;br&gt;
select count(&lt;/em&gt;) from trips where fare_amount &amp;lt; 0;&lt;br&gt;
select&lt;br&gt;
  payment_type,&lt;br&gt;
  count(*)&lt;br&gt;
from&lt;br&gt;
  trips&lt;br&gt;
group by&lt;br&gt;
  payment_type;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

&lt;p&gt;If all these are confirmed, you have your db. then exit&lt;/p&gt;

&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;exit&lt;/p&gt;
&lt;/blockquote&gt;


&lt;/blockquote&gt;

</description>
    </item>
    <item>
      <title>Evaluating Customers habits and Predicting their Purchases with a Classification Model with BigQuery ML</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Mon, 20 Jun 2022 23:46:21 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/evaluating-customers-habits-and-predicting-their-purchases-with-a-classification-model-with-bigquery-ml-3m8c</link>
      <guid>https://dev.to/nkwamphilip/evaluating-customers-habits-and-predicting-their-purchases-with-a-classification-model-with-bigquery-ml-3m8c</guid>
      <description>&lt;p&gt;Hi Folks,&lt;/p&gt;

&lt;p&gt;I decided to release a document on using a low code ML tool for Predictive Analytics entirely on GCP.&lt;/p&gt;

&lt;p&gt;I'll be loading datasets into BigQuery, from Google Analytics Sample E-commerce dataset that has millions of Google Analytics records for the Google Merchandise Store.&lt;br&gt;
Alongside, &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query and explore the ecommerce dataset&lt;/li&gt;
&lt;li&gt;Create a training and evaluation dataset to be used for batch prediction&lt;/li&gt;
&lt;li&gt;Create a classification (logistic regression) model in BigQuery ML&lt;/li&gt;
&lt;li&gt;Evaluate the performance of your machine learning model&lt;/li&gt;
&lt;li&gt;Predict and rank the probability that a visitor will make a purchase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right on my datasets inside BigQuery, i run an SQL command to find the total visitors who visited our website, and what % made a purchase.&lt;/p&gt;

&lt;h1&gt;
  
  
  standardSQL
&lt;/h1&gt;

&lt;p&gt;WITH visitors AS(&lt;br&gt;
SELECT&lt;br&gt;
COUNT(DISTINCT fullVisitorId) AS total_visitors&lt;br&gt;
FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;&lt;br&gt;
),&lt;br&gt;
purchasers AS(&lt;br&gt;
SELECT&lt;br&gt;
COUNT(DISTINCT fullVisitorId) AS total_purchasers&lt;br&gt;
FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;&lt;br&gt;
WHERE totals.transactions IS NOT NULL&lt;br&gt;
)&lt;br&gt;
SELECT&lt;br&gt;
  total_visitors,&lt;br&gt;
  total_purchasers,&lt;br&gt;
  total_purchasers / total_visitors AS conversion_rate&lt;br&gt;
FROM visitors, purchasers&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eGexypYM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ssikgaq0agus0p1fd463.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eGexypYM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ssikgaq0agus0p1fd463.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But what are the top 5 selling products?&lt;/p&gt;

&lt;p&gt;SELECT&lt;br&gt;
  p.v2ProductName,&lt;br&gt;
  p.v2ProductCategory,&lt;br&gt;
  SUM(p.productQuantity) AS units_sold,&lt;br&gt;
  ROUND(SUM(p.localProductRevenue/1000000),2) AS revenue&lt;br&gt;
FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;,&lt;br&gt;
UNNEST(hits) AS h,&lt;br&gt;
UNNEST(h.product) AS p&lt;br&gt;
GROUP BY 1, 2&lt;br&gt;
ORDER BY revenue DESC&lt;br&gt;
LIMIT 5;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Xx-tapWX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ptw0xndh7j2t0sqyckk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Xx-tapWX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ptw0xndh7j2t0sqyckk6.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select features and create your training dataset:&lt;br&gt;
SELECT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EXCEPT(fullVisitorId)
FROM
# features
(SELECT
fullVisitorId,
IFNULL(totals.bounces, 0) AS bounces,
IFNULL(totals.timeOnSite, 0) AS time_on_site
FROM
&lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;
WHERE
totals.newVisits = 1)
JOIN
(SELECT
fullvisitorid,
IF(COUNTIF(totals.transactions &amp;gt; 0 AND totals.newVisits IS NULL) &amp;gt; 0, 1, 0) AS will_buy_on_return_visit
FROM
  &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;
GROUP BY fullvisitorid)
USING (fullVisitorId)
ORDER BY time_on_site DESC
LIMIT 10;
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fJ2a-Fje--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s3qs7dc4frntcmgbavlg.png" alt="Image description" width="880" height="550"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Creating a BigQuery dataset to store models:&lt;br&gt;
I'll be creating a new dataset to store my models under my project name, then select a BigQuery ML model type and specify options.&lt;br&gt;
CREATE OR REPLACE MODEL &lt;code&gt;ecommerce.classification_model&lt;/code&gt;&lt;br&gt;
OPTIONS&lt;br&gt;
(&lt;br&gt;
model_type='logistic_reg',&lt;br&gt;
labels = ['will_buy_on_return_visit']&lt;br&gt;
)&lt;br&gt;
AS&lt;/p&gt;

&lt;h1&gt;
  
  
  standardSQL
&lt;/h1&gt;

&lt;p&gt;SELECT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EXCEPT(fullVisitorId)
FROM
# features
(SELECT
fullVisitorId,
IFNULL(totals.bounces, 0) AS bounces,
IFNULL(totals.timeOnSite, 0) AS time_on_site
FROM
&lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;
WHERE
totals.newVisits = 1
AND date BETWEEN '20160801' AND '20170430') # train on first 9 months
JOIN
(SELECT
fullvisitorid,
IF(COUNTIF(totals.transactions &amp;gt; 0 AND totals.newVisits IS NULL) &amp;gt; 0, 1, 0) AS will_buy_on_return_visit
FROM
  &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;
GROUP BY fullvisitorid)
USING (fullVisitorId)
;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--S40bJjqI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jvg2e2inj5mx8q3ad68t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--S40bJjqI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jvg2e2inj5mx8q3ad68t.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next thing is to evaluate classification model performance &lt;br&gt;
For classification problems in ML, i want to minimize the False Positive Rate (predict that the user will return and purchase and they don't) and maximize the True Positive Rate (predict that the user will return and purchase and they do).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2Zv9IzZz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1wkdvn2fd1h1pk4zd38h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2Zv9IzZz--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1wkdvn2fd1h1pk4zd38h.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Improving model performance with feature engineering:
How far the visitor got in the checkout process on their first visit
Where the visitor came from (traffic source: organic search, referring site etc.)
Device category (mobile, tablet, desktop)
Geographic information (country)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Creating a second model:&lt;br&gt;
CREATE OR REPLACE MODEL &lt;code&gt;ecommerce.classification_model_2&lt;/code&gt;&lt;br&gt;
OPTIONS&lt;br&gt;
  (model_type='logistic_reg', labels = ['will_buy_on_return_visit']) AS&lt;br&gt;
WITH all_visitor_stats AS (&lt;br&gt;
SELECT&lt;br&gt;
  fullvisitorid,&lt;br&gt;
  IF(COUNTIF(totals.transactions &amp;gt; 0 AND totals.newVisits IS NULL) &amp;gt; 0, 1, 0) AS will_buy_on_return_visit&lt;br&gt;
  FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;&lt;br&gt;
  GROUP BY fullvisitorid&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  add in new features
&lt;/h1&gt;

&lt;p&gt;SELECT * EXCEPT(unique_session_id) FROM (&lt;br&gt;
  SELECT&lt;br&gt;
      CONCAT(fullvisitorid, CAST(visitId AS STRING)) AS unique_session_id,&lt;br&gt;
      # labels&lt;br&gt;
      will_buy_on_return_visit,&lt;br&gt;
      MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,&lt;br&gt;
      # behavior on the site&lt;br&gt;
      IFNULL(totals.bounces, 0) AS bounces,&lt;br&gt;
      IFNULL(totals.timeOnSite, 0) AS time_on_site,&lt;br&gt;
      totals.pageviews,&lt;br&gt;
      # where the visitor came from&lt;br&gt;
      trafficSource.source,&lt;br&gt;
      trafficSource.medium,&lt;br&gt;
      channelGrouping,&lt;br&gt;
      # mobile or desktop&lt;br&gt;
      device.deviceCategory,&lt;br&gt;
      # geographic&lt;br&gt;
      IFNULL(geoNetwork.country, "") AS country&lt;br&gt;
  FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;,&lt;br&gt;
     UNNEST(hits) AS h&lt;br&gt;
    JOIN all_visitor_stats USING(fullvisitorid)&lt;br&gt;
  WHERE 1=1&lt;br&gt;
    # only predict for new visits&lt;br&gt;
    AND totals.newVisits = 1&lt;br&gt;
    AND date BETWEEN '20160801' AND '20170430' # train 9 months&lt;br&gt;
  GROUP BY&lt;br&gt;
  unique_session_id,&lt;br&gt;
  will_buy_on_return_visit,&lt;br&gt;
  bounces,&lt;br&gt;
  time_on_site,&lt;br&gt;
  totals.pageviews,&lt;br&gt;
  trafficSource.source,&lt;br&gt;
  trafficSource.medium,&lt;br&gt;
  channelGrouping,&lt;br&gt;
  device.deviceCategory,&lt;br&gt;
  country&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;To check the ROC-AUC&lt;/p&gt;

&lt;h1&gt;
  
  
  standardSQL
&lt;/h1&gt;

&lt;p&gt;SELECT&lt;br&gt;
  roc_auc,&lt;br&gt;
  CASE&lt;br&gt;
    WHEN roc_auc &amp;gt; .9 THEN 'good'&lt;br&gt;
    WHEN roc_auc &amp;gt; .8 THEN 'fair'&lt;br&gt;
    WHEN roc_auc &amp;gt; .7 THEN 'not great'&lt;br&gt;
  ELSE 'poor' END AS model_quality&lt;br&gt;
FROM&lt;br&gt;
  ML.EVALUATE(MODEL ecommerce.classification_model_2,  (&lt;br&gt;
WITH all_visitor_stats AS (&lt;br&gt;
SELECT&lt;br&gt;
  fullvisitorid,&lt;br&gt;
  IF(COUNTIF(totals.transactions &amp;gt; 0 AND totals.newVisits IS NULL) &amp;gt; 0, 1, 0) AS will_buy_on_return_visit&lt;br&gt;
  FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;&lt;br&gt;
  GROUP BY fullvisitorid&lt;br&gt;
)&lt;/p&gt;

&lt;h1&gt;
  
  
  add in new features
&lt;/h1&gt;

&lt;p&gt;SELECT * EXCEPT(unique_session_id) FROM (&lt;br&gt;
  SELECT&lt;br&gt;
      CONCAT(fullvisitorid, CAST(visitId AS STRING)) AS unique_session_id,&lt;br&gt;
      # labels&lt;br&gt;
      will_buy_on_return_visit,&lt;br&gt;
      MAX(CAST(h.eCommerceAction.action_type AS INT64)) AS latest_ecommerce_progress,&lt;br&gt;
      # behavior on the site&lt;br&gt;
      IFNULL(totals.bounces, 0) AS bounces,&lt;br&gt;
      IFNULL(totals.timeOnSite, 0) AS time_on_site,&lt;br&gt;
      totals.pageviews,&lt;br&gt;
      # where the visitor came from&lt;br&gt;
      trafficSource.source,&lt;br&gt;
      trafficSource.medium,&lt;br&gt;
      channelGrouping,&lt;br&gt;
      # mobile or desktop&lt;br&gt;
      device.deviceCategory,&lt;br&gt;
      # geographic&lt;br&gt;
      IFNULL(geoNetwork.country, "") AS country&lt;br&gt;
  FROM &lt;code&gt;data-to-insights.ecommerce.web_analytics&lt;/code&gt;,&lt;br&gt;
     UNNEST(hits) AS h&lt;br&gt;
    JOIN all_visitor_stats USING(fullvisitorid)&lt;br&gt;
  WHERE 1=1&lt;br&gt;
    # only predict for new visits&lt;br&gt;
    AND totals.newVisits = 1&lt;br&gt;
    AND date BETWEEN '20170501' AND '20170630' # eval 2 months&lt;br&gt;
  GROUP BY&lt;br&gt;
  unique_session_id,&lt;br&gt;
  will_buy_on_return_visit,&lt;br&gt;
  bounces,&lt;br&gt;
  time_on_site,&lt;br&gt;
  totals.pageviews,&lt;br&gt;
  trafficSource.source,&lt;br&gt;
  trafficSource.medium,&lt;br&gt;
  channelGrouping,&lt;br&gt;
  device.deviceCategory,&lt;br&gt;
  country&lt;br&gt;
)&lt;br&gt;
));&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XXHzFkkG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5ug1vixzboooiija1fyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XXHzFkkG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5ug1vixzboooiija1fyv.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ERiAolDK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m1k1d735knjky3mykh6w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ERiAolDK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m1k1d735knjky3mykh6w.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Results&lt;br&gt;
Of the top 6% of first-time visitors (sorted in decreasing order of predicted probability), more than 6% make a purchase in a later visit.&lt;/p&gt;

&lt;p&gt;These users represent nearly 50% of all first-time visitors who make a purchase in a later visit.&lt;/p&gt;

&lt;p&gt;Overall, only 0.7% of first-time visitors make a purchase in a later visit.&lt;/p&gt;

&lt;p&gt;Targeting the top 6% of first-time increases marketing ROI by 9x vs targeting them all!  &lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data Engineering - Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Thu, 09 Jun 2022 12:19:04 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/data-engineering-creating-a-streaming-data-pipeline-for-a-real-time-dashboard-with-dataflow-b2l</link>
      <guid>https://dev.to/nkwamphilip/data-engineering-creating-a-streaming-data-pipeline-for-a-real-time-dashboard-with-dataflow-b2l</guid>
      <description>&lt;p&gt;To think if GCP as a simplified infrastructure is something really positive to say. Not a feigned compliment cos we can all see it from the singleness of the platform.&lt;br&gt;
I used GCP provisioned resources to create a streaming data pipeline for a real-time dashboard with dataflow.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I sourced a public streaming dataset (PubSub Topic) gathered of the NYC Taxi Rides, collected from google public pubsub data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--NSH9jdTX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ymv0leyklf9w8l3529tv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NSH9jdTX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ymv0leyklf9w8l3529tv.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After signing in to my GCP Cloud Account, I navigated to "BigQuery" on the side-nav, would thereafter use my command line to create a dataset called taxirides.&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SfUh4Ohu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hydo4hn2p6vjz6nxv2gd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SfUh4Ohu--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hydo4hn2p6vjz6nxv2gd.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the command line, i run 'bq mk taxirides' to make a dataset under my project.&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5cwWTIeV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qh22yk90vjwfyazvf3ip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5cwWTIeV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/qh22yk90vjwfyazvf3ip.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would thereafter create a table in my dataset with my specified schema - the blueprint of my table data.&lt;br&gt;
bq mk \&lt;br&gt;
--time_partitioning_field timestamp \&lt;br&gt;
--schema ride_id:string,point_idx:integer,latitude:float,longitude:float,\&lt;br&gt;
timestamp:timestamp,meter_reading:float,meter_increment:float,ride_status:string,\&lt;br&gt;
passenger_count:integer -t taxirides.realtime&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--g8MEHycw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c9ddv1epxvhbghn9ef6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--g8MEHycw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c9ddv1epxvhbghn9ef6b.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
My Schema has been successfully created, and you can clearly see it on a dropdown showing on the left of the screen.&lt;/p&gt;

&lt;p&gt;Before creating my data pipeline, i'd create a bucket on my GCP Cloud Storage to serve as my data lake.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dJEDqV5A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ws6b1ad2ivzrvw3j7tbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dJEDqV5A--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ws6b1ad2ivzrvw3j7tbg.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
Named the bucket with my project ID and selected a multi-region.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8_bbXKRl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c2ecvk3pbyh4abpltj01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8_bbXKRl--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c2ecvk3pbyh4abpltj01.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
 And i have my bucket!&lt;/p&gt;

&lt;p&gt;The Next thing to do is to be sure to enable the Dataflow Api, as this would be serving out dataflow pipeline.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LFRSHu-m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nvfh0gnmb6qgdzvry46d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LFRSHu-m--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nvfh0gnmb6qgdzvry46d.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
I searched for it on the search bar and voila! it's there.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--4l8Dz6xW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rhz8064n3pa786ehbg3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--4l8Dz6xW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rhz8064n3pa786ehbg3m.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
I'd click on "Manage", disable and enable the API.&lt;br&gt;
After enabling the API, i'll proceed to create my Pipeline from an existing template on Dataflow.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6TF5hLNZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hik7ssw0aop0olhui0gu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6TF5hLNZ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hik7ssw0aop0olhui0gu.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
Right Here!&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tgDtbwxn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8z3lwbelspa8uewldcub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tgDtbwxn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8z3lwbelspa8uewldcub.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
I'll start by creating a Job from a template,&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--LpNsBBR0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oypkgdlt8uwgsdkbfnps.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--LpNsBBR0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oypkgdlt8uwgsdkbfnps.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
Insert my Job name, and select "PubSub Topic to BigQuery" considering where my data is coming from.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Szk4F9m1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t8zyuq1sjv7p536orgi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Szk4F9m1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t8zyuq1sjv7p536orgi7.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
I fill in the required parameters comprising of the PubSub Topic link, my dataset location as an output, and a temporary location.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9SQxXN3Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/54kul8ryig243fut3wqd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9SQxXN3Z--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/54kul8ryig243fut3wqd.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
After filling the values, i'd make sure to specify the number of compute engine instances - Max workers as 2, number of workers as 2.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--G8f49kb7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzvahlt9ccsxme515kfp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--G8f49kb7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nzvahlt9ccsxme515kfp.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
Then i run the job. A data pipeline is created after this and i can refresh my cloud storage bucket now, before i move to check the data on BigQuery.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bzjexTR9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b1fqokztriuhtd913hvy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bzjexTR9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/b1fqokztriuhtd913hvy.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--oioW8Tl6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0mg89mx461p851zf3o75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--oioW8Tl6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0mg89mx461p851zf3o75.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have it in mind that, BigQuery was the output of the Dataflow Pipeline.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uvBnFJ7o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r3h0dsag3m5lvcl97npc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uvBnFJ7o--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/r3h0dsag3m5lvcl97npc.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now i successfully have my Datasets on my BigQuery.&lt;/p&gt;

&lt;p&gt;we can Perform aggregations on the stream for reporting, i'll navigate to the query option and input my query&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--DeqYtwNa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/py213pc4iyl7q101l1ms.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--DeqYtwNa--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/py213pc4iyl7q101l1ms.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jxCZ2bEN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9i4ydlx9ubh6hwkfrk8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jxCZ2bEN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9i4ydlx9ubh6hwkfrk8a.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I can also choose to save this transformed result back to my Data Warehouse or anywhere else. I can also do this using a scheduled query that returns the same transformed data on a streaming pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6PiwvbaA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/58ed4a3qhmggtbecfm9q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6PiwvbaA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/58ed4a3qhmggtbecfm9q.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;I choose to explore with Data Studio.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GNoVH0Bi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tl7j504vj44ffh2q28j5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GNoVH0Bi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tl7j504vj44ffh2q28j5.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wIug_2HT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/74zkagm90w2jf7jlpl2h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wIug_2HT--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/74zkagm90w2jf7jlpl2h.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the Reports page, in the Start with a Template section, click the [+] Blank Report template.&lt;br&gt;
If prompted with the Welcome to Google Studio window, click Get started. Check the checkbox to acknowledge the Google Data Studio Additional Terms, and click Continue.&lt;br&gt;
Select No to all the questions, then click Continue.  &lt;/p&gt;

&lt;p&gt;I switched back to bigquery and explore with data studio again.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2Hlp82l7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g1yg56z7z7aygtteobn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2Hlp82l7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g1yg56z7z7aygtteobn4.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I selected a Combo Chart and i specified&lt;br&gt;
Date range Dimension: dashboard_sort&lt;br&gt;
Dimension: dashboard_sort&lt;br&gt;
Drill Down: dashboard_sort (Make sure that Drill down option is turned ON)&lt;br&gt;
Metric: SUM() total_rides, SUM() total_passengers, SUM() total_revenue&lt;br&gt;
Sort: dashboard_sort, Ascending (latest rides first) &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XRfHDaGr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e5mvs2mmt1vf1c4ji2or.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XRfHDaGr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e5mvs2mmt1vf1c4ji2or.png" alt="Image description" width="880" height="550"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NB: Visualizing data at a minute-level granularity is currently not supported in Data Studio as a timestamp. This is why we created our own dashboard_sort dimension.&lt;/p&gt;

</description>
      <category>database</category>
      <category>googlecloud</category>
      <category>dataengineering</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Hassles pushing a deployed heroku app to my github repository and re-deploying to heroku.</title>
      <dc:creator>Nkwam Philip </dc:creator>
      <pubDate>Tue, 24 May 2022 23:46:11 +0000</pubDate>
      <link>https://dev.to/nkwamphilip/hassles-pushing-a-deployed-heroku-app-to-my-github-repository-and-re-deploying-to-heroku-eh6</link>
      <guid>https://dev.to/nkwamphilip/hassles-pushing-a-deployed-heroku-app-to-my-github-repository-and-re-deploying-to-heroku-eh6</guid>
      <description>&lt;p&gt;Just so you don’t spend over 3 hours trying to push a web app to heroku after pushing it to your GitHub repo. Here I’d be showing you practically how to push your web app to heroku, change the branch and push the same code to your GitHub repository.  I had just deployed a flask CRUD app to heroku, &lt;a href="https://flaskcrudmanagerapp.herokuapp.com/"&gt;https://flaskcrudmanagerapp.herokuapp.com/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T1oIeriG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1p1by51pnh90oe2csk1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T1oIeriG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1p1by51pnh90oe2csk1l.png" alt="Image description" width="880" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;￼&lt;br&gt;
And the master branch was heroku.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wk-_VZY4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w0d6usn79mtnul6m9sxf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wk-_VZY4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/w0d6usn79mtnul6m9sxf.png" alt="Image description" width="880" height="91"&gt;&lt;/a&gt;&lt;br&gt;
￼&lt;br&gt;
I made several commit changes afterwards, pretty much to show it’s a working app I pushed into production and have modified multiple times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zPl7caX_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4cicrkgeqmprl0pbev5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zPl7caX_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4cicrkgeqmprl0pbev5e.png" alt="Image description" width="880" height="187"&gt;&lt;/a&gt;&lt;br&gt;
￼&lt;/p&gt;

&lt;p&gt;But I just thought to have my web app in my GitHub repository, at least. So I had to create a new GitHub repository, then I added a new origin to the main branch. Meanwhile I had checked the list of remote repo links. You can see clearly in the picture above that I have heroku and an origin alongside.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--l5KKCj00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/55e9li0g98mca3cgpixg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--l5KKCj00--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/55e9li0g98mca3cgpixg.png" alt="Image description" width="880" height="109"&gt;&lt;/a&gt; &lt;br&gt;
￼&lt;br&gt;
Most importantly, origin was push as the MAIN Branch. Literally where the problem unknowingly started.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--unfzvWCY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6h07bolqtkqigon5agji.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--unfzvWCY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6h07bolqtkqigon5agji.png" alt="Image description" width="880" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then i tried pushing a change to my deployed heroku app, but whoossh, it stopped working.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ZbuZdURv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3rk4p1p67nmq0ll1oedx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ZbuZdURv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3rk4p1p67nmq0ll1oedx.png" alt="Image description" width="880" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I removed the origin and main branch having the github repo i made earlier, tried pushing my heroku app again, but it still didnt work. It got hilarious at a point, yo i just removed the dude that got me into the problem.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--wsQ9uqj7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9c4incju7k5ocwsv6uce.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--wsQ9uqj7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9c4incju7k5ocwsv6uce.png" alt="Image description" width="880" height="125"&gt;&lt;/a&gt;&lt;br&gt;
it did tell me there was no match for master.&lt;/p&gt;

&lt;p&gt;i had to remove heroku, sadly i thought i was in a messier situation but i luckily added it back. Tried pushing again but naahh, it didnt work.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ryC6Vmx7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ylrq5vbeotu4qgwsjik7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ryC6Vmx7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ylrq5vbeotu4qgwsjik7.png" alt="Image description" width="880" height="215"&gt;&lt;/a&gt;&lt;br&gt;
i used "Get-url", staged with force (--force, -f) but it still didnt work. Could have actually given up buh nope, i eventually figured it out hereeee, i pulled the branch and pushed with force, lol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X9CpVUI---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zjw77ga1xvck3ul15rht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X9CpVUI---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zjw77ga1xvck3ul15rht.png" alt="Image description" width="880" height="123"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I believe you've seen the solution already. it happened that there was no branch (since i deleted the remote - main branch for the github repo i created earlier) and i have to push heroku as the main branch --The Upstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;One common cause of this behavior is attempting to deploy code from a different branch, in-fact the branch wasn't existinggg&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sTw_8XAQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k7lfkyvgtnuuf21jr8n1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sTw_8XAQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/k7lfkyvgtnuuf21jr8n1.png" alt="Image description" width="880" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally i set Heroku as the upstream main branch and voila, i can now push my commit changes to my live app. My dear Flask Application.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--m2o-Syv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3wcgl1r7rdoqhza28t0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--m2o-Syv0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3wcgl1r7rdoqhza28t0y.png" alt="Image description" width="880" height="170"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Yeah, that reminds me, i have to remove the staging app i force created earlier.&lt;/p&gt;

&lt;p&gt;I trust you had a great read here.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>github</category>
      <category>heroku</category>
    </item>
  </channel>
</rss>
