Kevin Mack

Posted on Aug 26, 2019 • Originally published at welldocumentednerd.com on Aug 19, 2019

Leveraging Azure Search with Python

#cloud #search #configuration #infrastructureasa

So lately I’ve been working on a side project, to showcase some of the capabilities in Azure with regard to PaaS services, and the one I’ve become the most engaged with is Azure Search.

So let’s start with the obvious question, what is Azure Search? Azure Search is a Platform-as-a-Service offering that allows for implementing search as part of your cloud solution in a scalable manner.

Below are some links on the basics of “What is Azure Search?”

The first part is how to create a search service, and really I find the easiest way is to create it via CLI:

az search service create --name {name} --resource-group {group} --location {location}

So after you create an Azure Search Service, the next part is to create all the pieces needed. For this, I’ve been doing work with the REST API via Python to manage these elements, so you will see that code here.

Create the data source
Create an index
Create an indexer
Run the Indexer
Get the indexer status
Run the Search

Project Description:

For this post, I’m building a search index that crawls through the data compiled from the Chicago Data Portal, which makes statistics and public information available via their API. This solution is pulling in data from that API into cosmos db to make that information searchable. I am using only publicly consumable information as part of this. The information on the portal can be found here.

Create the Data Source

So, the first part of any search discussion, is that you need to have a data source that you can search. Can’t get far without that. So the question becomes, what do you want to search. Azure Search supports a wide variety of data sources, and for the purposes of this discussion, I am pointing it at Cosmos Db. The intention is to search the contents of a cosmos db to ensure that I can pull back relevant entries.

Below is the code that I used to create the data source for the search:

import jsonimport requestsfrom pprint import pprint#The url of your search serviceurl = 'https://[Url of the search service]/datasources?api-version=2017-11-11'print(url)#The API Key for your search serviceapi\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}data = { 'name': 'cosmos-crime', 'type': 'documentdb', 'credentials': {'connectionString': '[connection string for cosmos db]'}, 'container': {'name': '[collection name]'}}data = json.dumps(data)print(type(data))response = requests.post(url, data=data, headers=headers)pprint(response.status\_code)

To get the API key, you need the management key which can be found with the following command:

az search admin-key show --service-name [name of the service] -g [name of the resource group]

After running the above you will have created a data source to connect to for searching.

Create an Index

Once you have the above datasource, the next step is to create an index. This index is what Azure Search will map your data to, and how it will actually perform searches in the future. So ultimately think of this as the format your search will be in after completion. To create the index, use the following code:

import jsonimport requestsfrom pprint import pprinturl = 'https://[Url of the search service]/indexes?api-version=2017-11-11'print(url)api\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}data = { "name": "crimes", "fields": [{"name": "id", "type": "Edm.String", "key":"true", "searchable": "false"}, {"name": "iucr","type": "Edm.String", "searchable":"true", "filterable":"true", "facetable":"true"}, {"name": "location\_description","type":"Edm.String", "searchable":"true", "filterable":"true"}, {"name": "primary\_description","type":"Edm.String","searchable":"true","filterable":"true"}, {"name": "secondary\_description","type":"Edm.String","searchable":"true","filterable":"true"}, {"name": "arrest","type":"Edm.String","facetable":"true","filterable":"true"}, {"name": "beat","type":"Edm.Double","filterable":"true","facetable":"true"}, {"name": "block", "type":"Edm.String","filterable":"true","searchable":"true","facetable":"true"}, {"name": "case","type":"Edm.String","searchable":"true"}, {"name": "date\_occurrence","type":"Edm.DateTimeOffset","filterable":"true"}, {"name": "domestic","type":"Edm.String","filterable":"true","facetable":"true"}, {"name": "fbi\_cd", "type":"Edm.String","filterable":"true"}, {"name": "ward","type":"Edm.Double", "filterable":"true","facetable":"true"}, {"name": "location","type":"Edm.GeographyPoint"}] }data = json.dumps(data)print(type(data))response = requests.post(url, data=data, headers=headers)pprint(response.status\_code)

Using the above code, I’ve identified the different data types of the final product, and these all map to the data types identified for azure search. The supported data types can be found here.

Its worth mentioning, that there are other key attributes above to consider:

facetable: This denotes if this data is able to be faceted. For example, in Yelp if I bring back a search for cost, all restuarants have a “$” to “$$$$$” rating, and I want to be able to group results based on this facet.
filterable: This denotes if the dataset can be filtered based on those values.
searchable: This denotes whether or not the field is having a full-text search performed on it, and is limited in the different types of data that can used to perform the search.

Creating an indexer

So the next step is to create the indexer. The purpose of the indexer is that this does the real work. The indexer is responsible for performing the following operations:

Connect to the data source
Pull in the data and put it into the appropriate format for the index
Perform any data transformations
Manage pulling in no data ongoing

import jsonimport requestsfrom pprint import pprinturl = 'https://[Url of the search service]/indexers?api-version=2017-11-11'print(url)api\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}data = { "name": "cosmos-crime-indexer", "dataSourceName": "cosmos-crime", "targetIndexName": "crimes", "schedule": {"interval": "PT2H"}, "fieldMappings": [{"sourceFieldName": "iucr", "targetFieldName": "iucr"}, {"sourceFieldName": "location\_description", "targetFieldName": "location\_description"}, {"sourceFieldName": "primary\_decsription", "targetFieldName": "primary\_description"}, {"sourceFieldName": "secondary\_description", "targetFieldName": "secondary\_description"}, {"sourceFieldName": "arrest", "targetFieldName": "arrest"}, {"sourceFieldName": "beat", "targetFieldName": "beat"}, {"sourceFieldName": "block", "targetFieldName": "block"}, {"sourceFieldName": "casenumber", "targetFieldName": "case"}, {"sourceFieldName": "date\_of\_occurrence", "targetFieldName": "date\_occurrence"}, {"sourceFieldName": "domestic", "targetFieldName": "domestic"}, {"sourceFieldName": "fbi\_cd", "targetFieldName": "fbi\_cd"}, {"sourceFieldName": "ward", "targetFieldName": "ward"}, {"sourceFieldName": "location", "targetFieldName":"location"}]}data = json.dumps(data)print(type(data))response = requests.post(url, data=data, headers=headers)pprint(response.status\_code)

What you will notice is that for each field, two attributes are assigned:

targetFieldName: This is the field in the index that you are targeting.
sourceFieldName: This is the field name according to the data source.

Run the indexer

Once you’ve created the indexer, the next step is to run it. This will cause indexer to pull data into the index:

import jsonimport requestsfrom pprint import pprinturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/run/?api-version=2017-11-11'print(url)api\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}reseturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/reset/?api-version=2017-11-11'resetResponse = requests.post(reseturl, headers=headers)response = requests.post(url, headers=headers)pprint(response.status\_code)

By triggering the “running” the indexer which will load the index.

Getting the indexer status

Now, depending the size of your data source, this indexing process could take time, so I wanted to provide a rest call that will let you get the status of the indexer.

import jsonimport requestsfrom pprint import pprinturl = 'https://[Url of the search service]/indexers/cosmos-crime-indexer/status/?api-version=2017-11-11'print(url)api\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}response = requests.get(url, headers=headers)index\_list = response.json()pprint(index\_list)

This will provide you with the status of the indexer, so that you can find out when it completes.

Run the search

Finally if you want to confirm the search is working afterward, you can do the following:

import jsonimport requestsfrom pprint import pprinturl = 'https://[Url of the search service]/indexes/crimes/docs?api-version=2017-11-11'print(url)api\_key = '[api key for the search service]'headers = { 'Content-Type': 'application/json', 'api-key': api\_key}response = requests.get(url, headers=headers)index\_list = response.json()pprint(index\_list)

This will bring back the results of the search. This will bring back everything as it is an empty string search.

I hope this helps with your configuring of Azure Search, Happy searching :)!

DEV Community

Leveraging Azure Search with Python

Project Description:

Create the Data Source

Create an Index

Creating an indexer

Run the indexer

Getting the indexer status

Run the search

Top comments (0)

Read next

CREATING AND CONNECTING TO A LINUX VIRTUAL MACHINE SCALE SET

Gestión de Identidades y Accesos (IAM) en AWS: Buenas prácticas para fortalecer la seguridad

Issue 71 of AWS Cloud Security Weekly

Working with Amazon OpenSearch Service Direct Queries with Amazon S3: The First-Ever Detailed Guide