I have kept up my journey east and this week I am focusing on scraping the North Dakota Secretary of State business search. This is the ninth post of the Secretary of State scraping series.
Investigation
I have never been to North Dakota and I do not know much about it except that I believe due to recently discovered oil fields. Looking to the secretary of state searches revealed some things that will really make this section on the investigation very, very short.
Behold!
If you have kept up with the secretary of state scraping series, you’ll recognize this search dialog. It’s the exact same software as that that is used in Idaho! There are some differences in options, like Idaho allows you to search by date range. Other than that, it’s identical.
The post on Idaho has all the investigation that was done to find out the best way to find the data we want. I’m not going to continue any further on the investigation section since that post has pretty much all of it there.
The code
I did do some work on abstracting the functions used by Idaho scraping so that they could easily be used for both. It was a fun exercise in refactoring.
The code really depends on several different functions. Because the search doesn’t haven’t a date range searched, I started using a technique I’ve used with several other states. I take the alphabet and loop through each letter and search for businesses that start with that letter.
export async function searchForBusinesses(domain: string, state: string, dateSearch = false) {
// Get the date - 1 day
const date = new Date(new Date().setDate(new Date().getDate() - 1)).toLocaleDateString();
const formattedBusinesses: any[] = [];
for (let i = 0; i < alphabet.length; i++) {
const businesses = await searchBusinesses(alphabet[i], domain, dateSearch ? date : null);
for (let key in businesses) {
if (businesses.hasOwnProperty(key)) {
const currentDate = new Date();
const formattedBusiness = {
filingDate: businesses[key].FILING_DATE,
recordNumber: businesses[key].RECORD_NUM,
agent: businesses[key].AGENT,
status: businesses[key].STATUS,
standing: businesses[key].STANDING,
title: businesses[key].TITLE[0].split('(')[0].trim(),
state: state,
sosId: businesses[key].ID,
createdAt: currentDate,
updatedAt: currentDate
};
formattedBusinesses.push(formattedBusiness);
}
}
// Wait five seconds like good citizens
await timeout(5000);
}
return formattedBusinesses;
}
The next function performs the actual search for each individual letter.
export async function searchBusinesses(search: string, domain: string, date: string) {
const url = `https://${domain}/api/Records/businesssearch`;
const body = {
SEARCH_VALUE: search,
STARTS_WITH_YN: true,
CRA_SEARCH_YN: false,
ACTIVE_ONLY_YN: true
} as any;
if (date) {
body.FILING_DATE = {
start: date,
end: null
};
}
let axiosResponse: AxiosResponse;
try {
axiosResponse = await axios.post(url, body);
}
catch (e) {
console.log(`Error searching ${domain} business info for`, search, e.response ? e.response.data : '');
throw `Error searching ${domain} business info for ${search}`;
}
console.log('Total business found using', search, Object.keys(axiosResponse.data.rows).length);
if (axiosResponse.data) {
return Promise.resolve(axiosResponse.data.rows);
}
else {
return Promise.resolve(null);
}
}
The most notable change is that we pass in a domain so that we can handle both Idaho and North Dakota (and maybe more if we find them?). I also had to make changes with the FILING_DATE
. North Dakota threw a 500 error if I tried to submit a date range. Because of this, I had to only conditionally add the date range.
The next function is getBusinessDetails
. I did a lot of refactor on this and it really works a lot better. Here are two examples of potential business detail responses:
and
The server returns an array of some details. As you can see, the array isn’t always the same. Previously, I was just assuming that the members were always the same.
businesses[i].filingType = businessInfo.DRAWER_DETAIL_LIST[0].VALUE;
businesses[i].status = businessInfo.DRAWER_DETAIL_LIST[1].VALUE;
businesses[i].formedIn = businessInfo.DRAWER_DETAIL_LIST[2].VALUE;
This caused trouble when certain members weren’t present in the array. I’ve since added a switch case that picks out the specific labels and sets it accordingly.
for (let drawer of businessInfo.DRAWER_DETAIL_LIST) {
switch (drawer.LABEL) {
case 'Filing Type':
businesses[i].filingType = drawer.VALUE;
break;
case 'Status':
businesses[i].status = drawer.VALUE;
break;
case 'Formed In':
businesses[i].formedIn = drawer.VALUE;
break;
case 'Principal Address':
const principalAddressSplit = drawer.VALUE.split(/\n/);
businesses[i].principalAddressStreet = principalAddressSplit[0];
const formattedPrincipalCityStateAndZip = formatCityStateAndZip(principalAddressSplit[1]);
businesses[i].principalAddressCity = formattedPrincipalCityStateAndZip.city;
businesses[i].principalAddressState = formattedPrincipalCityStateAndZip.state;
businesses[i].principalAddressZipcode = formattedPrincipalCityStateAndZip.zipcode;
break;
case 'Mailing Address':
const mailingAddressSplit = drawer.VALUE.split(/\n/);
businesses[i].mailingAddressStreet = mailingAddressSplit[0];
const formattedMailingCityStateAndZip = formatCityStateAndZip(mailingAddressSplit[1]);
businesses[i].mailingAddressCity = formattedMailingCityStateAndZip.city;
businesses[i].mailingAddressState = formattedMailingCityStateAndZip.state;
businesses[i].mailingAddressZipcode = formattedMailingCityStateAndZip.zipcode;
break;
case 'AR Due Date':
businesses[i].arDueDate = drawer.VALUE;
break;
case 'Registered Agent':
const registeredAgentSplit = drawer.VALUE.split(/\n/);
businesses[i].registeredAgentType = registeredAgentSplit[0];
businesses[i].registeredAgentId = registeredAgentSplit[1];
businesses[i].registeredAgentName = registeredAgentSplit[2];
businesses[i].registeredAgentStreetAddress = registeredAgentSplit[3];
const formattedCityStateAndZip = formatCityStateAndZip(registeredAgentSplit[4]);
businesses[i].registeredAgentCity = formattedCityStateAndZip.city;
businesses[i].registeredAgentState = formattedCityStateAndZip.state;
businesses[i].registeredAgentZipcode = formattedCityStateAndZip.zipcode;
break;
case 'Nature of Business':
businesses[i].industry = drawer.VALUE;
break;
case 'Initial Filing Date':
businesses[i].filingDate = drawer.VALUE;
break;
case 'Owner Name':
businesses[i].ownerName = drawer.VALUE;
break;
}
}
It’s big but it’s pretty simple. Switch based on the LABEL
and then set the VALUE
.
And…that’s it. It was fun to find a state that used the same software as Idaho. I am going to try and see if I can find another.
Looking for business leads?
Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome business leads. Learn more at Cobalt Intelligence!
The post Jordan Scrapes Secretary of States: North Dakota appeared first on JavaScript Web Scraping Guy.
Top comments (0)