How Do Insurance & Credit Scoring Work with Alternative Data?

Whether we talk about the insurance underwriting or credit-granting arena, alternative data generally refers to the datasets not integrally related to the insurance claim performance or individual's credit. Conventional data is generally restricted to that creating at the credit bureau (TransUnion, Equifax, and Experian), insurance or credit application data, or the institution's exclusive files on the present customers.

Alternative data is having numerous senses to become a hot topic as well as create a buzzword, moderately due to data explosion of last span (IDC projected that in 2010, about 1.2 Zettabytes of data got created. In 2018, 33 Zettabytes of data got created, making IDC forecast that during 2025, a massive 175 Zettabytes of data will get produced worldwide.) And relatively due to the excitement that it has got from optimistic hedge funds as well as other business organizations, which tend to become more susceptible to experiment with the 'new' approaches. All the full-time employees dedicated to leveraging alternative data within this space have grown up by a whopping 450% in the past five years, having 44% of the funds now getting dedicated teams for having the advantages of the alternative data attack, as per EY!

However, the flip side of a coin is, there are a projected 3 Billion adults across the world with no credits, so no credit files whatever, as per FICO. This is a huge under-serviced market situated in different parts of the world. Whereas many adults are living in developing as well as border markets having early-stages credit infrastructures, it is also a fact that a huge amount of people are there in the mature markets, which don’t have any credit files, thus unknown to credit bureaus. They are known as 'credit invisibles'.

Now, the question comes is, why becoming a 'credit invisible' matter so much? For instance, in the US, which is not the most important as well as most matured insurance & credit market, as per the CFPB (Consumer Financial Protection Bureau), 'the consumers having inadequate credit histories shown in the credit records sustained by three NCRAs (Nationwide Credit Reporting Agencies) always face major challenges in getting most insurance or credit. Often, the NCRA records are considered by the lenders while making insurance or credit underwriting decisions. Insurers and lenders often use scoring solutions like VantageScore or FICO scores taken from the NCRA records while determining whether to favor a loan application or set the interest rate of a loan. In case, an applicant is not having any credit records with any of the NCRAs or in case, the records contain inadequate data to assess soundness, insurance companies and lenders are less probable to spread credit facilities or take onboard an individual part of any insurance scheme. Consequently, consumers having limited credits as well as claim histories may face significantly reduced access to these markets.'

About 53 million people have either inadequate credit bureau files (around 28 Million) or non-existent records on different bureaus overall (25 Million) in the US. Minimum scoring criteria as publicly explained by some of the market leaders in the space include:

The consumer, who cannot be departed
A credit file requires a minimum of one account stated within the past six months
A credit file requires a minimum of one account, which is minimum of six months old

The FICO has recently completed research about how to possibly prolong credit to the 'unscorable' consumers, who do not fit into the standard measures set above. FICO has analyzed about 28 Million consumers, who only had stale or sparse credit data. The research has shown that making scoring models completely based on older or lackluster credit data always did an inferior job predicting future performance.

This research has shown a score of about 7 Million consumers having minimum public records accessible. The results are given that the older data means a less reliable scoring model! The integral risk level is related to a particular score, for example, say 650, and won’t be the same in successive segments of a population having stale records.

So, the question is why it is a problem? Well, just think about the mortgages. Lenders, which are coming with the underwriting strategy on a provided score cutoff might take on customers having various repayment risks, though they all are having a similar score of 650. The situation rises, as explained before, as the fundamental scoring model’s performance is less dependable if older data is used for scoring the consumers is in an initial place. FICO has shown that “the score of 640 depending on the files, which have not to get updated in the 21 months exhibits the repayment risk unevenly in line with the 590 scores for conventionally scorable populations—the odds misalignment of around 50 points.”

The result is that risk discernment is weaker when counting on an old bureau or sparse data. Lenders will likely decline the clients having weaker scores. For consumers, that translates to perhaps getting smaller credit lines than demanded or, in a worse scenario, higher than affordable.

Moreover, for the vast majority of 28 Million consumers, conventional data-based scoring models might not make that easy to have access to the credit in the initial place. Over 50% are reported with a minimum of one of three NCRAs for having a negative point or no active accounts from where to get data. Without information flowing into a scoring model for that to become positively inclined, they are less expected to use credit. They are efficiently locked up in a spiteful cycle: To get insurance or credit, they require to use insurance or credit facilities – however, without a dependable way of assessing their risk, insurers and lenders perhaps won’t take any chances to take them as clients.

For 25 Million-odd residual prospective consumers in the US, having no files, bureau data won’t assist either. They get caught in the catch-22 consequence.

So, if insurance credit will be offered to this substantial 'unscorable' audience, the credit bureau data has to be added by non-conventional types of data.

Alternative data isn't limited to social media resourced data. There are many self-evident reasons why data resourced from Instagram, Twitter, Facebook, etc., doesn't precisely scream quality data for different objectives of insurance or credit risk scoring.

Therefore, if alternative data isn’t only social media resourced data, then what is it?

Different Kinds of Alternative Data

Transactional Data
This kind of data is generally defined as the consumer use patterns of debit and credit cards. One can ask, how is the transactional data 'alternative'? Lots of this data is not vigorously mined to create predictive models.

Positive – Normally, data is clean and well-structured.
Negative – It’s time-intensive.

Clickstream Data
In layman's terms, it is how the users navigate different websites. With the 'death' of the 3rd-party cookies tracking in utmost, if not all the browsers by 2022, as well as past years' news stating that Avast was winding down its subordinate Jumpshot, this kind of data has become more uncommon and unreliable.

Geo-Location Data
Geo-location data is data, which can be utilized to recognize the physical location of an electronic device. As users taking their devices everywhere with them, it has turned into an increasing data source for assessing anything from store basketball to traveling arrays of populations.

Questionnaire or Survey Data
Companies like LenddoEFL are utilizing psychometrics, a study associated with the technique and theory of psychological measurements to scrape the datasets to mathematical analytical scoring solutions.

Text & Audio Files
This kind of unstructured data generally gets sourced from insurance or credit application documents and call recordings between the lender and customer.

Utility or Rental Data
It is normally historical data but seldom does appear in reports from key NCRAs. A few scoring companies have included that as a part of product offerings, mostly for the USA market.

Web Data
That’s where X-Byte rules! This kind of data could be anything, which is a part of the public online source, like forum posts, product & price data, products & services reviews, etc. Worldwide leading accounting company EY has called web data as amongst the most accurate & perceptive alternative datasets.

How Valuable the Data Is?
Research shows that all the above-mentioned data resources add analytical value to insurance and credit risk models, which are fundamentally depending on conventional data. As the predictive value given on the fundamental model relies on primary variables like the power of customer relationship, the original analytical performance of the conventional data-based models, the overall performance gets per data set that is anywhere in the range of 5% -20%. For instance, social network data is much more appreciated while applying that to newer customer onboarding consequences (~10%-20% of model performance improvement) than while there is a strong association between these parties (with 5%-15% percentage). In this theoretical scenario, transaction data is another very strong forecaster.

The given chart here shows the results of a project taken by FICO, amongst the leading scoring solutions providers, on the personal loan group. "The customary credit characteristics seized more value than optional data characteristics (using alternative data taking around 60% of analytical power), as well as there was a high amount of connection between them. When these two datasets got combined, the general model performance-enhanced significantly.

How to Productize Alternative Data? What’s the Role of Artificial Intelligence and Machine Learning in That?

Effective use of alternative data has its challenges. Different analytic expertise and Machine Learning tactics are implemented to cope with big-size unstructured data sets. Recognizing patterns in the data, which can be efficiently utilized in insurance or credit risk scenarios is a challenging job.

Nevertheless, the main element in Artificial Intelligence and Machine Learning is data scientists. Different models require to experience stringent quality assurance procedures to guarantee the accurateness of outputs, which the patterns recognized are relevant, strong, and understandable. Now, the question is how AI-driven alternative data-empowered models can be clarified to both regulatory authorities as well as consumers.

A very tangible example where the model’s explainability deals with the regulatory & compliance domain is in the context of GDPR or (General Data Protection Regulation).

Automated processing gets permitted in definite circumstances under the GDPR however, only when you have suitable measures for safeguarding an individual’s rights allowing the individual with a way of requesting human intervention, explain a viewpoint, as well as challenge decisions.

That’s where it becomes difficult for AI-powered scoring. Normally, Artificial Intelligence does not have the 'black-box' repute for nothing; so, in risk situations, customers require to have clear-cut reasons why they are adversely affected by the decision. If a decision is based on a model, the model should be explainable to a point where individual drivers of positive and negative scores are very much clear. In case, it is an easy and traditional scorecard, then its explanation will be self-evident. However, troubles start when an AI model is not very clear, raising the standard again for Artificial Intelligence explainability.

What’s the Status Now?

To fulfill the market demands, both from insurers and lenders looking to get new customers as well as from the consumers’ viewpoint, a risk-scoring approach that includes alternative data resources is vital.

While it might well be happening with VantageScore or FICO, coming to this market with a few alternative data improved scores, what is very clear is that the hunger to resource alternative data; especially web data is growing day by day. So, it is obvious that insurers, lenders, and scoring companies require to invest in the strong alternative data collecting process and infrastructure.

Keeping an eye on the explainability of Artificial Intelligence is also very important and also to remain obedient with present regulatory requirements as well as also offering a fair amount of transparency to the consumers.

DEV Community

How Do Insurance & Credit Scoring Work with Alternative Data?

Top comments (0)