DEV Community: Abe Dearmer

Why You Need a CRM Data Cleanup

Abe Dearmer — Tue, 09 Feb 2021 17:16:10 +0000

No one likes clutter, whether it’s in your office, your email inbox, or in your customer relationship management (CRM) software. It’s all too easy for junk to pile up unnoticed over time until it’s too late, making you feel unorganized, unproductive, and off-kilter.

While we can’t straighten up your house for you like Marie Kondo, we do have some advice to share regarding CRM data cleanup. A messy CRM database doesn’t just make your data feel “cluttered”—it can also have a tangible negative impact on your organization’s effectiveness, agility, and bottom line.

Unorganized CRM data can create problems such as:

Failure to keep proper track of your interactions with customers and leads.
Wasting valuable employee time by sifting through customer contact information that’s inaccurate or duplicated.
Neglecting, passing over, or alienating an important lead who might have become a valuable customer.

The consequences of not performing CRM data cleanup can be drastic for your organization’s productivity and profitability. IBM estimates that poor data quality costs the U.S. economy $3.1 trillion every year, while IT research and advisory firm Gartner projects that this figure is $13.5 million for the average business. Several other studies have estimated that bad data may be costing companies between 10 and 30 percent of their annual revenues.

Fortunately, your business doesn’t have to suffer the same fate. Below, we’ll discuss some of the most common reasons for data quality issues and offer our suggestions for performing CRM data cleanup so that you can avoid the most common mistakes and keep your database in top shape.

3 Reasons Why You Need CRM Data Cleanup

According to Salesforce, as much as 70 percent of data in a CRM system “goes bad” (i.e. becomes obsolete) every year. In this section, we’ll discuss the three biggest reasons why nearly every organization needs a CRM data cleanup.

Duplicate Data

Duplicate data is an all-too-common issue with CRM databases, and it can happen for a variety of reasons during the process of data collection and/or data entry:

The same lead might sign up using different emails or street addresses, creating a separate record in the CRM for each one.
The same data might be entered with different formatting, abbreviations, etc. For example, 555-555-5555, 5555555555, and (555) 555-5555 all represent the same phone number, but your CRM system might not realize this without performing data cleansing.

According to HubSpot, data duplication rates may be as high as 10 to 30 percent for companies who are in sore need of a CRM data cleanup.

On a small scale, the problem of duplicate data is manageable and relatively easy to fix when doing CRM data cleanup—just identify it and delete it. In the long run, though, duplicate data can be a disaster: every additional duplicate record makes it more and more time-consuming to sift through your database, which cuts into the time that you have to follow up with leads. It’s no wonder that sales representatives only spend 36 percent of their time actually selling.

To avoid this, you need to address the problem of duplicate data at the source by performing CRM data cleanup. This means going through your entire dataset, discovering the source of the duplicate records, and standardizing their format. Limiting the type and amount of input options (e.g. by only allowing selections from a drop-down menu, instead of text boxes that allow free entry) is another way to further deal with the problem.

Missing Data

Missing and incomplete data is another painful and all-too-common issue with your CRM database. For example, a lead might fill out contact data such as their email address, phone number, and company while forgetting to put down their own name—and now that you’re going through the database, you realize you have no idea what to call them.

Again, if this is a one-time incident, it’s not too difficult to do CRM data cleanup by researching the missing information, filling in the blanks, and working on converting the lead to a paying customer. But when this pattern repeats on a larger scale, it becomes a much more time-consuming and challenging problem to solve—not to mention the opportunity costs you face from not following up with leads sooner.

The complications don’t end there—without CRM data cleanup, your sales team won’t have access to a complete picture of your customers and leads. As a result, any automated marketing tools you have (e.g. sending more personalized emails to your contacts, or separating contacts by their lifecycle stage) are liable to fail when trying to perform their functions.

As a result, you’ll be unable to contact or separate your leads correctly if at all—so you’ll either have to do it manually and painfully or miss out on them entirely.

Finally, incomplete information doesn’t just have consequences at the level of individual leads. It can also create gaps in your large-scale understanding—things like which markets to focus on, or which leads are most likely to convert into opportunities and customers.

Without a dedicated CRM data cleanup program, you’ll risk falling behind in your ability to compete in a crowded marketplace.

Out-of-Date Information

The older your data is, the more likely it is to be inaccurate—which, for all intents and purposes, makes it basically useless. This fact is especially true for your CRM data. On a daily basis, companies are formed or go out of business, employees leave their jobs or get new job titles, and people change their addresses and phone numbers.

According to the U.S. Bureau of Labor Statistics, for example, the average employee has been with their employer for 4.1 years, and annual turnover rates at some businesses could be as high as 15 to 25 percent. Even a single inaccurate data point could make it impossible for you to get in touch with, or to follow up with, a contact.

For example, your CRM might show that you have 10,000 different contact records for leads in the Boston region—but what it doesn’t show is that 1,000 of these leads have actually moved to new cities since you collected their data. Without doing CRM data cleanup, you might incorrectly assume that the health of your business is stronger in a particular region than it actually is. In turn, you might make bad judgment calls about your advertising and marketing campaigns to reach these (now-extinct) leads: e.g. putting up billboards in a part of town where they no longer live or buying Facebook ads in their time zone.

Like the issues of duplicate data and missing information, incorrect data due to an out-of-date CRM database can cost you dearly. While correcting one record might not take much longer than a phone call or web search, correcting hundreds or thousands of records will be time-consuming to the extreme (not to mention mentally exhausting for the poor team members assigned to the task).

The Benefits of Clean Data in Your CRM Database

Thus far, we’ve talked about the potential detriments that having a messy CRM database can bring to your organization. But what about the flip side—the benefits of CRM data cleanup, more accurate data, and better data management for your sales and marketing teams?

The advantages of cleaning up your CRM data include:

Better-quality leads: When you do CRM data cleanup, the customer data and leads in your database will be unique, complete, and up-to-the-minute. You don’t have to worry about wasting your salespeople’s time when their emails bounce or a contact’s phone number is no longer in service.
Higher productivity: Keeping your database clean and tidy with automated CRM data cleanup means that employees don’t have to waste days or weeks of taxing manual effort cleaning it up themselves. Sales reps and marketers will be happier, more productive, and less likely to suffer from overwork and burnout.
Increased profits: Instead of correcting bad information and chasing down old leads gone missing, sales reps can spend more time doing what they’re supposed to be doing—selling. And assuming your sales reps are friendly and competent, more time selling is very likely to result in a healthier bottom line.

How to Do CRM Data Cleansing with Xplenty

If your house is a total wreck, the solution is easy—just call up a maid service for help. But what should you do if you need to perform CRM data cleanup and don’t know where to get started?

For many businesses, the answer is right in front of them. The correct and accurate information you need to perform data cleansing and standardization for your CRM database is often already available, but scattered across a dizzyingly large array of data sources, from websites and emails to phone calls and contact forms. Yet without a way to collect, clean, and ingest this data through marketing automation, businesses are no better off with this knowledge than they were before.

That’s why an effective ETL (extract, transform, load) workflow is essential for modern CRM data cleanup. ETL is a data integration process that automatically migrates information from diverse data sources into a centralized repository (e.g. a CRM database). To build a rapid, efficient ETL data pipeline with automation, more and more businesses are taking advantage of dedicated ETL solutions like Xplenty.

At its heart, Xplenty is a data delivery company. We remove all the roadblocks and barriers to the data flows across your organization so that you can streamline your information workflows and maintain a clean and tidy CRM database:

With more than 100 integrations with databases, SaaS applications, analytics tools, and more, Xplenty helps you quickly and efficiently connect all of your relevant data sources—there’s no need to use your company’s internal resources, or write the code yourself.
Xplenty lets you choose exactly the data you want from each source, when you want it integrated, and how you want it delivered. This freedom and precision mean that your data can be properly connected, synced, and up-to-date at all times, avoiding all the data woes discussed above.
The Xplenty low-code platform is user-friendly with a gentle learning curve and a simple, drag-and-drop graphical user interface. Xplenty makes it easy for even non-technical business users to build robust data pipelines from their data sources to a cloud data warehouse or CRM database.

Want to find out how ETL solutions like Xplenty can help automate your CRM data cleanup process? Get in touch with Xplenty today for a chat about your business needs and objectives, or to start your free trial of the Xplenty platform.

This post originally appeared on the Xplenty blog here. Check it out on our website and subscribe to our newsletter if you want to hear more updates.

What is Chaos Engineering: Theory, Principles & Benefits

Abe Dearmer — Thu, 28 Jan 2021 21:04:25 +0000

In the past, software systems ran in highly controlled environments on-premise and managed by an army of sysadmins. Today, migration to the cloud is relentless; the stage has completely shifted.

Systems are no longer monolithic and localized; they depend on many globalized uncoupled systems working in unison, often in the form of ethereal microservices.

It is no surprise that Site Reliability Engineers have risen to prominence in the last decade. Modern IT infrastructure requires robust systems thinking and reliability engineering to keep the show on the road. Downtime is not an option.

A 2020 ITIC Cost of Downtime survey indicated that 98% of organizations said that a single hour of downtime costs more than $150,000. 88% showed that 60 minutes of downtime costs their business more than $300,000. And 40% of enterprises reported that one hour of downtime costs their organizations $1 million to more than $5 million.

To increase the resiliency of these systems, the discipline of chaos engineering emerged. Stress testing a system with chaotic experiments by randomly engineering failures reveals the Achilles heel. Simulating adverse conditions allows engineers to integrate safeguards, circuit breakers, and incident response mechanisms. This post will dive into this chaotic artform.

What is Chaos Engineering?

As elucidated by the Chaos Community:

"Chaos Engineering is the discipline of experimenting on a system to build confidence in the system's capability to withstand turbulent conditions in production."

Most engineers' first exposure to this discipline most likely came from Chaos Monkey. A tool invented by Netflix, it is responsible for randomly terminating instances in production to ensure that engineers implement their services to be highly available and resilient to pseudo-random termination of instances and services within the Netflix architecture. At the time, Netflix had recently migrated to Amazon Web Services and needed to create a framework to prove its infrastructure could survive a fallout and automatically self-heal.

Netflix added more techniques to this framework, such as "Failure Injection Testing" (FIT), which causes requests between Netflix services to fail and verifies that the system degrades gracefully. And of course, tools within The Simian Army such as Chaos Kong, which simulates the failure of an entire Amazon EC2 (Elastic Compute Cloud) region.

This amalgamation of tools evolved into the discipline that we now know as chaos engineering.

Principles of Chaos Engineering

Designing any experiment requires four things: a postulation and hypotheses, independent variables, dependent variables, and of course, context. These principles provide a guidepost for designing chaos engineering experiments:

Construct a hypothesis around steady-state behavior.
Trigger real-world behavior, utilizing both a control and an experimental group.
Run experiments in production by injecting failures into the experimental group.
Automate experiments to run continuously, attempting to disprove the hypothesis that your system is resilient.

A viable hypothesis may be in the following form:

"the tolerance range will not exceed 8% above the steady- state while injecting X, Y, Z into the system."

Robust experiments should trigger the loss of availability of several components within the system. Experiments need to mimic real-world events, avoiding the happy path. Tests should utilize all possible inputs while also recreating scenarios from historical system outages.

Chaos Engineering Tools

It is essential to minimize the blast radius while designing chaos experiments, ideally one small failure at a time. Measure experiments carefully, ensuring they are low-risk: involve few users, limit user flows, limit the number of live devices, etc. As one begins, it is wise to inject failures that verify functionality for a subset or small group of clients and devices.

As these low-risk experiments succeed, you can then proceed to run small-scale diffuse experiments that will impact a small percentage of traffic, which is distributed evenly throughout production servers.

A small-scale diffuse experiment's main advantage is that it does not cross thresholds that could open circuits. This allows one to verify single-request fallbacks and timeouts while demonstrating the systems resilience to transient errors. It verifies the logical correctness of fallbacks, but not the characteristics of the system during large-scale fallout.

The following is a list of tools to get you started:

Chaos Monkey:The OG of chaos engineering. The tool is still maintained and currently integrated into Spinnaker, a continuous delivery platform developed initially by Netflix to release software changes rapidly and reliably.

Mangle: Enables one to run chaos engineering experiments against applications and infrastructure components and quickly assess resiliency and fault tolerance. Designed to introduce faults with minimal pre-configuration and supports a wide range of tooling, including K8S, Docker, vCenter, or any Remote Machine with SSH enabled.

Gremlin: Founded by the former Netflix and Amazon engineers who productized Chaos as a Service (CaaS). Gremlin is a paid service that gives one a command-line interface, agent, and intuitive web interface that allow you to set up chaos experiments in no time. Don't worry. There's a big red HALT button that makes it simple for Gremlin users to reactively rollback experiments in the case of an attack negatively impacting the customer experience.

Chaos Toolkit: An open-source project that tries to make chaos experiments easier by creating an open API and standard JSON format to expose experiments. They are many drivers to execute AWS, Azure, Kubernetes, PCF, and Google cloud experiments. It also includes integrations for monitoring systems and chat, such as Prometheus and Slack.

Why Invest in Chaos Engineering?

There are numerous reasons to invest in chaos engineering.

To start, it forces organizations to implement business continuity planning (BCP) and disaster recovery frameworks. Implementing these frameworks gives organizations a strategic advantage compared to their competitors because they elucidate the organization's awareness of their operational vulnerabilities and demonstrate a proactive approach in addressing them. This imparts trust to stakeholders and customers.

Additionally, organizations operating in critical infrastructure industries within the EU will have to abide by the EU's Directive requirements on network and information systems' security, meaning they will be legally obliged to implement incident response capabilities.

As chaos engineering is an experimentation approach, it gives us a holistic view of the system's behavior and how all the moving parts interact in a given set of circumstances, allowing us to derive insights into the system's technical and soft aspects (aka, the human factor).

Chaos engineering will enable organizations to find security vulnerabilities that are otherwise challenging to detect by traditional methods due to distributed systems' complex nature.

This may include losses caused by human factors, poor design, or lack of resiliency. For example, conventional approaches may consist of red and purple team exercises that focus on an adversarial process, allowing organizations to test how security systems and teams respond to active threats.

This post originally appeared on the Xplenty blog here. Check it out on our website and subscribe to our newsletter if you want to hear more updates.

ETLT Data Integration: The Best of Both Worlds

Abe Dearmer — Wed, 13 Jan 2021 23:29:45 +0000

In the decades since the dawn of data integration, a debate has raged: ETL or ELT?

ETL is valuable when it comes to data quality, data security, and data compliance. It can also save money on data warehousing costs. However, ETL is slow when ingesting unstructured data, and it can lack flexibility.

ELT tools (such as Fivetran or Stitch Data) are fast when ingesting large amounts of raw, unstructured data. It also brings flexibility to your data integration and data analytics strategies. However, ELT sacrifices data quality, security, and compliance in many cases.

Now, a new solution has risen. ETLT is a hybrid approach that allows the best of both worlds.

This article documents the “why, what, and how” of ETLT so you can determine if it’s right for your use-case’s

What Is ETL?

Historically, data warehouses were optimized to query and read large datasets fast for accurate business intelligence. This made data warehouses good at processing read operations (SELECT, WHERE, etc.). However, the cost of building and setting up a data warehouse – in terms of buying hardware, licensing software, and developing and maintaining the system – was a multi-million-dollar undertaking. To save on costs, developers would only load cleaned, transformed, and aggregated data into their warehouses – and for greater efficiency, they would remove any data that wasn’t necessary for the analysis.

To prepare data like this, organizations needed to extract data from different databases, transform it into a unified format, and remove unnecessary information before loading it into the warehouse. This gave rise to ETL (extract, transform, load) tools, which prepare and process data in the following order:

Extract raw, unprepared data from source applications and databases into a staging area.

Transform and aggregate the data with SORT, JOIN, and other operations while it is in the staging area.

Load data into the warehouse.

According to this workflow, by the time the data loads into the warehouse, ETL has structured it into a relational format that the data warehouse can read efficiently – so business intelligence tools can work with the data to produce valuable reports.

What Is ELT?

Most organizations continue to rely on ETL for data integration, but the need for preload transformations has changed with the rise of data lakes and high-performance, cloud-based data warehouses (like Redshift, Azure, BigQuery, and Snowflake).

Modern cloud data warehouses have the processing capability to efficiently manage write operations on large data sets. In fact, cloud data warehouses are so fast at processing data that they have rendered ETL unnecessary for many use-cases. This has ultimately given rise to a new data integration strategy, ELT, which skips the ETL staging area for speedier data ingestion and greater agility. ELT sends raw, unprepared data directly to the warehouse and relies on the data warehouse to carry out the transformations post-loading.

ELT tools prepare and process data in the following order:

Extract raw, unprepared data from source applications and databases.

Load the unprepared data into the warehouse.

Use the data warehouse to process transformations when required.

The main point to remember with ELT is that data transformations happen within the data warehouse itself, which typically bypasses the need for a staging server. In this respect, the data warehouse contains both raw and transformed data inside it.

ELT offers an excellent way to collect and store large amounts of raw, unstructured data. At the same time, ELT doesn’t give you the option of removing PHI, PII, and other sensitive data before loading it into the data warehouse. Therefore, it isn’t perfect from the perspectives of data security, compliance, and data quality. Simply put, sacrificing security and compliance in the name of speed and flexibility just isn’t an option for many businesses, which is why most businesses continue to rely on ETL or they select the hybrid approach of ETLT (more on this later).

ETL vs. ELT: Advantages and Use-Cases

Both ETL and ELT offer specific advantages that make them suitable for different use-cases, and sometimes – as you’ll see in the following section on ETLT – you really need a mix of both.

ETL Advantages and Use-Cases

The advantages of ETL apply to the following scenarios:

Data compliance and security: Whether your organization adheres to industry-specific data compliance standards – like SOC 2, GDPR, CCPA, and HIPAA – or a data compliance standard of your own, it may be necessary to remove, mask, or encrypt PHI (protected health information), PII (personally identifiable information), and other data before moving it to the data warehouse. An ETL strategy that transforms data before loading can achieve this, but an ELT strategy can’t. For instance, with ELT, SysAdmins may have access to sensitive information in logs even if the data warehouse transforms it after loading. This makes ETL more suitable when data compliance is a concern.

Managing large datasets: Even though a modern cloud data warehouse can handle virtually any size dataset, you might not want to pay extra data warehousing fees for this service. Instead, you can use an ETL solution to remove unnecessary or redundant data in-pipeline to reduce your data storage expenses. In contrast, an ELT solution could load a lot of unnecessary data into the data warehouse.

Data warehouse processing fees: Many ELT tools offer lower upfront prices, but this is only because ELT shifts data processing costs over to the data warehouse. In the end, ELT doesn’t bypass the cost of processing data transformations. For this reason, many organizations choose ETL as a more cost-efficient way to perform data transformations. For example, Xplenty uses an ETL pricing model that charges a flat fee per connector, and it doesn’t charge based on processing.
Therefore, organizations can save money by allowing Xplenty to perform transformations in-pipeline before the loading phase.

Data quality: ETL ensures data quality by managing data transformations in batches and by standardizing data formats to prevent unintended data corruption.

ELT Advantages and Use-Cases

The advantages of ELT apply to the following use-cases:

Rapid data ingestion: ELT allows you to quickly add new data sources and ingest any kind of raw data immediately without the data passing through a staging server. These advantages make ELT ideal for use-cases that require real-time streaming and rapid data ingestion.

Ingest and save data that you might need later: Traditional ETL involves the aggregation of data in a certain way, which requires you to throw out data. But ELT lets you save all data in the data warehouse – even data that you don’t have a use for now, but could be useful later.

Transform only the data you need for a specific analysis: ELT allows you to load raw data into the warehouse for storage purposes, and transform only specific data in the way that best supports a specific type of analysis. This slows down the process of reading and analyzing the information because each analysis transforms the data from scratch. However, for certain use-cases, it offers the flexibility to easily change the business logic of your data on the fly as your requirements change, or as you learn more about the data.

More individuals have the skills to code in-warehouse transformations: In-warehouse transformations are usually coded in basic SQL. The prevalence of SQL knowledge makes the process of coding ELT transformations more accessible for a larger number of developers, i.e., it’s easier and less expensive to find developers who can manage this task.

How ETLT Merges the Best of ETL and ELT

In the age of big-data analytics, ELT offers tremendous advantages because rapid ingestion gives BI solutions access to more data faster – including raw and unstructured information. ELT also brings the flexibility to change analytics strategies on the fly. However, the limitation of an ELT-only strategy is that it cannot always meet data security and compliance requirements – especially those that require you to mask, remove, or encrypt PHI and PII data before moving it into a data warehouse.

Whether you need to adhere to national or industry data compliance standards – or a security standard of your own – your organization simply might not be able to sacrifice any degree of security and compliance in the name of faster analytics. Thus, we come to the need for ETLT (extract, transform, load, transform).

ETLT is a “best of both worlds” approach to data integration that (1) speeds up data ingestion while (2) ensuring data quality and securing sensitive data in accordance with industry compliance standards. ETLT uses the following data integration pattern:

Extract the raw, unprepared data from source applications and databases and load it into a staging area.

Transform data “lightly” while it’s in the staging area (usually to remove/mask/encrypt PHI, PII, or other sensitive data). The first transformation stage only applies to one data source at a time. These transformations are fast and simple because they transform each source independently of other sources. There is no attempt to integrate two data sources into one until after loading. Transformations for this first stage relate to data formats, data cleansing, and masking/removing sensitive data for compliance purposes.

Load the prepared data into the data warehouse.

Transform and integrate data more completely within the data warehouse using the data warehouse to process those transactions. The second transformation stage relates to integrating multiple data sources and other transformations that apply to data from multiple sources at the same time.

ETLT allows you to ingest data from diverse sources faster because the pre-load transformation stage only performs light-duty transformations. These light-duty transformations are quick to set up and fast to process, and they overcome the risks of ELT by satisfying essential data compliance requirements and preserving data quality. Meanwhile, you can leave the complex, multi-source transformations for the data warehouse to process later.

ETLT: Advantages and Use-Cases

At the end of the day, the ETLT use-cases that matter most are when you need to mask, remove, or encrypt PII and PHI data for compliance purposes before loading. If your industry compliance standards (or regional data security laws) require preload transformations, ETLT allows you to meet these requirements while still reaping ELT benefits like fast data ingestion and business logic flexibility.

An example ETLT use-case involves the pseudonymization of data through preload data masking. According to the European Union’s General Data Protection Regulation (GDPR), data controllers must implement “pseudonymization” to ensure that data is not tied to a specific person. One of the most important aspects of pseudonymization is the use of data masking to redact or encrypt any PII or PHI data before loading it into a destination data warehouse. With ETLT, you can limit preload transformations to simple data-masking operations that encrypt or remove PII/PHI data before it arrives at the data warehouse. Later, you can carry out more complex transformations within the data warehouse as required.

Using ETLT to perform simple, preload transformations like this for compliance purposes – then saving more complex transformations for later – offers the following advantages:

Satisfies data security and compliance requirements
Promotes customer trust
Limits the chance of privacy breaches

Speeding up data ingestion
Offers flexibility in the way you transform data because you can easily change post-load transformations to suit different types of analyses

Conclusion

It would be foolish to think the growing popularity of hybrid solutions like ETLT will end the ETL vs ELT debate once and for all. At the same time, if your organization values compliance, PII protection, pre-transform loading speed, and the flexibility of warehouse transformations, an ETLT solution could be the best of both worlds solution you've been looking for.

About Xplenty

Xplenty is a powerful, enterprise-grade ETL, ELT, and ETLT as a service platform that makes it easy for anyone – regardless of their tech experience – to create and automate sophisticated data integration processes. With Xplenty’s powerful data engine, you can follow the ETL or ELT model as required. You can also adhere to the ETLT model by performing simple data preparations in-pipeline, and directing the data warehouse to perform more nuanced SQL-based transformations after loading.
Whether it’s pre-load or post-load transformations – or using ETLT for a mix of both – Xplenty makes data integration a snap.

Originally published at https://www.xplenty.com/blog/what-is-etlt/