DEV Community

Priscilla Parodi for Elastic

Posted on • Edited on

3 2

Data preparation for Data Frame Analysis with Transforms

| Menu | Next Post: Trained Models for Supervised Learning |

When you are using Data Frames (multi-variate analysis), Transforms can be useful in the data preparation step.

It converts existing Elasticsearch indexes into summary indexes, allowing you to define a pivot, which is a set of features that transform the index into a different, more digestible format, providing opportunities for new insights and analysis.

In fact, it performs search aggregations on the source index and indexes the results on the destination index. Therefore, a transformation never takes less time or uses less resources than the aggregation and indexing processes.

You can decide whether you want the transform to run once or continuously.

In this example we have 3 documents from a source index that stores reviews, with these fields: user-id, vendor and review.

Source Index (reviews)
{
...
user-id: 123,
vendor: "abc",
review: 4
},
{
...
user-id: 123,
vendor: "def",
review: 3
},
{
...
user-id: 123,
vendor: "ghi",
review: 5
}
Enter fullscreen mode Exit fullscreen mode

With Transforms we can have a Destination Index grouped by user-id, for example, with the number of reviews per user (3 reviews in this case), and a simple average of the reviews (4+3+5)/4.

Destination Index (reviews-result)
{
...
user-id: 123,
num_reviews(sum): 3,
avg_review: 4
}
Enter fullscreen mode Exit fullscreen mode

And it could be updated if running continuously, which means we could use the data we need in the way we need it, e.g., sum, max, cardinality, etc.

| Menu | Next Post: Trained Models for Supervised Learning |

This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay