Conference Notes: How ML Powers LINE Services

#datascience #machinelearning #ai #dataengineering

Preface

Hello everyone, I am Evan Lin from the LINE Taiwan DevRel team. I am very happy to share the third developer meetup of this year with you all. This is also the first offline gathering held in Hsinchu after the pandemic. It is also the first time the LINE Taiwan engineering team has held an offline event at National Chiao Tung University. This sharing is about how machine learning can make LINE's services more user-friendly, as shared by Shawn Tsai from the LINE Taiwan Data Engineering team at the event.

KKTIX event webpage: Event URL

Full text of the event URL: TBD

Slides

Video

Composition of the LINE Taiwan Data Engineering Team

First, Shawn shared the composition of the LINE Taiwan Data Engineering team, which is mainly composed of the following three roles:

Data Engineer

As a data engineer, you need to have strong engineering skills, whether it's data extraction, capturing, and pre-processing. Even in the data exploration phase, and finally the deployment of machine learning models, the assistance of data engineers is indispensable.

Data Scientist

The job of a data scientist is to assist data engineers in extracting data and discussing how to pre-process it. Then, they will learn the machine learning model.

Data Analyst

The focus of a data analyst is on data exploration, finding the values that can truly solve the problem. And they will make relevant tests and corrections for the completed model.

Collaboration Method of the Data Engineering Team and Projects

The data engineering team is mainly composed of the above three roles. All data engineering teams will have different task groups due to different product needs. Some products are still in the data discussion and extraction stage, while others may have already entered the tuning of machine learning models. For different product lines, each member can participate in many interesting products and projects in their daily work, and can learn new machine learning model methods to apply to each daily work.

Challenges Faced by the Data Engineering Team

Since LINE has more than 21 million users, LINE TODAY produces one million articles a year, and LINE Shopping has five million product inquiries every month. So much data is the problem the data engineering team has to face. And machine learning itself can be simple or quite complex. Next, the machine learning techniques used will be explained based on the different products.

LINE Customer Service Helper

"Oops... How do I move my account when I change phones?"

"How to buy stickers and send them to friends and family?"

These questions are operational issues that users want to understand every day, but how can they find answers in a timely manner? At this time, you can use the "Customer Service Helper" machine learning capabilities to help you quickly reply. Click here to join the LINE Customer Service Helper account on your phone or search for @linehelptw to add as a friend. For more usage introductions, please refer to "LINE Customer Service Helper" smart customer service is newly upgraded ~ Solve LINE issues in conversations.

Because the same question may have various ways of asking, for example:

Why don't notifications sometimes come from LINE?
Why can't the messages come out?
LINE doesn't ring or vibrate?

These three completely different ways of asking may lead to the problem caused by the iOS 11 update. These would be very time-consuming to answer manually, so Natural Language Understanding (NLU) is needed, and LSTM is used to understand the relevance of text in the context, and CNN is used to obtain the features of text and other text. This is the first version of the solution, but the effect is not satisfactory. Later, through the use of seq2seq, CBoW, DSSM and BERT to achieve an Ensemble solution, this method has greatly achieved better results.

LINE Message Verification Helper

The "LINE Message Verification" platform was officially launched in July last year. It not only has an official website, but also connects to the LINE official account. Users only need to "forward" the messages received in the chat room to the "LINE Message Verification" official account. If there are already verification reports in the database, the verification helper will automatically determine its authenticity, and the system will provide the verification results immediately; if the message has not been verified, it will be reported to a professional verification unit, and after clarification, it will quickly send back the correct information to the user, providing the most immediate message identification service, assisting users in identifying the authenticity of suspicious messages, and reducing the chance of fake messages being spread again.

The amount of messages received every day is as high as 40,000, but manual identification can only handle 300 messages per day, so machine learning is needed to help a lot. Through Near-Duplication and Classification methods to find and classify messages. Now the efficiency of message verification has improved more than ten times, and 46% of the suspicious messages sent by users have been successfully clarified. Don't want to become a spreader of fake messages? Join the "LINE Message Verification" official account.

Learning through Machine Learning Projects

Many student friends will be curious, what exactly do they do every day in the data engineering team at LINE? The speaker also generously shared with everyone that, as a data engineering team, do they really spend most of their time every day training machine learning models? This picture can let you know that most of the time is spent on machine configuration (Configuration), data collection (Data Collection), and a lot of time is spent on data verification (Data Verification), and even the construction of the infrastructure (Serving Infratructure) that can continuously update the latest machine models after the model is built will also take a lot of time. The actual model training is often only a small part of the entire project. You can also know that the main time and expertise of data scientists lies in how to find and identify "key information".