DEV Community

Cover image for Bra Asks Yuhang Jia: Saying Goodbye to "Artificial Stupidity"? AI Training Data Providers Rectify the Names of AI
Sam Ng
Sam Ng

Posted on

Bra Asks Yuhang Jia: Saying Goodbye to "Artificial Stupidity"? AI Training Data Providers Rectify the Names of AI

Guest: Yuhang Jia
Yuhang Jia, the general Manager at Cloud Testing Service (CTS), has extensive experience in market research for B2B enterprise services. In 2015, he established the North American Division of Testin.net, overseeing overseas markets and cutting-edge technology research and development. In 2017, he founded the business unit of AI Data Collection and Annotation, providing high-quality, scenario-based data collection and annotation services for AI, focusing on resolving the training data needs for AI implementation.
Hosted by Sam Ng (AKA: Bra)
Founder of ServBay, Secken and DNSPod, former General Manager of the SME Product Center at Tencent Cloud, cybersecurity expert, domain and DNS technology expert, webmaster, China Europe International Business School (CEIBS) EMBA.

Image description


Bra:
When did you first start exploring AI? What made you decide to delve into the niche field of AI data later on?

Yuhang Jia:
Image description
I’ve been closely following AI developments, from early voice interactions like Siri, to AlphaGo which sparked market-wide discussions, and various speculations about the future of AI.
What really struck me was an American health AI product, ActiveProtective. It's a wearable airbag that deploys to protect the wearer's lumbar and pelvic bones when a 3D motion sensor detects a fall.
ActiveProtective, now renamed Tango, is a wearable smart airbag belt. Previously, AI was seen as helping humans reduce repetitive labor or replace humans in dangerous tasks. ActiveProtective added a layer of care for humans, enhancing happiness through technological capabilities. Since then, I've believed in the tremendous potential of AI and wanted to contribute to accelerating the implementation of AI products.
Choosing the AI training data segment aligned with Testin's initial testing services, where project management mindset and toolchain management adapt well to current data collection and annotation operations. Additionally, after a decade in enterprise services with a vast user base from the mobile internet and traditional industries, these users are now also aiming to use AI for digital transformation.

Bra:
In 2015, you were responsible for overseas markets and cutting-edge technology R&D at Testin's North American Division. Having insight into AI technology development in both China and the US, how significant do you find the gap between China and the US in AI technology? What advanced experiences from abroad can be learned?

Yuhang Jia:
From a media perspective, there are some differences in the development focus of AI between China and the US. For example, in autonomous driving, Western, especially US, companies tend to focus on edge intelligence, making independent judgments based on edge computing and perception results; China leans towards vehicular networks, emphasizing an interconnected ecosystem and collaborative perception between vehicles and roads. In research areas, China has a broader range of application scenarios, accelerated by a series of policy supports for AI application implementation.
For a B2B service company like us, it's more important to learn from the ecosystem perspective. Testin's mission is to facilitate industrial intelligence. In the global trend of industrial upgrading, we provide a comprehensive service combining core technology, product tools, and professional talents to accelerate the digital and intelligent transformation of enterprises, improving operational efficiency, reducing costs, enhancing product quality, ensuring information security, and injecting new growth momentum into various industries.

Bra:
Data, computing power, and algorithms are said to be three driving forces to stimulate the development of AI's underlying technology. Testin.net is a top AI training data service provider in China, with data annotation accuracy reaching up to 99.99%. How is such high accuracy achieved? What supports this achievement?

Yuhang Jia:
AI datasets are created from raw data, where valuable data is selected and annotated to become datasets for AI training, resulting in high-quality AI.
As China's AI industry enters the commercial application stage, there's a strong demand for scenario-based, refined data. High annotation accuracy is both an industry requirement and a direction we, as a leading AI data service provider, must exemplify. Our high accuracy is based on three aspects:
First, strength. Testin boasts a comprehensive, efficient annotation platform for all types of knowledge products, streamlining every data processing request; it empowers industries with a suite of integrated services from platform development, data scenario labs, data delivery centers, and professional team building to efficient organizational collaboration, ensuring high-quality AI data processing.
Second, capability, referring to multi-dimensional data processing capabilities. As a leading AI data service provider, CTS Data supports all categories of vision, speech, and text, adapting to changing demands across these AI algorithm dimensions with higher precision support. Additionally, Testin incorporates auxiliary quality inspection tools in its tools, setting safeguards for data accuracy enhancement based on annotation project requirements.
Third, solutions. Testin's deep tech development and industry understanding contribute to industrial empowerment. Testin has developed training data service solutions for smart cities, smart homes, intelligent driving, smart finance, and AIoT, accelerating high-quality AI application implementations across industries.
These achievements are the result of Testin's continuous effort and technological investment. Rooted in market practice, Testin pioneers in business strategy and cutting-edge tech exploration, forming a complete AI data service chain of "collection, annotation, management, and storage." Meanwhile, focusing on serving, understanding, and guiding customers based on our business capabilities, we help clients build their foundational and cornerstone capacities.
Thus, Testin's comprehensive strength in technology, service quality, assurance, efficiency, and customer satisfaction leads the industry’s recognition.

Bra:
Testin developed a data processing platform named "Testin Annotation Platform 4.0," automating task flows from data collection to delivery for various roles. Why develop such a platform? What role does it play in AI development?

Yuhang Jia:
Our aim in the data business is industry leadership, necessitating excellence in technology experience, delivery capacity, customer satisfaction, accuracy, and scale. As the saying goes, "To do a good job, one must first sharpen one's tools." Without a large-scale, efficient collaborative workstation, these goals are unattainable.
Testin Annotation Platform (version 4.0) provides enterprises with the capability to handle large-scale sensory data. Through innovative structure, intelligentization, engineering, and standardization, the platform empowers the AI training data industry, stimulating data value in terms of quality and efficiency, accelerating AI technological innovation, and advancing AI industry scenario-based implementations.
Technologically, the Testin Annotation Platform features multi-end data support, AI-assisted quality inspection, comprehensive annotation tool support, efficient and standardized operations, deep integration with enterprise processes, and quality control in annotation processes. It supports quick data retrieval, data version management, and visualization of annotation results, meeting diverse and rich data needs for AI landing scenarios, improving overall efficiency in AI data training by 200%.
Tool-wise, the platform supports one-stop processing for data types like images, texts, voices, videos, and point clouds, equipped with professional tools such as 3D boxes, point cloud semantic segmentation, feature points, lines, rectangles, curves, planar boxes, polygons, etc., flexibly meeting various annotation needs and facilitating data processing implementation in coordination with algorithm models, quickly responding to diverse AI training demands.
By empowering tools through the Testin Annotation Platform, enterprises gain the capability to handle large-scale sensory data, shortening data collection cycles, enhancing data annotation efficiency, and significantly reducing AI model training costs. This helps enterprises achieve unprecedented accuracy in data recognition, greatly accelerating the AI landing and iteration cycle, saving considerable R&D time and costs.

Bra:
High-quality data relies on a team of excellent data annotators, yet the turnover rate is high, and experienced talents are scarce in this industry. How do you address this challenge?

Yuhang Jia:
We address turnover from two aspects.
First, we lower the entry barrier to the industry through pre-job training and toolchains, making data annotation simpler, thus broadening our recruitment pool and enabling more people to quickly adapt to their roles.
Second, we've established a performance incentive management system, creating a gradient for our annotators so they're not stuck in repetitive labor. Instead, based on their work outcomes, they receive corresponding incentives or opportunities for advancement at different levels.

Bra:
With major companies like Tencent Cloud, Alibaba Cloud, and Baidu Intelligent Cloud entering the data collection and annotation service market, do you feel pressured?

Yuhang Jia:
We feel no pressure, but rather excitement.
The entry of these cloud providers indicates the industry's recognition of the AI data sector. The more participants, the larger the market grows, which positively impacts us.
We're actively exploring ecosystem collaborations. At last week's Tencent Digital Ecosystem Summit, Tencent launched its autonomous driving cloud platform and announced its ecosystem partners, with us being the only AI training data service provider. This collaboration between Tencent's autonomous driving cloud and Testin is based on the leading technology of the Testin Data Annotation Platform.
Just like the popular "metaverse" concept, relying on cloud computing technology, major cloud providers can offer fertile ground. AI data services are the infrastructure on this ground, and we look forward to more participants joining in building the foundation, allowing the digital ecosystem to thrive.

Bra:
It seems you've covered all services in the AI data track, including data collection, annotation, and even developing an annotation platform. What untapped opportunities remain in this track?

Yuhang Jia:
When mentioning AI data, people first think of data collection, cleaning, and annotation, which are the production stages of AI data. However, we've noticed many companies can produce data but don't know how to efficiently utilize it. This is where the value of data management becomes apparent, making extending into whole data storage and management the next big trend. With foresight, we've developed an AI Dataset Management System.
Testin's AI Data Management System allows for iterative validation of algorithm directions through algorithm iterations. For instance, if an autonomous driving vision perception company finds poor machine recognition in snowy conditions, how can it specifically train the related algorithm with data? The data management system's tagging feature can quickly extract and validate corresponding snowy data from the existing database, achieving more efficient data management.
In fact, after observing numerous cases, we've found that companies utilizing the AI Data Management System accelerate their operational rhythm and iterative cycles, transitioning from waterfall to agile development, achieving more efficient data management. This underscores the significance of the AI Dataset Management System.

Bra:
You've launched an autonomous driving training data solution, offering a one-stop solution for intelligent driving training data needs from early development to implementation. Testin is a partner of Tencent Autonomous Driving Cloud and serves many leading intelligent car customers. Based on your experience, what are the requirements for training autonomous driving AI data?

Yuhang Jia:
Our intelligent driving solution is divided into three parts, corresponding to three different stages of development:
The first stage is the algorithm pre-research period, verifying the feasibility of the algorithm. We provide Testin's proprietary basic dataset to assist enterprises in preliminary research.
The second stage is data cold start, based on the algorithm's corresponding sensors and scenarios. Through data collection, cleaning, and annotation, we ensure a dataset is available for algorithm iteration and development. Testin's scenario lab and annotation base are capable of meeting the precision and scale requirements, offering customized collection and annotation services.
The third stage is product launch, where the product has already accumulated some online production data. We assist enterprises in data collection, annotation, and management processes through toolchains and onsite services, facilitating their own iterations.
Data service companies providing solutions for automakers and the automotive industry must possess at least three capabilities:
First, large data volume. Given the complexity of scenarios faced by cars in any environment, sufficient data volume is essential.
Second, coverage of diverse vertical domains to encompass as many different scenarios as possible.
Third, multi-dimensional sensor fusion data processing capability. What is multi-dimensional sensing? For example, driving on a highway with endless mountains and blue sky ahead, when a sky-blue car appears in front, visual sensors alone cannot make the judgment. Adding millimeter-wave radar and lidar is necessary to establish a perception system in the 3D coordinate system and identify obstacles ahead.
Some companies, like Tesla, primarily rely on visual sensors, while others focus on lidar. Testin can assist related enterprises in better environmental perception, improving distance measurement accuracy, planning routes more effectively, and customizing data collection, cleaning, and annotation solutions based on different sensors.

Bra:
Early this month, national regulation tightened with the official implementation of the "Personal Information Protection Law of the People's Republic of China." The AI data field involves extensive data collection, requiring constant attention to policy trends and legal regulations. Has the "Personal Information Protection Law" impacted you? How do you implement data security?

Yuhang Jia:
The "Personal Information Protection Law" mainly targets industries related to personal information security. For AI data service providers, the industry and operational processes are more clearly regulated, which positively influences us.
In terms of data security, we hold ISO9001, ISO27001, ISO27701, CMMI3 certifications, complying with relevant data privacy and security regulations. Additionally, Testin has security testing and penetration testing experts safeguarding our platform architecture. Beyond technical privacy and security measures, Testin emphasizes employee responsibility and standards in data collection and annotation, assisting enterprises in understanding data security and privacy requirements through training and coaching.

Bra:
What are Testin's future development plans? What technological innovations can we look forward to?

Yuhang Jia:
In response to AI data industry trends, Testin has devised a "horizontal and vertical" plan:
The "horizontal" aspect involves deepening our focus on five key areas—driving, finance, home, smart cities, and AIoT—providing professional AI data solutions to customers. We're also actively exploring sectors like construction and retail, aiming to apply our AI data service experience to more industries with growth potential.
The "vertical" aspect starts from the perspective of customer needs, enhancing efficiency across all data-related segments. As these areas continue to develop, Testin will persist in strategic deployment, strengthening our solution and service capabilities to ensure breakthroughs in these industries, allowing Testin to meet related customer needs.


Bra AsksYuhang Jia: Saying Goodbye to "Artificial Stupidity"? AI Training Data Providers Rectify the Names of AI

Thanks for reading.
If you are interested in more information, follow Bra's X(Twitter)

Top comments (0)