DEV Community: Tencent Cloud

COS-Based ClickHouse Data Tiering Solution

Man yin Mandy Wong — Tue, 08 Nov 2022 03:46:06 +0000

ClickHouse is a columnar database management system (DBMS) for online analytical processing (OLAP) and supports interactive analysis of petabytes of data. As a distributed DBMS, it differs from other mainstream big data components in that it doesn't adopt the Hadoop Distributed File System (HDFS). Instead, it stores data in local disks of the server and uses data replicas to guarantee high data availability. Then, it leverages distributed tables to implement distributed data storage and query.

Shard: It refers to a server that stores different parts of the data. In order to read all the data, you must access all the shards. Storing the data of distributed tables in multiple shards implements horizontal scaling of computing and storage.

Replica: Each shard contains multiple data replicas, so you can access any replica to read data. The replica mechanism ensures data availability in case a single storage node fails. Only MergeTree table engines support the multi-replica architecture. ClickHouse implements the data replica feature in table engines rather than database engines; therefore, replicas are table-level rather than server-level. When data is inserted into ReplicatedMergeTree engine tables, primary-secondary sync is performed to generate multiple data replicas. ZooKeeper is used to conduct distributed coordination during the sync.

Distributed table: Distributed tables created with distributed engines distribute query tasks among multiple servers for processing but don't store data. When such a table is created, ClickHouse will first create a local table in each shard, which will be visible only on the corresponding node; then, it will map the local tables to the distributed table. In this way, when you access the distributed table, ClickHouse will automatically forward your request to the corresponding local table based on the cluster's architecture information.

In summary, one ClickHouse cluster consists of multiple shards, each of which contains multiple data replicas. A replica corresponds to a server node in the cluster and uses its local disk to store data. With distributed tables, shards, and replicas, ClickHouse achieves the horizontal scalability and high data availability.

1. Tiered data storage

Starting from v19.15, ClickHouse supports multi-volume storage, which stores ClickHouse tables in volumes containing multiple devices. This feature makes it possible to define different types of disks in a volume for tiered storage of cold and hot data, striking a balance between performance and cost.

2. Disk types supported by ClickHouse

ClickHouse mainly supports DiskLocal and DiskS3 disks.

3. Data movement policy

ClickHouse can store data in different storage media by configuring disks of different types and storage policies in the configuration file. It also supports movement policies to automatically move data between storage media.

4. Current problems with data storage in ClickHouse

Many users choose ClickHouse for its superior query performance. To make the most of it, they generally select Tencent Cloud Enhanced SSD cloud disks to store ClickHouse data for their high performance; however, Enhanced SSD costs a lot. After a trade-off between the performance and cost, they may clear legacy data from ClickHouse. Although most queries involve the latest data, the business side does need to access legacy data sometimes. The balance between the cost and occasional access to legacy data bothers ClickHouse system admins.

5. COS strengths

Cloud Object Storage (COS) is a distributed storage service launched by Tencent Cloud. It has no directory hierarchy or data format restrictions, can accommodate an unlimited amount of data, and supports access over HTTP/HTTPS protocols.

COS organizes data in pay-as-you-go buckets with an unlimited capacity, which can be used and scaled on demand.

6. COS-based ClickHouse data tiering

Prepare the following environments before configuring data tiering:

• Local storage: Format an Enhanced SSD cloud disk and mount it to the "/data" path for storing hot data.

• COS bucket: Create a COS bucket for storing cold data and get the "SecretId" and "SecretKey" of the account that can access the bucket.

6.1 Configure the ClickHouse disk and policy

First, you need to configure the "/etc/clickhouse-server/config.d/storage.xml" file. In , define the local disk path, COS bucket URL, and "SecretId" and "SecretKey" of the access account. In , define the policy, which defines and volumes that contain the local disk and COS bucket respectively.

6.2 Import data to ClickHouse

After completing the storage configuration, set up a table with the TTL policy configured and import data to it to verify the tiering policy.

Here, a COS bucket inventory is selected as the data source for import. First, create a table named "cos_inventory_ttl" in ClickHouse based on the content of each column in the inventory. Then, configure the TTL policy. According to the "LastModifiedDate" value, store hot data in the "ttlhot" volume and cold data at least three months old in "ttlcold".

6.3 Verify data

After import, view the total number of data rows. Then, you can query the volumes storing different data. You can further conduct a query test to count the total size of files generated in the past three months in the "cos-user/" directory.

7. Conclusion

In ClickHouse, configuring different storage media and policies implements automatic tiered storage of data. Thanks to the unlimited capacity and cost-effectiveness of COS, ClickHouse clusters can store data in the long term at low costs while providing a superior query performance.

Tencent Cloud COS, Key to Data Disaster Recover

Man yin Mandy Wong — Tue, 01 Nov 2022 03:13:22 +0000

This article describes how Tencent Cloud Object Storage (COS) addresses data layer disaster recovery.

1. Cross-AZ Disaster Recovery

If your application is already deployed in Tencent Cloud, you can use COS's multi-AZ capabilities to improve the data layer availability. Multi-AZ refers to the multi-AZ storage architecture offered by COS, which can provide IDC-level disaster recovery capabilities for your data.

In this architecture, data will be split into multiple chunks, and corresponding coding chunks will be calculated based on the erasure code algorithm. The original data chunks and coding chunks will be mixed up and evenly distributed to IDCs in different AZs in a region for storage and intra-region disaster recovery.

The multi-AZ feature provides 99.9999999999% (12 nines) designed data reliability and 99.995% designed service availability. When you upload data objects to COS, you can store them in a multi-AZ region simply by specifying the storage class.

After the multi-AZ feature is enabled, your data will be distributed among IDCs in multiple AZs in a region. When an IDC fails due to extreme situations such as natural disasters or power outages, other IDCs can still guarantee normal data reads and writes, thereby ensuring persistent storage, business continuity, and high availability.

2. Cross-Region Disaster Recovery

In addition to the multi-AZ feature of COS, you can also save data copies in different regions to further improve the data layer availability.

COS's cross-region bucket replication feature asynchronously replicates data across regions. It is a bucket-level configuration item, where rules can be configured to replicate incremental objects from one bucket to another bucket automatically and asynchronously.

With cross-bucket replication, COS can accurately replicate exactly the same object content, along with object metadata and version IDs, from the source bucket to the destination bucket. Additionally, object operations such as adding or deleting objects can also be synced to the destination bucket.

With cross-region bucket replication, when the IDC in one region is damaged due to force majeure, the IDC in another region can still provide data copies for your use, implementing cross-region disaster recovery.

In addition to high availability, cross-region bucket replication can also meet industry-specific requirements for data compliance. If you have end users accessing objects from different regions, you can maintain object copies in buckets closest to them geographically, so as to minimize the access latency and deliver a better user experience.

3. Versioning

If data is deleted accidentally, it will be lost permanently even if cross-AZ or cross-region disaster recovery is implemented.

To avoid data loss due to accidental deletion or application failure, COS has launched the versioning feature. It allows you to store multiple versions of an object in the same bucket. For example, you can store multiple objects with the same object key "picture.jpg" but different version IDs like "100000", "100101", and "120002" in a bucket. Then, you can query, delete, or restore objects in the bucket by version ID. This enables you to recover from data loss caused by accidental deletion or application failure. For example, when you delete an object with versioning enabled:

• If you need to delete the object (not permanently), COS will insert a delete marker for the deleted object. The marker will serve as the current object version and can be used for version restoration.

• If you need to replace the object, COS will insert a new version ID for the newly uploaded object. You can still restore the replaced object with the version ID.

There are three versioning states for a bucket:

• Versioning not enabled: Bucket versioning is not enabled by default.

• Versioning enabled: When bucket versioning is enabled, it will be applied to all the objects in the bucket. After versioning is enabled for the first time, new objects uploaded to the bucket will be assigned a unique version ID.

• Versioning suspended: After versioning is suspended (it cannot be disabled once enabled), new objects uploaded to the bucket will no longer be subject to versioning.

You can upload, query, and delete objects no matter which versioning state the bucket is in.

4. Anti-Overwrite for Upload

Besides force majeure, data exceptions may also occur due to operations that don't seem risky. COS maintains the eventual consistency by overwriting an existing file when another file with the same name is uploaded. To avoid unexpected overwrites, you need to maintain a complete name check system in your business logic. Alternatively, you can enable versioning, which leads to a more complex logic for object management and extra storage usage though. More often than not, you only need to forbid overwrites of certain files, which makes versioning unnecessary in terms of functionality.

To this end, COS provides an anti-overwrite mechanism at bucket and object levels. You can enable bucket-level anti-overwrite, then the bucket will forbid uploads of any files with the same name. Specifically, when a file with the same name is uploaded, COS will deny the upload request to ensure that the existing file in the bucket will not be overwritten. If you only want to prevent certain files in the bucket from being overwritten, you can add a special header to the upload request, which checks whether any file in the bucket has the same name as the file to be uploaded, and if so, the upload will fail. After anti-overwrite is enabled, you can still rename or delete files.

5. Object Lock

In some compliance scenarios, anti-overwrite is far from enough though. For example, in the finance field, compliance regulations require file retention for a certain period of time and prohibit file overwrite, deletion, and modification. In this case, you can use object lock to meet the requirements. After it is enabled, within the retention period:

1.Objects cannot be deleted or modified;

2.The storage class of objects cannot be modified;

3.The HTTP headers and user metadata of objects cannot be modified, including "Content-Type", "Content-Encoding", "Content-Language", "Content-Disposition", "Cache-Control", "Expires", and "x-cos-meta-".

Object lock perfectly meets compliance requirements.

Compared with local secondary IDCs, cloud-based disaster recovery features higher reliability, availability, and security and gets rids of repeated hardware, computing, networking, and software. It greatly reduces the TCO while guaranteeing the RPO and RTO.

ARM-Based Server Review

Man yin Mandy Wong — Thu, 27 Oct 2022 04:06:25 +0000

1. Background

Take a look at the ARM-based server SR1 recently launched by Tencent Cloud. Is it worth it? How does it stack up against other models? Let's check it out.

We have reviewed two typical models of the ARM-based SR1 and x86-based S5 to show you how to measure CPU performance, mainly computing power, so that you can quickly know what you should be looking for.

2. ARM-based server environment and evaluation preparations

Tencent Cloud SR1 is the first ARM-based server with the latest Ampere Altra, an ARM Neoverse N1 CPU with up to 2.8 GHz clock rate and 64 KiB L1 cache. The Neoverse N1 CPU has the following architecture:

The other object is the mainstream x86-based standard S5, which adopts the latest Cooper Lake microarchitecture of Intel Xeon Platinum and runs at 2.5 GHz. It's quite popular in general use cases. By the way, both of the test objects accommodate 4-core 8 GiB memory.

From the cost perspective, SR1 is approximately 20% cheaper than S5 as indicated at the official website. Although it doesn't have a price as competitive as Lighthouse, it is really worth it.

1.1 ARM-based server activation

S5 and SR1 price comparison

SR1 is comparable to S5 in terms of overall performance and more economical than the latter, a must-have that promises a large amount of cost savings for both individuals and enterprises.

Tips: Screen splitting

Use the Tmux tool to split the screen (ctrl b), log in to two servers at the same time, and enter the ctrl b:setw synchronize-panes command to allow for entering commands on two terminals at the same time, as shown below:

2.1 System preparations and CPU viewing

Enter commands in different windows of Tmux.

Done with the preparations and let's start the evaluation.

3. 7-Zip compression evaluation

7-Zip is built with the LZMA compression tool to quickly evaluate the CPU computing performance of servers.

Run the following command to evaluate the performance:

2.6 LZMA compression evaluation (ARM-based SR1/x86-based S5)

7-Zip evaluation

The 7-Zip benchmark command can be used to display the compression and decompression performance of a server, with a measure of million instructions per second (MIPS). The higher the value, the stronger the performance. You can also use metrics such as compression rate and execution time for coordinated verification. 7-Zip evaluation rarely uses 64-bit instructions, let alone advanced sets; it's more about the performance of CPU "fundamentals". LZMA compression performance relies on the memory access latency, high-speed data cache (D-Cache) capacity, TLB performance, and out-of-order execution efficiency of a CPU; while the decompression performance reveals more about the branch prediction and instruction latency of the multi-stage pipeline design.

Evaluation results:

2.2 LZMA compression evaluation

7-Zip evaluation of S5 and SR1

As you can see, ARM-based SR1 delivers 60% higher performance than x86-based S5 in LZMA compression and decompression scenarios.

4. LUKS block device encryption and decryption evaluation

LUKS is a specification for block device encryption supported by the Linux kernel. Simply put, it encrypts disks.

Similar to file compression and decompression, block device encryption and decryption are typical applications that consume a lot of computing resources. Unlike generic computing scenarios, encryption and decryption computing instructions are usually implemented with special hardware to serve as CPU extension sets. The x86 system adopts the AES-NI extension, and ARM differentiates extensions for varied encryption and decryption scenarios.

There is no need to install any other software. Just use the cryptsetup tool that comes with Linux to evaluate the CPU performance through encryption and decryption algorithms:

By default, the command evaluates tasks of ciphers and key derivation functions (KDFs).

Run the following command to evaluate the performance:

2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)

LUKS evaluation process

Evaluation results (KDFs):

2.3 LUKS encryption evaluation

LUKS evaluation of S5 and SR1 in terms of KDFs

Evaluation results (ciphers):

2.3 LUKS encryption evaluation (ARM-based SR1/x86-based S5)

LUKS evaluation of S5 and SR1 in terms of encryption algorithms

As you can see, the ARM-based server outperforms its x86-based counterpart in terms of the optimization of common SHA instructions (SHA-256 and SHA-512) and AES-CBC encryption; while in terms of decryption and XTS encryption with the highest security, the x86-based server (AES-NI extension instruction) does a better job.

5. OpenSSL network encryption and decryption evaluation

Block device encryption uses data at rest, while network encryption involves data in transit. As OpenSSL is one of the most popular network encryption libraries, it's necessary to conduct an OpenSSL performance evaluation.

OpenSSL's speed sub-command can be used to evaluate all the encryption algorithms, which takes a long time. Generally speaking, you can use parameters to specify algorithms. Commonly used algorithms are Hash-based Message Authentication Code (HMAC) for encrypted information integrity and identity verification, SHA-256 secure hash for information digest and digital signature, and standard encryption algorithm of AES-256 widely adopted by cloud service providers.

Run the following command to evaluate the performance:

2.4 OpenSSL encryption evaluation (ARM-based SR1/x86-based S5)

OpenSSL encryption process through speed

Evaluation results:

2.4 OpenSSL encryption evaluation

OpenSSL encryption results of S5 and SR1

As you can see, the ARM-based server slightly lags behind the x86-based server in terms of MD5 HMAC, but it outperforms the latter in terms of SHA-256 and AES-256, especially in the former case.

6. Redis database throughput rate evaluation

Now let's move to Redis performance evaluation. As one of the most popular memory databases, Redis is often used for key-value storage, data cache, and message queue scenarios with a high throughput rate. Redis also has a built-in evaluation utility called redis-benchmark to measure the number of requests per second.

The redis-benchmark program evaluates the throughput rate of a single server during the tests of GET, SET, LPUSH, and other common Redis commands, looking into the CPU and its memory access capabilities (such as memory access bandwidth and performance).

Run the following command to evaluate the performance:

2.6 Throughput evaluation (ARM-based SR1/x86-based S5)

Redis evaluation command execution

Evaluation results:

2.6 Throughput evaluation

Redis throughput rate evaluation of S5 and SR1

According to the Redis evaluation results, ARM-based SR1 has 30% to 40% higher performance on average than x86-based S5.

7. Conclusion

Now it's time you get some hands-on experience and see what your cloud server performance test would reveal.

Actually, ARM-based servers are more than cost-effective. As ARM platform-based virtualization technologies become popularized in the cloud, ARM-based servers are bound to gain more momentum in IoT, cloud phone/gaming, Android ecosystem, and many more use cases.

Let's look forward to more diversified experiences available at our fingertips.

GME Immersive Voice Solution Empowers Games with Boundless Imagination of Metaverse

Man yin Mandy Wong — Tue, 25 Oct 2022 03:13:37 +0000

1.What possibilities can metaverse bring to games?
The trending "metaverse" concept was first coined in an American science fiction to refer to a cyberspace parallel to the reality. Games are the closest form of metaverse. From mainstream perspectives, metaverse games deliver a real and immersive interactive and social networking experience by allowing players to interact, create, and exchange value freely and boasting diverse and inclusive cultures and content.

2.What challenges in voice technologies need to be tackled to implement metaverse features in games?
Metaverse games have high requirements for an interactive experience and need to tackle the following core challenges to implement voice technologies: sense of direction, immersive experience, cross-platform compatibility, and barrier-free multilingual communication.

•Sense of Voice Direction
In interaction-intensive social gaming, the most important interaction method is game voice. When people are talking in the real world, the voice direction and distance also convey a large amount of information in addition to the volume level and tone. How to enable players to communicate like in the real world and how to convey the directional information in the game voice are top priorities for developers.

•Immersive Voice Experience
In addition to the voice direction and distance, the voice of people in the real world also integrates with the environment. When people are talking, they can perceive effects such as reverb and diffraction of their voice generated in the environment. How to integrate the voice with the environment to maximize the real immersive experience for players is also a major challenge.

•Cross-Platform Compatibility
Players log in to a game from different terminals and devices. How to implement smooth game voice, make the game compatible with tens of thousands of device models available on the market, and enable players on game consoles, mobile devices, and PCs to talk with one another are major challenges for developers.

•Barrier-Free Communication
Metaverse games allow players from different cultures and languages to have fun in an open metaverse and even switch their accents like Millie in Free Guy. To helpHelping players speaking different languages communicate without barriers creates higher requirements for games.

3.GME Empowers Games with Boundless Imagination of Metaverse
GME 3D Position-Based Voice
3D voice conveys direction and position information to make the voice more stereo. In battle royale and FPS games with an ever-changing battle situation, voice-based position identification greatly improves players' communication efficiency during multi-player team battles. In social games such as Werewolf, the sense of voice direction gives players a more truly interactive experience and enhances their memory even in roundtable discussions with strangers.

By adopting HRTF and distance-based equalization technologies, GME's unique realistic 3D sound effect can completely restore the position details of voice and virtualize the auditory perception of the sound source in any position in a space. This enables players to identify teammates' positions in game battles based on their voice and enjoy an immersive gaming experience.

The 3D position-based sound effect is also available for different types of games, including MOBA, FPS, ARPG, Werewolf, space Werewolf, and board game.

Immersive Experience: Real-Time Sound Effect Processing by Wwise + GME
How to integrate the voice and game environment has always been a challenge for game audio engineers. In traditional mobile game voice solutions, audio engineers usually have to give up carefully crafted background sound effects due to the poor audio quality of players' mic.

GME has developed a proprietary solution jointly with the industry-leading sound effect engine Wwise, which well integrates the player voice with the pipeline design of game sound effects, fundamentally solving problems occurring during volume type switch in traditional voice solutions, such as volume level jump and audio quality reduction.

Moreover, based on powerful audio processing capabilities and rich sound effect plugins of Wwise, GME can implement sound effects such as reverb, diffraction, and insulation that are perfectly integrated with the game scenes in captured voice chat streams, which not only makes voice gameplay features more diverse, but also makes player communication more immersive.

In addition to perfect integration with the environment sound effect, the Wwise + GME solution also allows you to customize the processing of each voice stream, leaving you more room for designing diverse voice gameplay features. For example, you can design special sound effects based on the character of players and the changes in their status in game scenes, for example, using a quaver to express pain after being hit by the enemy.

As the unique global official voice partner of Wwise, GME is perfectly compatible and easy to connect.

Fun: GME Voice Changing Effects
GME also provides the voice changingvoice-changing feature for voice chat. In game voice interaction, players can freely switch between dozens of sound effects like from a middle-aged man to a little girl or from a cute girl to a nerd, so as to add more personality to their characters and make chat more amusing. In the metaverse, players are no longer constrained by their real-world identity and can switch their tone and personality at any time.

Powerful Cross-Terminal Compatibilities of GME
As the only Chinese voice development tool that makes the list of third-party development tools and middleware for Nintendo Switch™, PlayStation®️4, and PlayStation®️5, GME provides SDKs for consoles and is compatible with the latest versions of all console platforms. It features deep optimizations for UE, Unity, Cocos, and other major game engines, supports macOS, Windows, iOS, and Android systems, and is adapted to 20,000+ device models.

Barrier-Free Multilingual Communication
GME helps you easily implement multilingual communication scenarios. It can convert voice messages and voice chat streams to text in up to 125 languages, eliminating the language barriers in communication. It returns high-accuracy recognition results at a low latency to help implement barrier-free communication across regions and cultures in games.

Metaverse is not only a popular concept in the investment and technology fields, but also a long-term vision in the game industry. GME brings a brand new interactive voice experience to game developers and aims to continuously explore more possibilities of metaverse jointly with all industries.

Application of Media Processing Technology to 4K/8K FHD Video Processing

Man yin Mandy Wong — Thu, 20 Oct 2022 02:39:46 +0000

The support for higher video resolutions and definitions on devices has created higher demand for high definition and brought many challenges for 4K/8K videos with a super high resolution and bitrate. Today, we'll share some ideas about accelerating media digitalization through media processing capabilities.

In part 1, we will talk about the features of 4K/8K FHD videos and the problems holding back their wide application. Part 2 details the optimizations we've performed on encoders to make them more adapted to videos with a super high bitrate and resolution. Part 3 focuses on the architecture of the real-time 8K transcoding system for live streaming scenarios. And in the last part, we cover how to leverage media processing capabilities and image quality remastering technology to increase definition so that more FHD videos are available.

4K/8K FHD videos feature a super high definition, resolution, and bitrate. The latter two pose new challenges to downstream systems. In a live streaming system, video resolution and bitrate are closely related to the processing speed and performance consumption during transcoding. To support the real-time 8K transcoding system, both the encoding kernel and system architecture need to be redesigned. Currently, there are many hardware solutions dedicated to real-time 4K/8K encoding, but these solutions suffer from a poor compression rate compared with software encoding. To deliver 4K/8K definition, they require dozens of or even hundreds of megabytes for bitrate, posing a huge challenge to the entire transfer linkage and to the playback device. In addition, AR and VR are gaining momentum, which rely heavily on video encoding and transfer. As technologies advance, FHD videos will be an inevitable trend.

The second part shares some encoding optimizations and the performance delivered by our proprietary encoders.

Our team has independently developed encoding kernels for H.264, H.265, AV1, and the latest H.266. Proprietary encoders make it possible to design encoding features for real-world business scenarios and perform targeted optimizations. For example, during the Beijing Winter Olympics, the Tencent Cloud live streaming system sustained real-time 4K/8K encoding and compression and supported up to 120 fps for real-time encoding. To ensure real-timeness, many custom optimizations were made inside the encoder. V265, Tencent's proprietary H.265 encoder, overshadows the open-source X265 in terms of speed and compression rate. At the highest speed level, V265 is significantly faster than X265, delivering quick encoding at a high resolution. V265 also supports 8K/10-bit/HDR encoding. AV1 encoding is much more complicated than H.265 encoding. For FHD implementations, we've made many optimizations in engineering performance. Compared with the open-source SVT-AV1, TSC delivers 55% performance acceleration and 16.8% compression gain.

To implement quick encoding of FHD videos, we have made a few optimizations. The first is to increase the parallelism. The encoding process involves parallelism at the frame and macroblock levels. In real-time encoding at a high resolution, the frame architecture of the video sequence is tuned to increase inter-frame encoding parallelism. As for macroblock-level parallelism, tile encoding is supported for better row-level encoding parallelism. The second one relates to pre-analysis and post-processing. Encoders always involve a lookahead pre-analysis process before subsequent encoding operations. The look-ahead process tends to affect the parallelism of the entire linkage. Therefore, algorithms for pre-analysis and post-processing are simplified to accelerate the process. After these optimizations, the encoder delivers a faster processing speed and a higher level of parallelism.

Part 3 describes system architecture optimization. For live streaming scenarios, encoding kernel optimization alone is not enough to accommodate real-time 8K encoding and compression rate, which means the architecture of the entire system needs to be adjusted.

It is common practice to input the 8K AVS3 video source to the hardware encoder and output different channels of bitrate streams for delivery, such as 8K H.265, 4K H.265, 1080p H.264, and 720p H.264. This can help achieve the goal, but it also has many problems. First of all, 8K hardware encoders are generally expensive, especially 8K/AV1 ones with fewer options. Second, hardware encoders have a poor compression rate compared with optimized software encoders, as many acceleration algorithms not applicable to parallelism cannot be used for hardware encoding features. Third, hardware encoders often have custom architectures and chips, making them unable to quickly respond to different business scenarios. It's hard for hardware encoders to meet constantly evolving business requirements. If the same encoding effect can be achieved by software encoding, both the transcoding compression rate and business flexibility can be guaranteed.

To solve these problems, many adjustments are made to the architecture of the entire live streaming system. In a general live streaming system, streams are pushed to the upload access gateway, processed, transcoded, and then pushed to CDN for delivery and watching. For 8K video encoding, it's difficult for the current live stream processing linkage with only one server and one transcoding node to implement real-time software encoding. Against this backdrop, we've designed the FHD live stream processing platform.

In FHD live streaming, a transcoding node performs remuxing instead of transcoding, that is, it splits a pulled source stream into TS segments and sends them as files to the video transcoding processing cluster. The cluster can process TS segments in parallel to implement parallel encoding of multiple servers. Compared with the original single-linkage encoding with one server, this distributed method on multiple servers features pure software control and high flexibility. It's quite convenient for processing both capacity expansion and business upgrades. In addition, costs are reduced. The hybrid deployment of the offline transcoding and live streaming clusters allows for resource reuse within a larger scope of business, increasing the resource utilization. There are shortcomings, of course. The latency will be higher than that in a standard transcoding process. To implement parallel transcoding, remuxing is performed before stream processing, during which independent TS segments are generated after a period of wait time, thus leading to a higher but acceptable latency. When HLS is used by the downstream services for live streaming, there won't be an obvious change in the latency.

Live 4K/8K FHD videos are converted into parallel and independent offline transcoding tasks by the offline processing cluster through parallel encoding. Top Speed Codec capabilities can be used within the offline transcoding node, where when transcoding is performed, the bandwidth can be saved by more than 50% at the same subjective quality.

Compared with hardware encoders, the compression rate is improved by more than 70%. That is, through the aforementioned system solution, streaming live 4K/8K FHD videos requires only 30% of the hardware encoding bitrate at the same image quality level; TSC can improve the subjective quality by more than 20% at the same bitrate.

Inside each independent offline transcoding node along the linkage, video sources are decoded when they are received, and they are categorized by scene using different encoding policies. Scene detection is then performed, including noise detection and glitch detection, to analyze the noise and glitches in the video sources for subsequent encoding optimization. Before the encoding, the detected noise and glitches will be removed; after the image quality remastering of the video sources, perceptual encoding analysis is performed, where ROI areas in the image are analyzed, such as the face area and areas with complicated or simple textures. For those with complicated textures, some textures may be covered, and the bitrate can be reduced appropriately. For those with simple textures that are sensitive to the human eye, blocking artifacts will have a significant impact. In this case, the control analysis of perceptual encoding, or JND capabilities, can be used. Based on ROI and JND results, the encoder kernel can better assign the bitrates to the macroblocks during encoding.

Currently, many playback devices support 4K, but not all video sources are 4K. With Tencent Cloud’s media processing capabilities, video sources can be upgraded to 4K to deliver a truly 4K viewing experience.

A 4K FHD video is usually generated in the following steps. First, the video source is analyzed for noise, compression, and other distortion. Then, comprehensive data degradation is performed based on the analysis result, including noise removal, texture enhancement, and noise suppression. It is important to note that if certain parts of the image are well processed, such as areas containing human faces or text, which are more sensitive to the human eye, the overall viewing experience can be enhanced greatly.

After detail enhancement, the color will be corrected. HDR capabilities are widely used in 4K/8K videos, and SDR to HDR conversion can be performed for many video sources with no HDR playback effects to deliver a high-resolution and truly vivid 4K effect.

During video super-resolution, we cannot achieve the ideal effect by using only one model. Specifically, a general model can be used for the background or the entire image, and another model needs to be used for areas with faces and text. The two models should be combined to deliver the final enhancement effect. As the facial features are fixed and provide sufficient prior information for video super-resolution, dedicated efforts can be made to enhance this area to significantly improve the viewing experience.

Next-Gen Media SDK Solution Design (TRTC)

Man yin Mandy Wong — Tue, 18 Oct 2022 06:09:16 +0000

1. Immersive Convergence

1.1 Higher definition

According to statistics from Tencent Cloud, the average bitrate of internet streaming media played on PCs, tablets, mobile phones, and other terminals has been increasing since H1 2018. As people require higher definition, the compression rate has also improved as the bitrate increases. This is due to the developments from H.264 and H.265 to the recent H.266 with its over 100 technical proposals, which delivers 50% higher compression rates than H.265.

1.2 Stronger immersiveness

Advances have been made in the immersive experience of many applications, such as 3D guides, 3D modeling, AR/VR games, and multi-angle sports viewing.

1.3 Enhanced interaction

Real-time interaction is stronger. In particular, face point cloud data is collected from a mobile phone and then sent back to the audience member's device from the cloud.

1.4 Lower latency

Latency has achieved the greatest improvement. A few years ago, the latency on webpages was counted in seconds, but now it is measured in milliseconds, low enough for users to sing duets together in live rooms.

1.5 Four elements of the all-true internet

The all-true internet features a higher definition, enhanced interaction, stronger immersiveness, and lower latency. But this entails challenges and unavoidable difficulties in the cloud and on the terminal.

2. Technical Challenges

Let's take a look at the challenges and how to overcome them.

2.1 Challenge 1: RT-Cube™ architecture design

It's hard to coordinate internal modules no matter what you are working on, from an operating system to something smaller like an SDK. An SDK has many modules. The image shows a simplified version of the SDK module architecture, but you can still imagine the large number of modules that are actually involved. The bottom-left corner shows audio/video engine modules, the bottom-right corner TIM modules, and the top TUI components. When multiple modules are working together, they tend to scramble for CPU resources and encounter other conflicts.

The image above depicts the architecture design of the audio/video engine in RT-Cube™, which consists of many core modules with their respective submodules. Between those modules, there are much data communication and control logic. When the system runs stably, everything works well in unison. However, if the CPU frequency is reduced or the memory becomes insufficient, competition between modules will soon cause the entire system to crash. Therefore, a central control module is adapted to monitor and coordinate the modules in real-time and take intervention measures when necessary to better coordinate them and prevent an avalanche.

2.2 Challenge 2: RT-Cube™ version management

The second challenge relates to versioning. Although we offer many features, not all of them are needed by each customer. When they are packaged into different combinations, we need to manage a larger number of versions.

If an SDK offers nine features, there are 510 possible combinations, which translates into 510 * 4 = 2,040 versions in total on four platforms.

The traditional compilers such as Xcode and Android Studio are no longer applicable. A new platform with a compilation solution is needed to output SDKs for different platforms and allow for free combinations of features on different versions.

2.3 Challenge 3: RT-Cube™ quality monitoring

The third challenge is quality monitoring. Imagine that six users are watching a live stream or on a video conference. In a period of 20 minutes, one of them experiences 10 seconds of lag, while the others experience no lag. According to the monitoring data, the lag rate is 0.13%, which cannot reflect the poor experience of 10-second lag. If the rate is counted based on the percentage of users experiencing a lag, the value will be 16.7%. Thus, poor performance data should be the focus of monitoring and product performance. To avoid being obscured by reported data, it is important to keep the infrastructure unchanged and have a data packet that includes lag, size, blur, and acoustic echo reported every day. The algorithm should be refined and based on user metrics to reflect the poor experience. The result will then be used to figure out the number of affected users, percentage increase or decrease, and cause. That's how we find a way to improve.

2.4 Challenge 4: Module communication efficiency

The fourth challenge is the efficiency of communication between modules.

This problem is common with games. Many enterprises unify their backend systems using SDP standards and microservice languages, but they cannot normalize iOS, Android, or Windows platforms simply through C++. Texture image formats, Android formats, and Windows D3D are processed differently on iOS. If C++ is applied, all of them are processed through binary buffers. A great deal of unification work has been done to ensure data performance across different platforms.

3. Optimization and Improvement

Having discussed challenges and solutions, we move on to the optimizations and improvements that have been made in half a year to one year after the completion of the infrastructure upgrade.

3.1 Improvement 1: Audio module optimization

3.1.1 Feature

With the upgraded architecture, audio/video modules on the new version support many new capabilities, such as full-band audio, 3D audio effect, noise reduction based on deep learning and AI, and source and channel resistance. These capabilities enable many more challenging real-time interaction scenarios, for example, live duets which are highly sensitive to audio/video communication latency. In live music scenarios, music modes are optimized to restore signals as much as possible and achieve the highest possible resolution. In addition, a number of big data analysis means are leveraged to perform targeted monitoring and real-time analysis of sound problems, constantly reducing the failure rate and complaint rate by improving the audio quality.

3.1.2 Use

Audio modes are more diversified to make the product user-friendly. The speech mode is for conference communication, the default mode applies to most scenarios and can be enabled if you are not sure which mode is better, and the music mode is available for music listening. All the parameters can be customized.

3.2 Improvement 2: Video module optimization - effect

The video module is improved on the whole. Specifically, algorithms are improved for BT.601 and BT.709 color spaces, and BT.2020 and other HDR color spaces are supported. This makes images brighter. Targeted optimizations are also made to enhance the SDK definition without compromising the bitrate.

3.3 Improvement 3: Network module optimization

3.3.1 Architecture

Last but not least is the network module with our core technology used to implement stream control and overall reconstruction. As shown above, the cloud and terminal are integrated into a system with coordinated modules. Several data-driven optimizations are performed on the central control module.

3.3.2 Stream push

This is a more detailed part of the network module for two scenarios: live streaming and communication. For live streaming, the upstream algorithm is mainly used for ensuring definition and smoothness. For RTC communication, such as Tencent Meeting or VooV Meeting, the focus is on real-timeness and smoothness to eliminate high latency and lag.

3.3.3 Playback

Tencent Cloud delivers industry-leading playback performance in live streaming scenarios. It has a competitive CDN and has been constantly expanding into new scenarios, such as LEB. Besides standard browsers, LEB can use the SDK to deliver performance and effects in more formats at a latency of about one second, much better than browsers in demanding scenarios. In chat scenarios that require lower latency and stronger interaction, efforts can be made to smoothen mic-on/off.

3.4 Improvement 4: TUI component library

The TUI component library is also upgraded and completed. Instead of keeping hundreds of APIs of professional PaaS components and putting up with an unsatisfactory final product, you can import the TUI library for each platform in a few minutes and with a few lines of code. You can build a proper UI similar to those shown above within hours, even if you have never tried it before.

4. Summary

We've talked about the systematic design of component integration, where one plus one equals more than two.

In the cloud, we've successfully integrated three networks, that is, TRTC network, IM network, and CDN network.

On the terminal, existing features are continuously optimized in terms of stability and performance. For example, the squeeze theorem is applied in more scenarios and big data analysis cases to make the RTC SDK a leader in the industry in every respect. In addition, the LEB SDK and IM SDK with a new kernel will be integrated into the system to contribute to a powerful RT-Cube™ Media SDK architecture.

Thanks to the TUI component library with ready-to-use UI output, a strong and easy-to-use PaaS system is in place to offer more basic capability components for the all-true internet.

The RT-Cube Media SDK can be downloaded from the website as shown above. Currently, common versions are available, and custom capabilities will be online as the compilation system becomes more robust. You can freely combine different features to get the desired version.

GME 3D Voice Technology: High-Precision HRTF + Distance Attenuation Model

Man yin Mandy Wong — Thu, 13 Oct 2022 02:45:33 +0000

3D voice provides more auditory information for players to help them identify the positions of their teammates/enemies through voice and feel their presence much like in the physical world. This makes the gaming experience more convenient and fun.

Many game developers may ask: How does 3D voice work? How do I add it to my games? Below is a quick guide to 3D voice technology.

1. How do we determine sound source positions?

We can determine the position of a sound source mainly because the sound reaches the left and right ears at different times, and the strengths and other metrics are different, too. Specifically, we identify the horizontal position based on the differences in time, sound level, and timbre between binaural signals. The auricle acts as a comb filter to help identify the vertical position of a compound sound source. Sound localization also depends on such factors as sound level, spectrum, and personal experience.

2. How are the voice positions of players simulated? How does Tencent Cloud GME work?

A head-related transfer function (HRTF) is needed to do so. It can be regarded as a comprehensive filtering process where sound signals travel from the sound source to both ears. The process includes air filtering, reverb in the ambient environment, scattering and reflection on the human body (such as torso, head, and auricle), etc.

The implementation of the real-time 3D virtualization feature for voice is not merely about calling the HRTF. It also entails mapping the virtual space in the game to the real-life environment and performing high-frequency operations. The implementation process is summarized as follows. Assume there are N players connecting to the mic in a game. Given the high requirements for real-timeness in gaming, each player's terminal should receive at least (N-1) packets containing voice information and relative position information within a unit time of 20 ms in order to ensure a smooth gaming experience. Based on the relative position information, the high-precision HRTF model in the 3D audio algorithm is used to process the voice information, coupled with the information about the presence of obstacles in the way, ambient sounds in the game (such as the sound of running water and echo in a room), etc. In this way, realistic real-time 3D sound is rendered on the players' devices.

The entire process is compute-intensive, and some low/mid-end devices may be unable to handle it. How to minimize resource usage on the players' devices while ensuring a smooth gaming experience remains an industry challenge. In addition, some HRTF libraries can result in serious attenuation for some frequencies in audio signals, most notably the musical instrument sounds with diverse frequency components. This not only affects the accuracy of sound localization but also dulls the instrument sounds in the output ambient sounds.

Tencent Cloud Game Multimedia Engine (GME) launched the 3D voice feature in partnership with Tencent Ethereal Audio Lab, a top-notch audio technology team. Through the high-precision HRTF model and the distance attenuation model, the feature gives players a highly immersive gaming experience in the virtual world. Thanks to optimized terminal rendering algorithms, the computing efficiency increases by nearly 50%, and the real-time spatial rendering time of a single sound source is around 0.5 ms, so that most low/mid-end devices can sustain real-time 3D sound rendering. To address the problem of signal attenuation in the rendering process, GME improves the 3D rendering effect through its proprietary audio signal equalization techniques, making ambient sounds crystal clear.

3. How do we integrate 3D voice?

There are two 3D voice integration methods available. You can choose a suitable method based on the characteristics of your game.

Method 1: For non-VR games

How it works:

As the implementation of 3D voice requires calculations based on the positions and distances of sound sources, position coordinates are needed as key data in order to achieve 3D sound effects. Based on the coordinates, we can identify the position in the virtual space, calculate the distance from the sound source, and get the position information.

GME has streamlined the overall integration process. You only need to transfer the local coordinate information and position information to GME through the API. Then, GME will aggregate the data and calculate the coordinate information and position information of everyone in the room to get the 3D voice information.

Now we already have the position information of each speaker in the room in the virtual world. In order to achieve a 3D sound effect, 3D sound needs to be created. The position information, together with the audio streams, reaches the voice-receiving client. Without position information, the sound would be played back without any sound effect, just like in a common phone call or conference call. By contrast, with position information and GME's local 3D voice model engine, a 3D sound effect can be achieved.

Integration steps:

Prerequisites:

The "EnterRoom" API has been called, and the result in the room entry callback is successful room entry.

On the premise of successful connection to the voice chat service, you can integrate 3D voice as instructed below:

Call "InitSpatializer" to initialize the 3D sound effect engine.

Call "EnableSpatializer" to enable 3D voice.

Call "UpdateAudioRecvRange" to set the attenuation range.

Call "UpdateSelfPosition" to update the position information in real time.

Integration Guide: https://cloud.tencent.com/document/product/607/18218

Method 2: For VR games

There is a dedicated integration method for VR games. As we have noticed, VR device users have high requirements for the refresh rate, sound responsiveness, and spatial perception of sound. In VR gaming scenarios that emphasize real-time interactions and deep immersion, a premium low-latency 3D voice experience is of paramount importance. However, the traditional RTC voice call and 3D voice solutions in the market fall short of players' expectations of accuracy, real-timeness, etc.

How it works:

We have further optimized the 3D voice feature for the GME SDK 2.9.2. You can directly call the 3D audio model to pass in the 3D position information in real time and therefore achieve a real-time 3D sound effect.

Integration steps:

Prerequisites:

The "EnterRoom" API has been called, and the result in the room entry callback is successful room entry.

On the premise of successful connection to the voice chat service, you can integrate 3D voice as instructed below:

Call "InitSpatializer" to initialize the 3D sound effect engine.

Call "EnableSpatializer" to enable 3D voice.

Call "UpdateAudioRecvRange" to set the attenuation range.

Call "UpdateSelfPosition" to update the position information in real time.

Call "UpdateOtherPosition" to update in real time the position information of others in the room (which can be obtained at the business layer).

A Brief History of Game Voice

Man yin Mandy Wong — Tue, 11 Oct 2022 03:42:32 +0000

1.Background
Game voice tools have evolved with the development of the internet. The last 20+ years have witnessed huge leaps in game voice technology, from support for a single platform to cross-platform interoperability, from one-to-one chat to interactive voice chat in a room with tens of thousands of online users, from third-party voice communication SaaS tools to PaaS SDKs, and from monotonous voice chat to immersive voice experiences.
Game voice technology has gone through several stages, starting from the most basic voice chat to immersive voice experiences and beyond. As breakthroughs in sensors, computing power, audio algorithms, IoT, and other technologies are on the horizon, all-real voice will eventually become a reality, delivering the ultimate voice experience the metaverse demands.

2.Game voice v1.0: Third-party voice chat tools
At this stage, players use third-party voice chat tools to communicate with each other in the process of gaming. Whether the game itself offers a voice communication feature or not, using third-party tools allows players to quickly create chat channels and communicate with each other through voice chat.

3.Game voice v2.0: In-game voice
In-game voice solutions mainly take the form of game developers connecting SDKs developed by voice communication PaaS providers. The basic APIs that come with the SDKs are used to implement various in-game voice scenarios, such as channel voice between teammates (teammates can have a voice chat at any position coordinates in the game), range voice between different teams (players of different teams can hear each other only when their position coordinates in the game are within a specified range), as well as blocklist/allowlist.

4.Game voice v2.5: Upgraded version of in-game voice
To further improve players' game voice experiences, voice SDKs like GME offer voice processing capabilities such as voice changing and virtual 3D sound field. With these features, players can change their voice in real time based on their selected voice type, which adds fun to gaming and allows a vast design space for game voice features.
Through the 3D virtualization technology, voice processing and gaming scenarios are combined, which, however, is are limited to position and distance information in gaming scenarios. For a truly immersive experience, voice processing should cover all aspects of gaming scenarios. A voice SDK is unlikely to provide a dedicated API for every potential factor; otherwise, the SDK would be extremely complicated and bulky, and that's not really necessary. To take the game voice experiences up a notch, we need a new solution, namely the immersive game voice solution.

5.Game voice v3.0: Immersive voice
An immersive voice solution means that players' voice effects are rendered in real time based entirely on the game process. All players' voices are processed through digital signal processing (DSP) algorithms, and then played back in the headphones to simulate voice communication in real-world settings. Voice chat processed in this way can deliver a more immersive game voice experience, allowing players to communicate in a natural way.
Then, how is an immersive voice solution implemented? As mentioned above, it is not advisable to have a single voice SDK packed with all sorts of APIs. Moreover, voice service providers are generally not experts in audio processing algorithms compared with specialist audio technology companies. Therefore, to develop an all-encompassing voice SDK is virtually unviable.
In view of this, a combination approach will work best, just as with the Wwise + GME solution. Tencent Cloud Game Multimedia Engine (GME) is dedicated to end-to-end real-time voice communication, and the Wwise interactive audio engine is adopted by many game developers as a tool for game sound design. The Wwise plugin acts as a bridge for data interactions between GME and the Wwise engine, and GME voice streams are seamlessly connected to the Wwise audio pipeline, so Wwise's rich sound effects processing and control features can be used in voice chat. Such a design makes it possible to deliver an immersive game voice experience.
As an interactive audio authoring tool, Wwise is generally used to create high-quality audio content for games, and GME complements Wwise in the field of game voice. Now sound engineers can also use Wwise to create immersive and interesting voice features, opening up new gameplay possibilities.
Immersive voice, however, is definitely not the acme of game voice experiences – all-real voice takes it further.

6.Game voice v4.0: All-real voice
With the advances in AR, VR, and MR technologies, the metaverse has become a hot button topic. Many technology giants are expanding into the metaverse, which is considered the next biggest opportunity in the realm of the internet in the coming decade. The metaverse refers to a parallel virtual world that is both independent of and interconnected with the real world, where people can interact, work, and do much more realistically.
To make virtual worlds more lifelike, software and hardware technologies need to be integrated to simulate human senses. As voice communication is an important form of human interaction, metaverse scenarios have higher requirements for voice, that is, all-real voice. Currently, the metaverse is still more of a concept than reality, and we'll see what the future holds.
Gaming is inherently a social activity in the internet age. Although voice chat is not a core feature for most game genres, it makes gaming more enjoyable and thus increases player retention. Therefore, it has become a common feature of online games.
Game voice technology has evolved in response to players' growing demand for better experiences and gameplay. The development of game voice technology can be divided into four stages based on the improvements in game voice experiences. As players have higher expectations of gaming experiences, voice is bound to hold greater weight in gaming.

Low-Latency Live Streaming Upgraded Based on WebRTC (CSS)

Man yin Mandy Wong — Fri, 07 Oct 2022 03:22:14 +0000

1.Live Event Broadcasting (LEB) Overview
The fast advancement of the live streaming industry has ignited the development of a wealth of low-latency live streaming scenarios, typically live shopping and online education. For these use cases, the key requirement is real-time audio/video interaction, which cannot be well supported by traditional HLS- and FLV/RTMP-based live streaming technologies that feature a relatively high latency of several seconds. Therefore, Live Event Broadcasting (LEB) adopts WebRTC to implement a live streaming product solution with a latency of milliseconds. Besides live shopping and online education, the product can well meet the requirement of real-time interaction at a low latency in other scenarios such as sports and game live streaming.

2.WebRTC-Based LEB Scheme
After years of development, live streaming has had a standardized linkage. streamers use PCs or mobile phones to implement audio/video capturing and encoding on the client and push streams over RTMP to the cloud platform for live streaming. Then, the audio/video data is transcoded and transferred to users' devices via FLV and HLS protocols over the CDN network. On the entire linkage, the highest latency comes from RTMP stream push, CDN transfer, caching on the device, and playback after decoding. The traditional RTMP/FLV/HLS methods are based on the TCP protocol, which means that data tends to build up under poor network connections. What's more, to defend against TCP network fluctuations, device players usually need to cache one to two GOPs to ensure smooth playback.
WebRTC is based on the RTP/RTCP protocol and leverages an excellent congestion control algorithm to ensure low latency and high performance under poor network connections in the real-time audio/video field. Exactly based on WebRTC, LEB reconstructs stream pull in LVB to implement highly compatible, cost-effective, and large-capacity low-latency live streaming. The system reuses the cloud data processing capabilities in the original live streaming architecture to make the live streaming access side and CDN edge both WebRTC-based, so that the former can receive WebRTC streams, and the latter can support WebRTC negotiation and remuxing distribution in addition to the original FLV/HLS distribution capabilities. In this way, the low-latency LEB is not only compatible with LVB's cloud media processing features such as stream push, transcoding, recording, screencapturingscreen-capturing, and porn detection, but also has the strong edge distribution capabilities of the traditional CDN network, which is able to support millions of concurrent online users. You can smoothly migrate your business from the existing LVB platform to LEB to implement low-latency live streaming applications.
LEB also uses WebRTC to achieve low latency across platforms. Most mainstream browsers, including Chrome and Safari, have supported WebRTC, so we can offer standard WebRTC capabilities via browsers. In addition, the well-established, open-source WebRTC SDK makes optimization and customization easy, allowing for the customization of an SDK with improved low-latency streaming features.

3.WebRTC Upgrade and Extension
The audio/video encoding format supported by the standard WebRTC no longer meets the requirements of the live streaming industry. Specifically, the video encoding formats supported by the standard WebRTC are VP8/VP9 and H.264, and the supported audio encoding format is Opus; however, H.264/H.265+AAC is used for audio/video stream push. In addition, to implement superior low-latency communications, the standard WebRTC doesn't support B-frame encoding, although B-frame encoding has been widely used in the live streaming industry as it can improve the compression ratio and save bandwidth costs. Therefore, transcoding is required to connect the standard WebRTC to existing live streaming systems, introducing extra latency and costs. It's necessary to upgrade the standard WebRTC to make it compatible with AAC (audio), H.265 (video), and B-frame encoding. The following details the WebRTC upgrade and extension in LEB.

4.LEB SDK and Demo
The LEB SDK uses the native WebRTC for customization and extension. In addition to the standard WebRTC, it also supports: a) decoding and playback in AAC, including AAC-LC, AAC-HE, and AAC-HEv2; b) decoding and playback in H.265, including software and hardware; c) B-frame decoding in H.264 and H.265; d) SEI callback; e) encryption disablement; f) image screencapturingscreen-capturing, rotation, and zooming. The LEB SDK optimizes the performance of the native WebRTC, such as first image frame latency, frame sync, sync, jitter buffer, and NACK policies. It removes modules irrelevant to stream pull and playback and is about 5 MB in size after packaging. It includes ARM64 and ARM32 architectures. To facilitate connection, it provides a complete SDK and demo. The demo for web shows how to pull streams via the standard WebRTC on the web, and the demos for Android and iOS come with the stream pull and playback SDK, demo, and connection documentation.

4.1 Demo for web
http://webrtc-demo.tcdnlive.com/httpDemo.html

Scan the QR code to open the demo for web.

4.2 SDK and demo for Android
https://github.com/tencentyun/leb-android-sdk

Scan the QR code to open the SDK and demo for Android.

4.3 SDK and demo for iOS
https://github.com/tencentyun/leb-ios-sdk/

Scan the QR code to open the SDK and demo for iOS.

GME Integration for Wwise: Unlock More Voice Features to Deliver an Immersive Game Experience

Man yin Mandy Wong — Wed, 05 Oct 2022 07:22:30 +0000

The State of Mobile 2021 report issued by App Annie identifies PUBG-like, shooter, and MOBA games highlighting social interactions as the most popular game categories, which are the main drive in the increase in gameplay time. Voice interaction in blockbuster games such as PUBG, Call of Duty, and Free Fire has already become a player habit. Innovative social games such as Roblox and Among Us are also widely popular among Gen Z.

Although multiplayer gaming and social interaction have become mainstream in the game world, how to deeply integrate game voice into gameplay and restore the real world experience for players remains a challenge.

The Wwise + GME solution not only helps games easily integrate the voice chat feature, but also maximizes the immersive gaming experience. This article will introduce the unique benefits of this solution from three aspects: solution strengths, technical implementation, and voice features.

Learn more about GME at: https://www.tencentcloud.com/products/gme

What is the Wwise + GME solution?

Game Multimedia Engine (GME) is a one-stop voice solution tailored for gaming scenarios and provides abundant features, including multiplayer voice chat, voice messaging, speech-to-text conversion, and speech analysis. You can connect to the GME SDK by calling APIs to implement voice features in your game.

The connection process of traditional standalone voice SDK solutions is designed independently of the game sound effects. In contrast, for games developed based on the Wwise sound engine, the Wwise + GME solution can include voice features in the game sound effect design process. Wwise's powerful audio processing and control capabilities can be applied to the voice features, which provides a larger space for designing voice features for game sound effects while improving the sound quality. Below is the basic flowchart:

As shown above, GME's plugins send all the local voice stream (voice of the player recorded by the mic) and the voice streams received over the network (voice streams of teammates to be played back locally) to the Wwise audio pipeline, where the GME voice streams are abstracted into Wwise's basic audio sources for processing. It is based on this novel design that the Wwise + GME solution has unique strengths over traditional standalone voice SDK solutions.

Unique strengths of the Wwise + GME solution

1. Unified design for voice and game sound effects:

In a Wwise project, GME voice streams are seamlessly connected to the Wwise audio pipeline, and the voice connection process is deeply integrated into the Wwise sound effect design process, avoiding audio conflicts that may occur during connection to a separate voice SDK. On the game client, the operations of sending and receiving GME voice streams are abstracted into triggers of Wwise events. This makes such operations consistent with the standard Wwise development process experience, which is more straightforward than previous API call-based connection.

2. Effective solution to the problems of declined sound effect quality and sudden change in the volume level after mic-on:

In traditional standalone voice SDK solutions, the declined game sound effect quality after mic-on, sudden change in volume level, and dry voice all are pain points in the industry; especially, the sound effect quality of the entire game will degrade to the sound quality level of phone calls (mono signal at a low sample rate) immediately after mic-on, which severely deteriorates the game experience. In contrast, the Wwise + GME solution effectively solves the problem of sound effect decline caused by volume type switch. This greatly improves the sound quality and enables players to identify other players' positions during smooth voice chat with the original sound effects retained.

3. Powerful design capabilities for unlimited gameplay and creativity

The Wwise + GME solution allows a vast design space for game voice features. As all voice streams flow to the Wwise audio bus, the rich sound processing and control capabilities of Wwise can be applied to the voice, and each voice stream can be customized, making gameplay more immersive and fun and allowing players to communicate in a natural way.

Technical implementation

For each player, voice chat mainly involves two audio stream linkages: the upstream linkage where the local mic captures the player's own voice and distributes it to remote teammates through the server, and the downstream linkage where the voices of all teammates are received from the server, mixed, and played back on the local device.

Upstream linkage:

The player's local chat voice stream will be sent to the Wwise engine through the GME capture plugin. Based on the rich sound effect processing capabilities of Wwise, the game can process the voice stream based on the actual environment and needs, with operations such as texture processing, reverb, and voice changing. Imagine that the player's character is in a church. The processed voice stream with church reverb will be sent to the server through the GME sending plugin and then to remote players. Similarly, if the game is configured with the voice changing feature, the voice stream processed by the real-time voice changing algorithm will be sent to remote players.

Upstream linkage processing flowchart

Downstream linkage:

Unlike the upstream linkage where only one local voice stream is involved, the downstream linkage generally receives multiple voice streams from all teammates, which will be passed to the Wwise engine through the GME receiving plugin. In addition, the game can process the corresponding sound effects based on the actual player conditions in each received voice stream, including position relative to the local player, distance, and presence of obstacles in the way. The processed data is mixed by Wwise and then played back on the local device. In a specific game scenario, for example, if teammate A is standing on the front left of the local player, then the local player will hear teammate A's voice from the front left direction; and if teammate B jumps behind a rock, then the local player will hear teammate B's voice obstructed and reflected by the rock. In addition, the voices of approaching and departing players will be amplified or attenuated.

Downstream linkage processing flowchart

Compared with traditional standalone voice SDKs that only provide an audio conference-like game voice experience, the Wwise + GME solution processes the voice based on game scenarios and takes the voice experience to a whole other level (i.e., a game scenario-specific immersive voice experience). The demo video below shows some basic usage of the Wwise + GME solution. If you watch it on your phone, please put on headphones, because it uses the binaural virtual sound field technology.

In the following demo video, the gray robot opposite you is your teammate talking to you through GME. 3D audio, voice changing, and reverb are applied in voice chat processing. All voices in the video are from real-time recording of the voice stream sent by the remote player instead of post-production synthesis.

Please watch the demo at: https://www.youtube.com/watch?v=3MPhscvG2dg

(Click to play. As the demo uses binaural virtual sound field technology, please put on headphones for optimal effect.)

More voice features

The unique design of the Wwise + GME solution makes it possible to implement voice features as a part of game sound effect design. Below are some proposed voice processing features, and there are more to be created by audio designers.

Sending ambient sound or accompaniment:

The Wwise + GME solution provides the capability to send not only player voice but also other audio streams to the voice server. The most obvious application of this capability is karaoke. For example, in a game scenario where a player's character is in the rain or wind, when the player talks with a teammate, the immersive experience requires that sound of rain or wind be properly mixed into the voice. There are also some other use cases, such as sending sound emojis based on the player's progress in the game to make the voice more fun.

Simulating voice reflection and diffraction:

To create an immersive voice experience, voice rendering and actual game scenarios must be taken into account together. The aforementioned texture processing, attenuation, voice changing, reverb, and 3D positioning are only basic processing features. To better simulate the voice transfer path between the speaker and the listener in game scenarios, you can leverage the reflection, diffraction, occlusion, and obstruction models provided by Wwise to process voice chat, and such processing effects are exactly the ultimate voice experience that the metaverse seeks.

Processing character personality and status:

In order to make game voice more fun, some specially designed DSP processing can be performed on the voice when character personality and status change in games. For example, if a character is attacked by an enemy and loses HP in a battle, then the voice can have some distortions, lags, or trills added to indicate that the character is in pain; when the character defeats the enemy or picks up an item, the voice can have some high-pass filtering or acceleration processed to reflect the excitement.

Side-chaining:

Side-chaining is an essential processing method in audio mixing. It controls a signal with another. The purpose of adding voice features to a game is to enhance in-game social networking, so the voice must be clearly delivered to listeners. When a player speaks, the focus of game sound mixing should switch from the game sound effects to the voice, as they do on radio stations, where the DJ decreases the music volume level when speaking and restores the original volume level after speaking. In the Wwise + GME solution, all voice streams are sent to the Wwise audio bus, which makes side-chaining possible in games; for example, you can set a Wwise Meter at the place where voice is received and then dynamically control the volume levels of other sound effects based on the value of this Meter.

Below is a demo video of the Wwise + GME solution's multiple capabilities, such as how sound reflection, obstruction, and side-chaining are processed by GME. The video shows the first-person view, third-person view, and top view, and the green robot is your teammate talking to you through GME. As the robot's position and environment change, the corresponding processing features will be applied to the voice (as described in the video subtitles). Voice chat processed in this way can deliver an immersive gaming experience. All voices in the video are from real-time recording of the voice stream sent by the remote player instead of post-production synthesis.

Please watch the demo at: https://youtu.be/1gTCa_hiAIE

(Click to play. As the demo uses binaural virtual sound field technology, please put on headphones for optimal effect.)

Summary

The Wwise audio engine middleware and the GME game voice solution can improve the game quality from different perspectives. Wwise greatly increases the efficiency of developing interactive sound effects for a better game voice experience, while GME enhances social networking in games for a higher player retention. When Wwise is combined with GME, the two create better synergy for multiplying effects. The Wwise + GME solution will become a powerful tool for game sound effect designers to create most realistic, vivid, and creative sound and voice effects in games.

Further Upgrade for Tencent Cloud Streaming Services (CSS) Stream Push

Man yin Mandy Wong — Thu, 29 Sep 2022 03:25:07 +0000

Support for Multi-Path Transfer (CSS) Lite Edition

1. Current network transfer problems

As internet video applications gain momentum, more and more platforms and industries are diving into live streaming, but streamers are faced with some quality problems.

● Transfer lag - The network in outdoor or public areas is unstable, causing packet loss, high latency, or jitter and subsequent stream push and playback lags.

● Packet loss in mobile environments - In 3G, 4G, 5G, and Wi-Fi environments, packet loss will occur at the transport layer due to bit errors at the physical and linkage layers, which doesn't mean congestion.

● Insufficient bandwidth of a single network - The linkage bandwidth of a 3G, 4G, 5G, or Wi-Fi network is insufficient or the network jitters.

● Network switch problems in mobile environments - Mobile network/Wi-Fi switches often occur when streamers are moving around.

In these scenarios, upstream push is prone to lags due to unstable transfer or insufficient bandwidth of a single network, adversely affecting the playback experience.

2. Multi-linkage transfer scheme

Tencent Cloud CSS's multi-linkage transfer scheme allows for transfer over multiple linkages at the same time to improve the reliability and quality of end-to-end transfer, further enhancing the upstream push and playback experience.

The conventional IP-layer scheme is more about routers and gateway servers that support multi-network aggregation. Specifically, data is shared by the sender, substreams are transferred over multiple linkages, and data is then aggregated by the receiver.

This scheme is independent of transport layer protocols. It is compatible with all the existing stream push protocols but requires support from hardware such as routers.

The Tencent Cloud CSS software scheme at the application layer leverages the reliability, anti-jitter, and low-latency capabilities of Tencent Cloud SRT, implements the SRT bonding-based algorithm for multi-path transfer at the transport layer, and is optimized for live streaming media scenarios. In addition, it no longer relies on hardware as long as the sender has multiple ENIs.

3. SDK architecture and usage

Tencent Cloud CSS provides the TMIO SDK for terminals to implement multi-network transfer capabilities.

In general, the widely used stream push protocol RTMP is integrated into products. You can establish a connection through RTMP over SRT and leverage the SRT feature to improve the performance under poor network conditions.

Tmio SDK instructions

Directly integrate the TMIO SDK to replace the transfer module. For detailed directions, see the integration documentation and demo source code.

Proxy mode: The proxy mode can be used as a separate process or integrated into the application. You only need to change the stream push address in the original application code to the local listening address in proxy mode. For example, if the original RTMP stream push address is:

rtmp://{$push_domain}:3570/live/sdk_test?txSecret=38fdd5b9ee9958c3f6e6e6a6dd39ba2b&txTime=6161BC80

Then the stream push address in proxy mode is:

rtmp://0.0.0.0:1935/live/sdk_test?txSecret=38fdd5b9ee9958c3f6e6e6a6dd39ba2b&txTime=6161BC80&txHost={$push_domain}

Quickly Isolating Resources in Tencent Cloud’s VOD

Man yin Mandy Wong — Mon, 26 Sep 2022 03:35:12 +0000

We have launched the sub-application feature for Tencent Cloud’s Video-on-Demand (VOD) to help you easily isolate resources.

1. What can sub-applications do?

Help you isolate resources - VOD sub-applications can implement efficient and secure resource isolation with zero O&M costs, and they have the same features and are used in the same way. In addition, the features of data statistics collection and usage analysis can be performed at the sub-application level, so that you can break down your data system for analysis.

Help you control permissions - VOD sub-applications are connected to the Cloud Access Management (CAM) service of Tencent Cloud, and access to sub-application resources is controlled through permission policies. You can quickly implement permission control through simple operations that are easier to learn than those in sub-applications of other cloud vendors, as multi-level authorization is not required.

Help you manage the resource lifecycle - VOD sub-applications have a complete lifecycle to allow for flexible sub-application management. Sub-applications can be disabled, terminated, or enabled as needed in different scenarios.

2. What are the typical use cases of sub-applications?

Sample use case 1

A company intends to develop its own products based on Tencent Cloud. Department A plans to use VOD to develop a short video application, and department B plans to develop a movie and television website. These two VOD businesses need to be isolated from each other. However, out of financial considerations, the company cannot create an independent Tencent Cloud account for each department.

In this case, the sub-application feature of VOD can be used to assign a sub-application to each department, so that the two departments can manage their business resources in separate sub-applications. Under the sub-application role, the features and usage of VOD are the same as those before the sub-application feature is enabled. VOD will generate separate data statistics for each sub-application to facilitate reasonable resource allocation.

Sample use case 2

After each is configured with its own sub-application, departments A and B are further required to control permissions at a finer granularity. For example, sub-application 1 is assigned to department A and sub-application 2 to department B. Department A needs to have all the operation permissions of sub-application 1 and be able to access sub-application 2, but should not be able to perform video processing operations in sub-application 2.

This use case requires permissions to access isolated resources. To achieve this, a custom policy can be created through the account admin to refine access permissions in the API dimension, and then, the policy can be associated with the sub-account of department A.

3. Summary

The VOD sub-application feature helps you implement resource isolation and permission assignment in VOD, thereby lowering your operations costs and facilitating resource management. Sub--applications are absolutely a good choice for complex production environments with multiple business scenarios.