<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 毅硕科技</title>
    <description>The latest articles on DEV Community by 毅硕科技 (@insvast).</description>
    <link>https://dev.to/insvast</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2292064%2F7050a395-fe15-48c8-8e01-890d9f0ea208.jpg</url>
      <title>DEV Community: 毅硕科技</title>
      <link>https://dev.to/insvast</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/insvast"/>
    <language>en</language>
    <item>
      <title>Sentieon | Application Tutorial: Accelerating Custom Algorithms Using the Sentieon Python API Engine</title>
      <dc:creator>毅硕科技</dc:creator>
      <pubDate>Tue, 17 Dec 2024 02:55:08 +0000</pubDate>
      <link>https://dev.to/insvast/sentieon-application-tutorial-accelerating-custom-algorithms-using-the-sentieon-python-api-engine-kmf</link>
      <guid>https://dev.to/insvast/sentieon-application-tutorial-accelerating-custom-algorithms-using-the-sentieon-python-api-engine-kmf</guid>
      <description>&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;All modules in the Sentieon suite are several times to dozens of times faster than their corresponding open-source software. While using these modules, users sometimes hope that the Sentieon team can help accelerate their custom-developed software. To help these users enjoy the speed of Sentieon modules in their own software, we have developed a Python API system to meet the needs of secondary development and self-acceleration.&lt;/p&gt;

&lt;h1&gt;
  
  
  API Introduction
&lt;/h1&gt;

&lt;p&gt;*&lt;em&gt;The Sentieon Python API is essentially a communication system that connects users' data analysis scripts with Sentieon's high-speed engine, accelerating the process while also improving the readability and maintainability of the scripts.&lt;br&gt;
*&lt;/em&gt;&lt;br&gt;
Sentieon's data processing engine is the core of multiple Sentieon modules, capable of high-speed analysis of BAM/CRAM and FASTA format data files. The engine supports both single-pass and multithreaded execution data flow methods. The multithreaded data flow is faster but relatively more complex. It divides the genome into fragments with a default length of 1Gb. The Sentieon engine processes each fragment independently in parallel across multiple threads. Each fragment is further divided into smaller segments (Steps) with a default length of 1Kb, which the engine processes linearly. During this process, the data processing logic of the user's software will be executed at high speed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw9wcukvoj3uxzevkh4v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw9wcukvoj3uxzevkh4v.jpg" alt="Image description" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Implementation Case Study
&lt;/h1&gt;

&lt;p&gt;We will demonstrate the acceleration effect of Sentieon through a collaborative case with the CREST software team from St. Jude Children's Research Hospital in the United States. CREST (Clipping REveals STructure) is a well-known software in the industry for detecting structural variations in cancer genomes, primarily using breakpoints as clues to identify structural variations in the genome. Specifically, the CREST software workflow includes steps such as soft-clip detection, assembly, post-assembly alignment, breakpoint confirmation, and structural variation confirmation. The assembly and alignment steps mainly rely on third-party tools. While CREST's advantage lies in its high accuracy, its speed limitations are equally apparent. For a standard 30x tumor whole-genome paired sample, the processing time on a 20-thread workstation can take up to 24 hours, which is often insufficient to meet user demands.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfe8quusispe4ftx2evc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfe8quusispe4ftx2evc.jpg" alt="Image description" width="711" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After learning about the capabilities of the Sentieon Python API, the CREST team reimplemented CREST's functionality using this system. &lt;strong&gt;In test data, the Sentieon-accelerated version of CREST achieved a 10-fold speed increase, with results identical to the original CREST&lt;/strong&gt;. On a 20-thread workstation, it reduced the processing time for the vast majority of samples to under 1 hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jz1rarlil5otduckk7r.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jz1rarlil5otduckk7r.jpg" alt="Image description" width="572" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we'll introduce two more application acceleration case studies. Quality control is a crucial step in NGS data processing workflows. Although the logic is relatively simple, it involves extensive reading of BAM/CRAM files. These tools often struggle to balance speed, multi-threaded parallelism, and code maintainability.&lt;/p&gt;

&lt;p&gt;The Sentieon Python API can separate the algorithmic logic of quality control tools from data reading, simultaneously improving speed and code readability. As implementation examples, we used the Python API to accelerate Picard's CollectInsertSizeMetrics tool for rapid insert size statistics, and GATK's CalculateTargetCoverage tool for quick depth statistics in target regions. Users can also refer to these cases to accelerate their own custom quality control tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4biuvlo4baopdue7h1c3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4biuvlo4baopdue7h1c3.jpg" alt="Image description" width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Technical Support
&lt;/h1&gt;

&lt;p&gt;The Sentieon Python API allows users' scripts to communicate with the Sentieon engine, enabling high-speed parallel reading of BAM/CRAM/FASTA files, resulting in over 10-fold speed improvements.** Users can utilize this platform for secondary development to accelerate their custom software**. We are more than willing to provide comprehensive technical support.&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction to Sentieon Software
&lt;/h1&gt;

&lt;p&gt;Sentieon offers a comprehensive, purely software-based solution for secondary analysis in genetic variant detection. Its analysis pipeline remains entirely faithful to the mathematical models of gold standards such as BWA, GATK, MuTect2, STAR, Minimap2, Fgbio, Picard, and others. While matching the analysis results of open-source workflows, Sentieon significantly improves the analysis efficiency and detection accuracy for sequencing data from WGS, WES, Panel, UMI, ctDNA, RNA, and other sources. It is compatible with all current second and third-generation sequencing platforms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gl82mfjbqze569zlxqe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gl82mfjbqze569zlxqe.png" alt="Image description" width="800" height="1149"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Sentieon software team possesses rich experience in software development and algorithm optimization engineering. They are dedicated to solving speed and accuracy bottlenecks in biological data analysis, &lt;strong&gt;providing efficient and precise software solutions for partners from various fields such as molecular diagnostics, drug development, clinical medicine, population cohorts, and animal and plant research&lt;/strong&gt;, jointly promoting the development of genetic technology.&lt;/p&gt;

&lt;p&gt;As of the end of 2023, Sentieon has provided services to over 1300 users worldwide and has been widely cited in top-tier impact factor journals such as NEJM, Cell, and Nature, with nearly a thousand citations. Furthermore, Sentieon has consistently won accolades in authoritative evaluations such as Precision FDA and Dream Challenges for several consecutive years, gaining widespread recognition in the industry.&lt;/p&gt;

</description>
      <category>bioinformaticsanalysis</category>
      <category>secondarydevelopment</category>
      <category>software</category>
    </item>
    <item>
      <title>Introduction to Sentieon</title>
      <dc:creator>毅硕科技</dc:creator>
      <pubDate>Thu, 07 Nov 2024 09:41:04 +0000</pubDate>
      <link>https://dev.to/insvast/introduction-to-sentieon-11e</link>
      <guid>https://dev.to/insvast/introduction-to-sentieon-11e</guid>
      <description>&lt;h2&gt;
  
  
  Sentieon Vision
&lt;/h2&gt;

&lt;h3&gt;
  
  
  01.Sentieon Vision
&lt;/h3&gt;

&lt;p&gt;Sentieon is dedicated to solving the bottleneck of speed and accuracy in bioinformatics data analysis, through the deep optimization of algorithms and enterprise-level software engineering, to significantly improve the efficiency, accuracy and reliability of NGS data processing.&lt;/p&gt;

&lt;p&gt;Sentieon provides software solutions to partners and research institutions in the fields of molecular diagnostics, drug discovery and development, clinical medicine, animal and plant genomics, etc. Sentieon is committed to advancing the development of genetic technology and realizing the vision of “Achieving Precision Data, Serving Precision Medicine”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3b500dtbh1snm7yld9tz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3b500dtbh1snm7yld9tz.png" alt="Image description" width="637" height="642"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  02.Bioinformatics Industry Pain Points
&lt;/h3&gt;

&lt;p&gt;Currently, for secondary analysis of second-generation sequencing data (SNP/Indel variant detection), BWA + GATK, as a best-practice process, has been widely recognized by the industry after more than a decade of repeated validation with a large number of samples, and is widely acknowledged as the gold standard.&lt;/p&gt;

&lt;p&gt;However, the BWA + GATK process has several significant problems:&lt;/p&gt;

&lt;p&gt;Long runtime, low resource utilization, and high computational cost;&lt;br&gt;
GATK's random downsampling in high sequencing depth regions significantly affects the sensitivity and accuracy of variant detection, especially in the case of low abundance (VAF) variants in tumors, and GATK causes inconsistency in the results when repeated runs are performed.&lt;/p&gt;

&lt;h3&gt;
  
  
  03.Sentieon Solutions
&lt;/h3&gt;

&lt;p&gt;Sentieon introduces a complete software-only secondary analysis solution that covers the entire process from gene matching to mutation detection for both germline mutation detection and somatic mutation detection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r34u1rq7osqnaursjl2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r34u1rq7osqnaursjl2.png" alt="Image description" width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sentieon software is faithful to the mathematical models of gold standards such as BWA, GATK, MuTect, MuTect2, Minimap2, etc., and improves the analysis efficiency by more than 10 times under the premise of ensuring that the results exactly match the gold standards.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rdk253yjtzt5tshapjn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rdk253yjtzt5tshapjn.jpg" alt="Image description" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  04.Awards
&lt;/h3&gt;

&lt;p&gt;Since Sentieon officially launched its software product in 2015, it has won first place or tied for first place in several international raw letter competitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bio-IT Innovative Practices Award.&lt;/li&gt;
&lt;li&gt;PrecisionFDA Truth Challenge.&lt;/li&gt;
&lt;li&gt;PrecisionFDA Consistency Challenge&lt;/li&gt;
&lt;li&gt;ICGC-TCGA Dream Mutation Calling Challenge.&lt;/li&gt;
&lt;li&gt;PrecisionFDA Hidden Treasures Warm-up.&lt;/li&gt;
&lt;li&gt;PrecisionFDA NCI-CPTAC Crowdsourced Multi-omics Sample Mislabeling Big Data Challenge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Far9u4pfy78a8jvqpk3ls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Far9u4pfy78a8jvqpk3ls.png" alt="Image description" width="364" height="108"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  software functionality
&lt;/h2&gt;

&lt;h3&gt;
  
  
  01.Highlights
&lt;/h3&gt;

&lt;p&gt;Pure soft solution: can be efficiently and flexibly deployed on different computing environments. No special hardware is required, reducing operation and maintenance costs. Can support X86\ARM architecture processor at the same time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Highlights&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;1.Accelerated analysis: 10x faster than original BWA/GATK process&lt;/p&gt;

&lt;p&gt;2.No downsampling: no differences between batches, better for high depth and low abundance detection.&lt;/p&gt;

&lt;p&gt;3.Accurate results: 99.7% consistency compared to BWA/GATK/Mutect2/Minimap2 process.&lt;/p&gt;

&lt;p&gt;4.Cost Savings: Cost savings of up to 90%, dramatically improving the accuracy of shallow sequencing results.&lt;/p&gt;

&lt;p&gt;5.Joint Calling: One-time joint call operation for 100,000 WGS samples, no need for large memory and no intermediate steps.&lt;/p&gt;

&lt;p&gt;6.Enterprise-level service: Provide comprehensive customer support from trial to use and establish long-term partnership with customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  02.functional module
&lt;/h3&gt;

&lt;p&gt;Sentieon software is very flexible, you can use the complete process directly, or according to the specific needs of the relevant modules to replace the original module. Users can directly use the complete process, but also according to the specific needs of the relevant modules to replace the original process modules, Sentieon software input and output files in strict accordance with industry norms for the operation, the user can easily get started to reduce the learning time. Currently Sentieon software can provide customers with the use of the module and the specific functions please refer to the following table:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sggn85z278y2hs273mb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sggn85z278y2hs273mb.png" alt="Image description" width="715" height="842"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  03.application scenario
&lt;/h3&gt;

&lt;p&gt;As an alternative acceleration solution to the industry gold-standard BWA/GATK/Mutect2/STAR processes, Sentieon software is also suitable for basic research, translational medicine, clinical and other fields. Some of the applicable scenarios are listed below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7exedflztww1spmpc3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff7exedflztww1spmpc3m.png" alt="Image description" width="716" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Industry Recognition
&lt;/h2&gt;

&lt;p&gt;Sentieon software is widely certified and used by nearly 1,000 commercial companies and research organizations, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Several large multinational pharmaceutical companies sequencing service providers and molecular diagnostic companies worldwide.&lt;/li&gt;
&lt;li&gt;Hospitals and medical research institutions: including the O'Brien Project led by 15 large hospitals and pharmaceutical companies in the U.S., Fudan University Pediatrics Hospital, Shanghai Jiaotong University Children's Hospital, Xiangya Hospital, and CUHK Cancer Medical Center.&lt;/li&gt;
&lt;li&gt;Governmental organizations: including the U.S. Food and Drug Administration (FDA), the China Food and Drug Administration (CFDA), the U.S. National Institutes of Health (NIH), and the U.S. National Cancer Institute (NCI).&lt;/li&gt;
&lt;li&gt;Universities and research institutes: including Harvard, Yale, Stanford, Cornell, Chinese Academy of Sciences, Union Medical College, Fudan University, Shanghai Jiaotong University, Sun Yat-sen University, Yunnan University, Chinese Academy of Agricultural Sciences, State Key Laboratory of Crop Genetic Improvement and so on.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far, Sentieon software has run and processed more than 1430 PB of data, with millions of samples and more than 360 million kernel hours, and has been cited many times in journals with world-class impact factors, including NEJM, Cell, and Nature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpjjtrekcw90sekcev9h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgpjjtrekcw90sekcev9h.jpg" alt="Image description" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
