<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: intalink</title>
    <description>The latest articles on DEV Community by intalink (@intalink).</description>
    <link>https://dev.to/intalink</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2031397%2F1dc1f790-cfa3-42cb-a448-970e2c84c592.jpg</url>
      <title>DEV Community: intalink</title>
      <link>https://dev.to/intalink</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/intalink"/>
    <language>en</language>
    <item>
      <title>IntaLink Community: Exploring the Open Source New Power of Data Table Relationship Automatic Analysis Platform</title>
      <dc:creator>intalink</dc:creator>
      <pubDate>Wed, 04 Dec 2024 02:36:36 +0000</pubDate>
      <link>https://dev.to/intalink/play-with-intalink-community-exploring-the-open-source-new-power-of-data-table-relationship-d3e</link>
      <guid>https://dev.to/intalink/play-with-intalink-community-exploring-the-open-source-new-power-of-data-table-relationship-d3e</guid>
      <description>&lt;p&gt;&lt;strong&gt;01-What is Intalink platform&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Intalink platform is based on the different data integration application requirements of users in multiple scenarios, without the need for business background support. It automatically completes the analysis of inter table relationships and generates data association paths, and based on different application strategies, provides the best data association path to achieve on-demand search and use. Eliminating a large number of repetitive manual analysis processes in traditional data integration applications.&lt;br&gt;
The IntaLink platform has been officially open sourced on GitHub. We welcome technology enthusiasts or platform demanders to participate in open source projects, contribute their technology, knowledge, and strength, and create a more complete ecosystem, advanced technology, and powerful IntaLink platform!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;02-How to quickly find the IntaLink open source project?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;1、 GitHub open source code address:&lt;br&gt;
You can view project details, download code, and participate in development on the IntaLink project homepage here. ( &lt;a href="https://github.com/YT-DATA/INTALINK" rel="noopener noreferrer"&gt;https://github.com/YT-DATA/INTALINK&lt;/a&gt; )&lt;br&gt;
2、 Community Guide:&lt;br&gt;
The IntaLink community guide includes community tasks, contribution incentive mechanisms, etc., to help new users quickly get started. ( &lt;a href="https://github.com/YT-DATA/community" rel="noopener noreferrer"&gt;https://github.com/YT-DATA/community&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;03-How to download IntaLink open source code?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We provide users with three ways to download code, and you can choose according to your own habits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use HTTPS:
gitclone  &lt;a href="https://github.com/YT-DATA/community.git" rel="noopener noreferrer"&gt;https://github.com/YT-DATA/community.git&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdmhqmap48vquoil5hcg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwdmhqmap48vquoil5hcg.png" alt="Image description" width="800" height="649"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2.Use SSH&lt;br&gt;
git clone &lt;a href="mailto:git@github.com"&gt;git@github.com&lt;/a&gt; :YT-DATA/community.git&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ybhfzx72m5avk4dou65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ybhfzx72m5avk4dou65.png" alt="Image description" width="800" height="766"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3.Use GitHub CLI：&lt;br&gt;
gh repo clone YT-DATA/community&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau12fzp8bvr2a3evpcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsau12fzp8bvr2a3evpcz.png" alt="Image description" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;04-Quickly learn about the Intalink open source community&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The IntaLink community welcomes the participation of all developers, whether you are a novice or an experienced expert, you will find a contribution path that suits you.&lt;br&gt;
The community provides a variety of tasks for novice and advanced developers, which can help you start from scratch and gradually improve your development skills. The following are task types:&lt;br&gt;
·Basic tasks: fixing minor bugs, updating documentation, or optimizing code comments, etc.&lt;br&gt;
·Functional perfection: Enhance existing functions or modules, improve performance or enhance user experience.&lt;br&gt;
·New feature development: Design and implement new functional modules to enhance IntaLink's application scenarios and scalability.&lt;br&gt;
For a more detailed contribution process, please refer to the detailed contribution guide on how to contribute(&lt;a href="https://github.com/YT-DATA/community/blob/main/README.ch.md" rel="noopener noreferrer"&gt;https://github.com/YT-DATA/community/blob/main/README.ch.md&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;05-If you encounter any problems, we are always ready to provide support for you&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the use of IntaLink, whether it is technical issues or functional suggestions, assistance or feedback can be obtained through the following methods:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;GitHub Issues： 
Submit issues or suggestions in the Issues section and discuss and resolve issues with community members.
（&lt;a href="https://github.com/YT-DATA/INTALINK/issues" rel="noopener noreferrer"&gt;https://github.com/YT-DATA/INTALINK/issues&lt;/a&gt;) &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwthxa38ozxmf928qmc7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwthxa38ozxmf928qmc7.png" alt="Image description" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Instant messaging platform:
Join our Discord community for real-time communication with global developers and share your insights.
(&lt;a href="https://discord.com/invite/FvhqEZ6z95)%EF%BC%8C" rel="noopener noreferrer"&gt;https://discord.com/invite/FvhqEZ6z95)，&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftkkw34svuybubii8y3x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftkkw34svuybubii8y3x.png" alt="Image description" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;06-Act now, join us, join the open source family&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Joining our open source community means stepping into a vibrant and innovative technology ecosystem. Here, every line of code and every discussion pushes the boundaries of technology, providing new ideas for solving practical problems.
We will periodically launch a series of award-winning solicitation activities in the community, covering multiple fields such as solutions and technical directions, with the aim of promoting deep communication and collision among technology enthusiasts. This is not only a great opportunity for learning and improvement, but also a stage to showcase talent and gain recognition.
Join us now and contribute your strength to the technology community through practical actions. Let's work together to promote technological progress and create a better future!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>data</category>
      <category>datascience</category>
    </item>
    <item>
      <title>IntaLink: A New NL2SQL Technology Distinct from Large Models</title>
      <dc:creator>intalink</dc:creator>
      <pubDate>Tue, 29 Oct 2024 06:05:42 +0000</pubDate>
      <link>https://dev.to/intalink/intalink-a-new-nl2sql-technology-distinct-from-large-models-9jk</link>
      <guid>https://dev.to/intalink/intalink-a-new-nl2sql-technology-distinct-from-large-models-9jk</guid>
      <description>&lt;h1&gt;
  
  
  IntaLink: A New NL2SQL Technology Distinct from Large Models
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Hidden Gem&lt;/strong&gt;  &lt;/p&gt;




&lt;h3&gt;
  
  
  Wide Application Scenarios of IntaLink
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Background Review&lt;/strong&gt;: In previous articles, it was mentioned that "the goal of IntaLink is to achieve automated data linking in the field of data integration." From the discussion, it is clear that IntaLink addresses the issue of automatic linking of "relational data and multiple tables."&lt;/p&gt;

&lt;p&gt;Now, let's discuss whether this issue has broad application scenarios or if it is merely a pseudo-proposition without practical demand.&lt;/p&gt;




&lt;h4&gt;
  
  
  01 Relational Data Remains One of the Most Important Data Assets
&lt;/h4&gt;

&lt;p&gt;Although large models, big data platforms, and other technologies can utilize various types of information, including documents, images, audio, and video, such as multimodal generative AI capable of producing videos and facilitating voice interactions, the results are often open-ended and subjective, occasionally leading to "hallucinations." Thus, while using them for reference or assistance is acceptable, in certain rigorous working environments, we cannot rely on this information or large models to complete tasks. In sectors like banking, finance, transportation, trading, accounting, production, and energy, core business data must be managed using structured relational data.&lt;/p&gt;

&lt;h4&gt;
  
  
  02 Data Construction is Inevitable and Distributed
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;(1) The Design Paradigm of Relational Databases&lt;/strong&gt; requires data to be reasonably divided to avoid significant redundancy. If the data generated during the construction phase contains a lot of redundancy, not only is the data collection workload duplicated, but data consistency is also difficult to ensure. From another perspective, if all related data are stored in a single table, but the data items come from different business sources, with varying data collectors and generation times, maintaining such data records becomes impossible. Thus, data construction will inherently organize data based on object orientation and business activities, leading to its distribution across different tables.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;(2) Data Must Originate from Multiple Systems&lt;/strong&gt;. Since information technology construction is not completed in one go, there will inevitably be a sequence of developments. Even within the same system, there may be variations in implementation timelines. Moreover, different application scenarios require different technological choices; for instance, business data, real-time data, and log information may be realized through various technologies, making data inherently multi-sourced.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  03 Integration is the Most Effective Means of Unlocking Data Value
&lt;/h4&gt;

&lt;p&gt;Data needs to be integrated for application. The demand for data integration applications has various possibilities. For example, integrating production data and planning data can assess the status of plan completion; integrating production data and sales data can identify product backlogs or fulfillment of order deliveries; and integrating production data with financial data can evaluate production costs and profitability. Therefore, data integration is the most effective way to maximize data value and empower business processes.&lt;/p&gt;

&lt;p&gt;In summary, the integration application of relational data will remain one of the most important data application scenarios for a long time. As long as this scenario exists, IntaLink will have broad adaptability.&lt;/p&gt;




&lt;h3&gt;
  
  
  Comparison of IntaLink and Large Model Data Integration Methods
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;T2SQL (Text to SQL)&lt;/em&gt; and &lt;em&gt;NL2SQL (Natural Language to SQL)&lt;/em&gt; automatically generate the required data queries through text or natural language input. The terms T2SQL and NL2SQL essentially describe the same concept: utilizing AI technology to transform semantic understanding into data operation methods, which is the same idea but with different terminologies. This is a research direction in data applications. In recent years, with the emergence of large model technologies, this field has seen significant advancement. I have researched technical reports from Alibaba and Tencent and tried out open-source projects like DB-GPT. These technologies are largely similar, at least in their underlying technical logic, while IntaLink’s approach is entirely different.&lt;/p&gt;

&lt;p&gt;Let’s set aside the underlying technical logic for now and conduct a comparative analysis based on implementation methods:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Utilizing Large Model Technology for Automatic Data Queries Requires Data Training
&lt;/h4&gt;

&lt;p&gt;Suppose we have a set of tables named T1, T2, ..., Tn, each containing several data items labeled C1, C2, ..., Cn, with varying counts of items per table. Consider a simulated dataset for table T1 as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;C1&lt;/th&gt;
&lt;th&gt;C2&lt;/th&gt;
&lt;th&gt;C3&lt;/th&gt;
&lt;th&gt;C4&lt;/th&gt;
&lt;th&gt;C5&lt;/th&gt;
&lt;th&gt;C6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orange&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From this content alone, we cannot derive any useful information. We are unclear about the meaning of the data above. Let’s simulate two meanings for the data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fruit Type&lt;/th&gt;
&lt;th&gt;Warehouse No.&lt;/th&gt;
&lt;th&gt;Shelf No.&lt;/th&gt;
&lt;th&gt;Stock&lt;/th&gt;
&lt;th&gt;Shelf Life&lt;/th&gt;
&lt;th&gt;Warehouse Manager ID&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orange&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hotel Name&lt;/th&gt;
&lt;th&gt;Warehouse Hotness Ranking&lt;/th&gt;
&lt;th&gt;Star Rating&lt;/th&gt;
&lt;th&gt;Years in Business&lt;/th&gt;
&lt;th&gt;Remaining Rooms&lt;/th&gt;
&lt;th&gt;Discount Available&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orange&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We won't dwell on the validity of these datasets or the existence of such tables. However, it is evident that without understanding the meaning of the tables and data items, the data cannot be applied. One cannot link data application needs to the data itself, let alone discuss more complex data operations.&lt;/p&gt;




&lt;p&gt;Using a dataset for testing NL2SQL, let’s illustrate the application pattern of large model technology in this field.&lt;/p&gt;

&lt;p&gt;The Spider dataset is a T2S dataset for multi-database, multi-table, single-round queries and is recognized as the most challenging large-scale cross-domain evaluation leaderboard. It was proposed by Yale University in 2018, annotated by eleven Yale students. The dataset contains ten thousand one hundred eighty-one natural language questions and five thousand six hundred ninety-three SQL statements, covering over two hundred databases across one hundred thirty-eight different domains. Seven thousand questions are used for training, one thousand thirty-four for development, and two thousand one hundred forty-seven for testing. In other words, by providing questions along with their corresponding answers (SQL), the large model learns to utilize the data. For simplicity, we can condense the logic as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Question 1&lt;/strong&gt;: How many red lipsticks are in stock?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer 1&lt;/strong&gt;: &lt;code&gt;select amount from warehouse where good_name='lipstick' and color='red'&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After training the model with such a dataset, we can pose the following test question:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test Question&lt;/strong&gt;: How many blue lipsticks are in stock?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Answer&lt;/strong&gt;: &lt;code&gt;select amount from warehouse where good_name='lipstick' and color='blue'&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From this, we see that NL2SQL emphasizes deriving possible SQL queries based on semantic and contextual understanding, relying on a trained dataset.&lt;/p&gt;




&lt;h4&gt;
  
  
  IntaLink’s Data Integration Method
&lt;/h4&gt;

&lt;p&gt;IntaLink's data integration does not require users to provide any training data. The relationships between data are generated through an inter-table relationship analysis model. This relationship generation does not require understanding the actual significance of the tables and data items but is derived through a set of methods that analyze the data's characteristic values to deduce associations between tables. Below, we illustrate the establishment of inter-table relationships using two sample tables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tab_1&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Student_ID&lt;/th&gt;
&lt;th&gt;CLASS&lt;/th&gt;
&lt;th&gt;Age&lt;/th&gt;
&lt;th&gt;Sex&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Zhang San&lt;/td&gt;
&lt;td&gt;2021_0001&lt;/td&gt;
&lt;td&gt;2021_01&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Male&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Li Si&lt;/td&gt;
&lt;td&gt;2021_0002&lt;/td&gt;
&lt;td&gt;2021_01&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Female&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wang Wu&lt;/td&gt;
&lt;td&gt;2021_0003&lt;/td&gt;
&lt;td&gt;2021_01&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Male&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Tab_2&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Student_ID&lt;/th&gt;
&lt;th&gt;Course&lt;/th&gt;
&lt;th&gt;Grade&lt;/th&gt;
&lt;th&gt;Rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2021_0001&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;135&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021_0001&lt;/td&gt;
&lt;td&gt;Chinese&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021_0002&lt;/td&gt;
&lt;td&gt;Math&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021_0002&lt;/td&gt;
&lt;td&gt;Chinese&lt;/td&gt;
&lt;td&gt;125&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In Tab_1, the Student_ID matches the Student_ID in Tab_2, sharing the same characteristic values. Therefore, to link these two tables, the condition Tab_1.Student_ID = Tab_2.Student_ID holds true. This analysis of inter-table linkage requires consideration of numerous factors. In IntaLink, we replicate the data characteristic value memory database as an analysis tool, utilizing a set of optimized analytical methods to produce inter-table relationship analysis results. Due to the complexity of the content involved, we will not elaborate further here. A separate article will discuss the implementation logic.&lt;/p&gt;




&lt;h4&gt;
  
  
  Differences Between IntaLink and Large Model Technologies in Implementing NL2SQL
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1)&lt;/strong&gt; There is no need to prepare a training question set for the large model; rather, relationships are derived through data analysis. Therefore, IntaLink can be applied to a wide range of data. The more data that can be integrated, the greater its advantages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2)&lt;/strong&gt; Focuses on data integration, specifically the generation of relational conditions during integration, without concentrating on data usage methods. Note: Data integration concerns establishing relationships between multiple tables, while data usage methods can vary, such as summation, counting, averaging, minimum and maximum values, etc. NL2SQL selects appropriate data operation methods based on semantics, like SUM, COUNT, AVG, MIN, MAX, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3)&lt;/strong&gt; High accuracy: Excluding data quality issues, the relational conditions generated by IntaLink theoretically can achieve one hundred percent accuracy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Potential Combination of IntaLink and Large Model Technologies
&lt;/h4&gt;

&lt;p&gt;Large model technologies excel in semantic understanding and generative content, while IntaLink has advantages in data association analysis with lower upfront workload and higher accuracy. Ideally, large model technologies could be integrated to understand user input requirements, converting that information into the necessary data tables and items, which IntaLink would then generate for data sets, followed by the large model generating the desired outcomes (e.g., reports, charts, etc.) for user presentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Join the IntaLink Community!
&lt;/h2&gt;

&lt;p&gt;We would love for you to be a part of the IntaLink journey! Connect with us and contribute to our project:&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/YT-DATA/INTALINK" rel="noopener noreferrer"&gt;GitHub Repository: IntaLink&lt;/a&gt;&lt;br&gt;&lt;br&gt;
💬 &lt;a href="https://discord.gg/FvhqEZ6z95" rel="noopener noreferrer"&gt;Join our Discord Community&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Be a part of the open-source revolution and help us shape the future of intelligent data integration!&lt;/p&gt;

</description>
      <category>github</category>
      <category>sql</category>
      <category>java</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Transforming Data Linkage: An In-Depth Look at IntaLink</title>
      <dc:creator>intalink</dc:creator>
      <pubDate>Tue, 08 Oct 2024 02:20:32 +0000</pubDate>
      <link>https://dev.to/intalink/transforming-data-linkage-an-in-depth-look-at-intalink-27a7</link>
      <guid>https://dev.to/intalink/transforming-data-linkage-an-in-depth-look-at-intalink-27a7</guid>
      <description>&lt;h1&gt;
  
  
  In-depth Analysis of IntaLink Data Auto-Linking Platform's Product Strength!
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Hidden Gem, Yuantuo Data Intelligence&lt;/strong&gt;  &lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Goal of IntaLink
&lt;/h2&gt;

&lt;p&gt;In one sentence: &lt;strong&gt;IntaLink's goal is to achieve automatic data linkage in the field of data integration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's break down this definition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IntaLink's application scenario is for data integration. The simplest case is linking multiple data tables within the same system; the more complex case is linking data across heterogeneous sources.&lt;/li&gt;
&lt;li&gt;For data integration applications, relationships between tables need to be established.&lt;/li&gt;
&lt;li&gt;The data to be integrated must be able to form linkable relationships.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the above conditions met, IntaLink’s goal is: Given the data tables and data items specified by the user, IntaLink will provide the available data linkage routes.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Role of IntaLink
&lt;/h2&gt;

&lt;p&gt;Let's explain the problem IntaLink solves through a specific scenario. This example is complex and requires careful consideration to understand the data relationships, which highlights IntaLink's value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;:&lt;br&gt;&lt;br&gt;
A university has different departments. Each department is identified by an abbreviation, and the table is defined as &lt;code&gt;T_A&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;DEPARTMENT_ID&lt;/th&gt;
&lt;th&gt;DEPART_NAME&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GEO&lt;/td&gt;
&lt;td&gt;School of Earth Sciences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IT&lt;/td&gt;
&lt;td&gt;School of Information Engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each department has several classes, and each class has a unique ID based on the enrollment year and a class number. This table is &lt;code&gt;T_B&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CLASSES_ID&lt;/th&gt;
&lt;th&gt;CLASSES_NAME&lt;/th&gt;
&lt;th&gt;DEPARTMENT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2020_01&lt;/td&gt;
&lt;td&gt;Earth Sciences Class 1 (2020)&lt;/td&gt;
&lt;td&gt;GEO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020_02&lt;/td&gt;
&lt;td&gt;Earth Sciences Class 2 (2020)&lt;/td&gt;
&lt;td&gt;GEO&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each class has students, and each student has a unique ID. This table is &lt;code&gt;T_C&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;STUDENT_ID&lt;/th&gt;
&lt;th&gt;STUDENT_NAME&lt;/th&gt;
&lt;th&gt;CLASSES&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;202000001&lt;/td&gt;
&lt;td&gt;Zhang San&lt;/td&gt;
&lt;td&gt;2020_01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;202000002&lt;/td&gt;
&lt;td&gt;Li Si&lt;/td&gt;
&lt;td&gt;2020_02&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The university offers various courses. Each course has a course code, maximum score, and credits. This table is &lt;code&gt;T_D&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CLASS_CODE&lt;/th&gt;
&lt;th&gt;CLASS_TITLE&lt;/th&gt;
&lt;th&gt;FULL_SCORE&lt;/th&gt;
&lt;th&gt;CREDIT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MATH_01&lt;/td&gt;
&lt;td&gt;Advanced Math I&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Different departments have different pass scores for the same course. This table is &lt;code&gt;T_E&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;DEPARTMENT&lt;/th&gt;
&lt;th&gt;CLASS&lt;/th&gt;
&lt;th&gt;PASS_SCORE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GEO&lt;/td&gt;
&lt;td&gt;MATH_02&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IT&lt;/td&gt;
&lt;td&gt;MATH_02&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Different semesters offer different courses, and students have scores for each course. This table is &lt;code&gt;T_F&lt;/code&gt;. Sample data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;STUDENT_ID&lt;/th&gt;
&lt;th&gt;TERM&lt;/th&gt;
&lt;th&gt;CLASS&lt;/th&gt;
&lt;th&gt;SCORE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;202000001&lt;/td&gt;
&lt;td&gt;2023_1&lt;/td&gt;
&lt;td&gt;MATH_02&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Based on this scenario, the requirement is to list each student’s courses for the 2023_1 semester, showing their score and the passing score. The result might look like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Class&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Course&lt;/th&gt;
&lt;th&gt;Pass Score&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Earth Sciences 2020 Class 1&lt;/td&gt;
&lt;td&gt;Zhang San&lt;/td&gt;
&lt;td&gt;2023_1&lt;/td&gt;
&lt;td&gt;Advanced Math II&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The critical challenge lies in determining which tables to link and ensuring the relationships between tables are correctly interpreted. For example, a student is not directly linked to a department but to a class, and the class belongs to a department.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Problems Solved by IntaLink
&lt;/h2&gt;

&lt;p&gt;You might think this is just a standard multi-table data linkage application that can be easily achieved with SQL queries. However, the real challenge is identifying which tables to use, especially when the system comprises numerous tables and fields across different applications.&lt;/p&gt;

&lt;p&gt;For instance, imagine a university with dozens of application systems, each containing numerous tables. A non-IT personnel requesting data might not know which table contains the required data. &lt;strong&gt;IntaLink automatically generates the necessary links between the data tables, reducing the complexity of data analysis and saving significant development time.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;IntaLink solves the following key challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No need to understand underlying business logic—just focus on the data integration goal.&lt;/li&gt;
&lt;li&gt;No need to manually identify which tables to link—IntaLink determines the relationships.&lt;/li&gt;
&lt;li&gt;Significantly reduces the time spent on data analysis and development, enhancing efficiency by over 10 times.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Join the IntaLink Community!
&lt;/h2&gt;

&lt;p&gt;We would love for you to be a part of the IntaLink journey! Connect with us and contribute to our project:&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/YT-DATA/INTALINK" rel="noopener noreferrer"&gt;GitHub Repository: IntaLink&lt;/a&gt;&lt;br&gt;&lt;br&gt;
💬 &lt;a href="https://discord.gg/FvhqEZ6z95" rel="noopener noreferrer"&gt;Join our Discord Community&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Be a part of the open-source revolution and help us shape the future of intelligent data integration!&lt;/p&gt;

</description>
      <category>java</category>
      <category>github</category>
      <category>cloud</category>
      <category>data</category>
    </item>
  </channel>
</rss>
