<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dorcas Bwire</title>
    <description>The latest articles on DEV Community by Dorcas Bwire (@dorcas_bwire).</description>
    <link>https://dev.to/dorcas_bwire</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2871269%2F6858d96d-cac8-47e1-ae8e-a2bd75770b10.jpeg</url>
      <title>DEV Community: Dorcas Bwire</title>
      <link>https://dev.to/dorcas_bwire</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dorcas_bwire"/>
    <language>en</language>
    <item>
      <title>[Boost]</title>
      <dc:creator>Dorcas Bwire</dc:creator>
      <pubDate>Tue, 15 Apr 2025 09:47:56 +0000</pubDate>
      <link>https://dev.to/dorcas_bwire/-3b1m</link>
      <guid>https://dev.to/dorcas_bwire/-3b1m</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846" class="crayons-story__hidden-navigation-link"&gt;Understanding LangChain's RecursiveCharacterTextSplitter&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/eteimz" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F673619%2Fec8d5488-46cf-4f85-a1b2-4ffc4cfb3c7a.png" alt="eteimz profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/eteimz" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Youdiowei Eteimorde
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Youdiowei Eteimorde
                
              
              &lt;div id="story-author-preview-content-1565237" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/eteimz" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F673619%2Fec8d5488-46cf-4f85-a1b2-4ffc4cfb3c7a.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Youdiowei Eteimorde&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Aug 12 '23&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846" id="article-link-1565237"&gt;
          Understanding LangChain's RecursiveCharacterTextSplitter
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/chatgpt"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;chatgpt&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/langchain"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;langchain&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/fire-f60e7a582391810302117f987b22a8ef04a2fe0df7e3258a5f49332df1cec71e.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;270&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/eteimz/understanding-langchains-recursivecharactertextsplitter-2846#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              29&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>python</category>
      <category>ai</category>
      <category>chatgpt</category>
      <category>langchain</category>
    </item>
    <item>
      <title>Regression with CART Trees</title>
      <dc:creator>Dorcas Bwire</dc:creator>
      <pubDate>Mon, 07 Apr 2025 13:48:53 +0000</pubDate>
      <link>https://dev.to/dorcas_bwire/regression-with-cart-trees-50ng</link>
      <guid>https://dev.to/dorcas_bwire/regression-with-cart-trees-50ng</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
Classification and Regression trees (CART) are a non-parametric decision tree learning technique that produces either classification or regression trees, dependent on whether variable is categorical or continuous. In this context, our focus is primarily on regression, with our goal being to predict a continuous output variable. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode of Operation&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;The CART algorithm builds a binary tree where each non-leaf node splits the dataset into exactly two subsets repeatedly. Each of the root nodes represents a single input variable (x) and a split point on that variable. Essentially, the dataset is split into number of trees, depending on the criteria of splitting. The criteria could either be: Entropy, Gini or Variance. The splitting is done till the terminal node of the tree is reached. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Process of Building the Trees&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffif6h6z9o8neeeuknjxi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffif6h6z9o8neeeuknjxi.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Feature Selection&lt;/strong&gt;
Entails the evaluation of the features of the data to identify that which best splits the data. The selection of the ideal input variable and the specific split is chosen using a greedy algorithm to minimize the cost function such as the mean squared error. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Binary Splitting&lt;/strong&gt;&lt;br&gt;
Upon selection of the best feature, a binary split is created in the data to two child nodes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recursive Tree Building&lt;/strong&gt;&lt;br&gt;
The process is ongoing until a stopping criterion is met, such as the minimum number of samples in a node, or the maximum tree depth. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tree Pruning&lt;/strong&gt;&lt;br&gt;
Upon building of the full tree, pruning begins. It entails examination of the tree sections to identify branches that can be removed without a significant loss in prediction accuracy. The simplest pruning approach involves working through each leaf node in the tree, while evaluating the effect of removing it using a hold-out test set. Leaf nodes are removed when there is a drop in the overall cost function on the entire test set. &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Application of the CART Algorithm&lt;/strong&gt;&lt;br&gt;
There are diverse applications of the CART algorithm, attributed to its ability to handle both the classification and regression problems, coupled by the transparent nature of decision trees. This provides valuable insights and predictions to the different domains. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39y7lz3mjlc3hm5bg4gq.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39y7lz3mjlc3hm5bg4gq.PNG" alt="Image description" width="800" height="447"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;In the healthcare sector, the importance of timely and accurate diagnosis cannot be underscored. This facilitates the prediction of the likelihood of a patient having a particular disease based on the symptoms and test results. The CART algorithm facilitates determining the risk of patients developing complications post operation, based on factors like age, surgery type and pre-existing conditions. From a financial standpoint, the CART algorithm facilitates the prediction of the creditworthiness of customers based on variables like debt ratios, employment status and income. &lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>decisiontrees</category>
    </item>
    <item>
      <title>Chi-Square Tests and Degrees of Freedom</title>
      <dc:creator>Dorcas Bwire</dc:creator>
      <pubDate>Fri, 07 Mar 2025 19:47:32 +0000</pubDate>
      <link>https://dev.to/dorcas_bwire/chi-square-tests-and-degrees-of-freedom-4op3</link>
      <guid>https://dev.to/dorcas_bwire/chi-square-tests-and-degrees-of-freedom-4op3</guid>
      <description>&lt;p&gt;The Chi-Square test, or χ² test indicates the existence of a relationship between two categorical variables. To expatiate, this analysis will mirror concert organizers using chi-square tests to determine whether the genre of music: Afro or Jazz affects the audience attendance. Essentially, the test checks whether or not observed data fits those that would be expected, assuming that there is no association whatsoever. The chi-square test helps in determining if there is a relationship between music genre and attendance.&lt;/p&gt;

&lt;p&gt;To compute the chi-square test, the following formula is used:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k3o1y0i09fh00cr5n46.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4k3o1y0i09fh00cr5n46.PNG" alt="Image description" width="689" height="135"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Where O is the observed value&lt;br&gt;
    E is the expected value&lt;br&gt;
If the p-value &amp;lt;=0.05, we reject the null hypothesis, and if p-value &amp;gt; 0.05, we fail to reject the null hypothesis. &lt;br&gt;
The steps to conducting the chi-square test include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Define the hypothesis, both null and alternative hypothesis&lt;/li&gt;
&lt;li&gt; Gather and organize the data&lt;/li&gt;
&lt;li&gt; Calculate the expected frequencies &lt;/li&gt;
&lt;li&gt; Compute the chi-square test&lt;/li&gt;
&lt;li&gt; Draw the conclusion
Degrees of freedom indicate the number of independent observations or variables that can vary in an analysis without breaking any constraints, readily available to estimate a parameter. In chi-square tests, there are three ways for calculating the degrees of freedom:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;a). &lt;strong&gt;Goodness of Fit Test&lt;/strong&gt;&lt;br&gt;
In this test, it checks whether the observed distribution of a single categorical variable matches the expected distribution. In this context, we analyze the frequency distribution of how often the audiences choose Afro versus Jazz concerts.&lt;br&gt;
df = k-1 where:&lt;br&gt;
k = number of categories&lt;/p&gt;

&lt;p&gt;b). &lt;strong&gt;Test of Independence&lt;/strong&gt;&lt;br&gt;
The test assesses the relationship between two categorical variables, such as music genre (Afro/Jazz) and the attendance level (high/level)&lt;br&gt;
df = (r-1) x (c-1) where:&lt;br&gt;
r is the number of rows,&lt;br&gt;
c is the number of columns in the contingency table&lt;/p&gt;

&lt;p&gt;c). &lt;strong&gt;Test for Homogeneity&lt;/strong&gt;&lt;br&gt;
Entails comparison of the distribution of the categorical variable across different populations. In this context, we would compare the two music genres: (Afro and Jazz) and how they vary between different cities where the concerts are held. &lt;/p&gt;

&lt;p&gt;Lets assume there are three cities and two music genres, the df = (3-1)*(2-1) = 2&lt;/p&gt;

&lt;p&gt;It is worth noting that the shape of the chi-square distribution evolves as the df increases. This is attributed to how the sum of squared differences between the observed and expected frequencies depend on the number of independent comparisons made. &lt;br&gt;
Notably, the df is not always monotonically decreasing, the shape is dependent of the freedom of the data to vary. Employing the concert planning analogy, the more the elements juggled such as venues, audience preferences and genres, it becomes inherently complex, leading to varied potential outcomes.&lt;/p&gt;

</description>
      <category>statistics</category>
      <category>hypothesistesting</category>
    </item>
    <item>
      <title>Hypothesis Testing</title>
      <dc:creator>Dorcas Bwire</dc:creator>
      <pubDate>Mon, 24 Feb 2025 07:24:51 +0000</pubDate>
      <link>https://dev.to/dorcas_bwire/hypothesis-testing-3o7m</link>
      <guid>https://dev.to/dorcas_bwire/hypothesis-testing-3o7m</guid>
      <description>&lt;p&gt;Simply explained, a hypothesis is an educated guess. Hypothesis play a pivotal role in facilitating decision-making, given we live in a data-driven age. Hypothesis testing is a structured approach for determining whether the findings of a study provide sufficient evidence to support a specific theory relevant to a larger population. A hypothesis test assesses how unusual the result is, and where it is reasonable chance variation or whether the result is too extreme to be regarded as chance variation. &lt;/p&gt;

&lt;p&gt;Primarily, hypothesis testing seeks to test whether the null hypothesis can be rejected or approved. In the event it is rejected, the alternative hypothesis can be accepted. If the null hypothesis is accepted, it implies the alternative hypothesis is rejected. Thus, a value is set in order to gauge whether the null hypothesis is accepted or rejected, and whether the result is statistically significant. &lt;/p&gt;

&lt;h2&gt;
  
  
  Process of Hypothesis Testing.
&lt;/h2&gt;

&lt;p&gt;The hypothesis testing process is classified into different phases:&lt;/p&gt;

&lt;p&gt;1.Restate the research question as the alternative hypothesis, and null hypothesis about the population.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The null hypothesis states that there is no effect or difference, which is the hypothesis one attempts to reject with the test. &lt;/li&gt;
&lt;li&gt;The alternative hypothesis is that which is being tested, expressed as a correlation or statistical relationship between variables. &lt;/li&gt;
&lt;li&gt;Determine the significance level, often denoted by alpha (α). It implies the probability of rejecting the null hypothesis when it is true. The p-value depicts the probability that, assuming the null hypothesis is correct, you might still observe results that are at least as extreme as the results of your hypothesis test. 
A smaller p-value increases the likelihood for the alternative hypothesis being correct, and the greater the significance of the results. &lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;One-sided vs. Two-sided Testing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sampling&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb11iboory8rqzparzv4a.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb11iboory8rqzparzv4a.PNG" alt="Image description" width="672" height="590"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hypothesis Testing?
&lt;/h2&gt;

&lt;p&gt;It helps in estimating the sampling error, and factor it into the test results, facilitating effective decision-making. &lt;/p&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
