<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rishabh Jain</title>
    <description>The latest articles on DEV Community by Rishabh Jain (@rishabhjaincodes).</description>
    <link>https://dev.to/rishabhjaincodes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F864621%2F84837fa1-6024-4f96-85cb-36e986cf80e1.png</url>
      <title>DEV Community: Rishabh Jain</title>
      <link>https://dev.to/rishabhjaincodes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rishabhjaincodes"/>
    <language>en</language>
    <item>
      <title>Mastering Dataset Acquisition: A Comprehensive Guide</title>
      <dc:creator>Rishabh Jain</dc:creator>
      <pubDate>Fri, 03 May 2024 10:37:35 +0000</pubDate>
      <link>https://dev.to/rishabhjaincodes/mastering-dataset-acquisition-a-comprehensive-guide-e54</link>
      <guid>https://dev.to/rishabhjaincodes/mastering-dataset-acquisition-a-comprehensive-guide-e54</guid>
      <description>&lt;p&gt;While learning, performing, practicing, or constructing a Machine Learning task, the foremost necessity is Machine Learning-specific datasets.&lt;/p&gt;

&lt;p&gt;However, a comprehensive process encompasses collecting, cleaning, verifying, and undertaking various tasks when handling datasets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Chapter 1: Understanding Your Project&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Acquiring a thorough understanding of your project is paramount, as it elucidates the fundamental aspects of your dataset's composition.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For instance, consider the scenario where you aim to procure a dataset pertaining to Taxi Customers. In such cases, the dataset's features can vary significantly based on factors such as the temporal context, the intended purpose, and the method of data collection. Some datasets may encompass details regarding customers' arrival and departure times, while others might incorporate information regarding additional tips offered. The diversity in features underscores the nuanced nature of dataset creation and underscores the importance of meticulous planning and project comprehension.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Chapter 2: Knowing the right sources&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kaggle&lt;/strong&gt;: A platform for data science and machine learning competitions, Kaggle also hosts datasets for practice and exploration. &lt;a href="https://www.kaggle.com/datasets"&gt;Kaggle Datasets.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UCI Machine Learning Repository&lt;/strong&gt;: A collection of databases, domain theories, and data generators widely used by the machine learning community. &lt;a href="https://archive.ics.uci.edu/datasets"&gt;UCI Machine Learning Repository&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Google Dataset Search&lt;/strong&gt;: Google's tool to help users find datasets stored across the web. &lt;a href="https://datasetsearch.research.google.com/"&gt;Google Dataset Search&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: Many researchers and organizations share datasets on GitHub repositories. You can search for repositories with datasets using specific keywords. &lt;a href="https://github.com/explore"&gt;GitHub&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS Public Datasets&lt;/strong&gt;: Amazon Web Services hosts a variety of public datasets that can be accessed for free. &lt;a href="https://registry.opendata.aws/"&gt;AWS Public Datasets&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;UCR Time Series Classification/Clustering Databases&lt;/strong&gt;: A collection of time series datasets for classification and clustering tasks. &lt;a href="https://www.cs.ucr.edu/~eamonn/time_series_data/"&gt;UCR Time Series Classification/Clustering Databases&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reddit Datasets&lt;/strong&gt;: A subreddit where users share interesting datasets they've found or collected. &lt;a href="https://www.reddit.com/r/datasets/"&gt;Reddit Datasets&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data.gov&lt;/strong&gt;: The home of the U.S. Government's open data. It provides access to thousands of datasets on various topics. &lt;a href="https://data.gov.in/"&gt;Data.gov&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FiveThirtyEight Datasets&lt;/strong&gt;: Datasets related to articles and investigations published by FiveThirtyEight. &lt;a href="https://data.fivethirtyeight.com/"&gt;FiveThirtyEight Datasets&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenML&lt;/strong&gt;: An online platform for sharing and organizing machine learning datasets. &lt;a href="https://www.openml.org/search?type=data&amp;amp;sort=runs&amp;amp;status=active"&gt;OpenML&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Chapter 3: Convert the dataset according to your needs and format you want to work in (cough...csv...cough)&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Chapter 4: Do the Data Cleaning part and apply Analytics to it.&lt;/em&gt;&lt;/strong&gt; 😎&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>tutorial</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Machine Learning : Things you don't know yet !!!</title>
      <dc:creator>Rishabh Jain</dc:creator>
      <pubDate>Thu, 02 May 2024 10:30:50 +0000</pubDate>
      <link>https://dev.to/rishabhjaincodes/machine-learning-things-you-dont-know-yet--8b8</link>
      <guid>https://dev.to/rishabhjaincodes/machine-learning-things-you-dont-know-yet--8b8</guid>
      <description>&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Machine Learning, although not a recent innovation, has been circulating since the 1950s. Its prominence surged notably in the 1990s with the introduction of programming languages like Python and R. Within the realm of Artificial Intelligence, Machine Learning constitutes a vital subdomain, with Deep Learning nestled as a subset therein.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While Data Analytics and Data Cleaning are perennial components of machine learning endeavours, the selection of an appropriate model holds equal significance, ensuring anticipated outcomes without unwelcome surprises.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9aj8uwe4xhyl5o9vxvyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9aj8uwe4xhyl5o9vxvyv.png" alt="Different Machine Learning Models and their usages. Image Credit:DataScience"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The paramount importance of Data Quality cannot be overstated. Regardless of the sophistication of the employed model, erroneous outcomes and pronounced biases ensue if the data quality is compromised. The adage "garbage in, garbage out" remains pertinent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;At times, simplicity triumphs over complexity in predictive accuracy, contingent upon the nature of the data and the predictive task at hand.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Human biases can insidiously infiltrate models, as datasets are crafted by individuals and may harbour inherent biases, as discernible in less-refined Image Generation Systems.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa50litwiq8wric19sflj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa50litwiq8wric19sflj.jpeg" alt="Human Biases in various sectors used by Machine Learning, Image Credit: Lightly.ai"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model performance is subject to degradation over time. Hence, regular model training and diligent data updates are imperative to sustain optimal performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensemble Models are a potent yet underutilized tool among novices. Mastery of diverse models empowers practitioners to leverage techniques like bagging, boosting, and stacking, substantially augmenting performance.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wxv4lp5e0t7i6xt5xco.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9wxv4lp5e0t7i6xt5xco.png" alt="Ensemble Methods"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even meticulously trained and rigorously tested ML models can err due to inadequate data provision. ML models crave copious data inputs for enhanced accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ethical considerations demand meticulous attention. Given the meteoric advancement of AI tools and the proliferation of AI-Software as a Service (SaaS) applications, ethical guidelines must be rigorously upheld. Every technological innovation harbors both benefits and drawbacks, and adherence to regulatory frameworks is pivotal in mitigating adverse consequences.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>development</category>
      <category>developers</category>
    </item>
  </channel>
</rss>
