<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MRUGANK MANOJ RAUT</title>
    <description>The latest articles on DEV Community by MRUGANK MANOJ RAUT (@mrugank).</description>
    <link>https://dev.to/mrugank</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1522666%2Fd3181855-8046-47a7-87ea-745c4e8829d2.jpg</url>
      <title>DEV Community: MRUGANK MANOJ RAUT</title>
      <link>https://dev.to/mrugank</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mrugank"/>
    <language>en</language>
    <item>
      <title>LLM performance optimization solutions</title>
      <dc:creator>MRUGANK MANOJ RAUT</dc:creator>
      <pubDate>Tue, 28 May 2024 09:29:44 +0000</pubDate>
      <link>https://dev.to/mrugank/llm-performance-optimization-solutions-5c0d</link>
      <guid>https://dev.to/mrugank/llm-performance-optimization-solutions-5c0d</guid>
      <description>&lt;h2&gt;
  
  
  Performance optimization techniques
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz5cthuqkuwc7e48d7q6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frz5cthuqkuwc7e48d7q6.png" alt="." width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
After distributed tranining, LLM practitioners use performance &amp;amp; memory optimization techniques.There are 3 techniques for this.&lt;/p&gt;

&lt;h3&gt;
  
  
  1.Mixed-Precision training
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1f7cjec6letajp9n5tm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1f7cjec6letajp9n5tm.png" alt="." width="800" height="263"&gt;&lt;/a&gt;&lt;br&gt;
This method uses lower-precision arithmetic and reduces resource utilization. It reduces the workload on CPU and lowers the use of storage. Because of this, we can deploy larger networks with same amount of memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.Gradient Checkpoint
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kdc11ypjehxe2p53so9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kdc11ypjehxe2p53so9.jpg" alt="." width="800" height="515"&gt;&lt;/a&gt;&lt;br&gt;
This technique stores only subset of intermediate activations and recomputing them during backward pass to reduce memory usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.Operator Fusion
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyv3cu2xsn975ziua9ffe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyv3cu2xsn975ziua9ffe.png" alt="a" width="735" height="355"&gt;&lt;/a&gt;&lt;br&gt;
Using this technique, we can combine multiple operations into a single one to reduce memory allocation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using Purpose-Built Infrastructure
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.AWS Trainium
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmujw8seu85q01ohygkjd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmujw8seu85q01ohygkjd.png" alt="a" width="768" height="386"&gt;&lt;/a&gt;&lt;br&gt;
It is second-generation machine-learning accelerator built for deep-learning training.It powers EC2-Trn1 instances.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.AWS Inferentia
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaph4d3pk58ndgcey9c6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcaph4d3pk58ndgcey9c6.png" alt="a" width="374" height="186"&gt;&lt;/a&gt;&lt;br&gt;
It delivers high performance at lowest cost for deep-learning applications. Inf2 instances are used for large-scale gen-AI applications. They use models containing billions of parameters.&lt;/p&gt;

&lt;p&gt;LLM practioners can use AWS neuron SDK for HPC.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxd3ju5yodiou9qrfcbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxd3ju5yodiou9qrfcbn.png" alt="a" width="280" height="280"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Thank You&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>largelanguagemodel</category>
      <category>aws</category>
    </item>
    <item>
      <title>LLM Multi-Machine Training Solutions</title>
      <dc:creator>MRUGANK MANOJ RAUT</dc:creator>
      <pubDate>Tue, 28 May 2024 08:21:00 +0000</pubDate>
      <link>https://dev.to/mrugank/multi-machine-training-solutions-38pp</link>
      <guid>https://dev.to/mrugank/multi-machine-training-solutions-38pp</guid>
      <description>&lt;h2&gt;
  
  
  Scaling LLMs with Distributed Training
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqx8wjajucgbr2fb4f1ik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqx8wjajucgbr2fb4f1ik.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
To maximize the resource utilization and reduce the training cost, practitioners use distributed computing techniques for multi-GPU or multi-machine training. This techniques are named as &lt;strong&gt;distributed data parallelism&lt;/strong&gt; and &lt;strong&gt;distributed model parallelism&lt;/strong&gt;.This methods help in efficient use of resources. They also support in horizontal scaling, fault tolerance and parallel processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying Data Parallelism Techniques
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsivvi87nt5mkl9ri9dgf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsivvi87nt5mkl9ri9dgf.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
Data parallelism is used when data does not fit in a single device or lets say a GPU. With data parallelism, dataset is shared across multiple devices which contain the copy of model.In beggining, a mini-batch of dataset is distributed equally in exclusive manner across all model copies. Then this copies are trained in parallel and model parameters are coordinated across all devices. Collective algorithms and high performance computing networking frameworks are used to perform parameter synchronization.&lt;/p&gt;

&lt;p&gt;Approaches of Data Parallelism are as follows:-&lt;/p&gt;

&lt;h3&gt;
  
  
  1.AllReduce
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj0zbpddsnqmz3c3frxb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj0zbpddsnqmz3c3frxb.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
The AllReduce approach counts on direct communication between devices to interactively exchange model gradients and parameters. This approach aggregates the data from all devices and redistributes the aggregted results back to them. &lt;/p&gt;

&lt;h3&gt;
  
  
  2.Parameter-Server
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8wjyu7ftgq4ktb77kul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft8wjyu7ftgq4ktb77kul.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
Local model copies are synchronized by using publisher between set of parameter servers. This servers hold the most up-to-date copy of model. If not, then they participate in weight averaging step.It can be performed at the end of each training step(synchronous). Also unsychronously, where model copies pull parameters and push gradients independently. To improve the performance of parameter-server approach, HPC infrastructure components are used.&lt;/p&gt;




&lt;h2&gt;
  
  
  Applying Model Parallelism Techniques
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4eu9htmevpzi1nhra1d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4eu9htmevpzi1nhra1d.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
When the neural network is too big to fit in a single device or say a GPU, Model parallelism is an ideal solution. It also makes training process less memory intensive. In model parallelism, the model is partitioned across multiple devices to effectively utilize the combined memory of training cluster.It stores the entire model in memory-efficient fashion.&lt;/p&gt;

&lt;p&gt;Common approaches in model Parallelism are as follows:-&lt;/p&gt;

&lt;h3&gt;
  
  
  1.pipeline parallelism
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssxoa1w0r72tly80mwew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fssxoa1w0r72tly80mwew.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
It partitions set of model layers across several devices and divided the mini-batch training into micro-batches. This micro-batches are scheduled in an artificial pipeline for forward and backward calculations in overlap manner. It reduces device inactive time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.Tensor parallelism
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Few7nlb9rfp5nz2d60hah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Few7nlb9rfp5nz2d60hah.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
In pipeline parallelism, it partitions the set of weights. But in this case, it splits the indivisual weights across multiple devices.Tensor Parallelism in required in the case where a single parameter consumes most of the GPU memory. Big models like GPT need to be divided  and run on many devices  at same time to handle all calculations.&lt;/p&gt;




&lt;p&gt;In AWS, Amazon sagemaker offers data and model parallelism libraries. Some other are DeepSpeed by Microsoft, Megatron-LM by NVIDIA.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmm2klsvb4bydbscgtqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmm2klsvb4bydbscgtqt.png" alt="."&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnzf8svvsm6uf1gn8vgt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnzf8svvsm6uf1gn8vgt.png" alt="."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thank You&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>largelanguagemodel</category>
    </item>
    <item>
      <title>Common LLM Practitioner Challenges</title>
      <dc:creator>MRUGANK MANOJ RAUT</dc:creator>
      <pubDate>Tue, 28 May 2024 05:30:06 +0000</pubDate>
      <link>https://dev.to/mrugank/common-llm-practitioner-challenges-18nj</link>
      <guid>https://dev.to/mrugank/common-llm-practitioner-challenges-18nj</guid>
      <description>&lt;p&gt;Model quality depends on the large size of LLM and data used to train it, but training an LLM is quite challenging. Lets learn some common challanges faced while building such LLMs.&lt;/p&gt;




&lt;h2&gt;
  
  
  1.Training Data Curation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ibkq2v7vdj242ui4vle.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ibkq2v7vdj242ui4vle.jpg" alt="data curation" width="800" height="370"&gt;&lt;/a&gt;&lt;br&gt;
Models which are based on transformers are trained on large datasets of text from multiple data sources. An LLM's quality majorly depends on selection and curation of training data. Preparing the LLM training data is an area of research in LLM industry. Collecting, processing and cleaning the data requires a lot of resources but they are necessary to ensure the quality of model outputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  2.Large-scale, High-end infrastructure need
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2t43f7yspzp06114jk9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2t43f7yspzp06114jk9.jpg" alt="infra" width="800" height="450"&gt;&lt;/a&gt;&lt;br&gt;
While training LLMs, we must maintain the balance between the factors such as model size, model performance, computational complexity, etc. Training requires large-scale accelerated computing resources, high-speed networking and high-end compute instances. This training can take several days to weeks for &lt;br&gt;
completion. The high-end compute instances exist in close quarters to each other and are sometimes grouped in single network spine.&lt;br&gt;
To detect and handle failure, GPU quality management software is essential. It also configures distributed storage and multi-node data I/O for datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  3.High Training Costs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqjptwi1ziwsvl4dugi3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqjptwi1ziwsvl4dugi3.png" alt="cost" width="800" height="340"&gt;&lt;/a&gt;&lt;br&gt;
To train LLMs, organizations require to invest from millions to billions dollars. Only few organizations are in the position to invest this much money to train their LLMs. Due to this, other teams/organizations look for cost-effective training or to fine-tune the pre-trained models.&lt;/p&gt;




&lt;h2&gt;
  
  
  4.Machine Learning Expertise
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizjrf3kg6sgrtoszs3sk.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizjrf3kg6sgrtoszs3sk.jpeg" alt="ML" width="800" height="488"&gt;&lt;/a&gt;&lt;br&gt;
To optimize the performance of LLMs, practitioners use some advanced techniques for distributed training and parallel data processing. Practitioners also manage the framework. It requires expertise in Machine Learning.&lt;/p&gt;




&lt;h2&gt;
  
  
  5.Responsible AI
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ot9hgfitg6mzp81c4nh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ot9hgfitg6mzp81c4nh.jpg" alt="AI" width="480" height="480"&gt;&lt;/a&gt;&lt;br&gt;
LLMs are complex. Understanding their reasoning is a challenging task. Exploratory reaserch is required to make certain that language models are fair, transparent and unbiased. Another area of research is to create certain benchmarks to evaluate and compare the model's performance over various tasks.&lt;/p&gt;




&lt;p&gt;Interested about how LLMs are trained,then read the following post!&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag__link"&gt;
  &lt;a href="/mrugank" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1522666%2Fd3181855-8046-47a7-87ea-745c4e8829d2.jpg" alt="mrugank"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="/mrugank/multi-machine-training-solutions-38pp" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Multi-Machine Training Solutions&lt;/h2&gt;
      &lt;h3&gt;MRUGANK MANOJ RAUT ・ May 28&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#llm&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#largelanguagemodel&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;em&gt;&lt;strong&gt;Thank You.&lt;/strong&gt;&lt;/em&gt;

</description>
      <category>largelanguagemodel</category>
      <category>llm</category>
    </item>
    <item>
      <title>AWS Core Services - Networking</title>
      <dc:creator>MRUGANK MANOJ RAUT</dc:creator>
      <pubDate>Fri, 24 May 2024 06:17:34 +0000</pubDate>
      <link>https://dev.to/mrugank/aws-core-services-networking-5fn7</link>
      <guid>https://dev.to/mrugank/aws-core-services-networking-5fn7</guid>
      <description>&lt;p&gt;When you run an application using cloud, firstly you have to connect your resources to the cloud and then end-users will connect to your application.All of this comes under the concept of Networking. To understand how networking works on AWS, we have to understand how Amazon VPC works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon VPC
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhx4j4uhq6ybtet0hcwz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhhx4j4uhq6ybtet0hcwz.jpg" alt="Amazon VPC" width="320" height="320"&gt;&lt;/a&gt;&lt;br&gt;
It is a private network space to launch your resources on cloud in order to run your application. Amazon VPC provides a logical isolation to your application. You can control the in-out traffic of Your Amazon VPC. You can also control the way you want to connect to other networks. &lt;br&gt;
More than one VPCs can be launched from your AWS account. You can use those multiple VPCs to launch different workloads. You can also configure the way packets travel through the layers of your network.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon Route 53
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hnmqywotn2ssw6b7wpp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hnmqywotn2ssw6b7wpp.jpg" alt="Route 53" width="320" height="320"&gt;&lt;/a&gt;&lt;br&gt;
Route 53 is a scalable domain name service (DNS).It has 3 functionalities namely domain registration, DNS routing and health checking. DNS service translates the domain names into IP Addresses. With the help of Route 53, you can purchase and manage the domain names and can also configure DNS settings. Route 53 provides you multiple routing options.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon ELB
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frd56l9ii2747eb97ar5g.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frd56l9ii2747eb97ar5g.jpeg" alt="ELB" width="225" height="225"&gt;&lt;/a&gt;&lt;br&gt;
Elastic Load Balancer(ELB) is a DNS Service. It automatically distributes the incoming network traffic between multiple EC2 instances. It is single point of contact to your application. Because of ELB, the users do not need to be aware that how many machines your application is running on.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>networking</category>
    </item>
  </channel>
</rss>
