sofaki000

Posted on Nov 20, 2021

Data analysis with Matlab: distributions

#matlab #computerscience #datascience #tutorial

Introduction:

Explore different distributions (normal, poisson, student,exponential) using matlab.

Normal distribution

normrnd = Normal random numbers

r = normrnd(mu,sigma) generates a random number from the normal distribution with mean parameter mu and standard deviation parameter sigma.

r = normrnd(mu,sigma,sz1,...,szN) generates an array of normal random numbers, where sz1,...,szN indicates the size of each dimension.

Arguments:
mu — Mean
sigma — Standard deviation

More on normal distribution on a separate article

Poisson distribution

poissrnd = generates random numbers from Poisson distribution.
r = poissrnd(lambda) generates random numbers from the Poisson distribution specified by the rate parameter lambda.

lambda can be a scalar, vector, matrix, or multidimensional array.

r = poissrnd(lambda,sz1,...,szN) generates an array of random numbers from the Poisson distribution with the scalar rate parameter lambda, where sz1,...,szN indicates the size of each dimension.

Examples:

r_scalar = poissrnd(20)
r_scalar = 9

2)
Generate a 2-by-3 array of random numbers from the same distribution by specifying the required array dimensions.

lambda = 20
r_array = poissrnd(lambda ,2,3)
r_array = 2×3

    13    14    18
    26    16    21

lambda = 10:2:20
lambda = 1×6

    10    12    14    16    18    20

Generate random numbers from the Poisson distributions.

r = poissrnd(lambda)
r = 1×6

    14    13    14     9    14    31

poisspdf= Poisson probability density function

Parameters:

x — Values at which to evaluate Poisson pdf
lambda — Rate parameters

Compute the Poisson probability density function values at each value from 0 to 10. These values correspond to the probabilities that a disk has 0, 1, 2, .., 10 flaws.

Examples:

flaws = 0:10;
y = poisspdf(flaws,2);

tV = [0:100]';
lambda = 10; 
pois = poisspdf(tV,lambda);

Student's distribution

tcdf = Student's t cumulative distribution function

p = tcdf(x,nu) returns the cumulative distribution function (cdf) of the Student's t distribution with nu degrees of freedom, evaluated at the values in x.

Arguments:
x — Values at which to evaluate cdf
nu — Degrees of freedom

tinv = Student's t inverse cumulative distribution function

Examples:

1)
Find the 95th percentile of the Student's t distribution with 50 degrees of freedom.

p = .95;   
nu = 50;   
x = tinv(p,nu)
x = 1.6759

Exponential distribution

exprnd = calculates exponential random numbers

Examples:

r = exprnd(mu)
r = exprnd(mu) generates a random number from the exponential distribution with mean mu.

r = exprnd(mu,sz1,...,szN)
 generates an array of random numbers from the exponential distribution, where sz1,...,szN indicates the size of each dimension.

exppdf = Exponential probability density function

1) y = exppdf(x) returns the probability density function (pdf) of the standard exponential distribution, evaluated at the values in x.

Compute the density of the value 5 in the standard exponential distribution.

y1 = exppdf(5) 
y1 = 0.0067

2) y = exppdf(x,mu) returns the pdf of the exponential distribution with mean mu, evaluated at the values in x.

Compute the density of the value 5 in the exponential distributions specified by means 1 through 5.

y2 = exppdf(5,1:5)
y2 = 1×5

    0.0067    0.0410    0.0630    0.0716    0.0736

Arguments:

x — Values at which to evaluate pdf
mu — mean

Histograms

1) histogram = Histogram plot

2) histfit = Histogram with a distribution fit

histfit(data) plots a histogram of values in data using the number of bins equal to the square root of the number of elements in data and fits a normal density function.

Construct a histogram with a normal distribution fit.
(bins are the number of bars you will see in your hist)

r = normrnd(10,1,100,1); 
histfit(r)

example
histfit(data,nbins) plots a histogram using nbins bins and fits a normal density function.
Construct a histogram using six bins with a normal distribution fit.

r = normrnd(10,1,100,1);
histfit(r,6)

Clearing environment

clc = clears command window
clear = Remove items from workspace, freeing up system memory

General Useful Commands

mean = calculates average or mean value of array
nargin = returns the number of function input arguments
ttest = One-sample and paired-sample t-test

More on ttest

The one-sample t-test is a parametric test of the location parameter when the population standard deviation is unknown.

The test statistic is:

t= (x−μ)/(s/squar(n))

where x is the sample mean, μ is the hypothesized population mean, s is the sample standard deviation, and n is the sample size. Under the null hypothesis, the test statistic has Student’s t distribution with n – 1 degrees of freedom.

Arguments of ttest:

Alpha — Significance level

0.05 (default) | scalar value in the range (0,1)

Dim — Dimension

first nonsingleton dimension (default) | positive integer value
Dimension of the input matrix along which to test the means consisting of 'Dim' and a positive integer value. For example, specifying 'Dim',1 tests the column means, while 'Dim',2 tests the row means.
3)

Tail — Type of alternative hypothesis

'both' (default) | 'right' | 'left'
Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of:

'both' — Test against the alternative hypothesis that the population mean is not m.

'right' — Test against the alternative hypothesis that the population mean is greater than m.

'left' — Test against the alternative hypothesis that the population mean is less than m.

Output Arguments of ttest:

Hypothesis test result

Hypothesis test result, returned as 1 or 0.

If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level.

If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level.

p — p-value,scalar value in the range [0,1]

p-value of the test, returned as a scalar value in the range [0,1]. p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of p cast doubt on the validity of the null hypothesis.

ci — Confidence interval

vector
Confidence interval for the true population mean, returned as a two-element vector containing the lower and upper boundaries of the 100 × (1 – Alpha)% confidence interval.

stats — Test statistics

structure
Test statistics, returned as a structure containing the following:

tstat — Value of the test statistic.

df — Degrees of freedom of the test.

sd — Estimated population standard deviation. For a paired t-test, sd is the standard deviation of x – y.

DEV Community

Data analysis with Matlab: distributions

Normal distribution

Poisson distribution

Student's distribution

Exponential distribution

Histograms

Clearing environment

General Useful Commands

More on ttest

Alpha — Significance level

Dim — Dimension

Tail — Type of alternative hypothesis

Hypothesis test result

p — p-value,scalar value in the range [0,1]

ci — Confidence interval

stats — Test statistics

Top comments (0)

Read next

Exploratory Testing: A Detailed Guide

Bridging the Gap: A Case Study on Synchronizing Shopify and Microsoft Dynamics GP

Set Git to Recognize Case Changes

How Digital Onboarding KYC is Transforming Identity Verification