DEV Community

Durga Pokharel
Durga Pokharel

Posted on • Originally published at iamdurga.github.io on

Dataframe in R.

Getting Started With Dataframe .

Introduction

Dataframe are the mostly used data structure in R. Dataframe is a list where all components have name and are on the same line. Easiest way of understanding about dataframe is the visualization of spreadsheets. The first row is represented by header. The header is given by the list component name. Each column can store the different datatype which is called a variable and each row is an observation across multiple variables, since dataframe are like spreadsheet we can insert the data how we will like to. There are many possibilities to inserting data.

Product apple Banana
price store A 23 56
price store B 67 80

It is not dataframe because here price store is divided into two parts. If we rearrange the data by taking product is one variable and price is next variable and store is one variable then it become dataframe.

Product Price Store
apple 23 A
apple 67 B
banana 56 A
banana 80 B

Attributes of dataframe

  • Length
  • Dimension
  • Name
  • Class ## How to Create DataFrame
product <- c('apple','banana','orange','papaya','rice','wheat','pee','noodle')
catagory <- c( 'groceries','groceries','electronic','electronic','groceries','electronic','electronic','groceries')
price <- c(24,45,67,88,56,78,89,90)
quality <- c('high','low','high','low','high','low','high','low') 

Enter fullscreen mode Exit fullscreen mode

To create dataframe from above data we can do

 shopping_data <- data.frame(product,catagory,price,quality,
                           budget = c(120,3000,600,500,45,67,89,90))
shopping_data

Enter fullscreen mode Exit fullscreen mode

Output of the avove code is,dataframe.

To check wether it is dataframe or not we can use folowing code.

str(shopping_data)

Enter fullscreen mode Exit fullscreen mode

Output of the above cde is,

'data.frame':   8 obs. of 5 variables:
 $ product : chr "apple" "banana" "orange" "papaya" ...
 $ catagory: chr "groceries" "groceries" "electronic" "electronic" ...
 $ price : num 24 45 67 88 56 78 89 90
 $ quality : chr "high" "low" "high" "low" ...
 $ budget : num 120 3000 600 500 45 67 89 90 

Enter fullscreen mode Exit fullscreen mode

Check the attribute of dataframe.

 names(shopping_data)

Enter fullscreen mode Exit fullscreen mode

Check dimension of dataframe.

 dim(shopping_data)

Enter fullscreen mode Exit fullscreen mode

Check first six rows of dataframe

 head(shopping_data)

Enter fullscreen mode Exit fullscreen mode

Check last six rows of dataframe.

 tail(shopping_data)

Enter fullscreen mode Exit fullscreen mode

Take only two rows of dataframe.

 head(shopping_data, n = 2)

Enter fullscreen mode Exit fullscreen mode

Access specified column of database.

 shopping_data$product

Enter fullscreen mode Exit fullscreen mode

Output of the above code is,

 'apple''banana''orange''papaya''rice''wheat''pee''noodle'


 shopping_data[['product']]

Enter fullscreen mode Exit fullscreen mode

Output of the above code is,

 'apple''banana''orange''papaya''rice''wheat''pee''noodle'

Enter fullscreen mode Exit fullscreen mode

Manipulating dataframe By manipulating data frame we khow how to select, add new row and how to sort and rank into dataframe. Dataframe are list where each elements are name vector of same length. Therefore we can select element as same as in list. we do by [[]] or $column. Dataframe are also two dimensional matricies which means we can index them as matrices by using square braces.[row,column].We fix data in one dimension they behave as list. Therefore dataframe can be index either as like list or as like matrices accoding to positions, rules, names.

List subsetting

#list subsetting
shopping_data[[2]]
shopping_data[['budget']]
shopping_data$price
shopping_data$price[1:3]
shopping_data[[3]][3]
shopping_data$price[3]

Enter fullscreen mode Exit fullscreen mode

Output of the above code is,

'groceries''groceries''electronic''electronic''groceries''electronic''electronic''groceries'
120300060050045678990
2445678856788990
244567
67
67

Enter fullscreen mode Exit fullscreen mode

Matrix subsetting

#Matrix subsetting
shopping_data[,1]
shopping_data[,"product"]
shopping_data[1,]
shopping_data[1,"price"]

Enter fullscreen mode Exit fullscreen mode

Output will be

'apple''banana''orange''papaya''rice''wheat''pee''noodle'
'apple''banana''orange''papaya''rice''wheat''pee''noodle'
A data.frame: 1 × 5
1   apple   groceries   24  high    120
24

Enter fullscreen mode Exit fullscreen mode

Add new attribute into dataframe.

feedback<- c('good','outstanding','ordinary','nice','excilent','brillent','extra-ordinary','satisfactory')
shopping_data <- cbind(shopping_data,feedback)
shopping_data

Enter fullscreen mode Exit fullscreen mode

Output will be

A data.frame: 8 × 6
apple   groceries   24  high    120 good
banana  groceries   45  low 3000    outstanding
orange  electronic  67  high    600 ordinary
papaya  electronic  88  low 500 nice
rice    groceries   56  high    45  excilent
wheat   electronic  78  low 67  brillent
pee electronic  89  high    89  extra-ordinary
noodle  groceries   90  low 90  satisfactory

Enter fullscreen mode Exit fullscreen mode

We can do the following operations to access the data from dataframe

shopping_data[c(1:3),1]
shopping_data[1]
shopping_data[[1]]
is.vector(shopping_data[1])
is.vector(shopping_data[[1]])
is.list(shopping_data[1])
is.list(shopping_data[1])

Enter fullscreen mode Exit fullscreen mode

Output is,

'apple''banana''orange'
A data.frame: 8 × 1
apple
banana
orange
papaya
rice
wheat
pee
noodle
'apple''banana''orange''papaya''rice''wheat''pee''noodle'
FALSE
TRUE
TRUE
TRUE

Enter fullscreen mode Exit fullscreen mode

Working with tidyverse

During data analysis we spend our most time in data cleaning and transforming the raw data. Tydyverse is an add on that let us perform operation such as cleaning data and creating powerful graph.

product <- c('apple','banana','orange','papaya','Rice','wheat','pee','noodle')
catagory <- c( 'groceries','groceries','electronic','electronic','groceries','electronic','electronic','groceries')
price <- c(24,45,67,88,56,78,89,90)
quality <- c('high','low','high','low','high','low','high','low')
shopping_data <- data.frame(product,catagory,price,quality,
                           budget = c(120,3000,600,500,45,67,89,90))
#arrange(desc(price))
shopping_data

Enter fullscreen mode Exit fullscreen mode

Output is,

A data.frame: 8 × 5
apple   groceries   24  high    120
banana  groceries   45  low 3000
orange  electronic  67  high    600
papaya  electronic  88  low 500
Rice    groceries   56  high    45
wheat   electronic  78  low 67
pee electronic  89  high    89
noodle  groceries   90  low 90

Enter fullscreen mode Exit fullscreen mode

Select Function

Select function allow us to select specified data from dataframe.

# dplyr never change the original data
#install.packages("tidyverse")
#library(tidyverse)
library(dplyr) 
product <- select(shopping_data,price,budget)
product

Enter fullscreen mode Exit fullscreen mode

Output is,

A data.frame: 8 × 2
24  120
45  3000
67  600
88  500
56  45
78  67
89  89
90  90

Enter fullscreen mode Exit fullscreen mode

Filter

Filter function work similar to the select. Using the pipe operator %>% we can write multiple operations at once without renaming the intermedating results.

filter(product,budget > 100)

Enter fullscreen mode Exit fullscreen mode

Output is,

A data.frame: 4 × 2
24  120
45  3000
67  600
88  500


dataset2 <- shopping_data %>%
select(product,price)%>%
filter(price>45)%>%
group_by( product)%>%
summarize(avg = mean(price))

dataset2 

Enter fullscreen mode Exit fullscreen mode

Output is,

A tibble: 6 × 2
noodle  90
orange  67
papaya  88
pee 89
Rice    56
wheat   78

Enter fullscreen mode Exit fullscreen mode

Arrange function

It sort our dataframe in acending order.arrange(price), to arrange dataframe in decending order we used arrange(desc(price))

arrange(product,price)


Output is,
A data.frame: 8 × 2
24  120
45  3000
56  45
67  600
78  67
88  500
89  89
90  90

Enter fullscreen mode Exit fullscreen mode

Managing control statements:

  • If statement:

If statement is the most common statement that execute code that only the condition place between bracket is true. Otherwise if statement ignore that particular piece of code. if(condition){ code to be executed} to overcome this abstacle we add extra element else # Paste function Paste converts its arguments ( via as.character) to character strings and concatenates them (separating them by the string given by sep ). If the arguments are vectors, they are concatenated term-by-term to give a character vector result.

product <- "tshirt"
price<- 110
if(price < 100){
    print(paste('adding',product,'to cart'))
}else
{
    print(paste('adding',product,'to wishlist'))
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "adding tshirt to wishlist"

Enter fullscreen mode Exit fullscreen mode

Control Statement in vectors

quantity <- c(1,1,2,3,4)
ifelse(quantity == 1,'Yes','No')

Enter fullscreen mode Exit fullscreen mode

Output is,

'Yes''Yes''No''No''No'


price <- 100
if(price < 100){
    print("price"< "budget")
}else if(price == 100){
    print("the price is equal to budget")

}else{
    print("The budget is less then price")
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "the price is equal to budget"


price <- c(58,100,110)
if(price < 100){
    print("price"< "budget")
}else if(price == 100){
    print("the price is equal to budget")

}else{
    print("The budget is less then price")
}

Enter fullscreen mode Exit fullscreen mode

If the condition has the lenght grater than one then only the first input is tested. That means it check the first elements and then stop. This problem is resolved by using any function.

Any Function

if(any(price < 100)){

    print('At least one price is under budget')
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "At least one price is under budget"

Enter fullscreen mode Exit fullscreen mode

All Function

if(all(price<100)){
    print('all the price are under budget')
}else{
    print('Not all prices satisfies the condition.')
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "Not all prices satisfies the condition."

Enter fullscreen mode Exit fullscreen mode

To combine the condition we can use && and || operator. single and and or are used to element wise vector. While double and or are used for vector compare on one(non vectorise form)

price <- 58
if(price> 50 && price < 100){
    print('The price is between 50 and 100')
}else {
    print("the price is not in between 50 and 100")
}


[1] "The price is between 50 and 100"

Enter fullscreen mode Exit fullscreen mode

Switch Statement

We can add as many as if else statements however keeping more than four is difficult to keep track what is happing when the condition is true. The switch command work with the cases, each syntax contain value to be tested followed by the possible cases.

quantity <- c(1,3,4,5)

average_quantity <- function(quantity,type) {
    switch(type,
          arthematic = mean(quantity),
          geometric = prod(quantity)^(1/length(quantity)))
}
average_quantity(quantity,"arthematic")

Enter fullscreen mode Exit fullscreen mode

Output is,

3.25


x <- c(1,2,3,4,5)
sumfunction <- function(x,i){
    switch(i, 
          s = sum(x)
        )
}
sumfunction(x,"s")

Enter fullscreen mode Exit fullscreen mode

Output is,

15

Enter fullscreen mode Exit fullscreen mode

Loop

Loop is the sequence of instructions that are repeated untill a certain condition is reached.

  • For loop It perform the same operations on all elements from input. Its syntax is if(variable in sequence ){ Expression}between parenthesis there are three argument first argument is variable which can take any name then we have keyword in and last is sequence or vector of any kind.

For loop does not save output untill we print it.

cart <- c('apple','cookie','lemoan')
    for(product in cart){
        print(product)
    }

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "apple"
[1] "cookie"
[1] "lemoan"

Enter fullscreen mode Exit fullscreen mode

While loop

While loop perform the operation as long as given conditions is true. Syntax is similary as for loop. To make loop stop there must be relation between condtion and expression other wise loop does not stop ever.

index <- 1
while(index <3 ) {
    print(paste("The index value is",index))
    index <- index + 1
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] "The index value is 1"
[1] "The index value is 2"

Enter fullscreen mode Exit fullscreen mode

Repeat Loop

They repeat the same operation untill it hitting the stop key or by inserting special function to stop them. Repeat loop are important in algorithms optimization and maximization. As an syntax repeat expression

The next statement is used to discontinue one particular cycle and skip to the next.

x <- 1
repeat {
    print(x)
    x = x + 1
    if( x==3){
        break
    }
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] 1
[1] 2


price <- c(123,456,78,900,987)
for(value in price){
    if( value < 100){
        next
    }
    discount <- value - value * 0.1
    print(discount)
}

Enter fullscreen mode Exit fullscreen mode

Output is,

[1] 110.7
[1] 410.4
[1] 810
[1] 888.3

Enter fullscreen mode Exit fullscreen mode

Top comments (0)