DEV Community

hfrnssc
hfrnssc

Posted on

How can I solve the tokenization process with R?

I want to process the text from this

enter image description here

to this with removal of regular expression, like emoticon symbol, numbers, punctuation, etc.

enter image description here

I have tried with this code

library(dplyr)
library(tidyr)
library(tidytext)
library(textdata)
library(purrr)
library(csv)
sentanalysis <- read.csv('crawling_commuterline.csv', header=TRUE, sep=";", encoding="UTF-8")
sentanalysis

tweetdt = sentanalysis %>% select(tw)
head(tweetdt)

tidytext::unnest_tokens(read.csv("crawling_commuterline.csv", stringsAsFactors = FALSE),word,tw)

Top comments (0)