How can I solve the tokenization process with R?

hfrnssc — Sun, 08 Nov 2020 04:20:23 +0000

How can I solve the tokenization process with R?

Nov 7 '20 Comments: 1 Answers: 0

I want to process the text from this

to this with removal of regular expression, like emoticon symbol, numbers, punctuation, etc.

I have tried with this code

library(dplyr)
library(tidyr)
library(tidytext)
library(textdata)
library(purrr)
library(csv)
sentanalysis <- read.csv('crawling_commuterline.csv', header=TRUE, sep=";", encoding="UTF-8")
sentanalysis

tweetdt = sentanalysis %>% select(tw)
head(tweetdt)

tidytext::unnest_tokens(read.csv("crawling_commuterline.csv", stringsAsFactors = FALSE),word,tw)

…

Open Full Question

DEV Community: hfrnssc

How can I solve the tokenization process with R?

How can I solve the tokenization process with R?