While reading the section on data frames in The Book of R, I got a result that was wrong from that in the exercise. Wrong because I had left out a comma in the code. Before I would just have corrected it and moved on, but this time around I wanted to know why it is I got that wrong result.
The data frame I was working with is
The code I typed was mydata[mydata$sex=="F"]
This resulted in the following output:
Not quite what I expected. I then typed out just the test itself mydata$sex=="F"
and got the following vector of logicals.
Looking at it, and comparing it to the output I got, things started making sense. The two columns I got back from my original code coincided with the logical TRUE
values in the vector from the second code. What I had done by leaving out the comma was make a selection of columns instead of a selection of rows meeting the test criteria.
With this knowledge, I typed out the proper code mydata[mydata$sex=="F",]
to get
The result I expected. Forcing myself to slow down and actually understand why code gives the output it does is doing wonders for my learning progress. Had I not understood from earlier reading and exercises how row and column selection coupled with logical values actually works, this would have been a much more difficult issue to figure out. Onwards with learning R!
Resources:
The Book of R: https://nostarch.com/bookofr
Top comments (2)
Hi Eric,
While it's important to learn the base R so that we understand its various twists and turns, pls also check out the tidyverse package which has a standard way of doing operations in R and is more readable.
Hi,
Thanks for your comment. Yup, I am aware of tidyverse and am actually working on learning it. I do feel, for my own understanding, that I need to learn the base first so that I can, as you put it, understand its twists and turns. Then once good on that, I pick up the various packages.