Skip to Tutorial Content

Filtering Numeric Variables

We will start with the prostate dataset, seen here:

## Rows: 316
## Columns: 20
## $ rbc_age_group      <dbl> 3, 3, 3, 2, 2, 3, 3, 1, 1, 2, 2, 1, 1, 1, 3, 1, 1, …
## $ median_rbc_age     <dbl> 25, 25, 25, 15, 15, 25, 25, 10, 10, 15, 15, 10, 10,…
## $ age                <dbl> 72.1, 73.6, 67.5, 65.8, 63.2, 65.4, 65.5, 67.1, 63.…
## $ aa                 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, …
## $ fam_hx             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ p_vol              <dbl> 54.0, 43.2, 102.7, 46.0, 60.0, 45.9, 42.6, 40.7, 45…
## $ t_vol              <dbl> 3, 3, 1, 1, 2, 2, 2, 3, 2, 2, 1, 3, 2, 2, 2, 2, 1, …
## $ t_stage            <dbl> 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, NA,…
## $ b_gs               <dbl> 3, 2, 3, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 1, …
## $ bn                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ organ_confined     <dbl> 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, …
## $ preop_psa          <dbl> 14.08, 10.50, 6.98, 4.40, 21.40, 5.10, 6.03, 8.70, …
## $ preop_therapy      <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, …
## $ units              <dbl> 6, 2, 1, 2, 3, 1, 2, 4, 1, 2, 2, 2, 2, 4, 2, 4, 5, …
## $ s_gs               <dbl> 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 1, 3, 1, 2, …
## $ any_adj_therapy    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ adj_rad_therapy    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ recurrence         <dbl> 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ censor             <dbl> 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ time_to_recurrence <dbl> 2.67, 47.63, 14.10, 59.47, 1.23, 74.70, 13.87, 8.37…

Exercise 1

Write the R code required to filter the prostate dataset to rows with a prostate volume (p_vol) greater than or equal to 90:

prostate %>% 
  filter(-- >= --)
prostate %>% 
  filter(p_vol >= 90)

Exercise 2

Write the R code required to filter the prostate dataset to rows with a family history (fam_hx) of prostate cancer.

Watch the number of == signs

prostate %>% 
  select(age, t_vol, fam_hx) %>% 
  filter(fam_hx --)
prostate %>% 
  select(age, t_vol, fam_hx) %>% 
  filter(fam_hx == 1)

Exercise 3

Write the R code required to filter the prostate dataset to rows with a preoperative psa (preop_psa) near 12 (within 1).

prostate %>% 
  select(age, aa, preop_psa) %>% 
  filter(preop_psa)
prostate %>% 
  select(age, aa, preop_psa) %>% 
  filter(near(preop_psa, 12, tol = 1))

Exercise 4

Write the R code required to filter the prostate dataset to rows with ages with values of 60 or 63 or 69.

prostate %>% 
  select(age, preop_psa, fam_hx) %>% 
  filter()
prostate %>% 
  select(age, preop_psa, fam_hx) %>% 
  filter(age %in% c(60, 63, 69))

Exercise 5

Write the R code required to filter the prostate dataset to rows with preop_psa between 9 and 11.

prostate %>% 
  select(age, preop_psa, fam_hx) %>% 
  filter()
prostate %>% 
  select(age, preop_psa, fam_hx) %>% 
  filter(between(preop_psa, 9,11))

Tutorial