Back to class overview
R script from this class
The authors of the data.table package are Matt Dowle and Arun Srinivasan. There are at least five good reasons to learn the package:
Regarding efficiency, it has been shown that data.table can be 75x faster than dplyr. Here is a comparison with other systems. We will talk about some of the reasons later in the class.
It is not the purpose of this class to convince you that data.table is superior to dplyr or vice versa. It is to show you another powerful tool that you can use to tackle big or small data problems efficiently in R. One can use the two packages to complement each other. If you end up liking the dplyr syntax more but want to efficiency advantage, consider using the dtplyr package. It uses dplyr syntax while working with data.table objects.
Let’s start our session by loading the library. Uncomment the second line and set the working directory where your data for this session is located (don’t forget to use forward slashes, e.g. “C:/Users/hana/Documents”):
library(data.table)
#setwd("/Users/hana/psrc/rworkshops2020/data.table")
To use data.table features, we need to create a data.table
object, which is an enhanced version of data.frame
. There are several ways to do it:
Create it from scratch:
d <- data.table(x = 1:10, y = letters[1:10])
d
## x y
## 1: 1 a
## 2: 2 b
## 3: 3 c
## 4: 4 d
## 5: 5 e
## 6: 6 f
## 7: 7 g
## 8: 8 h
## 9: 9 i
## 10: 10 j
Convert existing data.frame
:
df <- data.frame(x = 1:10, y = letters[1:10])
d <- as.data.table(df)
d
## x y
## 1: 1 a
## 2: 2 b
## 3: 3 c
## 4: 4 d
## 5: 5 e
## 6: 6 f
## 7: 7 g
## 8: 8 h
## 9: 9 i
## 10: 10 j
One can do such conversion also on a tibble
object and thus, it allows to easily switch between dplyr and data.table. To switch back from data.table to dplyr, use the as_tibble()
function.
Coercion by reference, i.e. no need to reassign an object:
dft <- data.frame(x = 1:10, y = letters[1:10])
class(dft)
## [1] "data.frame"
setDT(dft)
class(dft)
## [1] "data.table" "data.frame"
The class of dft
changed without a re-assignment, thus saving memory. To reverse the operation to make dft
being data.frame
again, use the setDF()
function.
Reading from file (extremely fast!):
dt <- fread("ofm_april1_population_final.csv")
class(dt)
## [1] "data.table" "data.frame"
For reading from Excel spreadsheets, one can use the read.xlsx()
function from the openxlsx library (as shown in the Rmarkdown class) and then convert to data.table
via as.data.table()
as shown above. However, for big datasets the fread()
function is superior to anything else in R. Reading in the PSRC’s parcels dataset (1.3 million rows, 285M) is 13x faster than with read.csv()
.
The equivalent to fread()
for writing data into a file is fwrite()
. It’s is also amazingly fast.
Now let’s explore the dataset structure.
str(dt)
View(dt)
The picture at the top of the page shows the basic syntax of all data.table
s. While tidyverse tends to break up operations step-by-step using verbs for each step, data.table aims to do everything in one concise expression.
When subsetting by rows, things work similary like in data.frame
, except the syntax is simpler. As an example, select rows from dt
that are in King county and are not cities:
dt[County == "King" & Filter < 4]
Or select all cities that have the word “Wood” in their name:
dt[grepl("Wood", Jurisdiction)]
Indices work too. Also notice that the comma is optional if no j
and by
is used.
dt[1:3] # is equivalent to
dt[1:3,]
For further processing, we select rows that correspond to cities in the PSRC’s four counties:
dt4c <- dt[Filter == 4 & County %chin% c("King", "Kitsap", "Pierce", "Snohomish")]
dim(dt4c)
## [1] 87 14
The %chin%
operator is like %in%
but optimized for speed and is for character vectors only.
The i
dimension is also used for putting rows into specific order. Here, ordering by Jurisdiction:
dt4c[order(Jurisdiction)]
## Line Filter County Jurisdiction 2010.Population.Census
## 1: 143 4 King Algona 3014
## 2: 325 4 Snohomish Arlington 17926
## 3: 144 4 King Auburn (part) 62761
## 4: 274 4 Pierce Auburn (part) 7419
## 5: 186 4 Kitsap Bainbridge Island 23025
## ---
## 83: 296 4 Pierce University Place 31144
## 84: 297 4 Pierce Wilkeson 477
## 85: 180 4 King Woodinville 10938
## 86: 344 4 Snohomish Woodway 1307
## 87: 181 4 King Yarrow Point 1001
## 9 variables not shown: [2011.Population.Estimate, 2012.Population.Estimate, 2013.Population.Estimate, 2014.Population.Estimate, 2015.Population.Estimate, 2016.Population.Estimate, 2017.Population.Estimate, 2018.Population.Estimate, 2019.Population.Estimate]
Ordering by reference (in place) implements optimized ordering:
setorder(dt4c, County, Jurisdiction)
Base R assigns objects by value, i.e. when an object is assigned, a copy of the original object is created, i.e. the value occupies a different address in the memory. Python, C, Java and other programming languages work this way.
Assignment by reference (or “in place”) differs as it does not create a separate copy; it rather references the original object, i.e. it points to the same address in the memory. In addition to data.table, C++ or Python pandas support this as well. It is well suited for working with big data as it is more memory efficient, but one has to be careful not to overwrite the original object.
The column names of our dataset are not very pretty, so let’s change it.
colnames(dt4c)
## [1] "Line" "Filter"
## [3] "County" "Jurisdiction"
## [5] "2010.Population.Census" "2011.Population.Estimate"
## [7] "2012.Population.Estimate" "2013.Population.Estimate"
## [9] "2014.Population.Estimate" "2015.Population.Estimate"
## [11] "2016.Population.Estimate" "2017.Population.Estimate"
## [13] "2018.Population.Estimate" "2019.Population.Estimate"
There are at least two ways to rename columns:
By reference: (we replace Jurisdiction by City)
setnames(dt4c, "Jurisdiction", "City")
Note that both, the second and third arguments can include more than one name, if renaming of multiple columns is desired.Using colnames()
as with data.frame
: (we extract the first four characters from the time columns)
colnames(dt4c)[5:ncol(dt4c)] <- substr(colnames(dt4c)[5:ncol(dt4c)], 1, 4)
colnames(dt4c)
## [1] "Line" "Filter" "County" "City" "2010" "2011" "2012" "2013"
## [9] "2014" "2015" "2016" "2017" "2018" "2019"
Columns can be selected in various ways. The most common case is probably a selection by column names:
dt4c[, .(County, City)]
## County City
## 1: King Algona
## 2: King Auburn (part)
## 3: King Beaux Arts Village
## 4: King Bellevue
## 5: King Black Diamond
## ---
## 83: Snohomish Mukilteo
## 84: Snohomish Snohomish
## 85: Snohomish Stanwood
## 86: Snohomish Sultan
## 87: Snohomish Woodway
Selection by an index still works:
dt4c[1:10, 1:5]
## Line Filter County City 2010
## 1: 143 4 King Algona 3014
## 2: 144 4 King Auburn (part) 62761
## 3: 145 4 King Beaux Arts Village 299
## 4: 146 4 King Bellevue 122363
## 5: 147 4 King Black Diamond 4153
## 6: 148 4 King Bothell (part) 17090
## 7: 149 4 King Burien 33313
## 8: 150 4 King Carnation 1786
## 9: 151 4 King Clyde Hill 2984
## 10: 152 4 King Covington 17575
Names can also be used in the same way as indices. Use backquotes if the column name starts with a number:
dt4c[1:5, Filter:City]
## Filter County City
## 1: 4 King Algona
## 2: 4 King Auburn (part)
## 3: 4 King Beaux Arts Village
## 4: 4 King Bellevue
## 5: 4 King Black Diamond
dt4c[1:5, Filter:`2014`]
## Filter County City 2010 2011 2012 2013 2014
## 1: 4 King Algona 3014 3055 3070 3075 3090
## 2: 4 King Auburn (part) 62761 63050 63390 64320 65350
## 3: 4 King Beaux Arts Village 299 300 300 290 295
## 4: 4 King Bellevue 122363 123400 124600 132100 134400
## 5: 4 King Black Diamond 4153 4160 4170 4170 4180
The .()
is a synonym for list()
:
dt4c[1:5, list(County, City)]
## County City
## 1: King Algona
## 2: King Auburn (part)
## 3: King Beaux Arts Village
## 4: King Bellevue
## 5: King Black Diamond
Or use the same way as in data.frame
, using the c()
function:
dt4c[1:5, c("County", "City")]
## County City
## 1: King Algona
## 2: King Auburn (part)
## 3: King Beaux Arts Village
## 4: King Bellevue
## 5: King Black Diamond
However, if you assign column names to an object, say cols
, then dt4c[, cols]
will not work. Instead use
cols <- c("County", "City")
dt4c[, ..cols] # or
dt4c[, cols, with = FALSE]
Negation can be used to exclude columns. Here, select all columns except the ones given:
dt4c <- dt4c[, !c("Line", "Filter")]
dim(dt4c)
## [1] 87 12
:=
operatorAssignments to columns is done via :=
which assigns by reference:
dt4c[, id := 1:nrow(dt4c)]
dt4c[, id := id + 5000]
Be aware of the implications of assigning by reference. Here is an example:
d # d is our toy data.table
## x y
## 1: 1 a
## 2: 2 b
## 3: 3 c
## 4: 4 d
## 5: 5 e
## 6: 6 f
## 7: 7 g
## 8: 8 h
## 9: 9 i
## 10: 10 j
a <- d # assign d to a
a[ , x := NULL][] # delete column in a
## y
## 1: a
## 2: b
## 3: c
## 4: d
## 5: e
## 6: f
## 7: g
## 8: h
## 9: i
## 10: j
d # d changed as well!
## y
## 1: a
## 2: b
## 3: c
## 4: d
## 5: e
## 6: f
## 7: g
## 8: h
## 9: i
## 10: j
Here is the right way to do it using the copy()
function:
d <- data.table(x = 1:10, y = letters[1:10]) # re-create d
a <- copy(d) # use the copy() function to create a separate object
a[ , x := NULL][] # delete column in a
## y
## 1: a
## 2: b
## 3: c
## 4: d
## 5: e
## 6: f
## 7: g
## 8: h
## 9: i
## 10: j
d # d is not impacted
## x y
## 1: 1 a
## 2: 2 b
## 3: 3 c
## 4: 4 d
## 5: 5 e
## 6: 6 f
## 7: 7 g
## 8: 8 h
## 9: 9 i
## 10: 10 j
Note that in the above example I used []
at the end of lines to print out the values of the object. They are not needed tfor the actual assignment.
You do not have to assign values to all rows - a partial assignment can be done with the help of the i
dimension. Say we want to increase the id
column by 1000 but only for King county:
dt4c[County == "King", id := id + 1000]
dt4c[1:4, .(County, City, id)]
## County City id
## 1: King Algona 6001
## 2: King Auburn (part) 6002
## 3: King Beaux Arts Village 6003
## 4: King Bellevue 6004
dt4c[50:53, .(County, City, id)]
## County City id
## 1: Pierce Edgewood 5050
## 2: Pierce Enumclaw (part) 5051
## 3: Pierce Fife 5052
## 4: Pierce Fircrest 5053
If the column does not exist, a partial assignment fills in NA
s:
d[x < 5, z := TRUE][]
## x y z
## 1: 1 a TRUE
## 2: 2 b TRUE
## 3: 3 c TRUE
## 4: 4 d TRUE
## 5: 5 e NA
## 6: 6 f NA
## 7: 7 g NA
## 8: 8 h NA
## 9: 9 i NA
## 10: 10 j NA
Let’s now delete the id
column:
dt4c[, id := NULL]
One can assign multiple columns in one command. Here we add two columns (u
and v
) and delete one column (z
) in one go:
d[ , ':='(u = x^2, v = x^3, z = NULL)]
We can also chain multiple operations into one line. Here we first add a column w
, then we add a column dummy
via a partial assignment, then we order the rows:
d[, w := -4:5][w > 0, dummy := TRUE][order(x, decreasing = TRUE)]
## x y u v w dummy
## 1: 10 j 100 1000 5 TRUE
## 2: 9 i 81 729 4 TRUE
## 3: 8 h 64 512 3 TRUE
## 4: 7 g 49 343 2 TRUE
## 5: 6 f 36 216 1 TRUE
## 6: 5 e 25 125 0 NA
## 7: 4 d 16 64 -1 NA
## 8: 3 c 9 27 -2 NA
## 9: 2 b 4 8 -3 NA
## 10: 1 a 1 1 -4 NA
Note that the first two operations are by reference, i.e. the d
object is changed. The last operation does not change the object, unless we would re-assign it to d
.
To assign multiple categories to one variable, there is a new function called fcase()
which is inspired by CASE WHEN
in SQL, and it is optimized for speed. For example.
d[, category := fcase(w > 0, "positive",
w %between% c(-2, 0), "slightly negative",
w < -2, "negative")]
Similarly to %chin%
, the %between%
operator is data.table specific.
Finally, an assignment can be also used to rename columns:
d[, dummy_new := dummy][, dummy := NULL][]
## x y u v w category dummy_new
## 1: 1 a 1 1 -4 negative NA
## 2: 2 b 4 8 -3 negative NA
## 3: 3 c 9 27 -2 slightly negative NA
## 4: 4 d 16 64 -1 slightly negative NA
## 5: 5 e 25 125 0 slightly negative NA
## 6: 6 f 36 216 1 positive TRUE
## 7: 7 g 49 343 2 positive TRUE
## 8: 8 h 64 512 3 positive TRUE
## 9: 9 i 81 729 4 positive TRUE
## 10: 10 j 100 1000 5 positive TRUE
This way of renaming columns changes the order of the columns. You can use the function setcolorder()
to reorder columns in place.
.SD
(Subsetting Data) is a very powerful operator in data.table. It allows to perform a given function on any subset of columns.
To demonstrate it, as part of our cleaning of dt4c
we will convert all time-related columns into the numeric type. We could do it column by column as
dt4c[, `2010` := as.numeric(`2010`)]
That would be a little tedious to do it for many columns. Thus, we use the .SD
symbol to apply the as.numeric
function to the 2010-2019 columns at once:
cols <- as.character(2010:2019) # column names to apply the conversion to
dt4c[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]
str(dt4c)
## Classes 'data.table' and 'data.frame': 87 obs. of 12 variables:
## $ County: chr "King" "King" "King" "King" ...
## $ City : chr "Algona" "Auburn (part)" "Beaux Arts Village" "Bellevue" ...
## $ 2010 : num 3014 62761 299 122363 4153 ...
## $ 2011 : num 3055 63050 300 123400 4160 ...
## $ 2012 : num 3070 63390 300 124600 4170 ...
## $ 2013 : num 3075 64320 290 132100 4170 ...
## $ 2014 : num 3090 65350 295 134400 4180 ...
## $ 2015 : num 3105 65950 300 135000 4200 ...
## $ 2016 : num 3175 67340 300 139400 4305 ...
## $ 2017 : num 3180 69060 300 140700 4335 ...
## $ 2018 : num 3180 70650 300 142400 4360 ...
## $ 2019 : num 3190 71740 300 145300 4525 ...
## - attr(*, ".internal.selfref")=<externalptr>
## - attr(*, "index")= int(0)
## ..- attr(*, "__County")= int(0)
The lapply()
function is a standard base R function that applies the given function (here as.numeric
) to a list or a dataset (given by the first argument), element by element, or columnn by column. In the data.table context, the .SD
symbol is replaced by a dataset composed by the columns given by .SDcols
. Thus, the function as.numeric
is applied to all columns defined in the cols
object, and the result is then assigned to columns given by left side of :=
, which here have the same names.
The .SD
symbol is often used in aggreagations, and we will see an example of it in the next section.
The most simple aggregations are done by applying an aggregating functions to columns. For example,
dt4c[, sum(`2019`)]
## [1] 2991715
dt4c[, mean(`2019`)]
## [1] 34387.53
For counts use the special symbol .N
:
dt4c[, .N]
## [1] 87
Things get more interesting when we aggregate by groups. Here summing by a column, by a conditional expression and by multiple expressions. If you want to name the aggregated column, use .()
on the j
dimension:
dt4c[, sum(`2019`), by = County]
## County V1
## 1: King 1978025
## 2: Kitsap 92170
## 3: Pierce 468300
## 4: Snohomish 453220
dt4c[, .(Population = sum(`2019`)), by = County]
## County Population
## 1: King 1978025
## 2: Kitsap 92170
## 3: Pierce 468300
## 4: Snohomish 453220
dt4c[, .(Population = sum(`2019`)), by = `2019` > 100000]
## `2019` > 1e+05 Population
## 1: FALSE 1541415
## 2: TRUE 1450300
dt4c[, .(Population = sum(`2019`), Count = .N), by = `2019` > 100000]
## `2019` > 1e+05 Population Count
## 1: FALSE 1541415 81
## 2: TRUE 1450300 6
dt4c[, .(Population = sum(`2019`), Count = .N), by = .(County, `2019` > 100000)]
## County `2019` > 1e+05 Population Count
## 1: King FALSE 850925 35
## 2: King TRUE 1127100 4
## 3: Kitsap FALSE 92170 4
## 4: Pierce FALSE 256900 23
## 5: Pierce TRUE 211400 1
## 6: Snohomish FALSE 341420 19
## 7: Snohomish TRUE 111800 1
The above commands do not change the dt4c
dataset. To add an aggregated column to the dataset, use the :=
symbol:
dt4c[, county_sum := sum(`2019`), by = County]
Now one can use the aggregated column to compute county shares. If it was done correctly, the sum of shares by county should yield 1.
dt4c[, county_share := round(`2019`/county_sum*100, 2)]
dt4c[, sum(county_share), by = County]
## County V1
## 1: King 100.02
## 2: Kitsap 99.99
## 3: Pierce 100.00
## 4: Snohomish 99.99
.SD
The power of the .SD
feature shows especially in aggregations. It allows to easily aggregate for multiple columns. Here we sum all columns from 2010 to 2019 (alternatively, we could use our object cols
created earlier):
dt4c[, lapply(.SD, sum), by = County, .SDcols = `2010`:`2019`]
## County 2010 2011 2012 2013 2014 2015 2016 2017
## 1: King 1606249 1657335 1701280 1728800 1765200 1799520 1859180 1906640
## 2: Kitsap 81111 82505 83880 83495 84465 86260 88280 89080
## 3: Pierce 428487 430040 432245 436005 439330 444070 452230 458920
## 4: Snohomish 411043 412565 414455 418000 420665 427340 433865 439600
## 2 variables not shown: [2018, 2019]
To demonstrate how to join datasets, let’s read in the dataset of counties provided with the class:
counties <- fread("counties.csv")
We now merge this dataset with our dt4c
. The joining columns are County
for dt4c
and county_name
for dataset counties
. There are two ways of doing the join.
The first one is using the merge()
function:
dt4cm <- merge(dt4c, counties, by.x = "County", by.y = "county_name")
Note that if the joining columns would be the same, just use the argument by
.
i
dimension and setting keysThe second way of doing the join is using the first (i
) dimension. For this example we first create another county dataset by aggregating dt4c
and renaming the County
column to be county_name
, chaining the operations one after another:
popcty <- dt4c[ , .(Pop = sum(`2019`)), by = County][, county_name := County][, County := NULL]
The left and right joins can be done via
popcty[counties, , on = .(county_name)] # left join
## Pop county_name county_id acres
## 1: 1978025 King 33 1403205
## 2: 92170 Kitsap 35 246076
## 3: 468300 Pierce 53 675448
## 4: 453220 Snohomish 61 1240445
counties[popcty, , on = .(county_name)] # right join
## county_id acres county_name Pop
## 1: 33 1403205 King 1978025
## 2: 35 246076 Kitsap 92170
## 3: 53 675448 Pierce 468300
## 4: 61 1240445 Snohomish 453220
Setting a key can significantly decrease the runtime. For example, merging the PSRC’s parcels with the buildings dataset was 16x faster than when merging using base R.
setkey(popcty, "county_name")
setkey(counties, "county_name")
When a key is set, the on
argument can be omitted:
counties[popcty]
## county_id acres county_name Pop
## 1: 33 1403205 King 1978025
## 2: 35 246076 Kitsap 92170
## 3: 53 675448 Pierce 468300
## 4: 61 1240445 Snohomish 453220
To explicitely use columns from the left dataset, use the prefix i.
. Below, the Pop
column is from the dataset popcty
, therefore it is used as i.Pop
, while the column acres
is from counties
and therefore does not have any prefix.
counties[popcty, .(county_name, density = i.Pop/acres)]
## county_name density
## 1: King 1.4096479
## 2: Kitsap 0.3745591
## 3: Pierce 0.6933176
## 4: Snohomish 0.3653689
We now add the density column to the counties
dataset:
counties[popcty, density := i.Pop/acres][]
## county_id acres county_name density
## 1: 33 1403205 King 1.4096479
## 2: 35 246076 Kitsap 0.3745591
## 3: 53 675448 Pierce 0.6933176
## 4: 61 1240445 Snohomish 0.3653689
Notice that by default we are doing an outer join:
popcty[county_name != "King"][counties]
## Pop county_name county_id acres density
## 1: NA King 33 1403205 1.4096479
## 2: 92170 Kitsap 35 246076 0.3745591
## 3: 468300 Pierce 53 675448 0.6933176
## 4: 453220 Snohomish 61 1240445 0.3653689
To do an inner join, use
popcty[county_name != "King"][counties, nomatch = NULL]
## Pop county_name county_id acres density
## 1: 92170 Kitsap 35 246076 0.3745591
## 2: 468300 Pierce 53 675448 0.6933176
## 3: 453220 Snohomish 61 1240445 0.3653689
One can also do a join that would give records not in common:
counties[!popcty[county_name != "King"]]
## county_id acres county_name density
## 1: 33 1403205 King 1.409648
The format of the data we’ve been working so far is often called wide. It is a typical data format of demographic datasets.
head(dt4c)
## County City 2010 2011 2012 2013 2014 2015 2016
## 1: King Algona 3014 3055 3070 3075 3090 3105 3175
## 2: King Auburn (part) 62761 63050 63390 64320 65350 65950 67340
## 3: King Beaux Arts Village 299 300 300 290 295 300 300
## 4: King Bellevue 122363 123400 124600 132100 134400 135000 139400
## 5: King Black Diamond 4153 4160 4170 4170 4180 4200 4305
## 6: King Bothell (part) 17090 17150 17280 17440 24610 25410 26590
## 5 variables not shown: [2017, 2018, 2019, county_sum, county_share]
Let’s remove columns that we will not need anymore.
dt4c[, ':='(county_sum = NULL, county_share = NULL)]
Various packages, including ggplot2 require the data to be in the so called long format (or tidy) where each value (here population) is its own record, i.e. each value occupies one row. To convert from wide to long, use the melt()
function. In our example, we need to specify that County
and City
are non-population columns:
dtl <- melt(dt4c, id.vars = c("County", "City"))
head(dtl)
## County City variable value
## 1: King Algona 2010 3014
## 2: King Auburn (part) 2010 62761
## 3: King Beaux Arts Village 2010 299
## 4: King Bellevue 2010 122363
## 5: King Black Diamond 2010 4153
## 6: King Bothell (part) 2010 17090
To name the variable and value columns explicitely, use the following arguments:
dtl <- melt(dt4c, id.vars = c("County", "City"), variable.name = "Year", value.name = "Population")
head(dtl)
## County City Year Population
## 1: King Algona 2010 3014
## 2: King Auburn (part) 2010 62761
## 3: King Beaux Arts Village 2010 299
## 4: King Bellevue 2010 122363
## 5: King Black Diamond 2010 4153
## 6: King Bothell (part) 2010 17090
Now this dataset can be plugged directly into ggplot
:
library(ggplot2)
g <- ggplot(dtl[County == "Pierce"]) +
geom_col(aes(x = City, y = Population, fill = as.factor(Year)),
position = "dodge") + coord_flip() + labs(fill = "Year")
print(g)
If you have a dataset in the long format and need to convert it to the wide format, for example for reporting purposes, use the dcast()
function. Its second argument is a formula (LHS ~ RHS
) that determines which columns stay (RHS) and which columns are cast (LHS). ...
represents all remaining variables:
dtw <- dcast(dtl, County + City ~ Year, value.var = "Population")
dtw <- dcast(dtl, ... ~ Year, value.var = "Population")
head(dtw)
## County City 2010 2011 2012 2013 2014 2015 2016
## 1: King Algona 3014 3055 3070 3075 3090 3105 3175
## 2: King Auburn (part) 62761 63050 63390 64320 65350 65950 67340
## 3: King Beaux Arts Village 299 300 300 290 295 300 300
## 4: King Bellevue 122363 123400 124600 132100 134400 135000 139400
## 5: King Black Diamond 4153 4160 4170 4170 4180 4200 4305
## 6: King Bothell (part) 17090 17150 17280 17440 24610 25410 26590
## 3 variables not shown: [2017, 2018, 2019]
As mentioned previously, the counterpart of the reading function fread()
is the function fwrite()
. It is the most efficient way to export data. Let’s write out our dtw
dataset:
fwrite(dtw, "my_ofm.csv")
See ?fwrite
for more options.