Pull most recent files in R
Pulling most recent files in R
###### by Joel Christian Andrade
Many coding pipelines involve taking raw data files, formulating code and eventually working with a few, if not many, output files.
Take these files in the directory /raw_data
below, downloaded from publicly available Apple mobility database tracking movement.
applemobilitytrends_2020-08-29.csv
applemobilitytrends_2020-09-12.csv
applemobilitytrends_2020-09-18.csv
applemobilitytrends_2020-09-25.csv
applemobilitytrends_2020-09-30.csv
applemobilitytrends_2020-10-02.csv
applemobilitytrends_2020-11-08.csv
R functions involved
While retroactive data analysis is useful, the most recent file will include all data from previous files. Thus, to retrieve the most up to date information while using R, we must use some functions which are pre-loaded into R.
First, we must elucidate these functions
file.info
= provides some basic information about a file(s)- size
- if the file is directory
- if the file is an executable
- editing times
mtime
= modification timeatime
= access timectime
= creation time
-
list.files
= list of all files within a directory which.min
orwhich.max
= determine minimum or maximum with respect to a given set of variables
Let’s take the directory raw_data
and view various outputs using these functions
file.info()
> file.info("/raw_data")
size isdir mode mtime ctime atime exe
data/raw_data 0 TRUE 777 2020-10-4 13:03:40 2020-09-19 17:14:40 2020-10-04 13:03:40 no
list.files()
> list.files("/raw_data")
[1] "applemobilitytrends_2020-09-12.csv" "applemobilitytrends_2020-09-18.csv" "applemobilitytrends_2020-09-25.csv"
[4] "applemobilitytrends_2020-09-30.csv" "applemobilitytrends_2020-10-02.csv" "applemobilitytrends_2020-08-29.csv"
[7] "applemobilitytrends_2020-11-08.csv"
Putting it all together
Now, we want to combine those functions to display a specific editing time (atime, atime, ctime)
Let’s combine all the elements together using vectors and rownames()
to pull the most recently created file via ctime
.
> raw_data_file_info <- file.info(
list.files("/raw_data", full.names =T)
)
> most_recent_file <- rownames(raw_data_file_info)[which.max(raw_data_file_info$ctime)]
> most_recent_file
> "raw_data/applemobilitytrends_2020-10-02.csv"
Notice that we used the full.names = T
option because we want the entire path of the file to be placed in the vector. Otherwise, you need to be in a specific project to utilize this code.
You can alter many things to cater to your own project.
Hope this tutorial helps!