Loading packages

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(httr)
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
options(stringsAsFactors = F)

Scraping ubike data

url <- "https://tcgbusfs.blob.core.windows.net/blobyoubike/YouBikeTP.json"
GET(url) %>% content("text") %>% cat(file = "test.json")
## No encoding supplied: defaulting to UTF-8.
read_json("test.json") %>% class
## [1] "list"
for(i in 1:5){
    message(i, "\t", now())
    Sys.sleep(3)
}
## 1    2019-10-13 01:01:26
## 2    2019-10-13 01:01:29
## 3    2019-10-13 01:01:32
## 4    2019-10-13 01:01:35
## 5    2019-10-13 01:01:38
# Converting datetime to character
now() %>% format("%Y%m%d%H%M%S")
## [1] "20191013010141"
# Listing all files in a sub-folder
list.files("data/", ".*\\.rds") 
## [1] "allpost_HatePolitics_lin_201910100052.rds"
## [2] "allpost_HatePolitics_lin_201910100106.rds"
## [3] "post_HatePolitics_lin.rds"                
## [4] "rent5911018.rds"                          
## [5] "stopWords.rds"                            
## [6] "typhoon.rds"
# If you pud your data in a sub-folder, you need to use full name to access them.
list.files("data/", ".*\\.rds", full.names = T) 
## [1] "data//allpost_HatePolitics_lin_201910100052.rds"
## [2] "data//allpost_HatePolitics_lin_201910100106.rds"
## [3] "data//post_HatePolitics_lin.rds"                
## [4] "data//rent5911018.rds"                          
## [5] "data//stopWords.rds"                            
## [6] "data//typhoon.rds"

Q1.1 ANS list files

  • Using list.files() to list json files you scraped
  • Using length to calculate list.files() result to see how many json files you have.
# your code

Q1.2

  • Reading JSON files one by one
  • Converting them to data frame individually
  • Defining one indicator: fullness = sbi/tot
  • Calculating each site’s fullness by time
# your code 

Q1.2 ANS

  • Using geom_line to display all site’s fullness by time
# your code

Q2.Scraping dcard forum (No extra point if you have solved the Q1 succesfully)

Q2.1.ANS Print out class and dimension of your data

# your code here

Q2.2.ANS Using bar chart to show the number of post trend by week.

# your code here

(No extra point) Discovering one more website generated by JSON

Q3.ANS Print out glimpse(), class, and dimension of your data

# YOUR CODE SHOULD BE HERE