Loading packages
Scraping ubike data
Q2.Scraping dcard forum (No extra point if you have solved the Q1 succesfully)
- Q2.1.ANS Print out class and dimension of your data
- Q2.2.ANS Using bar chart to show the number of post trend by week.
(No extra point) Discovering one more website generated by JSON
- Q3.ANS Print out glimpse(), class, and dimension of your data

Loading packages

library(tidyverse)

## ── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(httr)
library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:purrr':
## 
##     flatten

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

options(stringsAsFactors = F)

Scraping ubike data

Here is the ubike json data link https://tcgbusfs.blob.core.windows.net/blobyoubike/YouBikeTP.json
You are asked to scrape youbike data per 5 minutes, continuing at least 12 hours.
Saving each ubike json per 5 minutes with file name containing timestamp, which means you should have at least 12*12 files

url <- "https://tcgbusfs.blob.core.windows.net/blobyoubike/YouBikeTP.json"
GET(url) %>% content("text") %>% cat(file = "test.json")

## No encoding supplied: defaulting to UTF-8.

read_json("test.json") %>% class

## [1] "list"

for(i in 1:5){
    message(i, "\t", now())
    Sys.sleep(3)
}

## 1    2019-10-13 01:01:26

## 2    2019-10-13 01:01:29

## 3    2019-10-13 01:01:32

## 4    2019-10-13 01:01:35

## 5    2019-10-13 01:01:38

# Converting datetime to character
now() %>% format("%Y%m%d%H%M%S")

## [1] "20191013010141"

# Listing all files in a sub-folder
list.files("data/", ".*\\.rds")

## [1] "allpost_HatePolitics_lin_201910100052.rds"
## [2] "allpost_HatePolitics_lin_201910100106.rds"
## [3] "post_HatePolitics_lin.rds"                
## [4] "rent5911018.rds"                          
## [5] "stopWords.rds"                            
## [6] "typhoon.rds"

# If you pud your data in a sub-folder, you need to use full name to access them.
list.files("data/", ".*\\.rds", full.names = T)

## [1] "data//allpost_HatePolitics_lin_201910100052.rds"
## [2] "data//allpost_HatePolitics_lin_201910100106.rds"
## [3] "data//post_HatePolitics_lin.rds"                
## [4] "data//rent5911018.rds"                          
## [5] "data//stopWords.rds"                            
## [6] "data//typhoon.rds"

Q1.1 ANS list files

Using list.files() to list json files you scraped
Using length to calculate list.files() result to see how many json files you have.

# your code

Q1.2

Reading JSON files one by one
Converting them to data frame individually
Defining one indicator: fullness = sbi/tot
Calculating each site’s fullness by time

# your code

Q1.2 ANS

Using geom_line to display all site’s fullness by time

# your code

Q2.Scraping dcard forum (No extra point if you have solved the Q1 succesfully)

(If you cannot solve Q1 successfully, try to solve the Q2)
Selecting one chatting forum and Scraping at least 3 pages by for-loop (You must find out the rule of url formatting, and use for-loop to scrap at least 3 pages back)
e.g., https://www.dcard.tw/f/relationship
One video tutorial has introduced how to find out dcard forum’s JSON data page by page.
Adding code chunks as you need below

Q2.1.ANS Print out class and dimension of your data

# your code here

Q2.2.ANS Using bar chart to show the number of post trend by week.

# your code here

(No extra point) Discovering one more website generated by JSON

Try to find another website or webpage which is generated by JSON.
Website title: [Fill-in-here]
Website url: [Fill-in-here]
Selecting one chatting forum and Scraping at least 3 pages by for-loop (You must find out the rule of url formatting, and use for-loop to scrap at least 3 pages back)
Adding code chunks as you need below

Q3.ANS Print out glimpse(), class, and dimension of your data

# YOUR CODE SHOULD BE HERE

AS04_scraping_json

YOUR_NAME

10/10/2019

Loading packages

Scraping ubike data

Q1.1 ANS list files

Q1.2

Q1.2 ANS

Q2.Scraping dcard forum (No extra point if you have solved the Q1 succesfully)

Q2.1.ANS Print out class and dimension of your data

Q2.2.ANS Using bar chart to show the number of post trend by week.

(No extra point) Discovering one more website generated by JSON

Q3.ANS Print out glimpse(), class, and dimension of your data