AS03_Know-Your-Data

作業目的: Know Your Data

這份作業希望能夠讓你熟悉讀取不同檔案型態的資料，並利用 dplyr 與管線運算子操作資料表，讓你可以認識自己要分析的資料！共有兩個題組，每個題組有七題，前五題計分、後兩題不計分。

題目大部分都要求印出 tibble/dataframe，若你的資料是 tibble，直接印出即可，若是 dataframe，請印出前三列即可。

作業: Know Your Data

滿分 100 分

data importing & data manipulation (i) (50 分)

The world cannot be understood without numbers. But the world cannot be understood with numbers alone.

― Hans Rosling, Factfulness (漢斯・羅斯林，《真確》)

GapMinder Foundation 是一個致力於「利用統計數字理解世界上各個國家社會、經濟、環境發展」的基金會，除了開發軟體視覺化呈現前述的統計數字以外，他們也將蒐集到的資料公開上網。

若你對他們做的事情有興趣，可以先看這一篇介紹文如何用 30 秒了解台灣發展與全球趨勢：用 GapMinder 培養正確世界觀。但跟文中所說不同的是，現在 GapMinder Foundation 已經有提供台灣的資料了！因此，底下我們將利用 R 語言，來探索這份資料。

data importing and renaming columns: 請讀取請右方路徑的 csv 檔，路徑為 data 資料夾 -> AS03 資料夾 -> gapminder_raw.tsv，並取名為 df_gapminder_raw。這個檔案是 tsv 而非 csv，而且它的欄位名稱 (column names) 遺失了，請小心謹慎！將檔案讀進來之後，請依序更改 df_gapminder_raw 的欄位名稱成 country, continent, year, lifeExp(預期壽命), pop(人口數), gdpPercap(人均GDP)，括弧內是我的說明，請用英文命名欄位即可，最後印出改名後的df_gapminder_raw

### your code
library(tidyverse)
library(lubridate)
df_gapminder_raw <- read_tsv("data/AS03/gapminder_raw.tsv", col_names = F)
# using rename()
df_gapminder_raw <- df_gapminder_raw %>% rename(country = 1, continent = 2, year = 3, lifeExp = 4, pop = 5, gdpPercap = 6)
# using `colnames<-`
df_gapminder_raw <- df_gapminder_raw %>% `colnames<-`(c("country", "continent", "year", "lifeExp", "pop", "gdpPercap"))
# using colnames()
colnames(df_gapminder_raw) <- c("country", "continent", "year", "lifeExp", "pop", "gdpPercap")

df_gapminder_raw

#> # A tibble: 1,704 x 6
#>    country     continent  year lifeExp      pop gdpPercap
#>    <chr>       <chr>     <dbl>   <dbl>    <dbl>     <dbl>
#>  1 Afghanistan Asia       1952    28.8  8425333      779.
#>  2 Afghanistan Asia       1957    30.3  9240934      821.
#>  3 Afghanistan Asia       1962    32.0 10267083      853.
#>  4 Afghanistan Asia       1967    34.0 11537966      836.
#>  5 Afghanistan Asia       1972    36.1 13079460      740.
#>  6 Afghanistan Asia       1977    38.4 14880372      786.
#>  7 Afghanistan Asia       1982    39.9 12881816      978.
#>  8 Afghanistan Asia       1987    40.8 13867957      852.
#>  9 Afghanistan Asia       1992    41.7 16317921      649.
#> 10 Afghanistan Asia       1997    41.8 22227415      635.
#> # … with 1,694 more rows

detecting NAs: df_gapminder_raw 當中有 missing value 嗎？請利用程式碼研究每個欄位的 missing value 情形，回報哪些國家的哪些欄位有 missing value，並請闡述你認為該怎麼處理這些 missing value ，要手動補上嗎？要直接把有 missing value 的列都踢掉嗎？還是計算相關欄位再踢掉？最後將你處理過 missing value 的結果(如果你認為都不用動，就不用動沒關係)儲存在 df_gapminder_clean 中並將它印出

### your code
df_gapminder_raw %>% filter(is.na(country))
df_gapminder_raw %>% filter(is.na(continent))
df_gapminder_raw %>% filter(is.na(year))
df_gapminder_raw %>% filter(is.na(lifeExp))
df_gapminder_raw %>% filter(is.na(pop))
df_gapminder_raw %>% filter(is.na(gdpPercap))

df_gapminder_clean <- df_gapminder_raw %>%
  mutate(continent = if_else(country == "Nepal", "Asia", continent))

### your text (把文字描述寫在 code chunk 中)
# country 沒有
# continent 有, Nepal, 所以應該是 Asia
# year 沒有
# lifeExp 有, Portugal, 這樣這幾筆資料算 lifeExp 就無法使用，但不用踢掉
# pop 有, Puerto Rico, 這樣這幾筆資料算 pop 就無法使用，但不用踢掉
# gdpPercap 有, Reunion, 這樣這幾筆資料算 gdpPercap 就無法使用，但不用踢掉

#> # A tibble: 0 x 6
#> # … with 6 variables: country <chr>, continent <chr>, year <dbl>,
#> #   lifeExp <dbl>, pop <dbl>, gdpPercap <dbl>
#> # A tibble: 12 x 6
#>    country continent  year lifeExp      pop gdpPercap
#>    <chr>   <chr>     <dbl>   <dbl>    <dbl>     <dbl>
#>  1 Nepal   <NA>       1952    36.2  9182536      546.
#>  2 Nepal   <NA>       1957    37.7  9682338      598.
#>  3 Nepal   <NA>       1962    39.4 10332057      652.
#>  4 Nepal   <NA>       1967    41.5 11261690      676.
#>  5 Nepal   <NA>       1972    44.0 12412593      675.
#>  6 Nepal   <NA>       1977    46.7 13933198      694.
#>  7 Nepal   <NA>       1982    49.6 15796314      718.
#>  8 Nepal   <NA>       1987    52.5 17917180      776.
#>  9 Nepal   <NA>       1992    55.7 20326209      898.
#> 10 Nepal   <NA>       1997    59.4 23001113     1011.
#> 11 Nepal   <NA>       2002    61.3 25873917     1057.
#> 12 Nepal   <NA>       2007    63.8 28901790     1091.
#> # A tibble: 0 x 6
#> # … with 6 variables: country <chr>, continent <chr>, year <dbl>,
#> #   lifeExp <dbl>, pop <dbl>, gdpPercap <dbl>
#> # A tibble: 12 x 6
#>    country  continent  year lifeExp      pop gdpPercap
#>    <chr>    <chr>     <dbl>   <dbl>    <dbl>     <dbl>
#>  1 Portugal Europe     1952      NA  8526050     3068.
#>  2 Portugal Europe     1957      NA  8817650     3775.
#>  3 Portugal Europe     1962      NA  9019800     4728.
#>  4 Portugal Europe     1967      NA  9103000     6362.
#>  5 Portugal Europe     1972      NA  8970450     9022.
#>  6 Portugal Europe     1977      NA  9662600    10172.
#>  7 Portugal Europe     1982      NA  9859650    11754.
#>  8 Portugal Europe     1987      NA  9915289    13039.
#>  9 Portugal Europe     1992      NA  9927680    16207.
#> 10 Portugal Europe     1997      NA 10156415    17641.
#> 11 Portugal Europe     2002      NA 10433867    19971.
#> 12 Portugal Europe     2007      NA 10642836    20510.
#> # A tibble: 12 x 6
#>    country     continent  year lifeExp   pop gdpPercap
#>    <chr>       <chr>     <dbl>   <dbl> <dbl>     <dbl>
#>  1 Puerto Rico Americas   1952    64.3    NA     3082.
#>  2 Puerto Rico Americas   1957    68.5    NA     3907.
#>  3 Puerto Rico Americas   1962    69.6    NA     5108.
#>  4 Puerto Rico Americas   1967    71.1    NA     6929.
#>  5 Puerto Rico Americas   1972    72.2    NA     9123.
#>  6 Puerto Rico Americas   1977    73.4    NA     9771.
#>  7 Puerto Rico Americas   1982    73.8    NA    10331.
#>  8 Puerto Rico Americas   1987    74.6    NA    12281.
#>  9 Puerto Rico Americas   1992    73.9    NA    14642.
#> 10 Puerto Rico Americas   1997    74.9    NA    16999.
#> 11 Puerto Rico Americas   2002    77.8    NA    18856.
#> 12 Puerto Rico Americas   2007    78.7    NA    19329.
#> # A tibble: 12 x 6
#>    country continent  year lifeExp    pop gdpPercap
#>    <chr>   <chr>     <dbl>   <dbl>  <dbl>     <dbl>
#>  1 Reunion Africa     1952    52.7 257700        NA
#>  2 Reunion Africa     1957    55.1 308700        NA
#>  3 Reunion Africa     1962    57.7 358900        NA
#>  4 Reunion Africa     1967    60.5 414024        NA
#>  5 Reunion Africa     1972    64.3 461633        NA
#>  6 Reunion Africa     1977    67.1 492095        NA
#>  7 Reunion Africa     1982    69.9 517810        NA
#>  8 Reunion Africa     1987    71.9 562035        NA
#>  9 Reunion Africa     1992    73.6 622191        NA
#> 10 Reunion Africa     1997    74.8 684810        NA
#> 11 Reunion Africa     2002    75.7 743981        NA
#> 12 Reunion Africa     2007    76.4 798094        NA

checking on Taiwan: 利用 df_gapminder_clean 篩選出台灣的資料，請用文字描述你的觀察，你看到什麼趨勢？

### your code
df_gapminder_clean %>% filter(country == "Taiwan")

### your text
# 台灣的lifeExp, pop, gdpPercap 逐年成長沒有停過

#> # A tibble: 12 x 6
#>    country continent  year lifeExp      pop gdpPercap
#>    <chr>   <chr>     <dbl>   <dbl>    <dbl>     <dbl>
#>  1 Taiwan  Asia       1952    58.5  8550362     1207.
#>  2 Taiwan  Asia       1957    62.4 10164215     1508.
#>  3 Taiwan  Asia       1962    65.2 11918938     1823.
#>  4 Taiwan  Asia       1967    67.5 13648692     2644.
#>  5 Taiwan  Asia       1972    69.4 15226039     4063.
#>  6 Taiwan  Asia       1977    70.6 16785196     5597.
#>  7 Taiwan  Asia       1982    72.2 18501390     7426.
#>  8 Taiwan  Asia       1987    73.4 19757799    11055.
#>  9 Taiwan  Asia       1992    74.3 20686918    15216.
#> 10 Taiwan  Asia       1997    75.2 21628605    20207.
#> 11 Taiwan  Asia       2002    77.0 22454239    23235.
#> 12 Taiwan  Asia       2007    78.4 23174294    28718.

mutating, filtering, and counting: 請在 df_gapminder_clean 中新增欄位，判斷每筆資料的預期壽命介於以下哪個區間（小於 60、大於等於 60 但小於 70、大於等於 70），接著計算歐洲的國家中，各年裡面預期壽命各區間的資料各有多少筆。務必在計算時考慮該如何處理 missing value，如果你前面選擇留著，這邊要留著嗎，如果你前面刪掉，對你現在的運算會有影響嗎？請印出結果，並用文字說明理由

### your code
df_gapminder_clean %>%
  filter(!is.na(lifeExp)) %>%
  mutate(life_interval = case_when(
    lifeExp < 60 ~ "<60",
    lifeExp >= 60 & lifeExp < 70 ~ "60-70",
    lifeExp >= 70 ~ ">=70",
    TRUE ~ "others"
  )) %>%
  filter(continent == "Europe") %>%
  count(year, life_interval)

### your text
# 我踢掉 missing value，因為本來就是缺值，所以計算時不該納入

#> # A tibble: 28 x 3
#>     year life_interval     n
#>    <dbl> <chr>         <int>
#>  1  1952 <60               6
#>  2  1952 >=70              5
#>  3  1952 60-70            18
#>  4  1957 <60               3
#>  5  1957 >=70              7
#>  6  1957 60-70            19
#>  7  1962 <60               1
#>  8  1962 >=70             12
#>  9  1962 60-70            16
#> 10  1967 <60               1
#> # … with 18 more rows

creating variables and ordering: 利用 df_gapminder_clean，先計算亞洲各國 1972 年的總額 GDP (人口數 x 人均GDP)，再依照預期壽命由大到小排列，最後印出國家、總額 GDP、預期壽命三個欄位的 tibble/dataframe

### your code
df_gapminder_clean %>% filter(continent == "Asia") %>%
  mutate(GDP = gdpPercap*pop) %>%
  arrange(desc(lifeExp)) %>%
  select(country, GDP, lifeExp)

#> # A tibble: 396 x 3
#>    country              GDP lifeExp
#>    <chr>              <dbl>   <dbl>
#>  1 Japan            4.04e12    82.6
#>  2 Hong Kong, China 2.77e11    82.2
#>  3 Japan            3.63e12    82  
#>  4 Hong Kong, China 2.04e11    81.5
#>  5 Israel           1.64e11    80.7
#>  6 Japan            3.63e12    80.7
#>  7 Hong Kong, China 1.84e11    80  
#>  8 Singapore        2.15e11    80.0
#>  9 Israel           1.32e11    79.7
#> 10 Japan            3.34e12    79.4
#> # … with 386 more rows

advanced - subsetting: 利用 df_gapminder_clean，先篩選出歐洲各國各自人口最多的年份(e.g. A國 1992 年的人口多於 2007, 2002, etc.，因此留下 A國 1992 年的資料)，接著再依照預期壽命篩選出最高的十個國家(e.g. A國 1992年人口最多, B國 2002年的人口最多，比較兩國這兩年各自的預期壽命)，然後依照國家名稱由 A 到 Z 排列，最後印出國家、年份、預期壽命、人口數的 tibble/dataframe

### your code
df_gapminder_clean %>% filter(continent == "Europe") %>%
  group_by(country) %>%
  mutate(pop_max = max(pop)) %>%
  filter(pop == pop_max) %>%
  ungroup() %>%
  arrange(desc(lifeExp)) %>%
  slice(1:10) %>%
  arrange(country) %>%
  select(country, year, lifeExp, pop)

#> # A tibble: 10 x 4
#>    country      year lifeExp      pop
#>    <chr>       <dbl>   <dbl>    <dbl>
#>  1 Austria      2007    79.8  8199783
#>  2 France       2007    80.7 61083916
#>  3 Greece       2007    79.5 10706290
#>  4 Iceland      2007    81.8   301931
#>  5 Italy        2007    80.5 58147733
#>  6 Netherlands  2007    79.8 16570613
#>  7 Norway       2007    80.2  4627926
#>  8 Spain        2007    80.9 40448191
#>  9 Sweden       2007    80.9  9031088
#> 10 Switzerland  2007    81.7  7554661

advanced - in-group comparing: 利用 df_gapminder_clean，比較各國 1992 年和 2007 年的預期壽命差距，接著留下各洲中預期壽命差距最大與最小的國家，最後印出洲、國家、預期壽命差距的 tibble/dataframe

### your code
df_gapminder_clean %>%
  filter(!is.na(lifeExp)) %>%
  filter(year %in% c(1992, 2007)) %>%
  group_by(continent, country) %>%
  summarise(lifeExp_diff = max(lifeExp) - min(lifeExp)) %>%
  ungroup() %>%
  group_by(continent) %>%
  arrange(continent, desc(lifeExp_diff)) %>%
  mutate(lifeExp_diff_rank = row_number()) %>%
  filter(lifeExp_diff_rank %in% c(min(lifeExp_diff_rank), max(lifeExp_diff_rank))) %>%
  select(continent, country, lifeExp_diff)

#> # A tibble: 10 x 3
#> # Groups:   continent [5]
#>    continent country             lifeExp_diff
#>    <chr>     <chr>                      <dbl>
#>  1 Africa    Rwanda                   22.6   
#>  2 Africa    Togo                      0.359 
#>  3 Americas  Nicaragua                 7.06  
#>  4 Americas  Trinidad and Tobago       0.0430
#>  5 Asia      Nepal                     8.06  
#>  6 Asia      Iraq                      0.084 
#>  7 Europe    Turkey                    5.63  
#>  8 Europe    Montenegro                0.892 
#>  9 Oceania   New Zealand               3.87  
#> 10 Oceania   Australia                 3.67

data importing & data manipulation (ii) (50 分)

PTT 上面有好幾個和影集(drama)有關的子板(board)，下方提供 03/02 當天所抓取的相關資料，請載入這份資料後回答問題。

data importing: 請讀取請右方路徑的 csv 檔，路徑為 data 資料夾 -> AS03 資料夾 -> df_main_clean.csv，取名為 df_main_clean後印出

### your code
df_main_clean <- read_csv("data/AS03/df_main_clean.csv")
df_main_clean

#> # A tibble: 1,457 x 10
#>    board  type   title   date       comments author  text      IP       link    
#>    <chr>  <chr>  <chr>   <date>        <dbl> <chr>   <chr>     <chr>    <chr>   
#>  1 Korea… [公告] Fw: [公… 2021-01-18       20 XDDDD5… "┌──┬│┌─… "※ 發信站:… https:/…
#>  2 Korea… [新聞] [新聞] 柳… 2021-01-18        5 jinyi … "演員柳惠英被提… "※ 發信站:… https:/…
#>  3 Korea… [閒聊] [閒聊]  … 2021-01-18      101 jay947… "花了幾個週末把… "※ 發信站:… https:/…
#>  4 Korea… [新聞] [新聞] 孫… 2021-01-18        9 jinyi … "據多名電視臺相… "※ 發信站:… https:/…
#>  5 Korea… [LIVE] [LIVE]… 2021-01-18      249 kawasa… "前輩，那支口紅… "※ 發信站:… https:/…
#>  6 Korea… [LIVE] [LIVE]… 2021-01-18      406 tcchen… "劇名:▁▁▁▁… "※ 發信站:… https:/…
#>  7 Korea… [LIVE] [LIVE]… 2021-01-18       80 diana8… "───────… "※ 發信站:… https:/…
#>  8 Korea… [心得] [心得] 惡… 2021-01-18       28 analyt… "最後一集到底來… "※ 發信站:… https:/…
#>  9 Korea… [求薦] [求薦] 想… 2021-01-18      214 greenb… "1.重刷doc… "※ 發信站:… https:/…
#> 10 Korea… <NA>   看完13集驅… 2021-01-19       35 patty6… "不知道各位大大… "※ 發信站:… https:/…
#> # … with 1,447 more rows, and 1 more variable: time <dttm>

counting: 這份資料來自好幾個不同的子板 e.g. 韓劇版，請問各子板各有多少篇文章？請印出 tibble/dataframe

### your code
df_main_clean %>% count(board)

#> # A tibble: 4 x 2
#>   board           n
#> * <chr>       <int>
#> 1 China-Drama   361
#> 2 EAseries      383
#> 3 KoreaDrama    369
#> 4 TaiwanDrama   344

filtering: 請找出韓劇版(KoreaDrama)中文章留言數介於 50 到 100 之間（包含）的文章，並印出 tibble/dataframe

### your code
df_main_clean %>% filter(board == "KoreaDrama") %>%
  filter(comments >= 50 & comments <= 100)

#> # A tibble: 62 x 10
#>    board  type   title   date       comments author  text      IP       link    
#>    <chr>  <chr>  <chr>   <date>        <dbl> <chr>   <chr>     <chr>    <chr>   
#>  1 Korea… [LIVE] [LIVE]… 2021-01-18       80 diana8… "───────… "※ 發信站:… https:/…
#>  2 Korea… [心得] [心得] 看… 2021-01-20       51 Eva041… "天啊~這次真的… "※ 發信站:… https:/…
#>  3 Korea… [閒聊] [閒聊] R… 2021-01-20       51 xji6tp… "繼續用愛發電防… "※ 發信站:… https:/…
#>  4 Korea… [LIVE] [LIVE]… 2021-01-20       75 kawasa… "EP13預告\… "※ 發信站:… https:/…
#>  5 Korea… [心得] [心得] 晝… 2021-01-21       80 lyw318… "一般追劇，要嘛… "※ 發信站:… https:/…
#>  6 Korea… [心得] [心得] 推… 2021-01-21       81 ilovef… "晝與夜是一部怪… "※ 發信站:… https:/…
#>  7 Korea… [閒聊] [閒聊] J… 2021-01-22       62 rainin… "[這是碎碎念]… "※ 發信站:… https:/…
#>  8 Korea… [心得] [心得] 女… 2021-01-23       61 Kduran… "因為目前很喜歡… "※ 發信站:… https:/…
#>  9 Korea… [心得] [心得] 《… 2021-01-26       63 analyt… "看到新聞說有人… "※ 發信站:… https:/…
#> 10 Korea… [心得] [心得] 看… 2021-01-26       56 n25 (n… "細部情節帶給人… "※ 發信站:… https:/…
#> # … with 52 more rows, and 1 more variable: time <dttm>

mutating and counting: 請先增加一個每則文章月份的欄位，接著計算各版中各個月份各自有多少篇文章，最後印出板名、月份、文章數的 tibble/dataframe

### your code
df_main_clean %>% mutate(month = month(date)) %>% count(board, month)

#> # A tibble: 21 x 3
#>    board       month     n
#>    <chr>       <dbl> <int>
#>  1 China-Drama     1   135
#>  2 China-Drama     2   137
#>  3 China-Drama     3    24
#>  4 China-Drama     4     1
#>  5 China-Drama     8     1
#>  6 China-Drama    12    63
#>  7 EAseries        1   111
#>  8 EAseries        2   240
#>  9 EAseries        3    27
#> 10 EAseries        4     3
#> # … with 11 more rows

counting and comparing: “2021-02-23”到“2021-03-01”之間(包含這兩天)，哪個板的文章數量最多？請留下文章數最多的板，接著依照文章數、板名的順序印出 tibble/dataframe

### your code
library(lubridate)
df_main_clean %>%
  filter(date >= as_date("2021-02-23") & date <= as_date("2021-03-01")) %>%
  count(board) %>%
  arrange(desc(n)) %>%
  slice(1) %>%
  select(n, board)

#> # A tibble: 1 x 2
#>       n board   
#>   <int> <chr>   
#> 1    56 EAseries

advanced - in-group subsetting: 請找出各板中“發文數量最多的一天”當天“留言數前兩多”的文章，請印出板名、標題、日期、留言數的 tibble/dataframe

### your code
df_main_clean %>%
  group_by(board, date) %>%
  mutate(n = n()) %>%
  ungroup() %>%
  group_by(board) %>%
  filter(n == max(n)) %>%
  arrange(board, desc(comments)) %>%
  filter(row_number() %in% c(1:2)) %>%
  select(board, title, date, comments)

#> # A tibble: 8 x 4
#> # Groups:   board [4]
#>   board       title                                        date       comments
#>   <chr>       <chr>                                        <date>        <dbl>
#> 1 China-Drama [心得] 太過期待以致失望的有翡（含原著雷）    2021-01-24      415
#> 2 China-Drama [Live] 上陽賦 EP25 蕭綦高光即將開始          2021-01-24      337
#> 3 EAseries    [心得] Wandavision S01E05 大暴雷啦!!         2021-02-05      182
#> 4 EAseries    [請益] 是否可推薦帶入美式生活的影集          2021-01-22       94
#> 5 KoreaDrama  [情報] 哲仁王后花絮 未公開畫面 金正賢專訪    2021-02-17     1352
#> 6 KoreaDrama  [心得] 哲仁王后結局                          2021-02-17      836
#> 7 TaiwanDrama [LIVE] 台視《生生世世》第162集 台視週間20:00 2020-12-30      266
#> 8 TaiwanDrama [LIVE] 愛不愛栗絲 第三集                     2020-12-30       64

advanced - transforming variables and counting: 文章類型(type)非常多樣，譬如“[心得]”, “[閒聊]” 等，請保留“[心得]”、“[閒聊]”、“[新聞]”、“[情報]”，並將“[LIVE]”與“[Live]”合併成“[LIVE]”，剩下其他種類或是 missing value 都改成“[其他]”，修改後再計算各板的文章類型比例(百分比)，最後按照每個版的名稱由 A 到 Z、比例由大到小排列，並印出板名、類型、類型文章數、類型文章數占比的 tibble/dataframe

提示：你可以查一下 case_when() 的用法，或是你想要用 ifelse() 或者 if_else() 都行

### your code
df_main_clean %>% mutate(type = case_when(type == "[LIVE]" ~ "[LIVE]",
                                          type == "[Live]" ~ "[LIVE]",
                                          type %in% c("[心得]","[閒聊]","[新聞]","[情報]") ~ type,
                                          is.na(type) ~ "[其他]",
                                          TRUE ~ "[其他]")) %>%
  group_by(board, type) %>% summarise(n = n()) %>%
  ungroup() %>%
  group_by(board) %>%
  mutate(per = n/sum(n)) %>%
  ungroup() %>%
  arrange(board, desc(per))

#> # A tibble: 23 x 4
#>    board       type       n    per
#>    <chr>       <chr>  <int>  <dbl>
#>  1 China-Drama [心得]   128 0.355 
#>  2 China-Drama [閒聊]   110 0.305 
#>  3 China-Drama [其他]    57 0.158 
#>  4 China-Drama [LIVE]    45 0.125 
#>  5 China-Drama [情報]    12 0.0332
#>  6 China-Drama [新聞]     9 0.0249
#>  7 EAseries    [新聞]   123 0.321 
#>  8 EAseries    [心得]   109 0.285 
#>  9 EAseries    [其他]    91 0.238 
#> 10 EAseries    [閒聊]    59 0.154 
#> # … with 13 more rows

AS03_Know-Your-Data

你是誰 R09342000 新聞所碩五

2021/03/16

作業目的: Know Your Data

作業: Know Your Data

data importing & data manipulation (i) (50 分)

data importing & data manipulation (ii) (50 分)