Data frame – Shi no Shigoto

Data frame เป็นหนึ่งใน data structure ที่พบบ่อยที่สุดในการทำงานกับข้อมูล

Data frame เก็บข้อมูลในรูปแบบตาราง โดย:

1 row = 1 รายการ (เช่น ข้อมูลของ John)
1 column = 1 ประเภทข้อมูล (เช่น อายุ)

ตัวอย่าง data frame:

ในบทความนี้ เราจะมาสรุป 10 วิธีในการทำงานกับ data frame กัน:

Creating: การสร้าง data frame
Previewing: การดูข้อมูล data frame
Indexing: การเลือก columns ที่ต้องการ
Subsetting: การเลือก rows และ columns ที่ต้องการ
Filtering: การกรองข้อมูล
Sorting: การจัดลำดับข้อมูล
Aggregating: การสรุปข้อมูล
Adding columns: การเพิ่ม columns ใหม่
Removing columns: การลบ columns
Binding: การเชื่อมข้อมูลใหม่เข้ากับ data frame

ถ้าพร้อมแล้ว ไปเริ่มกันเลย

1️⃣ Creating

เราสามารถสร้าง data frame ด้วย data.frame() ซึ่งต้องการ ชื่อ column และ vector ที่เก็บข้อมูลของ column นั้น ๆ:

# Create a data frame
jjk_df <- data.frame(
  ID = 1:10,
  Name = c("Yuji Itadori", "Megumi Fushiguro", "Nobara Kugisaki", "Satoru Gojo",
           "Maki Zenin", "Toge Inumaki", "Panda", "Kento Nanami", "Yuta Okkotsu", "Suguru Geto"),
  Age = c(15, 16, 16, 28, 17, 17, 18, 27, 17, 27),
  Grade = c("1st Year", "1st Year", "1st Year", "Special", "2nd Year",
            "2nd Year", "2nd Year", "Special", "Special", "Special"),
  CursedEnergy = c(80, 95, 70, 999, 60, 85, 75, 200, 300, 400),
  Technique = c("Divergent Fist", "Ten Shadows", "Straw Doll", "Limitless",
                "Heavenly Restriction", "Cursed Speech", "Gorilla Mode",
                "Ratio Technique", "Rika", "Cursed Spirit Manipulation"),
  Missions = c(25, 30, 20, 120, 35, 28, 40, 90, 55, 80)
)

# View the result
jjk_df

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20
4   4      Satoru Gojo  28  Special          999                  Limitless      120
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80

2️⃣ Previewing

เรามี 8 functions สำหรับดูข้อมูล data frame:

No.	Function	For
1	`View()`	ดูข้อมูลทั้งหมด
2	`head()`	ดูข้อมูล 6 rows แรก
3	`tail()`	ดูข้อมูล 6 rows สุดท้าย
4	`str()`	ดูโครงสร้างข้อมูล
5	`summary()`	ดูสถิติข้อมูล
6	`dim()`	ดูจำนวน rows และ columns
7	`nrow()`	ดูจำนวน rows
8	`ncol()`	ดูจำนวน columns

เราไปดูตัวอย่างทั้ง 8 functions กัน

👀 View()

View() ใช้ดูข้อมูลทั้งหมดใน data frame:

# View the whole data frame
View(jjk_df)

เราจะเห็นผลลัพธ์ในหน้าต่างใหม่:

Note: เนื่องจาก View() แสดงข้อมูลทั้งหมด จึงเหมาะกับการใช้งานกับ data frame ขนาดเล็ก

🙊 head()

head() ใช้ดูข้อมูล 6 rows แรกใน data frame:

# View the first 6 rows
head(jjk_df)

ผลลัพธ์:

  ID             Name Age    Grade CursedEnergy            Technique Missions
1  1     Yuji Itadori  15 1st Year           80       Divergent Fist       25
2  2 Megumi Fushiguro  16 1st Year           95          Ten Shadows       30
3  3  Nobara Kugisaki  16 1st Year           70           Straw Doll       20
4  4      Satoru Gojo  28  Special          999            Limitless      120
5  5       Maki Zenin  17 2nd Year           60 Heavenly Restriction       35
6  6     Toge Inumaki  17 2nd Year           85        Cursed Speech       28

🐒 tail()

tail() ใช้ดูข้อมูล 6 rows สุดท้ายใน data frame:

# View the last 6 rows
tail(jjk_df)

ผลลัพธ์:

   ID         Name Age    Grade CursedEnergy                  Technique Missions
5   5   Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
6   6 Toge Inumaki  17 2nd Year           85              Cursed Speech       28
7   7        Panda  18 2nd Year           75               Gorilla Mode       40
8   8 Kento Nanami  27  Special          200            Ratio Technique       90
9   9 Yuta Okkotsu  17  Special          300                       Rika       55
10 10  Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80

🏗️ str()

str() ใช้ดูโครงสร้างข้อมูลของ data frame:

# View the data frame structure
str(jjk_df)

ผลลัพธ์:

'data.frame':	10 obs. of  7 variables:
 $ ID          : int  1 2 3 4 5 6 7 8 9 10
 $ Name        : chr  "Yuji Itadori" "Megumi Fushiguro" "Nobara Kugisaki" "Satoru Gojo" ...
 $ Age         : num  15 16 16 28 17 17 18 27 17 27
 $ Grade       : chr  "1st Year" "1st Year" "1st Year" "Special" ...
 $ CursedEnergy: num  80 95 70 999 60 85 75 200 300 400
 $ Technique   : chr  "Divergent Fist" "Ten Shadows" "Straw Doll" "Limitless" ...
 $ Missions    : num  25 30 20 120 35 28 40 90 55 80

จากผลลัพธ์ เราจะเห็นข้อมูล 5 อย่าง ได้แก่:

จำนวน rows (obs.)
จำนวน columns (variables)
ชื่อ columns (เช่น ID)
ประเภทข้อมูลของแต่ละ column (เช่น int)
ตัวอย่างข้อมูลของแต่ละ column (เช่น 1 2 3 4 5 6 7 8 9 10)

🧮 summary()

summary() ใช้สรุปข้อมูลใน data frame เช่น:

ค่าเฉลี่ย (Mean)
จำนวนข้อมูล (Length)

# View the summary
summary(jjk_df)

ผลลัพธ์:

       ID            Name                Age           Grade            CursedEnergy     Technique            Missions     
 Min.   : 1.00   Length:10          Min.   :15.00   Length:10          Min.   : 60.00   Length:10          Min.   : 20.00  
 1st Qu.: 3.25   Class :character   1st Qu.:16.25   Class :character   1st Qu.: 76.25   Class :character   1st Qu.: 28.50  
 Median : 5.50   Mode  :character   Median :17.00   Mode  :character   Median : 90.00   Mode  :character   Median : 37.50  
 Mean   : 5.50                      Mean   :19.80                      Mean   :236.40                      Mean   : 52.30  
 3rd Qu.: 7.75                      3rd Qu.:24.75                      3rd Qu.:275.00                      3rd Qu.: 73.75  
 Max.   :10.00                      Max.   :28.00                      Max.   :999.00                      Max.   :120.00

💠 dim()

dim() ใช้แสดงจำนวน rows และ columns ใน data frame:

# View the dimensions
dim(jjk_df)

ผลลัพธ์:

[1] 10  7

🚣 nrow()

nrow() ใช้แสดงจำนวน rows ใน data frame:

# Get the number of rows
nrow(jjk_df)

ผลลัพธ์:

[1] 10

🏦 ncol()

ncol() ใช้แสดงจำนวน columns ใน data frame:

# Get the number of columns
ncol(jjk_df)

ผลลัพธ์:

[1] 7

3️⃣ Indexing

Indexing หมายถึง การเลือก columns ที่ต้องการ ซึ่งเราทำได้ 2 วิธี:

ใช้ $ (นิยมใช้)
ใช้ [[]]

💰 Using $

เราสามารถใช้ $ ได้แบบนี้:

df$col

ยกตัวอย่างเช่น เลือก column Name:

# Index with $
jjk_df$Name

ผลลัพธ์:

 [1] "Yuji Itadori"     "Megumi Fushiguro" "Nobara Kugisaki"  "Satoru Gojo"      "Maki Zenin"      
 [6] "Toge Inumaki"     "Panda"            "Kento Nanami"     "Yuta Okkotsu"     "Suguru Geto"

🔳 Using [[]]

เราสามารถใช้ [[]] ได้แบบนี้:

df[["col"]]

ยกตัวอย่างเช่น เลือก column Name:

# Index with [[]]
jjk_df[["Name"]]

ผลลัพธ์:

 [1] "Yuji Itadori"     "Megumi Fushiguro" "Nobara Kugisaki"  "Satoru Gojo"      "Maki Zenin"      
 [6] "Toge Inumaki"     "Panda"            "Kento Nanami"     "Yuta Okkotsu"     "Suguru Geto"

4️⃣ Subsetting

Subsetting คือ การเลือก rows และ columns จาก data frame ซึ่งเราทำได้ 2 วิธี:

ใช้ df[rows, cols] syntax
ใช้ subset()

🍽️ df[rows, cols]

เราสามารถใช้ df[rows, cols] ได้ 3 แบบ:

เลือก rows
เลือก columns
เลือก rows และ columns

แบบที่ 1. เลือก rows อย่างเดียว:

# Subset rows only
jjk_df[1:5, ]

ผลลัพธ์:

  ID             Name Age    Grade CursedEnergy            Technique Missions
1  1     Yuji Itadori  15 1st Year           80       Divergent Fist       25
2  2 Megumi Fushiguro  16 1st Year           95          Ten Shadows       30
3  3  Nobara Kugisaki  16 1st Year           70           Straw Doll       20
4  4      Satoru Gojo  28  Special          999            Limitless      120
5  5       Maki Zenin  17 2nd Year           60 Heavenly Restriction       35

แบบที่ 2. เลือก columns อย่างเดียว:

# Subset columns only
jjk_df[, "Name"]

ผลลัพธ์:

 [1] "Yuji Itadori"     "Megumi Fushiguro" "Nobara Kugisaki"  "Satoru Gojo"      "Maki Zenin"      
 [6] "Toge Inumaki"     "Panda"            "Kento Nanami"     "Yuta Okkotsu"     "Suguru Geto"

แบบที่ 3. เลือก rows และ columns:

# Subset rows and columns
jjk_df[1:5, c("Name", "Technique")]

ผลลัพธ์:

              Name            Technique
1     Yuji Itadori       Divergent Fist
2 Megumi Fushiguro          Ten Shadows
3  Nobara Kugisaki           Straw Doll
4      Satoru Gojo            Limitless
5       Maki Zenin Heavenly Restriction

🔪 subset()

เราสามารถ subset ข้อมูลได้ด้วย subset() ซึ่งต้องการ 2 arguments:

subset(x, select)

x = data frame
select = columns ที่ต้องการเลือก

# Subset using susbet() - select conlumns only
subset(jjk_df, select = c("Name", "Technique"))

ผลลัพธ์:

               Name                  Technique
1      Yuji Itadori             Divergent Fist
2  Megumi Fushiguro                Ten Shadows
3   Nobara Kugisaki                 Straw Doll
4       Satoru Gojo                  Limitless
5        Maki Zenin       Heavenly Restriction
6      Toge Inumaki              Cursed Speech
7             Panda               Gorilla Mode
8      Kento Nanami            Ratio Technique
9      Yuta Okkotsu                       Rika
10      Suguru Geto Cursed Spirit Manipulation

ในกรณีที่เราต้องการเลือก rows ด้วย เราจะต้องกำหนด rows ใน x:

# Subset using susbet() - select both rows and columns
subset(jjk_df[1:5, ], select = c("Name", "Technique"))

ผลลัพธ์:

              Name            Technique
1     Yuji Itadori       Divergent Fist
2 Megumi Fushiguro          Ten Shadows
3  Nobara Kugisaki           Straw Doll
4      Satoru Gojo            Limitless
5       Maki Zenin Heavenly Restriction

5️⃣ Filtering

เราสามารถกรองข้อมูลใน data frame ได้ 2 วิธี:

ใช้ df[rows, cols] syntax
ใช้ subset()

🍽️ df[rows, cols]

เราสามารถกรองข้อมูลด้วย df[rows, cols] โดยกำหนดเงื่อนไขการกรองใน rows

เช่น กรองข้อมูลตัวละครที่อยู่ปี 1:

# Filter using df[rows, cols] - 1 condition
jjk_df[jjk_df$Grade == "1st Year", ]

ผลลัพธ์:

  ID             Name Age    Grade CursedEnergy      Technique Missions
1  1     Yuji Itadori  15 1st Year           80 Divergent Fist       25
2  2 Megumi Fushiguro  16 1st Year           95    Ten Shadows       30
3  3  Nobara Kugisaki  16 1st Year           70     Straw Doll       20

ในกรณีที่เรามีมากกว่า 1 เงื่อนไข เราสามารถใช้ logical operators ช่วยได้:

Operator	Meaning
`&`	AND
`\|`	OR
`!`	NOT

ยกตัวอย่างเช่น กรองข้อมูลตัวละครที่อยู่ปี 1 และมีอายุ 15 ปี:

# Filter using df[rows, cols] - multiple conditions
jjk_df[jjk_df$Grade == "1st Year" & jjk_df$Age == 15, ]

ผลลัพธ์:

  ID         Name Age    Grade CursedEnergy      Technique Missions
1  1 Yuji Itadori  15 1st Year           80 Divergent Fist       25

🔪 subset()

เราสามารถใช้ subset() เพื่อกรองข้อมูลได้แบบนี้:

# Filter using subset() - 1 condition
subset(jjk_df, Grade == "1st Year")

ผลลัพธ์:

  ID             Name Age    Grade CursedEnergy      Technique Missions
1  1     Yuji Itadori  15 1st Year           80 Divergent Fist       25
2  2 Megumi Fushiguro  16 1st Year           95    Ten Shadows       30
3  3  Nobara Kugisaki  16 1st Year           70     Straw Doll       20

เราสามารถเพิ่มเงื่อนไขการกรองได้ด้วย logical operator เช่น:

# Filter using subset() - multiple conditions
subset(jjk_df, Grade == "1st Year" & Age == 15)

ผลลัพธ์:

  ID         Name Age    Grade CursedEnergy      Technique Missions
1  1 Yuji Itadori  15 1st Year           80 Divergent Fist       25

6️⃣ Sorting

สำหรับการเรียงข้อมูล เราจะใช้ order() ซึ่งเพื่อเรียงข้อมูลได้ 3 แบบ:

Ascending (A–Z)
Descending (Z–A)
Sort by multiple columns: จัดเรียงด้วยหลาย columns

⬇️ Ascending

ยกตัวอย่างเช่น เรียงลำดับตามจำนวนภารกิจ (Missions):

# Sort ascending (default)
jjk_df[order(jjk_df$Missions), ]

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
4   4      Satoru Gojo  28  Special          999                  Limitless      120

⬆️ Descending

เราสามารถเรียงข้อมูลแบบ descending ได้ 2 วิธี:

ใช้ decreasing
ใช้ -

วิธีที่ 1. ใช้ decreasing:

# Sort descending with decreasing
jjk_df[order(jjk_df$Missions, decreasing = TRUE), ]

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
4   4      Satoru Gojo  28  Special          999                  Limitless      120
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20

วิธีที่ 2. ใช้ -:

# Sort descending with -
jjk_df[order(-jjk_df$Missions), ]

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
4   4      Satoru Gojo  28  Special          999                  Limitless      120
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20

↔️ Sort by Multiple Columns

เราสามารถจัดเรียงข้อมูลได้มากกว่า 1 column ด้วยการเลือก columns ที่ต้องการจัดเรียงเพิ่ม

เช่น จัดเรียงด้วย:

Grade
จำนวนภารกิจ (Missions)

# Sort by multiple columns
jjk_df[order(jjk_df$Grade, jjk_df$Missions), ]

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
4   4      Satoru Gojo  28  Special          999                  Limitless      120

7️⃣ Aggregating

เราสามารถสรุปข้อมูลโดยใช้ statistics functions เช่น:

Function	For
`mean()`	หาค่าเฉลี่ย
`median()`	หาค่ามัธยฐาน
`min()`	หาค่าต่ำสุด
`max()`	หาค่าสูงสุด
`sd()`	หาค่า standard deviation

ยกตัวอย่างเช่น หาค่าเฉลี่ย Cursed Energy (CursedEnergy):

# Find average Cursed Energy
mean(jjk_df$CursedEnergy)

ผลลัพธ์:

[1] 236.4

8️⃣ Adding Columns

เราสามารถเพิ่ม columns ใหม่ได้ด้วยแบบนี้:

df$new_col <- value

ยกตัวอย่างเช่น เพิ่ม column Ranking:

# Add a column
jjk_df$Ranking <- ifelse(jjk_df$CursedEnergy > 100, "High", "Low")

# View the result
jjk_df

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions Ranking
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25     Low
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30     Low
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20     Low
4   4      Satoru Gojo  28  Special          999                  Limitless      120    High
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35     Low
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28     Low
7   7            Panda  18 2nd Year           75               Gorilla Mode       40     Low
8   8     Kento Nanami  27  Special          200            Ratio Technique       90    High
9   9     Yuta Okkotsu  17  Special          300                       Rika       55    High
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80    High

9️⃣ Removing Columns

เราสามารถลบ columns ได้ด้วยวิธีเดียวกันกับการเพิ่ม columns:

df$col <- NULL

ยกตัวอย่างเช่น ลบ column Ranking:

# Remove a column
jjk_df$Ranking <- NULL

# View the result
jjk_df

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20
4   4      Satoru Gojo  28  Special          999                  Limitless      120
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80

🔟 Binding

เราสามารถเชื่อม data frame ได้ 2 แบบ:

rbind(): เชื่อม row
cbind(): เชื่อม column

🤝 rbind()

rbind() ใช้เชื่อม data frame กับ row ใหม่ และต้องการ 2 arguments:

rbind(df1, df2)

df1 = data frame ที่ 1
df2 = data frame ที่ 2

ยกตัวอย่างเช่น เพิ่มชื่อตัวละครใหม่ (Hajime Kashimo):

# Create a new data frame
new_sorcerer <- data.frame(
  ID = 11,
  Name = "Hajime Kashimo",
  Age = 25,
  Grade = "Special",
  CursedEnergy = 500,
  Technique = "Lightning",
  Missions = 60
)

# Bind the data frames by rows
jjk_df <- rbind(jjk_df, new_sorcerer)

# View the result
jjk_df

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20
4   4      Satoru Gojo  28  Special          999                  Limitless      120
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28
7   7            Panda  18 2nd Year           75               Gorilla Mode       40
8   8     Kento Nanami  27  Special          200            Ratio Technique       90
9   9     Yuta Okkotsu  17  Special          300                       Rika       55
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80
11 11   Hajime Kashimo  25  Special          500                  Lightning       60

🤲 cbind()

cbind() ใช้เชื่อม data frame กับ column ใหม่ และต้องการ 2 arguments ได้แก่:

cbind(df, vector)

df = data frame
vector = vector ที่เก็บข้อมูลของ column ใหม่

ยกตัวอย่างเช่น เพิ่ม column ที่บอกว่าตัวละครเป็นครูหรือไม่ (IsTeacher):

# Bind a column
jjk_df <- cbind(
  jjk_df,
  IsTeacher = c(FALSE, FALSE, FALSE, TRUE, FALSE,
                FALSE, FALSE, TRUE, FALSE, TRUE, FALSE)
)

# View the result
jjk_df

ผลลัพธ์:

   ID             Name Age    Grade CursedEnergy                  Technique Missions IsTeacher
1   1     Yuji Itadori  15 1st Year           80             Divergent Fist       25     FALSE
2   2 Megumi Fushiguro  16 1st Year           95                Ten Shadows       30     FALSE
3   3  Nobara Kugisaki  16 1st Year           70                 Straw Doll       20     FALSE
4   4      Satoru Gojo  28  Special          999                  Limitless      120      TRUE
5   5       Maki Zenin  17 2nd Year           60       Heavenly Restriction       35     FALSE
6   6     Toge Inumaki  17 2nd Year           85              Cursed Speech       28     FALSE
7   7            Panda  18 2nd Year           75               Gorilla Mode       40     FALSE
8   8     Kento Nanami  27  Special          200            Ratio Technique       90      TRUE
9   9     Yuta Okkotsu  17  Special          300                       Rika       55     FALSE
10 10      Suguru Geto  27  Special          400 Cursed Spirit Manipulation       80      TRUE
11 11   Hajime Kashimo  25  Special          500                  Lightning       60     FALSE

😺 GitHub

ดูตัวอย่าง code ในบทความนี้ได้ที่ GitHub

📃 References

รูปประกอบจาก Jujutsu Kaisen Character Dataset
R Data Frames
DataFrame Operations in R

✅ R Book for Psychologists: หนังสือภาษา R สำหรับนักจิตวิทยา

📕 ขอฝากหนังสือเล่มแรกในชีวิตด้วยนะครับ 😆

🙋 ใครที่กำลังเรียนจิตวิทยาหรือทำงานสายจิตวิทยา และเบื่อที่ต้องใช้ software ราคาแพงอย่าง SPSS และ Excel เพื่อทำข้อมูล

💪 ผมขอแนะนำ R Book for Psychologists หนังสือสอนใช้ภาษา R เพื่อการวิเคราะห์ข้อมูลทางจิตวิทยา ที่เขียนมาเพื่อนักจิตวิทยาที่ไม่เคยมีประสบการณ์เขียน code มาก่อน

ในหนังสือ เราจะปูพื้นฐานภาษา R และพาไปดูวิธีวิเคราะห์สถิติที่ใช้บ่อยกัน เช่น:

Correlation
t-tests
ANOVA
Reliability
Factor analysis

🚀 เมื่ออ่านและทำตามตัวอย่างใน R Book for Psychologists ทุกคนจะไม่ต้องพึง SPSS และ Excel ในการทำงานอีกต่อไป และสามารถวิเคราะห์ข้อมูลด้วยตัวเองได้ด้วยความมั่นใจ

แล้วทุกคนจะแปลกใจว่า ทำไมภาษา R ง่ายขนาดนี้ 🙂‍↕️

👉 สนใจดูรายละเอียดหนังสือได้ที่ meb:

ดูรายละเอียดหนังสือ R Book for Psychologists

ในบทความนี้ เราจะมาดูวิธีใช้ sqldf() ซึ่งเป็น function ที่ช่วยให้เราทำงานกับ data frame ในภาษา R ได้ด้วย SQL syntax และเหมาะกับคนที่คุ้นเคยกับการใช้ SQL ในการทำงานกัน

ถ้าพร้อมแล้ว ไปเริ่มกันเลย

🏁 Getting Started: Install & Load sqldf

ก่อนเริ่มใช้ sqldf() เราจะต้องติดตั้งและโหลด sqldf ซึ่งเป็น package ต้นทางของ sdldf() ด้วย install.packages() และ library() ตามลำดับ:

# Install sqldf
install.packages("sqldf")

# Load sqldf
library(sqldf)

Note:

เราใช้ install.packages() เพื่อติดตั้งแค่ครั้งเดียว
เราใช้ library() เพื่อโหลด sqldf ทุกครั้งที่เริ่ม session ใหม่

🏃‍♂️‍➡️ Using sqldf

sqldf() ต้องการ 1 argument คือ character ที่มี SQL query เช่น:

sqldf("SELECT * FROM df")

เราลองมาดูตัวอย่างการใช้ sqldf() กับ Cars93 dataset (จาก MASS package) ซึ่งมีข้อมูลรถ 93 คันที่ขายในปี ค.ศ. 1993:

# Install MASS
install.packages("MASS")

# Load MASS
library(MASS)

# Load the dataset
data(Cars93)

# View the dataset structure
str(Cars93)

ผลลัพธ์:

'data.frame':	93 obs. of  27 variables:
 $ Manufacturer      : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
 $ Model             : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
 $ Type              : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
 $ Min.Price         : num  12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
 $ Price             : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
 $ Max.Price         : num  18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
 $ MPG.city          : int  25 18 20 19 22 22 19 16 19 16 ...
 $ MPG.highway       : int  31 25 26 26 30 31 28 25 27 25 ...
 $ AirBags           : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
 $ DriveTrain        : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
 $ Cylinders         : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
 $ EngineSize        : num  1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
 $ Horsepower        : int  140 200 172 172 208 110 170 180 170 200 ...
 $ RPM               : int  6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
 $ Rev.per.mile      : int  2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
 $ Man.trans.avail   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
 $ Fuel.tank.capacity: num  13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
 $ Passengers        : int  5 5 5 6 4 6 6 6 5 6 ...
 $ Length            : int  177 195 180 193 186 189 200 216 198 206 ...
 $ Wheelbase         : int  102 115 102 106 109 105 111 116 108 114 ...
 $ Width             : int  68 71 67 70 69 69 74 78 73 73 ...
 $ Turn.circle       : int  37 38 37 37 39 41 42 45 41 43 ...
 $ Rear.seat.room    : num  26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
 $ Luggage.room      : int  11 15 14 17 13 16 17 21 14 18 ...
 $ Weight            : int  2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
 $ Origin            : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
 $ Make              : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...

1️⃣ Example 1. SELECT

เลือกข้อมูลจาก columns Manufacturer, Model, Min.Price, และ Max.Price:

# Set query
select_query <- "
SELECT
  Manufacturer,
  Model,
  `Min.Price`,
  `Max.Price`
FROM
  Cars93
"

# Select from df
select_result <- sqldf(select_query)

# View the result
select_result

Note: เราใช้ backticks (`) สำหรับชื่อ columns ที่ไม่ valid ใน SQL (เช่น column ที่มี . อย่าง Min.Price และ Max.Price)

ผลลัพธ์:

    Manufacturer          Model Min.Price Max.Price
1          Acura        Integra      12.9      18.8
2          Acura         Legend      29.2      38.7
3           Audi             90      25.9      32.3
4           Audi            100      30.8      44.6
5            BMW           535i      23.7      36.2
6          Buick        Century      14.2      17.3
7          Buick        LeSabre      19.9      21.7
8          Buick     Roadmaster      22.6      24.9
9          Buick        Riviera      26.3      26.3
10      Cadillac        DeVille      33.0      36.3
11      Cadillac        Seville      37.5      42.7
12     Chevrolet       Cavalier       8.5      18.3
13     Chevrolet        Corsica      11.4      11.4
14     Chevrolet         Camaro      13.4      16.8
15     Chevrolet         Lumina      13.4      18.4
16     Chevrolet     Lumina_APV      14.7      18.0
17     Chevrolet          Astro      14.7      18.6
18     Chevrolet        Caprice      18.0      19.6
19     Chevrolet       Corvette      34.6      41.5
20      Chrylser       Concorde      18.4      18.4
21      Chrysler        LeBaron      14.5      17.1
22      Chrysler       Imperial      29.5      29.5
23         Dodge           Colt       7.9      10.6
24         Dodge         Shadow       8.4      14.2
25         Dodge         Spirit      11.9      14.7
26         Dodge        Caravan      13.6      24.4
27         Dodge        Dynasty      14.8      16.4
28         Dodge        Stealth      18.5      33.1
29         Eagle         Summit       7.9      16.5
30         Eagle         Vision      17.5      21.2
31          Ford        Festiva       6.9       7.9
32          Ford         Escort       8.4      11.9
33          Ford          Tempo      10.4      12.2
34          Ford        Mustang      10.8      21.0
35          Ford          Probe      12.8      15.2
36          Ford       Aerostar      14.5      25.3
37          Ford         Taurus      15.6      24.8
38          Ford Crown_Victoria      20.1      21.7
39           Geo          Metro       6.7      10.0
40           Geo          Storm      11.5      13.5
41         Honda        Prelude      17.0      22.7
42         Honda          Civic       8.4      15.8
43         Honda         Accord      13.8      21.2
44       Hyundai          Excel       6.8       9.2
45       Hyundai        Elantra       9.0      11.0
46       Hyundai         Scoupe       9.1      11.0
47       Hyundai         Sonata      12.4      15.3
48      Infiniti            Q45      45.4      50.4
49         Lexus          ES300      27.5      28.4
50         Lexus          SC300      34.7      35.6
51       Lincoln    Continental      33.3      35.3
52       Lincoln       Town_Car      34.4      37.8
53         Mazda            323       7.4       9.1
54         Mazda        Protege      10.9      12.3
55         Mazda            626      14.3      18.7
56         Mazda            MPV      16.6      21.7
57         Mazda           RX-7      32.5      32.5
58 Mercedes-Benz           190E      29.0      34.9
59 Mercedes-Benz           300E      43.8      80.0
60       Mercury          Capri      13.3      15.0
61       Mercury         Cougar      14.9      14.9
62    Mitsubishi         Mirage       7.7      12.9
63    Mitsubishi       Diamante      22.4      29.9
64        Nissan         Sentra       8.7      14.9
65        Nissan         Altima      13.0      18.3
66        Nissan          Quest      16.7      21.5
67        Nissan         Maxima      21.0      22.0
68    Oldsmobile        Achieva      13.0      14.0
69    Oldsmobile  Cutlass_Ciera      14.2      18.4
70    Oldsmobile     Silhouette      19.5      19.5
71    Oldsmobile   Eighty-Eight      19.5      21.9
72      Plymouth          Laser      11.4      17.4
73       Pontiac         LeMans       8.2       9.9
74       Pontiac        Sunbird       9.4      12.8
75       Pontiac       Firebird      14.0      21.4
76       Pontiac     Grand_Prix      15.4      21.6
77       Pontiac     Bonneville      19.4      29.4
78          Saab            900      20.3      37.1
79        Saturn             SL       9.2      12.9
80        Subaru          Justy       7.3       9.5
81        Subaru         Loyale      10.5      11.3
82        Subaru         Legacy      16.3      22.7
83        Suzuki          Swift       7.3      10.0
84        Toyota         Tercel       7.8      11.8
85        Toyota         Celica      14.2      22.6
86        Toyota          Camry      15.2      21.2
87        Toyota         Previa      18.9      26.6
88    Volkswagen            Fox       8.7       9.5
89    Volkswagen        Eurovan      16.6      22.7
90    Volkswagen         Passat      17.6      22.4
91    Volkswagen        Corrado      22.9      23.7
92         Volvo            240      21.8      23.5
93         Volvo            850      24.8      28.5

2️⃣ Example 2. WHERE

คัดกรองข้อมูลรถที่ชื่อผู้ผลิตขึ้นต้นด้วย “M”:

# Set query
where_result <- "
SELECT
  Manufacturer,
  Model,
  `Min.Price`,
  `Max.Price`
FROM
  Cars93
WHERE
  Manufacturer LIKE 'M%'
"

# Filter df
where_result <- sqldf(where_result)

# View the result
where_result

ผลลัพธ์:

    Manufacturer    Model Min.Price Max.Price
1          Mazda      323       7.4       9.1
2          Mazda  Protege      10.9      12.3
3          Mazda      626      14.3      18.7
4          Mazda      MPV      16.6      21.7
5          Mazda     RX-7      32.5      32.5
6  Mercedes-Benz     190E      29.0      34.9
7  Mercedes-Benz     300E      43.8      80.0
8        Mercury    Capri      13.3      15.0
9        Mercury   Cougar      14.9      14.9
10    Mitsubishi   Mirage       7.7      12.9
11    Mitsubishi Diamante      22.4      29.9

3️⃣ Example 3. Aggregate

หารถ 10 อันดับแรกที่มีราคาสูงสุด:

# Set query
aggregate_query <- "
SELECT
  Manufacturer,
  AVG(Price) AS Avg_Price
FROM
  Cars93
GROUP BY
  Manufacturer
ORDER BY
  Avg_Price DESC
LIMIT
  10;
"

# Aggregate df
aggregate_result <- sqldf(aggregate_query)

# View the result
aggregate_result

ผลลัพธ์:

    Manufacturer Avg_Price
1       Infiniti      47.9
2  Mercedes-Benz      46.9
3       Cadillac      37.4
4        Lincoln      35.2
5           Audi      33.4
6          Lexus      31.6
7            BMW      30.0
8           Saab      28.7
9          Acura      24.9
10         Volvo      24.7

😺 GitHub

ดูตัวอย่าง code ทั้งหมดได้ที่ GitHub

📃 References

✅ R Book for Psychologists: หนังสือภาษา R สำหรับนักจิตวิทยา

📕 ขอฝากหนังสือเล่มแรกในชีวิตด้วยนะครับ 😆

Correlation
t-tests
ANOVA
Reliability
Factor analysis

แล้วทุกคนจะแปลกใจว่า ทำไมภาษา R ง่ายขนาดนี้ 🙂‍↕️

👉 สนใจดูรายละเอียดหนังสือได้ที่ meb:

ดูรายละเอียดหนังสือ R Book for Psychologists

Tag: Data frame

สรุป 10 วิธีในการทำงานกับ data frame ในภาษา R: creating, indexing, subsetting, filtering, sorting, และอื่น ๆ — ตัวอย่างการทำงานกับ Jujutsu Kaisen data frame

1️⃣ Creating

2️⃣ Previewing

👀 View()

🙊 head()

🐒 tail()

🏗️ str()

🧮 summary()

💠 dim()

🚣 nrow()

🏦 ncol()

3️⃣ Indexing

💰 Using $

🔳 Using [[]]

4️⃣ Subsetting

🍽️ df[rows, cols]

🔪 subset()

5️⃣ Filtering

🍽️ df[rows, cols]

🔪 subset()

6️⃣ Sorting

⬇️ Ascending

⬆️ Descending

↔️ Sort by Multiple Columns

7️⃣ Aggregating

8️⃣ Adding Columns

9️⃣ Removing Columns

🔟 Binding

🤝 rbind()

🤲 cbind()

😺 GitHub

📃 References

✅ R Book for Psychologists: หนังสือภาษา R สำหรับนักจิตวิทยา

Share this:

sqldf(): แนะนำ function ในการทำงานกับ data frame ด้วย SQL ในภาษา R — ตัวอย่างจาก Cars93

🏁 Getting Started: Install & Load sqldf

🏃‍♂️‍➡️ Using sqldf

1️⃣ Example 1. SELECT

2️⃣ Example 2. WHERE

3️⃣ Example 3. Aggregate

😺 GitHub

📃 References

✅ R Book for Psychologists: หนังสือภาษา R สำหรับนักจิตวิทยา

Share this: