Tag: CSV

3 วิธีโหลด flat file ในภาษา R: แนะนำการใช้ utils, readr, และ data.table packages เพื่อโหลดข้อมูลจาก flat files — ตัวอย่างการทำงานกับ Fast Food Joint Nutrition

ในบทความนี้ เราจะมาดูวิธีใช้ 3 packages สำหรับโหลด flat file ในภาษา R กัน

ถ้าพร้อมแล้ว ไปเริ่มกันเลย

📁 Flat File คืออะไร?

Flat file คือ ไฟล์ plain text ที่เก็บข้อมูลแบบตาราง (tabular data)

Flat file ที่พบได้บ่อย เช่น:

CSV (comma-separated values)
TSV (tab-separated values)

ตัวอย่าง flat file (CSV):

Fast Food Joint Nutrition Values Dataset จาก Kaggle ซึ่งมีข้อมูลโภชนาการอาหารจาก restaurant chains อย่าง Pizza Hut, McDonald’s, และ Starbucks:

			
Company,Category,Product,Per Serve Size,Energy (kCal),Carbohydrates (g),Protein (g),Fiber (g),Sugar (g),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg)
Pizza Hut,All Meals,Corn n Cheese (Personal),143.5 g,432.6,65.64,17.91,3.85,0.0,10.93,5.14,0.16,16.19,499.72
Pizza Hut,All Meals,Country Feast (Personal),178 g,407.6,67.11,16.73,7.19,0.0,8.03,3.24,0.11,66.8,818.0
Pizza Hut,All Meals,Double Cheese (Personal),143 g,423.33,59.97,18.26,3.49,0.0,12.27,5.23,0.18,19.75,638.22
Pizza Hut,All Meals,Double Paneer Supreme (Personal),174.5 g,474.03,52.86,20.07,3.79,0.0,20.26,9.25,0.33,71.72,1128.11
Pizza Hut,All Meals,Farmer`s Pick (Personal),177 g,408.16,53.93,19.91,2.46,0.0,12.53,4.9,0.14,48.09,942.67

		

📦 3 Packages

ในภาษา R เรามี 3 packages สำหรับโหลด flat file ได้แก่:

utils
readr
data.table

เราไปดูวิธีใช้แต่ละ package กัน

1️⃣ Package #1: utils

utils เป็น package จาก base R ทำให้เราใช้งานได้โดยไม่ต้องโหลด package เพิ่มเติม

utils มี 3 functions หลักสำหรับโหลด flat file ดังนี้:

Function	For
`read.csv()`	โหลด CSV
`read.delim()`	โหลด TSV
`read.table()`	โหลด flat file ทั่วไป เช่น CSV, TSV, และอื่น ๆ

ตัวอย่างการใช้งาน:

โหลด Fast Food Joint Nutrition Values Dataset แบบ CSV:

# Import with read.csv()
nutrition.csv <- read.csv("Nutrition_Value_Dataset.csv")

โหลด Fast Food Joint Nutrition Values Dataset แบบ TSV:

# Import with read.delim()
nutrition.delim <- read.delim("Nutrition_Value_Dataset.tsv")

โหลด flat file อื่น ๆ เช่น flat file ซึ่งใช้ “/” แบ่งข้อมูล:

nutrition.table <- read.table("Nutrition_Value_Dataset.txt",
                              header = TRUE,
                              sep = "/")

ตัวอย่างผลลัพธ์:

			
Company,Category,Product,Per Serve Size,Energy (kCal),Carbohydrates (g),Protein (g),Fiber (g),Sugar (g),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg)
Pizza Hut,All Meals,Corn n Cheese (Personal),143.5 g,432.6,65.64,17.91,3.85,0.0,10.93,5.14,0.16,16.19,499.72
Pizza Hut,All Meals,Country Feast (Personal),178 g,407.6,67.11,16.73,7.19,0.0,8.03,3.24,0.11,66.8,818.0
Pizza Hut,All Meals,Double Cheese (Personal),143 g,423.33,59.97,18.26,3.49,0.0,12.27,5.23,0.18,19.75,638.22
Pizza Hut,All Meals,Double Paneer Supreme (Personal),174.5 g,474.03,52.86,20.07,3.79,0.0,20.26,9.25,0.33,71.72,1128.11
Pizza Hut,All Meals,Farmer`s Pick (Personal),177 g,408.16,53.93,19.91,2.46,0.0,12.53,4.9,0.14,48.09,942.67

		

Note:

read.csv() และ read.tsv() เป็น wrapper function ของ read.table() หรือ function ที่ทำให้เราใช้งาน read.table() ได้ง่ายขึ้น

นั่นหมายความว่า ทุกครั้งที่เราเรียกใช้ read.csv() และ read.delim() เรากำลังเรียกใช้ read.table() แบบนี้:

Wrapper Function	read.table()
`read.csv()`	`read.table(file, header = TRUE, sep = ",")`
`read.tsv()`	`read.table(file, header = TRUE, sep = "\\t")`

2️⃣ Package #2: readr

readr เป็น package ที่ทำงานคล้ายกับ utils แต่แทนที่โหลด flat file เป็น data frame, readr โหลดข้อมูลเป็น tibble ซึ่งเป็น data frame เว่อร์ชั่นที่มีประสิทธิภาพมากขึ้น

readr มี 3 functions หลักสำหรับโหลด flat file ซึ่งเทียบได้กับ utils ดังนี้:

utils	readr	For
`read.csv()`	`read_csv()`	โหลด CSV
`read.delim()`	`read_tsv()`	โหลด TSV
`read.table()`	`read_delim()`	โหลด flat file ทั่วไป เช่น CSV, TSV, และอื่น ๆ

ตัวอย่างการใช้งาน:

ก่อนเริ่มใช้งาน เราต้องติดตั้งและโหลด readr ก่อน:

# Install
install.packages("readr")

# Load
library(readr)

จากนั้น เราสามารถเรียกใช้งาน functions ของ readr ได้เลย:

โหลด Fast Food Joint Nutrition Values Dataset แบบ CSV:

# Import with read_csv()
nutrition_csv <- read_csv("Nutrition_Value_Dataset.csv")

โหลด Fast Food Joint Nutrition Values Dataset แบบ TSV:

# Import with read_tsv()
nutrition_tsv <- read_tsv("Nutrition_Value_Dataset.tsv")

โหลด flat file อื่น ๆ เช่น flat file ซึ่งใช้ “/” แบ่งข้อมูล:

# Import with read_delim()
nutrition_delim <- read_delim("Nutrition_Value_Dataset.txt",
                              delim = "/")

ตัวอย่างผลลัพธ์:

			
# A tibble: 6 × 14
  Company   Category  Product      `Per Serve Size` `Energy (kCal)` `Carbohydrates (g)` `Protein (g)` `Fiber (g)` `Sugar (g)` `Total Fat (g)` `Saturated Fat (g)`
  <chr>     <chr>     <chr>        <chr>                      <dbl>               <dbl>         <dbl>       <dbl>       <dbl>           <dbl>               <dbl>
1 Pizza Hut All Meals Corn n Chee… 143.5 g                     433.                65.6          17.9        3.85           0           10.9                 5.14
2 Pizza Hut All Meals Country Fea… 178 g                       408.                67.1          16.7        7.19           0            8.03                3.24
3 Pizza Hut All Meals Double Chee… 143 g                       423.                60.0          18.3        3.49           0           12.3                 5.23
4 Pizza Hut All Meals Double Pane… 174.5 g                     474.                52.9          20.1        3.79           0           20.3                 9.25
5 Pizza Hut All Meals Farmer`s Pi… 177 g                       408.                53.9          19.9        2.46           0           12.5                 4.9 
6 Pizza Hut All Meals Margherita … 130.5 g                     362.                50.7          15.9        4.08           0           10.6                 3.86
# ℹ 3 more variables: `Trans Fat (g)` <dbl>, `Cholesterol (mg)` <dbl>, `Sodium (mg)` <dbl>

		

Note:

เช่นเดียวกับ utils, read_csv() และ read_tsv() เป็น wrapper function ของ read_delim()

3️⃣ Package #3: data.table

สุดท้าย data.table เป็น data manipulation package ที่เน้นความเร็วในการประมวลผล

data.table มี function สำหรับโหลด flat file ที่ใช้ง่ายและรวดเร็วกว่า read.table() และ read_delim() ได้แก่ fread() (ย่อมาจาก fast read)

ข้อมูลที่โหลดด้วย fread() จะอยู่ในรูป data.table ซึ่งเป็นประเภท data frame ที่ทรงพลังกว่า

ตัวอย่างการใช้งาน:

เช่นเดียวกับ readr เราต้องติดตั้งและโหลด data.table ก่อนเริ่มใช้งาน:

# Install
install.packages("data.table")

# Load
library(data.table)

จากนั้น เราสามารถเรียกใช้งาน fread() เพื่อโหลดไฟล์ได้:

โหลด Fast Food Joint Nutrition Values Dataset แบบ CSV:

# Import CSV with fread()
nutrition_fread_csv <- fread("Nutrition_Value_Dataset.csv")

โหลด Fast Food Joint Nutrition Values Dataset แบบ TSV:

# Import TSV with fread()
nutrition_fread_tsv <- fread("Nutrition_Value_Dataset.tsv")

โหลด flat file อื่น ๆ เช่น flat file ซึ่งใช้ “/” แบ่งข้อมูล:

# Import TXT with "/" separator with fread()
nutrition_fread_txt <- fread("Nutrition_Value_Dataset.txt",
                             sep = "/")

ตัวอย่างผลลัพธ์:

     Company  Category                          Product Per Serve Size Energy (kCal) Carbohydrates (g) Protein (g) Fiber (g) Sugar (g) Total Fat (g) Saturated Fat (g) Trans Fat (g) Cholesterol (mg) Sodium (mg)
      <char>    <char>                           <char>         <char>         <num>             <num>       <num>     <num>     <num>         <num>             <num>         <num>            <num>       <num>
1: Pizza Hut All Meals         Corn n Cheese (Personal)        143.5 g        432.60             65.64       17.91      3.85         0         10.93              5.14          0.16            16.19      499.72
2: Pizza Hut All Meals         Country Feast (Personal)          178 g        407.60             67.11       16.73      7.19         0          8.03              3.24          0.11            66.80      818.00
3: Pizza Hut All Meals         Double Cheese (Personal)          143 g        423.33             59.97       18.26      3.49         0         12.27              5.23          0.18            19.75      638.22
4: Pizza Hut All Meals Double Paneer Supreme (Personal)        174.5 g        474.03             52.86       20.07      3.79         0         20.26              9.25          0.33            71.72     1128.11
5: Pizza Hut All Meals         Farmer`s Pick (Personal)          177 g        408.16             53.93       19.91      2.46         0         12.53              4.90          0.14            48.09      942.67
6: Pizza Hut All Meals            Margherita (Personal)        130.5 g        361.73             50.69       15.93      4.08         0         10.58              3.86          0.12            26.47      713.82

💪 Summary

ในบทความนี้ เราได้ทำความรู้จักกับ 3 packages สำหรับโหลด flat file ในภาษา R:

Package #1. utils:

read.csv()
read.delim()
read.table()

Package #2. readr:

read_csv()
read_tsv()
read_delim()

Package 3. data.table:

fread()

📚 Further Reading

สำหรับคนที่ต้องการศึกษาเพิ่มเติม สามารถอ่านคู่มือการใช้ functions ในบทความนี้ได้ดังนี้:

😺 GitHub

ดู code ทั้งหมดในบทความนี้ได้ที่ GitHub:

📃 References

2026-05-21

วิธีใช้ 9 arguments ใน read_csv() จาก pandas library เพื่อโหลดข้อมูลใน Python — ตัวอย่างการโหลดข้อมูลการแข่งขันฟุตบอล

pandas เป็น Python library สำหรับทำงานกับข้อมูลในรูปแบบตาราง (tabular data) และมี functions หลากหลายสำหรับโหลดข้อมูลเข้ามาใน Python

โดยหนึ่งใน functions ที่นิยมใช้กันมากที่สุด ได้แก่ read_csv() ซึ่งใช้โหลดข้อมูล CSV (Comma-Separated Values) และมี arguments หลัก 9 อย่าง ได้แก่:

filepath_or_buffer: file path, ชื่อไฟล์, หรือ URL ของไฟล์ที่ต้องการโหลด
sep: กำหนด delimiter
header: กำหนด row ที่เป็นหัวตาราง
skiprows: กำหนด rows ที่ไม่ต้องการโหลด
nrows: เลือกจำนวน rows ที่ต้องการโหลด
usecols: กำหนด columns ที่ต้องการโหลด
index_col: กำหนด column ที่จะเป็น index
names: กำหนดชื่อของ columns
dtype: กำหนดประเภทข้อมูล (data types) ของ columns

ในบทความนี้ เราจะมาดูวิธีใช้ทั้ง 9 arguments ของ read_csv() เพื่อโหลดตัวอย่างข้อมูลการแข่งขันฟุตบอลในอังกฤษกัน

ถ้าพร้อมแล้ว ไปเริ่มกันเลย

🏁 Getting Started

ก่อนเริ่มใช้งาน read_csv() เราต้องติดตั้งและโหลด pandas ก่อน:

# Install pandas
!pip install pandas

# Import pandas
import pandas as pd

Note: ในกรณีที่เราเคยติดตั้ง pandas แล้วให้ใช้คำสั่ง import อย่างเดียว

🗃️ Argument #1. filepath_or_buffer

filepath_or_buffer เป็น argument หลักที่เราจะต้องกำหนดทุกครั้งที่เรียกใช้ read_csv()

ยกตัวอย่างเช่น เรามีข้อมูลการแข่งขันฟุตบอล (matches_clean.csv):

MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ read_csv() ได้แบบนี้:

# Load the dataset
df1 = pd.read_csv("matches_clean.csv")

# View the result
print(df1)

ผลลัพธ์:

  MatchID           HomeTeam     AwayTeam  HomeGoals  AwayGoals   MatchDate
0    M001  Manchester United      Chelsea          2          1  2024-08-14
1    M002          Liverpool      Arsenal          1          1  2024-08-20
2    M003          Tottenham      Everton          3          0  2024-09-02
3    M004           Man City  Aston Villa          4          2  2024-09-15
4    M005          Newcastle     West Ham          0          0  2024-09-22
5    M006           Brighton        Leeds          2          3  2024-09-29

🤺 Argument #2. sep

sep ใช้กำหนด delimiter หรือเครื่องหมายในการแบ่ง columns โดย default ของ sep คือ "," ทำให้ปกติ เราไม่ต้องกำหนด sep เมื่อไฟล์เป็น CSV

เราจะใช้ sep เมื่อข้อมูลมี delimiter อื่น เช่น ";" (matches_semicolon.txt):

MatchID;HomeTeam;AwayTeam;HomeGoals;AwayGoals;MatchDate
M001;Manchester United;Chelsea;2;1;2024-08-14
M002;Liverpool;Arsenal;1;1;2024-08-20
M003;Tottenham;Everton;3;0;2024-09-02
M004;Man City;Aston Villa;4;2;2024-09-15
M005;Newcastle;West Ham;0;0;2024-09-22
M006;Brighton;Leeds;2;3;2024-09-29

เราสามารถใช้ sep ได้แบบนี้:

# Load the dataset with ";" as delim
df2 = pd.read_csv("matches_semicolon.csv", sep=";")

# View the result
print(df2)

ผลลัพธ์:

  MatchID           HomeTeam     AwayTeam  HomeGoals  AwayGoals   MatchDate
0    M001  Manchester United      Chelsea          2          1  2024-08-14
1    M002          Liverpool      Arsenal          1          1  2024-08-20
2    M003          Tottenham      Everton          3          0  2024-09-02
3    M004           Man City  Aston Villa          4          2  2024-09-15
4    M005          Newcastle     West Ham          0          0  2024-09-22
5    M006           Brighton        Leeds          2          3  2024-09-29

😶‍🌫️ Argument #3. header

header ใช้กำหนด row ที่จะเป็นหัวตาราง

เราจะใช้ header เมื่อ rows แรกของข้อมูลมีข้อมูลอื่น เช่น metadata (matches_with_metadata.txt):

# UK Football Matches Data
# Created for practice with pd.read_csv()
MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ header ได้แบบนี้:

# Load the dataset where the header is the 3rd row
df3 = pd.read_csv("matches_with_metadata.txt", header=2)

# View the result
print(df3)

ผลลัพธ์:

  MatchID           HomeTeam     AwayTeam  HomeGoals  AwayGoals   MatchDate
0    M001  Manchester United      Chelsea          2          1  2024-08-14
1    M002          Liverpool      Arsenal          1          1  2024-08-20
2    M003          Tottenham      Everton          3          0  2024-09-02
3    M004           Man City  Aston Villa          4          2  2024-09-15
4    M005          Newcastle     West Ham          0          0  2024-09-22
5    M006           Brighton        Leeds          2          3  2024-09-29

จะสังเกตว่า metadata จะไม่ถูกโหลดเข้ามาด้วย

Note: เราสามารถกำหนด header=None ในกรณีที่ข้อมูลไม่มีหัวตาราง เช่น matches_no_header.csv:

M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

🛑 Argument #4. skiprows

skiprows ใช้เลือก rows ที่เราไม่ต้องการโหลดเข้ามาใน Python ซึ่งเราสามารถกำหนดได้ 2 แบบ:

กำหนดเป็น int (เช่น 2) ในกรณีที่ต้องการข้าม row เดียว
กำหนดเป็น list (เช่น [0, 1, 2]) ในกรณีที่ต้องการข้ามมากกว่า 1 rows

ยกตัวอย่างเช่น เราต้องการข้าม 2 บรรทัดแรกซึ่งเป็น metadata:

# UK Football Matches Data
# Created for practice with pd.read_csv()
MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ skiprows ได้แบบนี้:

# Load the dataset, skipping the metadata
df4 = pd.read_csv("matches_with_metadata.txt", skiprows=[0, 1])

# View the result
print(df4)

ผลลัพธ์:

  MatchID           HomeTeam     AwayTeam  HomeGoals  AwayGoals   MatchDate
0    M001  Manchester United      Chelsea          2          1  2024-08-14
1    M002          Liverpool      Arsenal          1          1  2024-08-20
2    M003          Tottenham      Everton          3          0  2024-09-02
3    M004           Man City  Aston Villa          4          2  2024-09-15
4    M005          Newcastle     West Ham          0          0  2024-09-22
5    M006           Brighton        Leeds          2          3  2024-09-29

📋 Argument #5. nrows

nrows ใช้เลือก rows ที่เราต้องการโหลดเข้ามาใน Python

เช่น แทนที่จะโหลดข้อมูลทั้งหมด:

MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราจะโหลดข้อมูล 3 rows แรกด้วย nrows แบบนี้:

# Load the first 3 rows
df5 = pd.read_csv("matches_clean.csv", nrows=3)

# View the result
print(df5)

ผลลัพธ์:

  MatchID           HomeTeam AwayTeam  HomeGoals  AwayGoals   MatchDate
0    M001  Manchester United  Chelsea          2          1  2024-08-14
1    M002          Liverpool  Arsenal          1          1  2024-08-20
2    M003          Tottenham  Everton          3          0  2024-09-02

☑️ Argument #6. usecols

usecols ใช้กำหนด columns ที่เราต้องการโหลดเข้ามาใน Python

ยกตัวอย่างเช่น เลือกเฉพาะ HomeTeam และ HomeGoals จาก:

MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ usecols ได้แบบนี้:

# Load only HomeTeam and HomeGoals
df6 = pd.read_csv("matches_clean.csv", usecols=["HomeTeam", "HomeGoals"])

# View the result
print(df6)

ผลลัพธ์:

            HomeTeam  HomeGoals
0  Manchester United          2
1          Liverpool          1
2          Tottenham          3
3           Man City          4
4          Newcastle          0
5           Brighton          2

🔢 Argument #7. index_col

index_col ใช้กำหนด column ที่เป็น index ของข้อมูล เช่น MatchID:

MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราจะใช้ index_col แบบนี้:

# Load the dataset with MatchID as index col
df7 = pd.read_csv("matches_clean.csv", index_col="MatchID")

# View the result
print(df7)

ผลลัพธ์:

                  HomeTeam     AwayTeam  HomeGoals  AwayGoals   MatchDate
MatchID
M001     Manchester United      Chelsea          2          1  2024-08-14
M002             Liverpool      Arsenal          1          1  2024-08-20
M003             Tottenham      Everton          3          0  2024-09-02
M004              Man City  Aston Villa          4          2  2024-09-15
M005             Newcastle     West Ham          0          0  2024-09-22
M006              Brighton        Leeds          2          3  2024-09-29

🔠 Argument #8. names

names ใช้กำหนดชื่อ columns ซึ่งเราจะใช้เมื่อ:

ข้อมูลไม่มีหัวตาราง
ต้องการเปลี่ยนชื่อ columns

ยกตัวอย่างเช่น ใส่ชื่อ columns ให้กับ matches_no_header.csv:

M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ names ได้แบบนี้:

# Set col names
col_names = [
    "id",
    "home",
    "away",
    "home_goals",
    "away_goals",
    "date"
]

# Load the dataset with custom col names
df8 = pd.read_csv("matches_no_header.csv", names=col_names)

# View the result
print(df8)

ผลลัพธ์:

     id               home         away  home_goals  away_goals        date
0  M001  Manchester United      Chelsea           2           1  2024-08-14
1  M002          Liverpool      Arsenal           1           1  2024-08-20
2  M003          Tottenham      Everton           3           0  2024-09-02
3  M004           Man City  Aston Villa           4           2  2024-09-15
4  M005          Newcastle     West Ham           0           0  2024-09-22
5  M006           Brighton        Leeds           2           3  2024-09-29

⏹️ Argument #9. dtype

dtype ใช้กำหนดประเภทข้อมูลของ columns

ยกตัวอย่างเช่น กำหนด ประเภทข้อมูลของ MatchID, HomeGoals, และ AwayGoals จาก matches_clean.csv:

MatchID,HomeTeam,AwayTeam,HomeGoals,AwayGoals,MatchDate
M001,Manchester United,Chelsea,2,1,2024-08-14
M002,Liverpool,Arsenal,1,1,2024-08-20
M003,Tottenham,Everton,3,0,2024-09-02
M004,Man City,Aston Villa,4,2,2024-09-15
M005,Newcastle,West Ham,0,0,2024-09-22
M006,Brighton,Leeds,2,3,2024-09-29

เราสามารถใช้ dtype ได้แบบนี้:

# Set col data types
col_dtypes = {
    "MatchID": str,
    "HomeGoals": "int32",
    "AwayGoals": "int32"
}

# Load the dataset, specifying data types for MatchID, HomeGoals, and AwayGoals
df9 = pd.read_csv("matches_clean.csv", dtype=col_dtypes)

# View the result
df9.info()

ผลลัพธ์:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   MatchID    6 non-null      object
 1   HomeTeam   6 non-null      object
 2   AwayTeam   6 non-null      object
 3   HomeGoals  6 non-null      int32
 4   AwayGoals  6 non-null      int32
 5   MatchDate  6 non-null      object
dtypes: int32(2), object(4)
memory usage: 372.0+ bytes

⚡ Summary

ในบทความนี้ เราได้ไปดูวิธีการใช้ 9 arguments ของ read_csv() จาก pandas เพื่อโหลดข้อมูลใน Python กัน:

filepath_or_buffer: ไฟล์ที่ต้องการโหลด
sep: delimiter ในไฟล์
header: row ที่เป็นหัวตาราง
skiprows: rows ที่ไม่ต้องการโหลด
nrows: จำนวน rows ที่ต้องการโหลด
usecols: columns ที่ต้องการโหลด
index_col: column ที่จะเป็น index
names: ชื่อของ columns
dtype: ประเภทข้อมูล (data types) ของ columns

😺 GitHub

ดูตัวอย่าง code และ datasets ในบทความนี้ได้ที่ GitHub

📃 References

2025-10-30

Tag: CSV

3 วิธีโหลด flat file ในภาษา R: แนะนำการใช้ utils, readr, และ data.table packages เพื่อโหลดข้อมูลจาก flat files — ตัวอย่างการทำงานกับ Fast Food Joint Nutrition

📁 Flat File คืออะไร?

📦 3 Packages

1️⃣ Package #1: utils

2️⃣ Package #2: readr

3️⃣ Package #3: data.table

💪 Summary

📚 Further Reading

😺 GitHub

📃 References

Share this:

วิธีใช้ 9 arguments ใน read_csv() จาก pandas library เพื่อโหลดข้อมูลใน Python — ตัวอย่างการโหลดข้อมูลการแข่งขันฟุตบอล

🏁 Getting Started

🗃️ Argument #1. filepath_or_buffer

🤺 Argument #2. sep

😶‍🌫️ Argument #3. header

🛑 Argument #4. skiprows

📋 Argument #5. nrows

☑️ Argument #6. usecols

🔢 Argument #7. index_col

🔠 Argument #8. names

⏹️ Argument #9. dtype

⚡ Summary

😺 GitHub

📃 References

Share this: