Notice

Recent Posts

Recent Comments

Link

Github

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Just Do IT

05. Pandas 기초 - 데이터 확인 본문

데이터사이언스-코딩/Pandas

05. Pandas 기초 - 데이터 확인

풀용 2022. 1. 27. 23:59

본 포스팅은 유튜브 나도코딩님의 판다스 강의를 정리하여 만들었습니다.
https://www.youtube.com/watch?v=PjhlUzp_cU0

1. Data 준비

import pandas as pd
df = pd.read_excel('score.xlsx', index_col = '지원번호')
df

출력 결과

2. Data 확인

describe()
계산 가능한* 데이터에 대해 Column 별로 데이터의 갯수, 평균, 표준편차, 최소/최대값 등의 정보를 보여준다.

df.describe()

출력 결과
info()
Column명, Non-Null 데이터의 갯수, 데이터 타입등을 보여준다.

df.info()

출력 결과

<class 'pandas.core.frame.DataFrame'> Index: 8 entries, 1번 to 8번 Data columns (total 9 columns): # Column Non-Null Count Dtype
0 이름 8 non-null object  
1 학교 8 non-null object  
2 키 8 non-null int64  
3 국어 8 non-null int64  
4 영어 8 non-null int64  
5 수학 8 non-null int64  
6 과학 8 non-null int64  
7 사회 8 non-null int64  
8 SW특기 6 non-null object  
dtypes: int64(6), object(3)  
memory usage: 640.0+ bytes

head()
DataFrame의 상위 5개 데이터를 보여준다. default는 5개이고 파라미터 값에 원하는 정수를 넣으면 해당 정수 만큼의 데이터를 보여준다.

df.head()

출력 결과
tail()
DataFrame의 하위 5개 데이터를 보여준다. default는 5개이고 파라미터 값에 원하는 정수를 넣으면 해당 정수 만큼의 데이터를 보여준다.

df.tail()

출력 결과
df.columns
DataFrame의 Column을 보여준다.

df.columns

출력 결과

Index(['이름', '학교', '키', '국어', '영어', '수학', '과학', '사회', 'SW특기], dtype='object')

df.shape
DataFrame의 row, col의 갯수를 보여준다.

df.shape

출력 결과

(8, 9)

Series 확인

df['x'].describe()
DataFrame의 Column을 보여준다.

df['키'].describe()

출력 결과

count      8.000000
mean     188.000000
std        9.985704
min      168.000000
25%      186.250000
50%      188.000000
75%      191.750000
max      202.000000
Name: 키, dtype: float64

min,max,mean,count등도 사용 할 수 있다.

print(df['키'].min())
print(df['키'].max())
print(df['키'].mean())
print(df['키'].count())

출력 결과

df['x'].nlargest()

df['키'].nlargest(3)

출력 결과

지원번호 6번 202 
1번 197 
8번 190 
Name: 키, dtype: int64

df['x'].unique(), df['x'].nunique()

df['학교'].unique() # 중복을 제외한 데이터
df['학교'].nunique() # 중복을 제외한 값

출력 결과

array(['북산고', '능남고'], dtype=object)
2

'데이터사이언스-코딩 > Pandas' 카테고리의 다른 글

07. Pandas 기초 - 데이터 선택(loc, iloc) (0)	2022.01.28
06. Pandas 기초 - 데이터 선택 (0)	2022.01.28
04. Pandas 기초 - 파일 저장 및 열기 (read_csv, read_excel) (0)	2022.01.27
03. Pandas 기초 - Index ( set_index, reset_index) (0)	2022.01.27
02. Pandas 기초 - DataFrame (0)	2022.01.27

'데이터사이언스-코딩/Pandas' Related Articles

Comments