[Data Science] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

포스트 난이도: HOO_Middle

# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

이 문제가 발생하는 이유는 한글이 포함된 데이터에서 인코딩 형식 에러로 발생한다.

한글 인코딩 에러와 관련된 자세한 내용은 아래의 포스트를 참고하면 된다.

https://whoishoo.tistory.com/409

[Data Science] Pandas csv 유니코드 디코드 에러 문제 해결 방법

Pandas csv 유니코드 디코드 에러 문제 해결 방법 포스트 난이도: HOO_Middle [Notice] 포스트 난이도에 대한 설명 안녕하세요, HOOAI의 Henry입니다. Bro들의 질문에 대한 내용을 우선적으로 포스팅이

whoishoo.tistory.com

# 문제 해결 방법

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte"와 같은 에러 코드가 출력되었다면 csv를 읽는 과정에서 encoding 조건을 추가해줘서 작성해주면 문제가 간단히 해결된다. 유니코드 디코드 에러가 발생했을 경우 pd.read_csv에 encoding 세팅을 추가 작성해 준다. 한마디로 encoding = 'cp949'를 작성해줌으로써 인코딩 형식을 바꾸어 올바르게 한글 데이터가 산출된다. 아래의 예제 코드를 살펴보면 cp949로 어떻게 설정해주었는지 알 수 있다.

import pandas as pd

df = pd.read_csv('국가어항+일반현황.csv', encoding= 'cp949')
print(df.to_string())

cp949로 설정하면 문제없이 결과가 산출된다. / 자료 출처: 해양수산빅데이터 거래소, 국가어항+일반현황.csv

# IOPub data rate exceeded. 에러가 발생했다면

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

IOPub data rate exceeded 에러가 발생했다면 당황하지 말고 print 기능을 삭제해주고 다시 코드를 실행해주면 된다. IOPub data rate exceeded 가 발생하는 이유는 출력하고자 하는 데이터 양이 방대할 경우, 출력이 되지 않는 경우가 있다. 또다른 경우는 print가 두 번 이루어지다 보니 여기서 문제가 발생해서 위와 같은 에러가 발생할 수 있다.

코드를 살펴보면 파이썬에서 굳이 print 기능을 작성하지 않아도 되기 때문에 위와 같은 에러가 발생할 수 있는 것이다. 에러 문제 없이 결과가 출력이 된다면 괜찮지만 IOPub data rate exceeded와 같은 에러가 발생했다면 print를 제외하고 코드를 작성해주면 문제가 해결된다.

아래에는 이번 포스트에서 사용했던 해양수산빅데이터 거래소의 국가어항+일반현황.csv 파일에 대한 링크이다.

# 예제 데이터 출처

https://www.bigdata-sea.kr/datasearch/issue/view.do?prodId=PROD_000045

해양수산빅데이터 거래소

www.bigdata-sea.kr

728x90

저작자표시 비영리 변경금지

'Computer Science > Errors' 카테고리의 다른 글

[Statistical Machine Learning] Regression Function: f(x), expected value, Mean-squared Prediction Error (0)	2023.01.25
[Data Science] IOPub data rate exceeded 에러 (0)	2022.12.30
[Python] KeyError: 0 (sklearn predict function error) 해결 방법 (1)	2022.12.05
[R / RStudio] Error in file(file, ifelse(append, "a", "w")) : cannot open the connection (0)	2022.09.22
[R / RStudio] Error in setw() 해결 방법 (0)	2022.09.19

HOOAI

[Data Science] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

'Computer Science > Errors' 카테고리의 다른 글

댓글

티스토리툴바

[Data Science] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 740: invalid start byte

'Computer Science > Errors' 카테고리의 다른 글

관련글

댓글

티스토리툴바