[23파이썬특강] 1강. 개발환경과 기본개념(코드 검색)

2024-01-01 11 분 소요

1강. 데이터 분석 환경과 기본 개념

첫째마당 (01-03)
pp.014-073 (60쪽)

03. 데이터 분석에 필요한 연장 챙기기

03-1. 변하는 수, ‘변수’ 이해하기(53-58쪽)

변수 : ‘변하는 수’. 다양한 값을 지닌 하나의 속성 -> 데이터는 변수들의 덩어리.
변수는 데이터 분석의 ‘대상’이다.
데이터 분석 : 변수 간의 관계를 파악하는 작업.

[Do it! 실습] 변수 만들기(54쪽)

# 변수 a에 숫자 1을 할당
a = 1
a

# 변수 b에 숫자 2를 할당
b = 2
b

# 변수 c에 숫자 3을 할당
c = 3
c

# # 변수 d에 숫자 3.5를 할당
d = 3.5
d

3.5

# 변수 a와 b를 더하라
a + b

# 변수 a와 b와 c를 더하라
a + b + c

# 4를 변수 b의 값으로 나누어라
4 / b

2.0

# 5에 변수 b값을 곱해라
5 * b

# 변수명 정하기 규칙(55쪽) - 문자로 시작, 영문 권장, 소문자

[Do it! 실습] 여러 값으로 구성된 변수 만들기(56쪽)

# [1, 2, 3]을 변수 var1에 할당하라
var1 = [1, 2, 3]
var1

[1, 2, 3]

# [4, 5, 6]을 변수 var2에 할당하라
var2 = [4, 5, 6]
var2

[4, 5, 6]

# 두 변수 var1과 var2를 더하라
var1 + var2

[1, 2, 3, 4, 5, 6]

[Do it! 실습] 문자로 된 변수 만들기(57쪽)

# 문자 x를 변수 str1에 할당하라
str1 = 'x'
str1

'x'

# text라는 단어를 변수 str2에 넣어라
str2 = 'text'
str2

'text'

# Hello World!라는 구문을 변수 str3에 넣어라
str3 = 'Hello World!'
str3

'Hello World!'

# ['a', 'b', 'c']를 변수 str4에 할당하라
str4 = ['a', 'b', 'c']
str4

['a', 'b', 'c']

# ['Hello', 'World', 'is', 'good']를 변수 str5에 할당하라
str5 = ['Hello', 'World', 'is', 'good']
str5

['Hello', 'World', 'is', 'good']

# 변수 str2와 str3을 더하라.
str2 + str3

'textHello World!'

# 변수 str2와 str3을 더하되 그 사이에 공백 한 칸을 삽입하라.
str2 + " " + str3

'text Hello World!'

# 문자로 된 변수로는 연산할 수 없다.
str1 + 2

03-2. 마술 상자 같은 ‘함수’ 이해하기(59-61쪽)

데이터 분석은 ‘함수를 이용해서 변수를 조작하는 일’이다.
함수 : 입력값에 특정 기능을 수행하여 처음과 다른 값을 산출

[Do it! 실습] 함수 이용하기(60쪽)

함수 : 함수 이름 + 괄호
함수 =:= 특정 기능을 하는 상자

# 변수 만들기: [1, 2, 3]을 변수 x에 할당하라.
x = [1, 2, 3]
x

[1, 2, 3]

# 함수 적용하기: x의 각 값을 모두 합하라
sum(x)

# 최대값: x의 최대값 구하기
max(x)

# 최소값: x의 최소값 구하기
min(x)

# 함수의 결과물로 새 변수 만들기: x 각 값을 합한 뒤 이 값을 변수 x_sum에 넣어라
x_sum = sum(x)
x_sum

# 함수의 결과물로 새 변수 만들기: x의 최대값을 변수 x_max에 넣어라
x_max = max(x)
x_max

03-3. 함수 꾸러미, ‘패키지’ 이해하기(62-73쪽)

패키지 =:= 함수가 여러 개 들어 있는 꾸러미

[Do it! 실습] 패키지 활용하기(63쪽)

패키지 사용하려면, 패키지를 설치한 다음 로드해야 함.
패키지 설치는 한 번만 하면 됨. But 로드는 JupyterLab 새로 시작할 때마다 반복해야 함.
아나콘다에는 주요 패키지가 대부분 들어 있다.

패키지 함수 사용하기

seaborn 패키지의 countplot() 함수 -> 빈도 막대 그래프 작성

import seaborn  # 패키지 로드

# ['a', 'a', 'b', 'c']를 변수 var에 넣어라
var = ['a', 'a', 'b', 'c']
var

['a', 'a', 'b', 'c']

# var 값으로 x축을 구성해 빈도 막대 그래프를 출력하라: seaborn 패키지 함수 countplot() 사용
seaborn.countplot(x = var)

<Axes: ylabel='count'>

설치하거나 로드하지 않고 사용하는 내장함수

sum() max() min() 등

패키지 약어 활용하기

# seaborn 패키지를 불러와서 sns라는 약어를 부여하라
import seaborn as sns

# var 값으로 x축을 구성해 빈도 막대 그래프를 출력하라
sns.countplot(x = var)

<Axes: ylabel='count'>

[Do it! 실습] seaborn의 titanic 데이터로 그래프 만들기(66쪽)

seabron 패키지의 dataload_dataset()를 이용하면 seaborn 패키지에 들어 있는 데이터를 불러올 수 있음.
titanic 데이터

df = sns.load_dataset('titanic')
df

	survived	pclass	sex	age	sibsp	parch	fare	embarked	class	who	adult_male	deck	embark_town	alive	alone
0	0	3	male	22.0	1	0	7.2500	S	Third	man	True	NaN	Southampton	no	False
1	1	1	female	38.0	1	0	71.2833	C	First	woman	False	C	Cherbourg	yes	False
2	1	3	female	26.0	0	0	7.9250	S	Third	woman	False	NaN	Southampton	yes	True
3	1	1	female	35.0	1	0	53.1000	S	First	woman	False	C	Southampton	yes	False
4	0	3	male	35.0	0	0	8.0500	S	Third	man	True	NaN	Southampton	no	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	13.0000	S	Second	man	True	NaN	Southampton	no	True
887	1	1	female	19.0	0	0	30.0000	S	First	woman	False	B	Southampton	yes	True
888	0	3	female	NaN	1	2	23.4500	S	Third	woman	False	NaN	Southampton	no	False
889	1	1	male	26.0	0	0	30.0000	C	First	man	True	C	Cherbourg	yes	True
890	0	3	male	32.0	0	0	7.7500	Q	Third	man	True	NaN	Queenstown	no	True

891 rows × 15 columns

함수의 다양한 기능 이용하기(66쪽)

파라미터(매개변수) : 함수의 옵션을 설정하는 명령어.

ex) countplot()에 입력한 ‘x’

# countplot()의 data 파라미터에 df를 지정하고, 
# x 파라미터에 sex를 지정해서 
#'성별 빈도 막대 그래프'를 만들어라

sns.countplot(data = df, x = 'sex')

<Axes: xlabel='sex', ylabel='count'>

# x 파라미터를 class로 변경해서 
#'선실 등급별 빈도 막대 그래프'를 만들어라

sns.countplot(data = df, x = 'class')

<Axes: xlabel='class', ylabel='count'>

# x 파라미터를 class로 지정하고, hue 파라미터에 alive 변수를 지정해서
# '선실 등급별 생존 여부를 나타내는 막대 그래프'를 만들어라
# cf) hue : 변수 항목별로 막대의 색을 다르게 표현하는 파라미터

sns.countplot(data = df, x = 'class', hue = 'alive')

<Axes: xlabel='class', ylabel='count'>

# class를 y 파라미터로 변경하여
# '선실 등급별 생존 여부를 나타내는 막대 그래프'를 만들어라

sns.countplot(data = df, y = 'class', hue = 'alive')

<Axes: xlabel='count', ylabel='class'>

함수 사용법 궁금할 때 : Help 함수 활용(68쪽)

# sns.countplot() 매뉴얼 출력

sns.countplot?

Signature:
sns.countplot(
    data=None,
    *,
    x=None,
    y=None,
    hue=None,
    order=None,
    hue_order=None,
    orient=None,
    color=None,
    palette=None,
    saturation=0.75,
    width=0.8,
    dodge=True,
    ax=None,
    **kwargs,

Show the counts of observations in each categorical bin using bars.

A count plot can be thought of as a histogram across a categorical, instead
of quantitative, variable. The basic API and options are identical to those
for :func:`barplot`, so you can compare counts across nested variables.

Note that the newer :func:`histplot` function offers more functionality, although
its default behavior is somewhat different.

.. note::
    This function always treats one of the variables as categorical and
    draws data at ordinal positions (0, 1, ... n) on the relevant axis,
    even when the data has a numeric or date type.

See the :ref:`tutorial <categorical_tutorial>` for more information.    

Parameters
----------
data : DataFrame, array, or list of arrays, optional
    Dataset for plotting. If ``x`` and ``y`` are absent, this is
    interpreted as wide-form. Otherwise it is expected to be long-form.    
x, y, hue : names of variables in ``data`` or vector data, optional
    Inputs for plotting long-form data. See examples for interpretation.    
order, hue_order : lists of strings, optional
    Order to plot the categorical levels in; otherwise the levels are
    inferred from the data objects.    
orient : "v" | "h", optional
    Orientation of the plot (vertical or horizontal). This is usually
    inferred based on the type of the input variables, but it can be used
    to resolve ambiguity when both `x` and `y` are numeric or when
    plotting wide-form data.    
color : matplotlib color, optional
    Single color for the elements in the plot.    
palette : palette name, list, or dict
    Colors to use for the different levels of the ``hue`` variable. Should
    be something that can be interpreted by :func:`color_palette`, or a
    dictionary mapping hue levels to matplotlib colors.    
saturation : float, optional
    Proportion of the original saturation to draw colors at. Large patches
    often look better with slightly desaturated colors, but set this to
    `1` if you want the plot colors to perfectly match the input color.    
dodge : bool, optional
    When hue nesting is used, whether elements should be shifted along the
    categorical axis.    
ax : matplotlib Axes, optional
    Axes object to draw the plot onto, otherwise uses the current Axes.    
kwargs : key, value mappings
    Other keyword arguments are passed through to
    :meth:`matplotlib.axes.Axes.bar`.

Returns
-------
ax : matplotlib Axes
    Returns the Axes object with the plot drawn onto it.    

See Also
--------
barplot : Show point estimates and confidence intervals using bars.    
catplot : Combine a categorical plot with a :class:`FacetGrid`.    

Examples
--------

.. include:: ../docstrings/countplot.rst
[1;31mFile:[0m      c:\users\creta\anaconda3\lib\site-packages\seaborn\categorical.py
[1;31mType:[0m      function

모듈 알아 보기(69쪽)

# sklearn 패키지의 metrics 모듈 로드하기

import sklearn.metrics

# sklearn 패키지 metrics 모듈의 accuracy_score() 함수 사용하기
                           # 일단 여기서는 함수에 값 입력하지 않아 에러 메시지 출력됨
sklearn.metrics.accuracy_score()

모듈명.함수명()으로 함수 사용하기(70쪽)

# sklearn 패키지의 metrics 모듈 로드하기
from sklearn import metrics

# sklearn 패키지 metrics 모듈의 accuracy_score() 로드하기
metrics.accuracy_score()

[Do it! 실습] 패키지 설치하기(71쪽)

아나콘다에 들어있지 않은 패키지 사용할 때 직접 설치 필요
PyDataset : 여러 가지 데이터 셋을 손쉽게 불러올 수 있음.

[용어/ChatGPT] 데이터셋

관련된 데이터의 모음.
일반적으로 표나 데이터베이스 형태로 구성되어 있으며, 여러 행과 열로 이루어져 있음.
각 행은 개별 기록이나 관측치를 나타내고, 각 열은 해당 데이터의 다양한 속성이나 변수를 나타냄.
ex) 학교의 학생 정보를 담은 데이터셋
- 각 학생(행)에 대한 정보가 포함
- 학생의 이름, 나이, 성별, 성적 등(열)이 데이터로 기록됨

pip install pydataset

Requirement already satisfied: pydataset in c:\users\creta\anaconda3\lib\site-packages (0.2.0)
Requirement already satisfied: pandas in c:\users\creta\anaconda3\lib\site-packages (from pydataset) (2.0.3)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\creta\anaconda3\lib\site-packages (from pandas->pydataset) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in c:\users\creta\anaconda3\lib\site-packages (from pandas->pydataset) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in c:\users\creta\anaconda3\lib\site-packages (from pandas->pydataset) (2023.3)
Requirement already satisfied: numpy>=1.21.0 in c:\users\creta\anaconda3\lib\site-packages (from pandas->pydataset) (1.24.3)
Requirement already satisfied: six>=1.5 in c:\users\creta\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas->pydataset) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

패키지 함수 사용하기(72쪽)

import pydataset

# pydataset 패키지에 들어 있는 데이터셋 목록 출력
pydataset.data()

	dataset_id	title
0	AirPassengers	Monthly Airline Passenger Numbers 1949-1960
1	BJsales	Sales Data with Leading Indicator
2	BOD	Biochemical Oxygen Demand
3	Formaldehyde	Determination of Formaldehyde
4	HairEyeColor	Hair and Eye Color of Statistics Students
...	...	...
752	VerbAgg	Verbal Aggression item responses
753	cake	Breakage Angle of Chocolate Cakes
754	cbpp	Contagious bovine pleuropneumonia
755	grouseticks	Data on red grouse ticks from Elston et al. 2001
756	sleepstudy	Reaction times in a sleep deprivation study

757 rows × 2 columns

# data()에 'mtcars' 입력해 mtcars 데이터셋을 불러옴
## mtcars 데이터셋 : 자동자 32종의 정보 담고 있음

pydataset.data('mtcars')

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108.0	93	3.85	2.320	18.61	1	1	4	1
Hornet 4 Drive	21.4	6	258.0	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2
Valiant	18.1	6	225.0	105	2.76	3.460	20.22	1	0	3	1
Duster 360	14.3	8	360.0	245	3.21	3.570	15.84	0	0	3	4
Merc 240D	24.4	4	146.7	62	3.69	3.190	20.00	1	0	4	2
Merc 230	22.8	4	140.8	95	3.92	3.150	22.90	1	0	4	2
Merc 280	19.2	6	167.6	123	3.92	3.440	18.30	1	0	4	4
Merc 280C	17.8	6	167.6	123	3.92	3.440	18.90	1	0	4	4
Merc 450SE	16.4	8	275.8	180	3.07	4.070	17.40	0	0	3	3
Merc 450SL	17.3	8	275.8	180	3.07	3.730	17.60	0	0	3	3
Merc 450SLC	15.2	8	275.8	180	3.07	3.780	18.00	0	0	3	3
Cadillac Fleetwood	10.4	8	472.0	205	2.93	5.250	17.98	0	0	3	4
Lincoln Continental	10.4	8	460.0	215	3.00	5.424	17.82	0	0	3	4
Chrysler Imperial	14.7	8	440.0	230	3.23	5.345	17.42	0	0	3	4
Fiat 128	32.4	4	78.7	66	4.08	2.200	19.47	1	1	4	1
Honda Civic	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
Toyota Corolla	33.9	4	71.1	65	4.22	1.835	19.90	1	1	4	1
Toyota Corona	21.5	4	120.1	97	3.70	2.465	20.01	1	0	3	1
Dodge Challenger	15.5	8	318.0	150	2.76	3.520	16.87	0	0	3	2
AMC Javelin	15.2	8	304.0	150	3.15	3.435	17.30	0	0	3	2
Camaro Z28	13.3	8	350.0	245	3.73	3.840	15.41	0	0	3	4
Pontiac Firebird	19.2	8	400.0	175	3.08	3.845	17.05	0	0	3	2
Fiat X1-9	27.3	4	79.0	66	4.08	1.935	18.90	1	1	4	1
Porsche 914-2	26.0	4	120.3	91	4.43	2.140	16.70	0	1	5	2
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.90	1	1	5	2
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.50	0	1	5	4
Ferrari Dino	19.7	6	145.0	175	3.62	2.770	15.50	0	1	5	6
Maserati Bora	15.0	8	301.0	335	3.54	3.570	14.60	0	1	5	8
Volvo 142E	21.4	4	121.0	109	4.11	2.780	18.60	1	1	4	2

[개인 실습] 혼자서 해보기(73쪽)

Q1 : 시험 점수 변수 만들고 출력하기

학생 5명의 시험 점수를 담고 있는 변수 score 를 만들어 출력하시오. 학생들 시험 점수는 다음과 같습니다.
80, 60, 70, 50, 90

# A1
score = [80, 60, 70, 50, 90]
score

[80, 60, 70, 50, 90]

Q2 : 합계 점수 구하기

앞 문제에서 만든 변수를 이용해 합계 점수를 구해 보세요

# A2
sum(score)

Q3 : 합계 점수를 변수 만들어 출력하기

합계 점수를 담고 있는 세 변수를 만들어 sum_score라는 변수에 담아 출력하시오.
앞 문제 풀 때 사용한 코드를 응용하면 됨

# A3
sum_score = sum(score)
sum_score

The End of Note

Twitter Facebook LinkedIn

[23파이썬특강] 1강. 개발환경과 기본개념(코드 검색)

1강. 데이터 분석 환경과 기본 개념

03. 데이터 분석에 필요한 연장 챙기기

03-1. 변하는 수, ‘변수’ 이해하기(53-58쪽)

[Do it! 실습] 변수 만들기(54쪽)

[Do it! 실습] 여러 값으로 구성된 변수 만들기(56쪽)

[Do it! 실습] 문자로 된 변수 만들기(57쪽)

03-2. 마술 상자 같은 ‘함수’ 이해하기(59-61쪽)

[Do it! 실습] 함수 이용하기(60쪽)

03-3. 함수 꾸러미, ‘패키지’ 이해하기(62-73쪽)

[Do it! 실습] 패키지 활용하기(63쪽)

패키지 함수 사용하기

설치하거나 로드하지 않고 사용하는 내장함수

패키지 약어 활용하기

[Do it! 실습] seaborn의 titanic 데이터로 그래프 만들기(66쪽)

함수의 다양한 기능 이용하기(66쪽)

함수 사용법 궁금할 때 : Help 함수 활용(68쪽)

모듈 알아 보기(69쪽)

모듈명.함수명()으로 함수 사용하기(70쪽)

[Do it! 실습] 패키지 설치하기(71쪽)

패키지 함수 사용하기(72쪽)

[개인 실습] 혼자서 해보기(73쪽)

Q1 : 시험 점수 변수 만들고 출력하기

Q2 : 합계 점수 구하기

Q3 : 합계 점수를 변수 만들어 출력하기

The End of Note

공유하기

댓글남기기

참고

[개벽의 사회주의] 01. 사회주의, 개벽, TNA

[개벽의 사회주의] 00. 환경설정

[23파이썬특강] 7강. TNA 5단계

[23파이썬특강] 6-8강. 『개벽』 데이터 분석