文章/答案/技术大牛

发布

数据分析 ——— pandas篇

共 6 篇文章

数据分析 ——— pandas可视化（六）

数据分析 ——— pandas日期处理（五）

数据分析 ——— pandas基础（四）

数据分析 ——— pandas基础（三）

数据分析 ——— pandas基础（二）

数据分析 ——— pandas数据结构（一）

清单首页数据分析 ——— pandas篇文章详情

清单「数据分析 ——— pandas篇」 06/06

数据分析 ——— pandas数据结构（一）

andrew_a

之前我们了解了numpy的一些基本用法，在这里简单的介绍一下pandas的数据结构。

一、Pandas数据结构

Pandas处理有三种数据结构形式：Series，DataFrame, index。Series和DataFrame是现在常用的两种数据类型。

1. Series

Series和一维数组很像，只是它的每一个值都有一个索引，输出显示时索引在左，值在右。

pandas.Series( data, index=index, dtype, copy)

data: 可以是多种类型，如列表，字典，标量等
index: 索引值必须是唯一可散列的，与数据长度相同，如果没有索引被传递，则默认为**np.arrange(n)**
dtype: 设置数据类型
copy: 复制数据，默认为Flase

1）创建一个空的序列

import numpy as np
import pandas as pd

# 利用Series序列构造函数
# 创建一个空序列
s = pd.Series()
print(s)
"""
输出：Series([], dtype: float64) 
"""

2）从ndarray创建一个序列：

如果数据是ndarray，则传递的索引必须具有相同的长度。如果没有索引被传递，那么默认情况下，索引将是 range（n），其中 n 是数组长度，即[0,1,2,3 ...。范围（LEN（阵列）） - 1]。

# 使用ndarray创建一个序列
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print(s)
"""
输出：
0    a
1    b
2    c
3    d
dtype: object
"""

不给赋索引值时，默认的索引范围为1~（len(data)-1）

3）传入索引值：

# 传入索引值
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[12, 13, 14,15])
print(s)
"""
输出:
12    a
13    b
14    c
15    d
dtype: object
"""

4）从字典创建一个序列：

当所创建的索引中，未给赋值时，也即缺少元素是，用NAN填充

data = {'a':0,'b':1,'c':2.}
s= pd.Series(data, index=['b','a','c','d'])
print(s)
"""
输出：
b    1.0
a    0.0
c    2.0
d    NaN
dtype: float64
"""

5）用标量创建一个序列：

# 从标量创建一个序列
s= pd.Series(5, index=['a','b','c','d'])
print(s)
"""
输出：
a    5
b    5
c    5
d    5
dtype: int64
"""

6）检索数据：

data = [1, 2, 3, 4]
s= pd.Series(data, index=['a','b','c','d'])
print(s["d"]) # 检索索引为d的数据
print(s[-3:]) # 检索后3个数据
"""
输出：
4
b    2
c    3
d    4
dtype: int64
"""

2. DataFrame

DataFrame是一个2维标签的数据结构，它的列可以存在不同的类型。你可以把它简单的想成Excel表格或SQL Table，或者是包含字典类型的Series。它是最常用的Pandas对象。和Series一样，DataFrame接受许多不同的类型输入。

pandas.DataFrame( data, index, columns, dtype)

data: 包含一维数组，列表对象，或者是Series对象的字典对象
index ：对于行标签，如果没有索引被传递，则要用于结果帧的索引是可选缺省值np.arrange（n）。
columns: 对于列标签，可选的默认语法是 - np.arrange（n）。这只有在没有通过索引的情况下才是正确的。
dtype: 每列的数据类型

1) 创建一个空的DataFrame

# 创建一个空的DataFrame
import pandas as pd
df =  pd.DataFrame()
print(df)
"""
输出：
Empty DataFrame
Columns: []
Index: []
"""

2）从列表中创建一个DataFrame

DateFrame可以使用单个列表或者列表列表创建

data = [1, 2, 3,4, 5]
df =  pd.DataFrame(data)
print(df)
"""
输出：
   0
0  1
1  2
2  3
3  4
4  5
"""

data = [['Al', 9],['Bl', 8],['Cl', 10]]
# dtype参数将Age列的类型更改为浮点型
df =  pd.DataFrame(data, columns=['Name', 'Age'], dtype=float)
print(df)
"""
输出：
  Name   Age
0   Al   9.0
1   Bl   8.0
2   Cl  10.0
"""

3）从ndarrys/lists的字典创建一个dataFrame

所有的 ndarrays 必须具有相同的长度。如果索引被传递，那么索引的长度应该等于数组的长度。

如果没有索引被传递，那么默认情况下，索引将是range（n），其中 n 是数组长度。

# 从ndarrays/List的Dict创建一个DataFrame
data1 = {'Name':['Al','Bl','Cl'], 'Age':[9, 8,10]}
df1 = pd.DataFrame(data1)
print(df1)
"""
输出：
   Age Name
0    9   Al
1    8   Bl
2   10   Cl
"""

4）添加行标签

data1 = {'Name':['Al','Bl','Cl'], 'Age':[9, 8,10]}
# 添加行标签
df1 = pd.DataFrame(data1, index=['rank1','rank2','rank3'])
print(df1)
"""
输出：
       Age Name
rank1    9   Al
rank2    8   Bl
rank3   10   Cl
"""

5）在列表中创建一个dataframe

import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['rank1','rank2'])
print(df)
"""
输出:
       a   b     c
rank1  1   2   NaN
rank2  5  10  20.0
"""

# 使用字典，行索引，列索引列表创建DataFrame
data = [{'a':1,'b':2},{'a':2,'b':10,'c':9}]
df1 = pd.DataFrame(data, index = ['rank1','rank2'],columns=['a','b'])
print('df1:\n',df1)

df2 = pd.DataFrame(data1, index = ['rank1','rank2'],columns=['a','b1'])
print('df2:\n',df2)
"""
输出：
df1:
        a   b
rank1  1   2
rank2  2  10
df2:
        a  b1
rank1  1 NaN
rank2  2 NaN
"""

6）从序列字典中创建一个DataFrame，并进行列添加，删除

# 从序列字典创建一个DataFrame
d = {'one':pd.Series([1,2,3], index=['a','b','c']),
    'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DataFrame(d)
print(df)
"""
输出：
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4
"""

添加列：

# 添加列
d = {'one':pd.Series([1,2,3], index=['a','b','c']),
    'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DataFrame(d)

df['three'] = pd.Series([20,3,21], index=['a','b','d'])
df['four'] = df['one']+df['three']
print(df)
"""
输出：
   one  two  three  four
a  1.0    1   20.0  21.0
b  2.0    2    3.0   5.0
c  3.0    3    NaN   NaN
d  NaN    4   21.0   NaN
"""

删除列：

# 删除列
d = {'one':pd.Series([1,2,3], index=['a','b','c']),
    'two':pd.Series([1,2,3,4], index=['a','b','c','d']),
    'three':pd.Series([20,3,21], index=['a','b','d'])}
df = pd.DataFrame(d)
print(df)
"""
输出：
  one  three  two
a  1.0   20.0    1
b  2.0    3.0    2
c  3.0    NaN    3
d  NaN   21.0    4
"""
# 删除列
del df['one']
print(df)
"""
输出：
   three  two
a   20.0    1
b    3.0    2
c    NaN    3
d   21.0    4
"""

7）通过字典创建dataFrame,并进行行选择，添加，删除

# 行选择， 添加，删除
d = {'one':pd.Series([1,2,3], index=['a','b','c']),
    'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DataFrame(d)
print(df,'\n')
print(df.loc['b'],'\n')
print(df.iloc[2],'\n')
print(df[2:4])
"""
输出：
   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4 

one    2.0
two    2.0
Name: b, dtype: float64 

one    3.0
two    3.0
Name: c, dtype: float64 

   one  two
c  3.0    3
d  NaN    4
"""

添加行：

# 添加行
df = pd.DataFrame([[1,2],[3,4]], columns=['a','b'])
df2 = pd.DataFrame([[5,6],[7,8]], columns=['a','b'])
df = df.append(df2)
print(df)
"""
输出：
   a  b
0  1  2
1  3  4
0  5  6
1  7  8
"""

删除行：

# 删除行
df = df.drop(0)
print(df)
"""
输出：
  a  b
1  3  4
1  7  8
"""

精彩推荐

数据分析 ——— pandas篇

数据分析 ——— pandas数据结构（一）

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐