1. DataFrame 处理缺失值 pandas.DataFrame.dropna
df2.dropna(axis=0, how='any', subset=[u'ToC'], inplace=True)
把在ToC列有缺失值的行去掉
2. 根据某维度计算重复的行 pandas.DataFrame.duplicated
print df.duplicated(['name']).value_counts() # 如果不指定列,默认会判断所有列 """ 输出: False 11118 True 664 表示有664行是重复的 """
利用DataFrame中的duplicated方法返回一个布尔型的Series,显示各行是否为重复行,非重复行显示为False,重复行显示为True
3. 去重 pandas.DataFrame.drop_duplicates
df.drop_duplicates(['name'], keep='last', inplace=True) """ keep : {‘first’, ‘last’, False}, default ‘first’ first : Drop duplicates except for the first occurrence. last : Drop duplicates except for the last occurrence. False : Drop all duplicates. """