最近需要抽取csv文件中的特定列，使用excel老是将hh.mm.ss.SSS格式数据搞坏，于是想通过python直接提取。方法记录如下：

环境配置

Pandas介绍

Pandas是用于数据操纵和分析的Python软件库。它建造在NumPy基础上，并为操纵数值表格和时间序列，提供了数据结构和运算操作。

依赖环境

python 3.8.5，可在windows7上运行，python 3.9以后版本不支持windows7
numpy==1.24.4
pandas==2.0.3
python-dateutil==2.9.0.post0
pytz==2024.1
six==1.16.0
tzdata==2024.1

虚拟环境

mkdir csv_extract
cd csv_extract
python -m venv env
env\Scripts\activate.bat
pip install pandas
pip freeze > requirement.txt
pip download -d packages -r requirement.txt

离线迁移

创建envConfig.bat批处理文件，一键执行离线环境配置。

1
2
3

python -m venv env
call env\Scripts\activate.bat
pip install --no-index --find-links=packages -r requirements.txt

具体实现

import pandas as pd
import os

def walk_files(src_filepath = "."):
    filepath_list = []
   
    for root, dirs, files in os.walk(src_filepath):
        for file in files:
            if root == '.':
                root_path = os.getcwd() + "/"
            else:
                root_path = root
            
            if (root_path != src_filepath) and (root != '.'):
                filepath = root_path + "/" + file
            else:
                filepath = root_path + file
            
            if filepath not in filepath_list:
                filepath_list.append(filepath)
                       
    return filepath_list

def extract_csv(filepath,usecols=[0,3,11,42,43],encoding='gbk'):
    df = pd.read_csv(filepath,usecols=usecols,encoding=encoding)
    df.to_csv(filepath+'.csv',index=0)

if __name__ == '__main__':
    print(os.getcwd())

    search_dir='./testData/'
    file_info_list = walk_files(search_dir) 
    
    for file in file_info_list:
        print(file)
        extract_csv(file)

参考链接

pandas,by pandas.
How to Recursively Traverse Files and Directories in Python,by Sabahat Khan.
Pandas读取CSV的时候报错文件不存在的经验小记,by 翻滚的小@强.
Python os.walk() 方法,by runoob.