In scientific computing, the conversion of input data file format is a frequently encountered problem. Due to the non-standard input methods of data input personnel, various types of tabs (\t), mixed spaces, etc. are more common problems. Here is a small example. The example reads a text file (txt), which contains data similar to the following format:
date d1 d2, error ave name
2017/1/1 nan nan nan nan nan
2017/1/2nan nan nan nannan
2017/1/3 nan nan nan nan nan
These data are separated by blank characters, but the blank characters include spaces that are not empty multiples, and some also include tabs, which brings difficulties to further data analysis. Now through python string functions, regulars, etc. The application converts the data format to the standard CSV format. code show as below:
import re
f2 = open('Rn.csv', 'w')
with open('data1.txt', 'r') as f1:
for i in f1:
# The string i is divided into a list of characters with all blank characters as separators
line = re.split('\s+', i)
# Combine the character list with',' into a new string
new_line = ','.join(line)
# Remove the',' from the end of the new string
new_line = new_line.strip(',')
print(new_line)
f2.write(new_line)
f2.close()
The data format in the converted Rn.csv file is as follows:
date,d1,d2,error,ave,name
2017/1/1,nan,nan,nan,nan,nan
2017/1/2,nan,nan,nan,nan,nan
2017/1/3,nan,nan,nan,nan,nan
#ConvertCSV.py
ls = open("../material/procrank.txt").readlines()
newTxt = ""
for line in ls:
newTxt = newTxt + ",".join(line.split()) + "\n"
print(newTxt)
fo = open("../material/procrank.csv", "x")
fo.write(newTxt)
fo.close()