Issue
I want to convert lines of LogCat Text Files to structured Pandas DF. I cannot seem to properly conceptualize how I am going to do this...Here's my basic pseudo-code:
dateTime = []
processID = []
threadID = []
priority = []
application = []
tag = []
text = []
logFile = "xxxxxx.log"
for line in logfile:
split the string according to the basic structure
dateTime = [0]
processID = [1]
threadID = [2]
priority = [3]
application = [4]
tag = [5]
text = [6]
append each to the empty list above
write the lists to pandas dataframe & add column names
The problem is: I do not know how to properly define the delimiter with this structure
08-01 14:28:35.947 1320 1320 D wpa_xxxx: wlan1: skip--ssid
Solution
import re
import pandas as pd
ROW_PATTERN = re.compile(r"""(\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}\.\d+) (\d+) (\d+) ([A-Z]) (\S+) (\S+) (\S+)""")
with open(logFile) as f:
s = pd.Series(f.readlines())
df = s.extract(ROW_PATTERN)
df.columns = ['dateTime', 'processID', 'threadID', 'priority', 'application', 'tag', 'text']
This will read each line of logFile
into a row in a Series, which can then be expanded into a DataFrame via each group in the regular expression. This assumes that 08-01 14:28:35.947
is the first value in each row and that subsequent values are separated by white space.
Answered By - Steele Farnsworth
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.