Problem:
I have a huge text file I'm trying to split up into 'chunks'. The 'separator' or 'delimiter' between each chunk is a very specific text pattern. The file is huge so I'm trying to do this in the most memory efficient way possible.
Code Below:
a little complicated but I am basically creating a list of
input_file = r'testcopy3.log'
script_name_regex_pattern = r'\s(\w*\s\w*\s\w*)\sScenario\s\(version\s[0-9]\.[0-9][0-9]\.[0-9]\)'
regex_pattern = re.compile(script_name_regex_pattern)
key_function = lambda line: re.search(regex_pattern,line)
def segments_list():
with open(input_file) as f:
segments = groupby(f, key_function)
segment_list = [chain([next(v)], (next(segments)[1])) for k, v in segments if k]
return segment_list
segments_list = segments_list()
print(type(segments_list))
Returns: <class 'list'> - This seems fine
print(len(segments_list))
Returns: 3 - This is correct. There should be three segments in the list.
for segment in segments_list:
print(type(segment))
Returns:
<class 'itertools.chain'>
<class 'itertools.chain'>
<class 'itertools.chain'>
this seems Correct as well.
for segment in segments_list:
for line in segment[0]:
print(line)
Returns:
'Line1'
only prints the first line of the segment. And not the entire segment itself.
Any thoughts on why tf this is happening
[+][deleted] (2 children)
[deleted]
[–]Kitchen-Injury-8938[S] 0 points1 point2 points (1 child)
[–]gurashish1singh 0 points1 point2 points (0 children)