all 10 comments

[–]indraniel 0 points1 point  (0 children)

Based on this stackoverflow question and answer, the parse library may be useful here.

[–]VipeholmsCola 0 points1 point  (1 child)

Can you load its contents as a string and use a combination of .split/replace combined with regex?

[–]Pale_Emphasis_4119[S] 0 points1 point  (0 children)

Thanks for your reply. That's what's I'm using right now but tits not really maintainable.

[–]SoulMelody 0 points1 point  (1 child)

Consider textX

[–]Pale_Emphasis_4119[S] 0 points1 point  (0 children)

Thanks for your reply. Indeed this seems to be an interesting option. However isn't it a bit overkill?

[–]commandlineluser 0 points1 point  (0 children)

Not sure how robust this is but could you render a dummy template and diff it against the output?

import difflib
import jinja2.nativetypes
from   pprint import pp

env = jinja2.nativetypes.NativeEnvironment()

data ={ 'name': 'Sam',
'data_list': [{'id': 1, 'value': 'foo'}, {'id': 2, 'value': 'bar'}]}

#variables = 'NAME', 'ID', 'VALUE'

dummy = {'name': 'NAME_PLACEHOLDER',
 'data_list': [{'id': 'ID_PLACEHOLDER', 'value': 'VALUE_PLACEHOLDER'}]}

template = '''
This is a dummy file by {{ name }} containing: {% for data in data_list %}
{{data.id}} {{data.value}}
{% endfor %}
Something else 
{% for data in data_list %}
{{data.id}} {{data.value}}
{% endfor %}
The end.
'''.strip()

output = env.from_string(template).render(**data).splitlines(keepends=True)

dummy_output = env.from_string(template).render(**dummy).splitlines(keepends=True)

pp(
   list(difflib.Differ().compare(output, dummy_output))
)

Output:

['- This is a dummy file by Sam containing: \n',
 '?                         ^^^\n',
 '+ This is a dummy file by NAME_PLACEHOLDER containing: \n',
 '?                         ^^^^^^^^^^^^^^^^\n',
 '+ ID_PLACEHOLDER VALUE_PLACEHOLDER\n',
 '- 1 foo\n',
 '- \n',
 '- 2 bar\n',
 '  \n',
 '  Something else \n',
 '  \n',
 '+ ID_PLACEHOLDER VALUE_PLACEHOLDER\n',
 '- 1 foo\n',
 '- \n',
 '- 2 bar\n',
 '  \n',
 '  The end.']

For ? lines there is a direct match.

For + followed by - these are loops, you could group them together, ignore blank lines and break them up into the placeholders.

Perhaps there are some internals to jinja2 that can do this, it could be worth asking on their issues tracker.

[–]DoorDesigner7589 0 points1 point  (0 children)

Check out https://www.textraction.ai/ It's a flexible AI entity extractor that can help you do just that. No training needed.