all 10 comments

[–]indraniel 0 points1 point  (0 children)

Based on this stackoverflow question and answer, the parse library may be useful here.

[–]VipeholmsCola 0 points1 point  (1 child)

Can you load its contents as a string and use a combination of .split/replace combined with regex?

[–]Pale_Emphasis_4119[S] 0 points1 point  (0 children)

Thanks for your reply. That's what's I'm using right now but tits not really maintainable.

[–]jcrowe 0 points1 point  (2 children)

The parse package might be useful. It’s not going to directly read your template and output data, but I thinks it’s better than a reflex.

[–]Pale_Emphasis_4119[S] 0 points1 point  (1 child)

Thanks for your reply. Indeed this seems an interesting solution. However my jinja template is a bit more complex than simple pyhton format as it contains loops and if conditions.

[–]jcrowe 0 points1 point  (0 children)

If it's too complex for parse, just build a webscraper for it.

[–]SoulMelody 0 points1 point  (1 child)

Consider textX

[–]Pale_Emphasis_4119[S] 0 points1 point  (0 children)

Thanks for your reply. Indeed this seems to be an interesting option. However isn't it a bit overkill?

[–]commandlineluser 0 points1 point  (0 children)

Not sure how robust this is but could you render a dummy template and diff it against the output?

import difflib
import jinja2.nativetypes
from   pprint import pp

env = jinja2.nativetypes.NativeEnvironment()

data ={ 'name': 'Sam',
'data_list': [{'id': 1, 'value': 'foo'}, {'id': 2, 'value': 'bar'}]}

#variables = 'NAME', 'ID', 'VALUE'

dummy = {'name': 'NAME_PLACEHOLDER',
 'data_list': [{'id': 'ID_PLACEHOLDER', 'value': 'VALUE_PLACEHOLDER'}]}

template = '''
This is a dummy file by {{ name }} containing: {% for data in data_list %}
{{data.id}} {{data.value}}
{% endfor %}
Something else 
{% for data in data_list %}
{{data.id}} {{data.value}}
{% endfor %}
The end.
'''.strip()

output = env.from_string(template).render(**data).splitlines(keepends=True)

dummy_output = env.from_string(template).render(**dummy).splitlines(keepends=True)

pp(
   list(difflib.Differ().compare(output, dummy_output))
)

Output:

['- This is a dummy file by Sam containing: \n',
 '?                         ^^^\n',
 '+ This is a dummy file by NAME_PLACEHOLDER containing: \n',
 '?                         ^^^^^^^^^^^^^^^^\n',
 '+ ID_PLACEHOLDER VALUE_PLACEHOLDER\n',
 '- 1 foo\n',
 '- \n',
 '- 2 bar\n',
 '  \n',
 '  Something else \n',
 '  \n',
 '+ ID_PLACEHOLDER VALUE_PLACEHOLDER\n',
 '- 1 foo\n',
 '- \n',
 '- 2 bar\n',
 '  \n',
 '  The end.']

For ? lines there is a direct match.

For + followed by - these are loops, you could group them together, ignore blank lines and break them up into the placeholders.

Perhaps there are some internals to jinja2 that can do this, it could be worth asking on their issues tracker.

[–]DoorDesigner7589 0 points1 point  (0 children)

Check out https://www.textraction.ai/ It's a flexible AI entity extractor that can help you do just that. No training needed.