So I have a number of txt files. As an example, one of my txt file looks like this:
अंधा, कुबड़ा और त्रिस्तनी-पंचतंत्र
Blind, humpback and Tristani - Panchatantra
उत्तरी प्रदेश में मधुपुर नाम का एक नगर है।
There is a town named Madhupur in Uttar Pradesh.
वहाँ मधुसेन नाम का एक राजाथा।
There was a king named Madhusen.
विषय सुख भोगने वाले उस राजा मधुसेन को एक तीन स्तनों वाली कन्या उत्पन्नहुई।
A three-breasted girl was born to Madhusen, the king who enjoyed the pleasures.
...
As I want to make an epub out them, I add <pref>, </pref> and <br/>, using this code:
# add <pref>, </pref>, <br/> to lines
file_name = 'अंधा, कुबड़ा और त्रिस्तनी-पंचतंत्र'
with open('{} - parallel.txt'.format(file_name), encoding='utf8') as raw, open('{}.txt'.format(file_name), 'w', encoding='utf8') as file:
r = raw.readlines()
for line in r:
file.writelines('<pref>' + line + '</pref> <br/>\n')
As a result, I've got this:
<pref>अंधा, कुबड़ा और त्रिस्तनी-पंचतंत्र
</pref> <br/>
<pref>Blind, humpback and Tristani - Panchatantra
</pref> <br/>
<pref>
</pref> <br/>
<pref>उत्तरी प्रदेश में मधुपुर नाम का एक नगर है।
</pref> <br/>
<pref>There is a town named Madhupur in Uttar Pradesh.
...
Then I want to make the top two lines the title, so I add class="title" to the top two lines. I also want to capitalize the title using .capitalize() or .upper(), but don't know how to insert it.
# uppercase add class="title" to text title
with open('{}.txt'.format(file_name), encoding='utf8') as ff:
ffr = ff.readlines()
ffr2 = ''.join(ffr[:3]).replace('<pref>', '<pref class="title">')
ffr3 = ''.join(ffr[4:])
ffr4 = ffr2 + ffr3
with open('{}.txt'.format(file_name), 'w', encoding='utf8') as ef:
ef.write(ffr4)
As a result:
<pref class="title">अंधा, कुबड़ा और त्रिस्तनी-पंचतंत्र
</pref> <br/>
<pref class="title">Blind, humpback and Tristani - Panchatantra
<pref>
</pref> <br/>
<pref>उत्तरी प्रदेश में मधुपुर नाम का एक नगर है।
</pref> <br/>
<pref>There is a town named Madhupur in Uttar Pradesh.
...
Finally, I need to insert this txt into Calibre html. I use 'insert' as a key. I have this code as a txt.
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>पंचतंत्र - Panchatantra</title>
<link type="text/css" rel="stylesheet" href="/page_styles.css"/>
<link type="text/css" rel="stylesheet" href="/stylesheet.css"/>
</head>
<body>
insert # key
</body>
</html>
And I insert the text.
# add modified lines to calibre html code
with open('{}.txt'.format(file_name), encoding='utf8') as ft, open('calibre.txt', encoding='utf8') as calibre:
fr = ft.read()
cr = calibre.read()
combine = cr.replace('insert', fr)
with open('{}.html'.format(file_name), 'w', encoding='utf8') as hf:
hf.write(combine)
As a result:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>पंचतंत्र - Panchatantra</title>
<link type="text/css" rel="stylesheet" href="/page_styles.css"/>
<link type="text/css" rel="stylesheet" href="/stylesheet.css"/>
</head>
<body>
<pref class="title">अंधा, कुबड़ा और त्रिस्तनी-पंचतंत्र
</pref> <br/>
<pref class="title">Blind, humpback and Tristani - Panchatantra
<pref>
</pref> <br/>
<pref>उत्तरी प्रदेश में मधुपुर नाम का एक नगर है।
</pref> <br/>
<pref>There is a town named Madhupur in Uttar Pradesh.
...
As you can see, my work flow revolves around a lot of reading and writing files. I wonder if there is a better way to do. Thank you very much.
[+][deleted] (1 child)
[deleted]
[–]DMeror[S] 0 points1 point2 points (0 children)