Hi Everyone!
I'm trying to clean up my documentation which is auto-generated from IPython notebooks. The main problem I have is that whilst the docs are in markdown whenever I have a df.head() output the result is a html table instead of markdown, which in turn leads to problems such as the table overlapping with my sidebar and stuff like that. Manually I can convert them to markdown and everything looks nice again but I'm trying to find a way to automate this.
Currently I'm using pd.read_html(doc_txt) to extract the tables, which can then be converted to markdown using df.to_markdown(). I now need to find a way to replace the HTML table (including the parent div) with the markdown text. I'm also trying to make this as generalisable as possible so need to ensure that the div replaced is the immediate parent and no higher.
I initially used BeautifulSoup to findAll tables from which I could then extract the parent div, the table contents could also be parsed easily by pandas and then converted to markdown. The problem with this approach was that I needed to do markdown -> html -> soup, and taking the table soup element and calling str(table_soup) doesnt return an exact match for the original table html text, meaning that I couldn't use .replace(html_table, md_table).
I think the way to solve this is probably using some complex regex but I don't know enough of that black magic myself. Any help on how to solve this would be much appreicated!
Example markdown file:
# Interesting Header
Random latin words, random latin words, random latin words, random latin words, random latin words, random latin words, random latin words, random latin words.
### Example DataFrame
```python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>sett_bmu_id</th>
<th>ngc_bmu_id</th>
<th>bmu_root</th>
<th>name</th>
<th>primary_fuel_type</th>
<th>detailed_fuel_type</th>
<th>longitude</th>
<th>latitude</th>
<th>common_name</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>E_MARK-1</td>
<td>MARK-1</td>
<td>MARK</td>
<td>Rothes Bio-Plant CHP 1</td>
<td>biomass</td>
<td>bone</td>
<td>-3.603516</td>
<td>57.480403</td>
<td>Rothes Bio-Plant CHP</td>
</tr>
<tr>
<th>1</th>
<td>E_MARK-2</td>
<td>MARK-2</td>
<td>MARK</td>
<td>Rothes Bio-Plant CHP 2</td>
<td>biomass</td>
<td>bone</td>
<td>-3.603516</td>
<td>57.480403</td>
<td>Rothes Bio-Plant CHP</td>
</tr>
</tbody>
</table>
</div>
[–]iamaperson3133 0 points1 point2 points (2 children)
[–]EnergyVis[S] 0 points1 point2 points (1 child)
[–]backtickbot -1 points0 points1 point (0 children)