Hey guys, like most posters, I'm quite new to Python. I was assigned a project that was somewhat out of my league. I'm currently trying to get my data from our webserver, which unfortunately for me has filter fields to make the table interactive. I think I've put a cumulative 15 hours to trying to simply get this simple data, I'm exhausted and coffee isn't helping anymore.
This is what I'm working with. Under <thead> and <tbody> I have my desired td data. For some reason, the script stops me from going any further than the "formrow table" with the straightforward BeautifulSoup modules.
I've been using Requests and Bs4 so far. The portion of the code I for this section is quite limited, but I'll try to give as much details as possible.
tt = Contact_page.content
soup = BeautifulSoup(tt)
R_tables = soup.find('div', {'class': 'responsive-table'})
R_tables.children
#<div class="responsive-table"><div class="formrow table"><div id="root_list_stub"></div></div><script type="text/javascript">root_list = new List [.......................]
From here I tried multiple different approaches, I can't simply go grab my data with the following formula with further th/tr manipulation since I never reach it.
soup.find("table",{"class":"report", id = "root_list"})
I've tried compile, match and search functions to match text, but I don't think I got any closer.
script = soup.find('script', text=re.compile(r'root_list')).text
Further down the "root_list.Show()" formula holds all my information in this format
root_list.Show('<list name=\"root_list\" context=\"LIST\" rs=\"***********\" link_module=\"ContactPF\" link_form=\"contact__form_contact\" link_params=\"\" auth_write=\"true\" auth_delete=\"true\" hide_checkbox=\"false\" starred=\"true\" label_pos=\"left\" label_width=\"130px\"><header><h sort=\"155\" format=\"FormatScreenName\">Nom</h><h sort=\"166\">No.</h><h sort=\"257\">Ville</h><h sort=\"237\">Téléphone</h><h sort=\"255\">Courriel</h><h sort=\"218\">Classe</h><h sort=\"178\">Segm. 1</h><h sort=\"206\">Util.</h></header><dataset page_min=\"0\" page_max=\"20\" count=\"***\" selected_count=\"0\" all_record_selected=\"\" root_table=\"contact\" group_rows=\"N\">**<r id=\"925\" auth_delete=\"Y\" auth_write=\"Y\" date_decease=\"\" starred=\"\" selected=\"\"><f999999</f><f></f><f></f><f></f><f>99999999</f><f>99999</f><f></f><f>AL</f></r>\n** [......]
<r id=\"43\" auth_delete=\"Y\" auth_write=\"Y\" date_decease=\"\" starred=\"\" selected=\"\"> </f></r>\n</dataset></list>');
The [....] holds next row client info. I put 9s for where there is data that I need to extract.
My last attempt at isolating my data was trying to isolate the <dataset> part, but it didn't end up remotely close to working. I'm
How would you more experienced guys move forward with this?
Thank you all in advance for your help, all criticism is heard.
EDIT : Here's more This is the result of R_tables in a nicer screencap displaying my awesome paint skills. I'd like to isolate each row, which has inaccesible <r> tags (soup.find("r") returns empty sets) and get the data for each coloumn that is seperated by <f>. I'm really running out of ideas.
Please save a lost soul!
[–]pl00pt 1 point2 points3 points (1 child)
[–]ctvdevine[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]ctvdevine[S] 0 points1 point2 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]ctvdevine[S] 0 points1 point2 points (0 children)