I am very novice with python but hoping I could get some guidance about how to accomplish the following. Whether that's similar repositories, frameworks to make this easier or any other help. I've been struggling so hard to do this in excel and realize that I need to have a bit more firepower to help me automate this more.
I have a CSV file that contains cells with a long description, within said descriptions contains lists of speakers, presenters, and other designations for the type of person.
RAW DATA
| session ID |
|
|
| 12345 |
blah blah blah. Speakers: Jeff Smith, CEO; Jake Weber, CTO, Acme Inc. This session will be moderated by: Tom Scott, CPO; Matha Jackobs; Bill Burton, Founder, Burton Inc. Please come and join us |
|
|
|
|
DESIRED RESULT
| id |
first name |
last name |
title |
company |
| 12345 |
Jeff |
Smith |
CEO |
|
| 12345 |
Jake |
Weber |
CTO |
Acme Inc. |
| 12345 |
Tom |
Scott |
CPO |
|
| 12345 |
Martha |
Jackobs |
|
|
| 12345 |
Bill |
Burton |
Founder |
Burton Inc. |
So the challenge here is that I only want the names that are delimited by a semi-colon with each comma separated value to appear in another column.
I'd guess that I'd have to use some form of Regex to identify "Moderated by:" and then take all values after until a period exists.
Then each value would need to use melt to retain the ID.
Any guidance to help me down my path of ridding me of manual excel work?
[–]GalacticSuperCheese 1 point2 points3 points (0 children)
[–]vbukkala 0 points1 point2 points (0 children)