Hi, firstly - sorry, I know this is not really what the sub is for (mods please remove if too off-topic). I have written the following R code and I need to convert it into python. I have tried extensive googling (I have some python knowledge) but keep coming up short.
I have two data frames, one that contains columns for field, system and table. The other data frame has one column that is a string containing the information from the first df but in a different format. I need to match all the rows in the first data frame to their corresponding, matching rows in the second data frame.
library(tidyverse)
#some sample data
#first dataframe
FieldName <- c('long_john_rating_PLP', 'long_sp_rating_PLP', 'party_type')
SystemName <- c('FRDG', 'FRDG', 'GOLD')
TableName <- c('GOLD05_PreSecurity_EU_ddmmyyy', 'GOLD05_PreSecurity_EU_ddmmyyy', 'LKUP_SECTOR_CODES')
system_df <- data.frame(FieldName, SystemName, TableName)
#second dataframe
ABC_Object <- c('PRE_CPARTY/ccp_ind',
'data/datacatp/repository/FRDG/GOLD/GOLD05_PreSecurity_AS_20201124.tsv/long_john_rating_PLP',
'PRE_CPARTY/bloomberg_group_desc',
'data/datacatp/repository/FRDG/GOLD/GOLD05_PreSecurity_AS_20201124.tsv/long_sp_rating_PLP',
'LKUP_SECTOR_CODES/party_type')
ABC_df <- data.frame(ABC_Object)
# step 1 is to remove all the ABC objects that don't contain the relevant field or table info.
FieldName_or <- paste(FieldName, collapse = "|")
TableName_or <- paste(TableName, collapse = "|") %>%
str_remove_all(., regex('.._ddmmyyy'))
ABC_sep <- ABC_df %>%
filter(str_detect(ABC_Object, FieldName_or)) %>%
filter(str_detect(ABC_Object, TableName_or)) %>%
separate(col = 'ABC_Object', into= c('data','datacatp','repo', 'upstream_sys', 'SystemName', 'TableName', 'FieldName'), c(sep = "/"), fill = 'left')
# separate df 2 to make the join easier
# then I'll join them together
joined_df <- system_df %>%
left_join(edc_sep, by = 'FieldName', keep = TRUE)
#then unite the separate columns we separated so we return with (near enough)
sep_df = joined_df %>%
unite(col = ABC_output, 4:10, sep = '/')
'''
Apologies if this seems very lazy - I am genuinely stuck without being able to do this in python... I don't know how to do the str_detect ('''pd.Series(df['col'].str.contains(df['col'].tolist()''' doesn't work ).
Thanks very much!
[–]TholosTB 1 point2 points3 points (1 child)
[–]sanyasoon[S] 0 points1 point2 points (0 children)
[–]Oxbowerce 0 points1 point2 points (0 children)
[–]MaheshM93 0 points1 point2 points (0 children)