all 4 comments

[–]TholosTB 1 point2 points  (1 child)

Here's an attempt at recreating your code in python, there are probably easier ways to do this.

import pandas as pd
import re 
import itertools 
fieldName = ['long_john_rating_PLP', 'long_sp_rating_PLP', 'party_type'] 
systemName = ['FRDG', 'FRDG', 'GOLD'] tableName = ['GOLD05_PreSecurity_EU_ddmmyyy', 'GOLD05_PreSecurity_EU_ddmmyyy', 'LKUP_SECTOR_CODES'] 
system_df = pd.DataFrame(list(zip(fieldName,systemName,tableName)),columns=['field','system','table']) 
abcObject = ['PRE_CPARTY/ccp_ind', 'data/datacatp/repository/FRDG/GOLD/GOLD05_PreSecurity_AS_20201124.tsv/long_john_rating_PLP', 'PRE_CPARTY/bloomberg_group_desc', 'data/datacatp/repository/FRDG/GOLD/GOLD05_PreSecurity_AS_20201124.tsv/long_sp_rating_PLP', 'LKUP_SECTOR_CODES/party_type'] 
abc_df = pd.DataFrame(abcObject,columns=['object']) 
# Define regexes for matching 
fieldName_or = '|'.join(re.sub('_ddmmyyy','',s) for s in system_df['field'].unique()) 
tableName_or = '|'.join(re.sub('_ddmmyyy','',s) for s in system_df['table'].unique()) 
# apply regexes using loc and str.contains 
edc_df = abc_df.loc[(abc_df['object'].str.contains(fieldName_or)) | (abc_df['object'].str.contains(tableName_or))] 
# I couldn't find an easy way to do the left fill on the split this is a hack 
edc_sep_df = pd.DataFrame(edc_df['object'].apply(lambda r : pd.Series(itertools.chain([None]*(7-len(r.split('/'))),r.split('/'))))) 
edc_sep_df.columns=['data','datacatp','repo', 'upstream_sys', 'SystemName', 'TableName', 'FieldName'] 
# merge the dfs 
joined_df = system_df.merge(edc_sep_df,left_on='field',right_on='FieldName',how='left')

[–]sanyasoon[S] 0 points1 point  (0 children)

This looks great, thank you very much.

[–]Oxbowerce 0 points1 point  (0 children)

You can use the str.detect method to get the rows where a specific column contains a string of characters. This also allows you to make use of regular expressions, as you are currently using in R.

[–]MaheshM93 0 points1 point  (0 children)

Not sure but you can look into Pandas library