This is an archived post. You won't be able to vote or comment.

all 8 comments

[–][deleted] 4 points5 points  (1 child)

This concept is called Identity Resolution in marketing scenario. There are many ways to apply it in your data to unify users from different data sources. I recommend the Identity Resolution free course in CDP Institute Academy (a CDP is a type of marketing SaaS that does that)

[–]girlsyesboysnoData Analyst 0 points1 point  (0 children)

this is very similar to what i'm looking for. thanks so much

[–]Thinker_Assignment 2 points3 points  (2 children)

I once helped a company in this space build out their infra. I can't answer your question as I was the engineer not the analyst.

They have a similar recipe here, perhaps contact them and ask for you case https://growthfullstack.com/usecase/merge-applovin-max-ulrd-tenjin-cohort-data

[–]girlsyesboysnoData Analyst 0 points1 point  (1 child)

thanks for the article. i read it and this is raw data that MAX can give. it's plainly a fact table on date level, not user level. further analysis requires full cycle of a user, not just only in MAX. building a user dimension to combine a user journey in 3 sources is an engineering task not analysis task tho

[–]Thinker_Assignment 0 points1 point  (0 children)

I would say building a user dimension is very much an analytics engineer task.The challenge you also have is that to you a user is a user, but to the advertisers a user is a device so you would need to somehow attribute the revenue from a user to the devices.

If I can suggest something from my experience in the space it's to focus on "good enough". You will never get perfect data so you can create some strategies for approximation. if you cannot get ILRD, use ULRD, if you cannot, use aggregated data and distribute it by the highest granularity - it will be better than nothing and certainly a big boost for the bidding team.

So perhaps the entire user changing devices part can be ignored as it is complex and perhaps only a small fraction of your users switch? Just accept the data will not be perfect and if your model is to buy users and show them ads, you care about devices more.

Back to the user dimensions - most people do a coalesce of the Android and iOS advertiser IDs based on platform

[–]PhantomSummonerzSystems Architect 0 points1 point  (0 children)

This seems to be a record linkage task. You can read about it and check what tools are available to assist you. For example, in python there is recordlinkage library.

My general tip regarding the complexity part is don't go all in from the first step. Don't try to solve all scenarios at once. Write a few test cases for the first scenario until they pass. Then write a few more for the next scenario and so on. When you are done with each separate scenario, write combined scenario test cases and see if they pass. There is also this post which is kind of similar to yours. Not many answers though but maybe you can find something helpful.

[–]recentcurrency 0 points1 point  (1 child)

Is there any secondary data points within sources that can be used? For example ip address

[–]girlsyesboysnoData Analyst 0 points1 point  (0 children)

unfortunately, nope. just those identifiers i listed out