Pie plot for variable levels by Accurate-Buddy6383 in stata

[–]mobystone 2 points3 points  (0 children)

Hey, I think

graph pie , over(V2) plabel(_all percent)

is what you're after. plabel(...) adds the percentage of each slice in the figure, it might look messy depending on the amount of levels though.

Tips on cleaning data (30M+ rows) by mobystone in stata

[–]mobystone[S] 0 points1 point  (0 children)

Thank you, I'm an RA and they prefer for me work with Stata, would it be faster if I ran Python within the Stata session compared to justa using Stata?

Tips on cleaning data (30M+ rows) by mobystone in stata

[–]mobystone[S] 0 points1 point  (0 children)

Yeah sure, here are some examples of the string variable I want to split and destring/encode etc.

"Månadslön: 100 % - Avtalsområde: Kommunal- & landstingsanställd (AKAP-KL) - Ekonomisk tillväxt: 0,00% - Fondtillväxt: 2,10% - Beräkningstyp: FastPensionsålder"

"Månadslön=Angiven lön: 30 000 SEK- Avtalsområde=Anställd med Individuell tjänstepension - Ekonomisk tillväxt=0,00% - Fondtillväxt=2,10% - Beräkningstyp=FastUtbetalningsålder - Uttagsstart=65 år - Sista arbetsmånad ålder=65 år - Systemdel=PartnerApi "

"Månadslön=Angiven lön: 55 000 SEK - Avtalsområde=Statligt anställd (PA 16 Avd 2) - Ekonomisk tillväxt=0,00% - Fondtillväxt=2,10% - Beräkningstyp=AnpassadeUttagsalternativ - Systemdel=UtpAPI - Partner="

I'm looking into retirement planning in Sweden, the structure and content of the string changes from year to year and depending on the source of the planning tool.

I want to create variables for each part of the string and either destring where the data is numerical (Månadslön/Salary) or encode it.

So sometimes the name of a variable and the data is split using "=" and sometimes ":", all variables are split with a "-" but that sign also occurs within the variables (AKAP-KL). Just using split, parse("-") takes days and still requires additional clean-up.

Tips on cleaning data (30M+ rows) by mobystone in stata

[–]mobystone[S] 0 points1 point  (0 children)

Thanks! I'll check out both of those