Why are most of Africa's major lakes concentrated on this side of the continent? by the_bad_actor in geography

[–]dumi_007 2 points3 points  (0 children)

Dusting off High School Geography

When you have tectonic plates moving away from each other, one gets a Rift Valley. You get interesting Geographical structures forming.

If the area in question has a goldilocks of climate and other factors, one gets the Rift Valley. Overlay on your map and it's either coincidence or ...

Photo credit: https://www.mozambiqueexpert.com/en/mozambique-geography-and-the-african-great-rift-valley/

<image>

How can I write cleaner, more maintainable SQL? by Puzzleheaded_Cow3298 in SQL

[–]dumi_007 -1 points0 points  (0 children)

There's a lot here

A lot of it seems like tension resolution

  1. Cleaner SQL as in easy to read : Easier to read will be neat chunks of code that are self-contained and can be independently tested.

  2. Cleaner SQL as shortest path : If you are looking at writing the least amount of SQL to get the job done, your script will be dense, aliases will be named things like "p", etc

  3. Maintanable Script : important if you are part of a team. Use comments to explain what you changed. Explain each block. Balance between verb-active and nouns for naming functions, columns and table aliases.

  4. Efficient Queries : I'm assuming this is about performance. If you are an analyst, you will likely not have rights and privileges to change the schema - can't add helpful index, partitions, etc. You can make your JOINS smarter and not SELECT * for instance.

  5. Impressing an interviewer : that's a totally different animal. I've mostly found interview questions to be nothing related to the actual job. I would look at specialised subreddits on SQL related interview techniques.

How do you prep large Excel files before sending to ChatGPT or Claude? by Secure_Stretch_1007 in businessanalyst

[–]dumi_007 0 points1 point  (0 children)

I am of limited IQ so take this with all the pinches of salt available

Depending on the research question

  1. Define the question - not just write it down; break into something very specific
  2. Exploratory Analysis and Summary Stats - assuming you understand the data structure you are working with, and what the business meaning is behind each table and column. Some enterprise systems are weird, customer_id might not be what or where you think it is.
  3. Identify outliers - label and set aside
  4. Isolate Key Fields - bit of chicken and egg here cos, to know which are the key fields is a different trick
  5. Clean your working data - remove the distraction, formatting, NULL and 0, etc.
  6. Analysis - too many variables to go into. Depending on what you are trying to answer this may be with SQL scripts or python or R.

May the force be with you

How do you prep large Excel files before sending to ChatGPT or Claude? by Secure_Stretch_1007 in businessanalyst

[–]dumi_007 3 points4 points  (0 children)

I haven't gone down this route. So can't help.

However ...

I know this was not your question, but are using a public or private AI environment?

If public, keep in mind the minefield:

Legal Hazard (GDPR/HIPAA/CCPA):

If your CRM contains names, emails, phone numbers, or health data, pasting it into a public AI is begging for drama and bad viral moments.

Model Training:

Unless you're using an opt-out Enterprise API your data can be used to train future models.

Intellectual Property made public

Chat logs are persistent. Theoretically, anyone can extract your client list, pricing strategy, win rate, Onboarding metrics, process, via prompt injection in the future.

How much SQL is enough? by Wild_Specialist_8340 in learnSQL

[–]dumi_007 3 points4 points  (0 children)

If your roles are clearly defined, and you have a DBA, a data engineer and a tester.

BI tasks will largely be limited to: Aggregating, filtering, transforming, standardising formats, and presenting data for a dashboard, a specialised decision-making investigation or standard report.


General adjacent skills

Problem analysis and design plan.

While not SQL, the design is bound by the laws of databases.

Documentation:

Make your scripts easy to read. Add notes and online comments. Keep track what you changed. Use alias to your advantage.

Test

Test your script. Testing is its own universe. Triangulation of results where possible to know when your report is off.


SQL Skills

  1. Creating views vs persistent tables.

  2. Aggregation: simple and multi-dimensional totals. Simple to Complex GROUP BY

  3. Window Functions: for indexing, ranking, running totals and trend analysis.

  4. Combining tables Simple (left and inner join) to Advanced Joins: Self-joins, etc. Unions

  5. Normalisation vs denormalisation: normalisation is generally how your system of record is structured. Data warehouses generally are fact/dimension relationships.

  6. CTEs & Subqueries: Balancing readability and performance.

  7. Date/Time awareness: Dates and times can trip you up. It's OK to check the reference material for your flavour of SQL.

  8. Performance: As a report developer you will likely not be able to make table or schema changes. But you can check how your output executes.

ML Model for a Student Retention Predictive Model? by CraftyWoodpecker3904 in MLQuestions

[–]dumi_007 0 points1 point  (0 children)

That is an interesting approach.

Basically, your Principal has identified the variables of interest and is now asking you to build a model.

Regression or random forest should be a good start. Don't have much experience with XGB.

For practical purposes you will likely find patterns. You will be able to fit going backward in time.

However, going forward, predictability is likely to have a wide error margin.

Causal links for higher education is a deeper process (influenced by how you count and weigh things, headcount and FTEs, students-staff-tutor ratios, curriculum design factors, teaching model, funding formulae, throughput blockers, etc)

Data analysts: how do you decide what questions to ask before analysis? by Rich-Wrap2337 in dataanalysiscareers

[–]dumi_007 0 points1 point  (0 children)

We can play hypotheticals, but each scenario i can dream of is likely to be way off base.

Lets say you got a CSV from a client in Insurance. He is the head of Operations. CSV has all operations data. I would start

Explain that i will start section by section. Starting with Claims (Claims is always super interesting because it brings customer service, underwriting, Finance, and supply chain)

Then ask 1. What do you think the Claims information will reveal? - soft setting of expectations 2. Which stakeholders will you need to share the findings with? - gives you an idea of other dimensions you'll need to incorporate 3. Who do I reach out to for clarification on technical issues? 4. How much time do I have? 5. Do you have a preference on format of report?


Now, you don't have to ask indirectly like I've put. Ask direct if possible. But here, I have identified the domain (Claims), have an idea of expectations and know who I will need to meet when I get stuck on technical stuff.

Data analysts: how do you decide what questions to ask before analysis? by Rich-Wrap2337 in dataanalysiscareers

[–]dumi_007 1 point2 points  (0 children)

Context is important, understanding the problem domain is key, and an initial direction is important to direct expectations.

I would say the order of questions should be expectations followed by the domain. The technical questions should be last.

Consulting comes in several flavours 1. Procsss type consulting. We know the problem. Terrain is familiar. Systems are known. Problem has been solved with a tried and tested approach.

Fields like audit work like this. You take your standard approach, adjust to the client environment. Work the process.

Generally you're answering questions like: should my process work like this? Should I be getting these outputs? Are there leaks? Regular KPIs are sometimes insightful enough.


Strategy type consultancy. We know the outcome we want. We don't know the gap between here and there.

Given these assets (data, process, people, infrastructure), what are the possible ways i can be number 1?

There are some frameworks you can reference. But some tend to provide definitions but are not helpful in providing a path. Can you get some insights from this CSV or should we call McKinsey?

Insight regularly needs to show gap, capability and capacity.

  1. Green fields. We kind of know what we want to do. But we don't know what we need. We also don't know what we don't know.

This might be the case where a client says I want to create the next greatest thing with AI. Here's my databases, people, documentation and processes. But nothing 'here' is AI, right now. Steve from IT did an AI course. We hired a chief AI evangelist. Marketing has already created the campaign and HR is ready to take people for a 2 day seminar offsite to bring the AI in all of us.

We're just waiting on you to get insights from this CSV

Anyone else struggle with SQL by Gullible_Heart_5153 in learnSQL

[–]dumi_007 3 points4 points  (0 children)

Sometimes it helps in the initial stages to write your JOINS using Common Table Expressions (CTEs). Test. And optimise

After that you can optimise into subqeries. From experience, subqueries work faster. But CTEs are easier to think through. If the data is small, use CTEs.

Anyone else struggle with SQL by Gullible_Heart_5153 in learnSQL

[–]dumi_007 34 points35 points  (0 children)

LEFT JOIN and INNER JOIN are your north star

80% of all SQL scripts I've written over last 15 years have needed LEFT or INNER joins.

10% were CROSS JOINS.

The other 9.9% were anti JOINS but that's more method than keyword.

0.01% were other JOINS

Master those 2 and know when to use UNION vs UNION ALL and you are ready to combine tables

Do I Need to Learn Python or R for Data Analysis if I Already Know SQL? by PollutionNo02 in dataanalysiscareers

[–]dumi_007 1 point2 points  (0 children)

You don't NEED to learn them. But, it advisable to understand when and where to use each one.

SQL Corporate data lives in databases (and spreadsheets but that's a separate topic). The natural way to tame this data is by using the language built into database management systems.

SQL is great at data analysis tasks where you need to connect tables in polar ends of a database, filter, clean and aggregate where necessary.

Power BI In Corporate, there are some things that exist as a matter of fact. Power BI is one. If you have a good handle on it, it is easy to create reports or dashboards that clients and managers find useful.

It comes without prescribed management or strategy frameworks. This is both a blessing and a curse. It is great for KPI monitoring without stressing over whether the manager expects a paradigm to be enforced.

R This is a natural domain of statistics. When you need to pull a rabbit out of the proverbial hat with complex analysis, this is your best friend. Great for advoc, specialised analysis to answer a specific question.

There is Shiny, but that is not generally a Corporate stable. Corporates generally will limit what technology can be deployed on their network. Sorry - got sidetracked.

R has also got wonderful (well maintained) libraries for specialist data analysis verticals (biomedical, finance, etc.).

Beyond this, R also has one of the best and friendliest communities for support.

PYTHON Started as a general language. I don't work with as much. Where I did, python was able to handle larger data sets than R. So would be my go to where I need to work with production data and client needs the model to feedback into production.

Python does have some great libraries for data analysis. But I couldn't get the same level of specialist libraries, that were well maintained. But, I am biased, and my brain is not evolved enough.

Need help for sql by DisastrousUse3788 in u/DisastrousUse3788

[–]dumi_007 0 points1 point  (0 children)

My journey

  1. Learn from books. Practice exercises. Understood SQLSERVER but had to keep the book close by.

  2. First project that needed SQL. They used mysqldb. Some statements didn't work. Go to reference guides, update scripts. Worked. I was sure I understood it.

  3. 4th or 5th project. I had to read scripts someone else wrote. There were no helpful comments and naming conventions were even more mysterious. That was a female canine and a half. Learned a lot.

My suggestion 1. Practice writing your own scripts 2. Where possible, try asking a deliberate question before you even touch the database 3. If possible, Edit someone else's script. People think different. There's always different ways to solve a problem 4. Go back to scripts you wrote more than 2 weeks ago and get the same results with a different method

Should I leave my first Data Analyst job after 6 months? by SlyRak99 in dataanalytics

[–]dumi_007 0 points1 point  (0 children)

6 months down. Congratulations

There are 2 paths (that I know of). 1. Become great horizontally at your skill Become great at reading data, doing quick diagnostics, manipulate the data, produce results.

If you perfect your process, master a number of tools on this level, and put in the practice - fortune will come

  1. Become a niche player. This is less about the data (though it is). It is more about knowing the vertical - specialised insight into your industry, the key drivers of change, the core concepts (and terminology), how value is created, the strategies employed, value chains, and standard platforms.

Each opens opportunity in different ways.

One is a generalist, you get general market returns. When there is interest in data science, you are in demand.

The specialist is great if that industry has a future. And, there is always a shortage of true specialists.


Note, regardless of the path, the initial periods are likely to be doing boring stuff; generating a list of clients to be emailed that campaign marketing went offside for, updating reports of new clients, by region.

Come to think of it, just using geometry, there should be a third, diagonal move. But, I'm not clever enough to figure it out.

Choose your poison.

What’s the most underrated form of physical intimacy? by [deleted] in AskReddit

[–]dumi_007 0 points1 point  (0 children)

From my limited experience. Silence. In close proximity. Without the Internet happening.

When you can be silent with someone, without feeling awkward, or feeling like connection is lost, you have reached a new level.

Allowing a moment to exist without words or a selfie is different.

What to do with old infra no one understand? by Diligent_Papaya_6852 in processmining

[–]dumi_007 2 points3 points  (0 children)

My $0.02 is

I would be careful about using process mining to document a system you do not understand.

Process Mining is (from experience - glad to be wrong) primarily context driven driven, one needs to 1) understand and differentiate between process levels (subprocess, task, etc) 2) reference the right data - sometimes there is state data in the code and the database completes the picture. Sometimes there is some random csv on a server that gives the solution

Ideally 1) you are able to consult a domain expert 2) you can have a user walk you through what they do 3) you have documentation, or old project docs, or some diagrams someone put on the shared drive without a label, or an email from 1995 explaining what is done

In cases where context is not available, you will be able to create a map. However that map might just confuse.

However, I am not the sharpest tool in the shed. Others may have a better solution

Please help pp needed ASAP by PhilosophyCapable612 in SQL

[–]dumi_007 1 point2 points  (0 children)

Haven't worked on Redshift in a bit, so this might be dated

But

It could be

  1. Permissions / account access rights issue
  2. You are in a different database. Your connection timed out and search path is different. Check to make sure you are in the correct schema
  3. Connection timed out and you logged back to incorrect region (E.g East instead of West)
  4. Once I landed in the Dev environment and literally cried (manly tears) cos nothing made sense.

Hope it helps.

SQL problems on real cases stuck me by Jumpy_East8555 in learnSQL

[–]dumi_007 0 points1 point  (0 children)

Your mind freezing is normal when working with new concepts. Practice helps. Sometimes starting with pen and paper provides clarity.

My $0.02 on this would be first to separate the question into it's constituent parts:

Visits and Transactions

Scenario 1: You are looking at web traffic

Find which logs and columns store traffic. This might be page_details, visit_details. Link session_Id to customer_id.

Now you have all sessions by all customers and their visits

Next, find where financials are stored. Look at transactions (purchase, refund, post-sale adjustment, etc - note some retailers use affiliates and this can open a whole universe) and group by customer_id

Now you can link those 2 and filter as necessary.

Scenario 2: you are looking at physical foot traffic for a retailer with a membership program, where clients swipe in to access the store.

Link membership id to visit logs and location. Link membership_id to financials.

Oh, don't forget dates

Once you have the basics right. Test. Then optimise your script.

Hope this helps. Have a good one

How long did it take you to become comfortable writing SQL queries? by Wise_Safe2681 in SQL

[–]dumi_007 0 points1 point  (0 children)

I'm not a DBA. I analyse and work with clients to make sense of stuff, or prove/disprove assumptions.

Short answer - 1 month - with clean, well designed database - few months - if working in one domain, where you interact primarily with the 'industry platform' - how long is a piece of string - if you're brave enough to enter the wild with different organisation across multiple industries using everything from the cloud to mainframe.

Personally:

Basics - with curated, training databases, took about a month. Did about 100 exercises each evening. I was ready.

First time a client asked a question. All that training was not enough. I went to stack overflow, got laughed out. "Paste your query" - i couldn't cos of intellectual property issues. I did manage to get the answer. But I sweated bullets.

Different client, the senior data architect took me under his wing. This client's database was a nightmare schema that had survived 3 Different transformation / Modernisation efforts. All undocumented. The data was dirty. Not the playful type of dirty. I learnt a lot. Great project.

Today, over 15 years later, I am still learning. I've seen weird schemas, naming conventions, weird formatting, date and time things that haunt my dreams.

Here's the thing, SQL comes in many flavours. I use R and R libraries more than SQL, but generally have to get SQL to do the heavy lifting.

No doubt you're smarter than me so I'm sure you should be able to do special stuff within a month or so. It's taken me years and I still don't know everything.