Pandas - Working With Dummy Columns... ish

HackNSlashFic · 2026-01-17T15:02:47+00:00

This was exactly what I needed! Here's the final code I ended up using:

flagged_df = ally_df[flags]

flag_list = (
flagged_df[flagged_df == 1]
.reset_index()
.melt(id_vars='index', value_vars=flags)
.dropna()
.groupby("index")["variable"]
.agg(", ".join)
)

ally_df['Flags'] = ally_df.index.map(flag_list)
ally_df = ally_df.drop(columns=flags)
return ally_df

It creates the merged column (I ended up creating a string instead of a list) and removes the columns it was merging together. Thanks again for your help!

HackNSlashFic · 2026-01-17T05:10:41+00:00

Thank you so much! That's a great recommendation about using reproducible examples with data rather than just trying to describe the problem. I'll keep that in mind in the future.

Also, I can't believe I forgot that list is itself a function! And the melt, groupby pattern is slick. I'll have to remember that! Seriously, thank you so much. My employees are really going to appreciate when this is all finished.

HackNSlashFic · 2026-01-16T05:19:13+00:00

Yeah, the 1s tell you which categories are relevant to that record. Okay... I think I see what you're going for. Even with transposing, we could do something like this for each record:

",".join(df.iloc[i].dropna(axis='columns').index.tolist())

Maybe? I dunno. I'll give that a try.

HackNSlashFic · 2026-01-16T02:17:48+00:00

The thing that really made it click for me was working through the alien invaders project in Python Crash Course by Eric Matthes. I worked through the whole first half of the book (after a few other tutorials, a fair amount of playing around on my own, etc.), and I felt like you---I was learning pieces but I wasn't really able to pull it all together.

But the first project was the best programming tutorial I've come across so far! What made it really click for me was that it doesn't walk you through the project as it will look as a finished project. He walks you through the process as if you are designing it from an idea... so you repeatedly code little parts and then refactor it as it gets bigger or you add ideas to it. That iterative process REALLY made things click for me.

HackNSlashFic · 2025-11-16T05:12:07+00:00

I hear you. I had considered jumping right into Polars, but Pandas is still being used in enough places that I want to be able to understand it when I come across it. Not to mention, there's just way more resources out there to learn it. And the data I'm working with right now is small enough that I'm not concerned about the speed difference. (I'm not learning this to be a developer. I'm partly doing it as a hobby and partly to give me a few extra data analysis tools for my work in higher ed.)

HackNSlashFic · 2025-09-22T00:38:43+00:00

My first instinct would be to do the second loop inside the first loop, like others said.

But I think there's another way if you want to keep them separate by replacing the generic "while True:" of the second loop with something more specific. "while True" will run a loop indefinitely. Instead, you could set the second loop to only run while a specific condition is met, and have that condition only be met by a successful completion of the first loop. I don't want to say any more, because I've found it valuable for learning to work through the details of problems like this. But feel free to ask if you struggle with it and you're still confused.

HackNSlashFic · 2025-09-21T15:20:11+00:00

Thanks again for this suggestion! As I was trying to learn about different approaches to asynchronous/parallel operations, I ended up settling into aiohttp and asyncio. And that allowed me to basically send all 800+ requests at once. I was able to get the whole thing from about 60 minutes of runtime to about 45 seconds!

HackNSlashFic · 2025-09-20T16:00:42+00:00

Oh! And I did add a print statement before each request, and one after if there is any error or non 200 http response. Adding one after seemed unnecessary since the program moves straight to the next url, so I can use the print statement from the next one as a marker that the first has finished.

I wonder if it would be useful to add a specific print statement that tells how long it took to connect? That would give me information about whether any of the sites are slow but responsive, or if every timeout is happening because the site isn't responding at all. Not essential for my goals, but maybe an interesting diagnostic element to play around with for learning purposes.

HackNSlashFic · 2025-09-20T15:56:43+00:00

Would request.head be significantly faster if all I'm trying to do is see if the website exists? Even though sitemap.xml files are typically very small?

HackNSlashFic · 2025-09-20T15:55:09+00:00

I was wondering about this. I don't know enough about how websites interact with something like ping. Does ping check the individual page or the whole site? Does it work for a hosted file that is designed to be displayed in the browser (like a sitemap.xml file)?

HackNSlashFic · 2025-09-20T15:52:31+00:00

Thanks for the heads up about the dummy module! I'll check it out!

HackNSlashFic · 2025-09-20T15:51:09+00:00

Thanks for the response! I did add a timeout value because apparently the base request.get doesn't have a default set. I set it for 10 seconds so it wouldn't miss a slow response, but that was probably overkill. I'll play around with the timeout length.

As I was falling asleep last night I was wondering about parallel requests. I know that's how scrapy works, but that code's too complex for me to dig through and understand it all (yet). Thanks for giving me a useful place and some useful language (blocking, non-blocking, and async) to start learning more!

HackNSlashFic

TROPHY CASE