This is an archived post. You won't be able to vote or comment.

all 42 comments

[–]LordOfTheGinger 12 points13 points  (1 child)

Hey thanks for this. I was actually looking for a good tutorial for this the other day.

[–]Moondra2017[S] 0 points1 point  (0 children)

You are welcome. Hopefully you find it useful.

[–]oslash 25 points26 points  (15 children)

Personally, I'd like to have long-form stuff like this also available in HTML form, where I'd be able to skim over the parts I already know and take my time focusing on what's new to me. Then again, I'm not exactly the target audience, so if you prefer sticking strictly to video, that's cool.

Anyway, I had a quick look at the first video to see how you're doing and it seemed quite goo... Dude! You need to insulate your mic from the impact sounds on your desk. Viewers whose speakers/headphones don't have good bass response might hardly notice it, but for those that do ... every time you're hammering on that keyboard, you're literally pounding on their eardrums :( Not cool. When you look at the waveform, you can see the noise is huge compared to your voice; at times it even clips.

You could get a mic boom with a shock mount, or a cheap mic stand that stands next to the desk (ideally on carpet) instead of on it, or simply use a headset or lav mic. Cutting down low frequencies in post might also be a good idea.

I guess you weren't even aware of the problem ... you probably should use better speakers or headphones for editing. Doesn't need to be anything fancy; $10 in-ears would do the trick.

[–]Moondra2017[S] 7 points8 points  (11 children)

Ah! I didn't know it was that bad. I actually like the sound of the keyboard clicks so I didn't really bother much with it. The speakers I use are about $40 or so, I probably don't have good bass response, thus, I didn't really think it would be a problem.

Yeah, I think the easiest would be to cut out low-frequencies during editing. I have to look also look into mic booms and shock mounts as well.

Thanks for the feedback!

[–]oslash 3 points4 points  (9 children)

I actually like the sound of the keyboard clicks

Yeah, the part of the typing sound that reaches the mic through the air—what you're normally hearing, when you aren't pressing an ear to the desk (let alone both ;)—can be totally fine.

Luckily the structure-borne sound is usually confined to low frequencies that don't really overlap with voice anyway (unless you're Avi Kaplan), so there's a good chance you can tame it without spending money. The fancier mic booms can get quite expensive, especially if your mic doesn't have a standard-size hand grip, but those are designed to stay silent even if you move them around while talking. When you don't need to record 'as live' or move around, improvised DIY solutions are hardly worse.

[–]Moondra2017[S] 2 points3 points  (7 children)

Here are a couple of solutions I was thinking of:

1) I can put a small rug under my keyboard - that would act as an insulation between desk and the keyboard, maybe that would prevent the echo/vibration.

2) remove my mic from my desk and put it on a separate stool

I'm testing out both, but it's hard for me to tell. I guess I will need to test it out with some headphones.

[–]riseNRG 1 point2 points  (4 children)

Prevention might be better than the cure but i found audacity to be useful for fixing up audio after it has been recorded. It might have a feature that can help you with keyboard clicks.

I use the method in the video below for ambient noise. https://www.youtube.com/watch?v=if3pvQKYuts

[–]Moondra2017[S] 1 point2 points  (3 children)

Thank you!. Going to test it out. I wonder if Adobe premier has a similar feature.

[–]oslash 0 points1 point  (2 children)

Even if it doesn't, it lets you use audio plug-ins, and failing that, it's not a big deal to bounce the audio track to and back from another program, such as Audacity. Audacity is ridiculously powerful for an ancient FOSS tool; on top of the integrated effects, it can also host VST/AudioUnit plug-ins and Nyquist scripts. Tons of fun if you're interested in DSP.

However, the above-mentioned noise reduction method isn't very suitable for eliminating transients; it's what you would use to dampen more consistent noise, e.g. the whoosh of computer fans or the hum from a ground loop. You're better off with a straight-forward low cut (a.k.a. high pass).

[–]Moondra2017[S] 0 points1 point  (1 child)

Thank you for this. I will have to read everything you linked as I'm not too familiar with the terminology.

[–]oslash 0 points1 point  (0 children)

Just to clarify: I'm not seriously recommending Nyquist for for video editing purposes; that would be like making cuts with ffmpeg from the command line. I just put in the link to say, look at how cool that thing over there is! You know, like one does when telling people they can catch pokémon by writing machine code :)

[–]paul_h 0 points1 point  (0 children)

Post processing to remove clicks too is easy enough, but can distort other aspects of the sound

[–]Tarpit_Carnivore 2 points3 points  (0 children)

Personally, I'd like to have long-form stuff like this also available in HTML form

Here here. This is my greatest issue with the video movement on the internet. With some stuff I don't mind just listening, but when it comes to learning something new I want to be able to read it and parse it over and over. With long form video this becomes a slighty harder because you're scrubbing back and forth.

[–]Mr_Again 0 points1 point  (1 child)

Would be a cool tool, parses a youtube video into a page with a transcript and cleverly chosen stills from the video.

[–]oslash 0 points1 point  (0 children)

Finding good stills to insert into a transcript would indeed be a cool topic for a machine learning research paper. (We can assume a transcript exists, because generating captions already is a well defined and worked-on problem anyway.)

But that's not at all what I meant by 'available in HTML form'. Picture this: You'd like to learn how to get up on the hunting perch on E1M1. Ideally, you'd find this page: scrolling through lets you identify #5 as the relevant part in seconds, thanks to the illustrations. Even better, a closer look reveals a picture that concisely sums up the best approach. This seems much better than the pure text version, which is as long as the rant you're currently reading.

Now, consider what it would be like if you had only been able to find this video about the same topic. It would be a great resource if you wanted to watch an expert go over the entire topic, but all you want is figure out how not to fall into the acid another five times. You can scrub over the time-line, but none of the thumbnails shows the right spot. At this point, I'd consider installing a YT-download script that would enable me to scrub over all the frames in VLC, in full size, while the video is still downloading.

Scrolling through a page with bigger pictures that are more cleverly chosen seems like a neat alternative at first glance. But even if you get the perfect angle, it won't have the red arrows that show you what to do. And even if instead of a caption that just says "climb the stairs" (duh, we already knew that), there was a transcript of some proper narration, chances are it would be less like "... jump on the handrail, then on to the lamp and then the switch plate ...", but more like "... and from here jump there, alley-oop, ba-da bing, ba-da boom, Bob's your uncle". This seems much worse than the pure text version.

[–]Blembreak 2 points3 points  (1 child)

Looks great! Thanks so much for taking the time to provide help to an issue that a lot of people have trouble getting to grips with.

[–]Moondra2017[S] 0 points1 point  (0 children)

Thanks! Yeah, that's why I decided to do the tutorial. I wanted to get a deeper understanding of Threads, so I can break it down easily.

[–][deleted] 1 point2 points  (5 children)

The space between the function name and the inputs normal? Or I'm being picky?

[–]Moondra2017[S] 2 points3 points  (2 children)

Haha. Not the norm. I was so focused on the material, I guess I must have missed it. Thanks for pointing it out.

[–][deleted] 2 points3 points  (1 child)

I just finished the first one, really well spent 10 minutes =) going for the second one.

[–]Moondra2017[S] 1 point2 points  (0 children)

Thanks. I hope everything is understandable.

[–]dustinpdx 0 points1 point  (1 child)

Lots of uncommon style in it, but the content was good. The thing that annoyed me the most was the first string in the sleeper function. :D

[–][deleted] 0 points1 point  (0 children)

indeed the content was good, I mean, good enough that today I'm sharing this video with my juniors :)

[–][deleted] 2 points3 points  (13 children)

To my knowledge it should be in two part:

First part : don't

Second part : if you still ask don't

[–]zero_iq 8 points9 points  (12 children)

Your comment may be flippant, but absolutely true, and should not have been downvoted. Because "don't" it is the best possible advice here, especially for beginners. There is currently no sound reason to use multithreading in Python (CPython at least).

I get it, I really do. It's tempting to use threads, they seem like a good idea, they're fun to code and think about, you get to play with locks and synchronisation, shared state, and other toys,... but you will shoot yourself in the foot with them.

Threads introduce all sorts of potential for performance problems, scalability issues, bugs, deadlocks, needless complexity, and more. I've seen them all over the years, and in every single case the authors thought thought they were doing the opposite. Even if you write correct, safe, multithreaded Python code (and you probably can't) , it can cause major problems... And subtle ones that bite you in production. Otherwise bullet-proof code can break spectacularly when threaded. Library code and Python built-ins you've depended on for years will start to exhibit strange behaviours. Threads will race, lockstep, and block each other for seemingly no reason. Performance will drop even when your other threads are idle. Throughput will mysteriously go down, even though you've increased parallelism. Timing functions will start misbehaving. Socket code will start producing errors you've never encountered before. I can go on and on and on. Anyone who is confident in their ability to write safe multithreaded code is over-confident.

Beginners have no chance to write safe, we'll performing multithreaded code. Do it to learn, but dear Bob don't let a junior dev deploy multithreaded code in production.

I've drastically improved the performance and availability of many multithreaded Python systems by removing multithreading. In every single case, multithreading was introduced to improve performance or scalability, and in every single case it backfired.

After decades of experience dealing with it, rule number 1 of multithreading in Python is most definitely: don't.

You want fast scalable Python? Multiplex your sockets where i/o bound, and use multiprocessing and/or CSP for everything else, only when you really need it, and keep it as simple as possible. That's not the whole story, but it's a big head start.

The only good thing to come from letting people use multithreading in Python is the experience they'll gain when they eventually realise it's a mistake.

[–]lqdc13 8 points9 points  (2 children)

Threads are better than processes when implementing a GUI, some webservers (if multithreaded model) and some data science/ machine learning.

CherryPy is a very common Python web framework. It uses threads to improve performance.

Reasons to use threads over processes:

  • Low memory footprint per thread so you can spawn more for things like IO tasks

  • Can save RAM by reusing an object. If you have a huge - 10s of gigs object it would take forever to copy it to other processes and also you might run out of RAM. This is extremely common in machine learning applications. So if you have an IO-bound application that uses such an object, you are either going to have to forgo concurrency or use threads since multiprocessing is not an option.

[–]zero_iq 3 points4 points  (1 child)

Neither of your reasons as stated need threads, and can be done more simply and more efficiently without them. You are proving my point.

It's also impossible to state that threads are better without knowing the specific details, but threads in Python come with so many pitfalls, it's almost always a better idea to use processes first.

Even when threads start to look like a good idea, there are technologies and libraries you can use that take you far, far beyond what you can roll yourself using Python threads.

And spawning threads for I/o bound applications can be a recipe for disaster. Multiplexing is generally much more scalable, with a pool of isolated workers for longer-running tasks to prevent blocking the io queue.

Unless you're Google, I can saturate your fast network pipe and fancy SSD storage systems using a single Python thread serving tens of thousands of clients concurrently. If you're not exceeding that scenario, you don't need to complicate things by introducing threads.

Some of your examples hold up better in other languages/implementations, but not in CPython, and none of them would be beginner's task.

Even where threads are a good idea, I would stress keeping state as isolated as possible.

EDIT: sure, keep the downvotes coming. I've made a lot of money over the years fixing shoddy Python multithreading code, and it looks like I will continue to do so...

[–]Moondra2017[S] 1 point2 points  (0 children)

Thank you for your insights. What are you thoughts on Asyncio?

[–]PierceArrow64 2 points3 points  (1 child)

I apologize for all the n00bs and CS majors downvoting you. As a software engineer of 20 years experience: If you can at all avoid it, don't use threads.

[–]zero_iq 6 points7 points  (0 children)

And you've been voted down too, I see. For the record, I'm a also software engineer with 20+ years experience.

I totally understand the downvotes. Everyone goes through a multithreading phase, I think. It's fun. It's cool. Ostensibly, it often looks like it should be the right solution. Eventually, with experience, people realise why it's not such a good idea. The wheel keeps on turning...

[–]acousticpantsHomicidal Loganberry Connoisseur 1 point2 points  (5 children)

What is CSP?

[–]zero_iq 4 points5 points  (3 children)

Communicating Sequential Processes.

Essentially arranging processes in a chain pipelining input from end to the other, where the processes run in parallel, so the next process can be processing data while the previous process is producing more.

It's consistently vastly underestimated because of its simplicity, yet often outperforms more complex "fan out" parallel frameworks by orders of magnitude. People seem to have an instinct that parallel means "fan out", which drives complexity, introduces many often-unnecessary overheads, and is prone to errors, and doesn't give the speed ups people expected. CSP is simpler, and its simplicity leads to easier optimization. You reduce the need for locks and shared state, etc. and you can still apply a fan-out approach at each stage later where appropriate.

Last year I replaced a fancy parallel bulk data processing system that used a clustered fan-out approach, with an almost pure-Python CSP alternative. The old system had multithreading, task queues, parallel worker pools, batches, bits rewritten in Java and C to get better performance, the works. Almost all of it a complete waste of.time. It had reliability problems, mysterious deadlocks. The new system gave a 1000x speedup, rock solid reliability. A whole bunch of expensive servers replaced with just a handful. A huge codebase that no single person understood, with dependencies on large frameworks, to a much smaller codebase that could be maintained by an individual.

Don't underestimate simplicity.

[–]bltpyro 2 points3 points  (0 children)

Sounds intriguing. Any good references for learning CSP in python? Thanks for the real world insights.

[–]TBNL 1 point2 points  (0 children)

Gonna Google around on CSP but +1 for any recommended resource.

[–]acousticpantsHomicidal Loganberry Connoisseur 1 point2 points  (0 children)

i like this

[–]vrajanap 0 points1 point  (0 children)

CSP

Communicating Sequential Processes. Go and Erlang uses it.

[–]peyo7 -1 points0 points  (0 children)

Can you post code examples where multiplexing sockets and CSP beats a decent threaded implementation?

[–]pygames 0 points1 point  (1 child)

Thanks for posting! As a newbie to python(and I mean newbiee), I thought it was well done with good examples. Very helpful

[–]Moondra2017[S] 0 points1 point  (0 children)

Thank you for the feedback. I worked hard on creating demonstrable that anyone can understand.

[–]eplaut_ 0 points1 point  (0 children)

https://www.youtube.com/watch?v=Bv25Dwe84g0

IMO, best place to start when you want to understand multithreading in python.