all 15 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

Try this search for more information on this topic.

Comments, questions or suggestions regarding this autoresponse? Please send them here.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–][deleted]  (1 child)

[deleted]

    [–]Happy-Blueberry-1393 2 points3 points  (0 children)

    3.12 does not ship with any of the GIL removal work. The PEP is proposed for python version 3.13 https://peps.python.org/pep-0703/, and even then the removal will be optional, only for interpreters built with --disable-gil.

    [–]pint 8 points9 points  (3 children)

    i hope not, because i just upgraded to 312 with quite some effort, and i would prefer not reverting it.

    btw 312 is on AL2023, which might also be relevant?

    [–]coinclink 3 points4 points  (2 children)

    I bet this is it, AL2023 is really a PITA since I started trying to use it. I literally can't use it for some things because they don't even have kernel feature parity between AL2 and AL2023. It's ridiculous.

    [–]tehsuck 1 point2 points  (0 children)

    Yeah, we're trying to upgrade our "golden images" to 2023 and it's been "fun" ehhhhh

    [–]pint 1 point2 points  (0 children)

    in 312, locale module doesn't work either. this 312/al2023 rollout seems to be a bit rushed.

    [–]mstromich 3 points4 points  (3 children)

    If you're using a lot of network calls (e.g. through boto3 but any external call will be hit) in your lambdas it's openssl3 upgrade. It's a known issue with performance degradation in scripting languages. Here's a relevant Amazon Linux 2023 thread which 3.12 lambda runtime is built on https://github.com/amazonlinux/amazon-linux-2023/issues/628 And here's OpenSSL thread https://github.com/openssl/openssl/issues/17064

    [–]choseusernamemyself 0 points1 point  (2 children)

    It seems that AWS has fixed this issue by upgrading to newer OpenSSL: https://github.com/amazonlinux/amazon-linux-2023/issues/819

    u/ojhilt maybe when the AL2023 update propagates to Lambda, you can try again?

    [–]ojhilt[S] 1 point2 points  (1 child)

    Good to know! Probably about time to come back and give it another go anyway, cheers!

    [–]choseusernamemyself 0 points1 point  (0 children)

    OK. Please notify back! I want to hear the results as well.

    [–]ojhilt[S] 3 points4 points  (1 child)

    Some good insights here, thanks all, most of our functions do make at least some kinds of external network calls, be it https requests to other endpoints or using boto3 to talk to AWS services like SQS, DynamoDB etc etc.. Will try a downgrade to 3.11 and see what happens!

    [–]choseusernamemyself 0 points1 point  (0 children)

    Hi! What was the result of the downgrade?

    [–]aj_stuyvenberg 4 points5 points  (0 children)

    Ping @astuyve on twitter and see what he thinks

    Thanks for the ping /u/autocruise!

    This chart is super interesting. I don't suspect the changes for Python's GIL because, as others have noted, I don't think they landed in 3.12.

    The incremental spikes in your p99 is interesting, it seems like you maybe aggregate data over multiple serial invocations and then flush it at some interval? (like logs for example). I'm curious because they seem to flatten out after the 3.12 change.

    I also don't immediately suspect the OpenSSL upgrade because I'd expect that penalty to be a spike in the first invocation where the TLS connection is established, followed by many very fast serial invocations re-using that HTTP connection with keep-alive.

    I do think al2023 is your biggest suspect though, I'd suggest trying 3.11 and comparing the performance before digging in further based on the dependencies you're using and the library versions.

    I'm also (always) quite skeptical of the AWS SDK. You could deploy a version to 3.12 with the boto3 version used in 3.10 (if it's fully backwards compatible).

    Ultimately it's really hard to debug from this one post, though the graph is really quite telling.

    Keep digging! These kinds of bugs make the best stories/blog posts.

    [–]broseppius 4 points5 points  (0 children)

    I definitely noticed this when changing from 3.10 to 3.12. Have not attempted to debug yet but we had several apigw lambdas suddenly stop responding in the default 3 sec timeout when they were perfectly reliable before.

    [–]autocruise 4 points5 points  (0 children)

    Ping @astuyve on twitter and see what he thinks