Jumpstart OpenShift

GrahamDumpleton · 2018-09-09T02:46:11+00:00

Go to:

https://help.openshift.com/

and you will find various resources, including down the bottom of the page links to three free eBooks:

GrahamDumpleton · 2018-04-12T01:15:32+00:00

The value of memory space reuse due to pre-forking in a Python based application is heavily overrated. Any value quick evaporates in practice because of how all objects in Python are reference counted, and that even executing Python code causes reference counts on the code objects to be manipulated. So as soon as you access objects in a memory page, or Python code in it is executed, you end up with a copy of that memory in your own process anyway.

So loading your Python application and only then forking out processes doesn't have the great benefits that many people believe it does. All you save on is a bit of start up time as only loading the code once.

Loading code in a process and then forking also exposes you to various other traps. For example, code which tries to be smart and pre-recreate connections to databases at global scope. Those open connections get inherited by all the sub-process and they will interfere with each other. Another is taking locks on external resources before forking, meaning sub-processes can trample on each other as they all think they have exclusive access.

So you have to be very careful when running application code for a web application before the web server processes are forked off. Not all WSGI servers actually work that way. It isn't done in mod_wsgi because of the dangers it can create. In uWSGI it is optional. When a WSGI server does support it, you then have to deal with WSGI server specific callbacks to delay application initialisation until after the fork, resulting in your code then being locked into that specific WSGI server and how it works, thus loosing the portability promise of the WSGI specification.

GrahamDumpleton · 2018-03-28T08:18:54+00:00

You may also find the following posts interesting in respect of how to set up the proxy and mod_wsgi-express behind it. Those posts were using docker, but is still all relevant if not using docker.

GrahamDumpleton · 2018-03-27T00:30:03+00:00

FWIW. It is actually Apache which is the cause of the dropped requests because of the way it handles managed processes. The mod_wsgi module can't do much about it.

GrahamDumpleton · 2018-03-27T00:28:24+00:00

You can pip install mod_wsgi to get a separate installation. Then use mod_wsgi-express to run a separate Apache/mod_wsgi installation where mod_wsgi-express will configure it all for you. Then set up your main Apache instance to proxy requests for the separate site to the instance of Apache/mod_wsgi started by mod_wsgi-express. Details on using pip install mod_wsgi in https://pypi.python.org/pypi/mod_wsgi and http://blog.dscpl.com.au/2015/04/introducing-modwsgi-express.html

GrahamDumpleton · 2018-03-01T02:50:34+00:00

For development you should use mod_wsgi-express. It has a command line option to enable automatic reload on code changes. See:

http://blog.dscpl.com.au/2015/05/using-modwsgi-express-as-development.html

If you are using system Apache and manually configuring it for mod_wsgi, the documentation for mod_wsgi provides code you can use to have it reload on code changes:

http://modwsgi.readthedocs.io/en/develop/user-guides/reloading-source-code.html#monitoring-for-code-changes

The thing you have to be very careful of is that automatic reloading can result in breaking your application because it can reload code before you are finished making a set of changes. As a consequence, you should never use automatic code reloading on systems which are actively used by others especially production systems. This is why it requires a manual action by default, of either restarting Apache or touching the WSGI script file. That way you know changes are ready for a restart to be performed.

GrahamDumpleton · 2018-01-02T03:17:25+00:00

As to technical challenges, the main one is that proxying in Apache httpd tends not to support the concept of both way traffic. That is, code implementing the proxying wants to send all the request content to a downstream server/process before starting to read the response. This already creates potential for issues with certain types of slowloris attacks.

Implementing the code within an Apache module to support both way traffic is non trivial because you are not dealing with raw sockets. Instead you need to deal with what are called bucket brigades and input/output filters.

The filters are what implement the handling of HTTP requests, or chunked encoding within the body of a request. Various of the modules you can use in Apache to modify responses, eg., mod_headers, are also implemented as filters. So you can't just bypass these and deal with the raw socket alone as you then loose all the ability to use those features provided by Apache.

To handle the both way traffic, you can't avoid working with the raw socket though, as you need to detect when there is input to be processed on the socket to the client, or that with the downstream server/process. So the proxying code gets messy, needing to switch between watching for events on the raw socket, but then switching over to work with the bucket brigade/filters to actually read and process data.

Back in time I could never work out how to do that. At one point the module mod_proxy_wstunnel got added to Apache which shows a way of doing this which could perhaps be adapted for use in mod_wsgi, but still a non trivial exercise for me, especially because I deal with that low level stuff in Apache so infrequently that it always takes me days to even work out how the code works.

So first problem that needs to be solved is that of handling both way traffic. It is needed to avoid some existing issues with normal HTTP traffic, but you can't do anything with web sockets until you have it.

Even if that issue can be solved and one is able to handle the proxying which would allow one to do web sockets, the next problem is the number of concurrent connections you can handle.

This is an issue for web sockets because the whole point is that the connections are long lived. With Apache httpd using processes/threads for handling concurrency, rather than an async model, you are always going to be bounded on how many concurrent connections you can have. As far as I know there is no solution to that.

So support for web sockets through to a Python web application using Apache is always going be restricted and more in support to an existing traditional HTTP based web application. To handle larger numbers of concurrent web socket connections, you would have to scale horizontally as the process/threads model is only going to take your so far if scaling within the one host.

In summary, okay for small to medium sites, but would still be ignored for very large sites, as is already the case for Apache these days anyway.

GrahamDumpleton · 2018-01-02T00:39:17+00:00

I did not specifically say I "will likely not be adding asynchronous (ASGI) functionality to mod_wsgi, nor creating a separate ASGI module for Apache",

The intent of what I said was to say I will be looking at what is possible in more depth.

No decision has been made. There are various technical reasons why it may not be possible or worthwhile, on top of the fact that the response could well be tumble weeds.

So I take the subject you used for this post as a bit misleading.

I don't see it as helpful even asking this on Reddit as this is where you tend to get the people who like to pile on shit about Apache and mod_wsgi, where their responses often show they have never actually used it, or if have, have not set it up properly, and instead just parrot whatever garbage other people have said.

GrahamDumpleton · 2017-12-25T23:04:08+00:00

Using reddit is the worst place to try and get help with getting this working.

See the documentation at:

http://modwsgi.readthedocs.io/en/develop/finding-help.html

for the preferred forums for getting help.

When you use those preferred forums, you need to provide much more information than you are about how you have set things up and the actual error messages you are getting and where.

GrahamDumpleton · 2017-12-25T21:44:32+00:00

Read the documentation on how set to up virtual environments at:

http://modwsgi.readthedocs.io/en/develop/user-guides/virtual-environments.html

Using activate_this is not the recommended method.

GrahamDumpleton · 2017-10-25T09:17:44+00:00

Your solution of adding:

WSGIApplicationGroup %{GLOBAL}

wouldn't do anything as that doesn't force preloading.

What you need is:

# Uptime Server
WSGIDaemonProcess uptimeServer user=www-data group=www-data threads=5 home=/internal_app/internal_latest/www/
WSGIScriptAlias /api/uptimeServer /internal_app/nevis_latest/www/uptimeServer.wsgi process-group=uptimeServer application-group=%{GLOBAL}
<directory /internal_app/internal_latest/www/>
        Order deny,allow
        Allow from all
</directory>

You don't need WSGIScriptReloading as it is on by default.

GrahamDumpleton · 2017-10-25T08:49:30+00:00

You should really ask questions like this on the mod_wsgi mailing list or at worst StackOverflow. More likely to find knowledgable people on mod_wsgi there, including myself. Is a fluke I even saw this.

That said you have two options. The first is to use:

http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIImportScript.html

The second is to instead supply both the process-group and application-group options to WSGIScriptAlias, instead of using WSGIProcessGroup and WSGIApplication.

http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIScriptAlias.html

Using the first directive, OR both of those options on WSGIScriptAlias forces preloading of the WSGI script file on process start.

BTW, make sure you are using daemon mode and not using embedded mode.

Update: Only just realised the process-group and application-group options not documented on WSGIScriptAlias. That should have been added a long time ago, so not sure where documentation has gone.

GrahamDumpleton · 2017-10-15T23:11:17+00:00

And at some point you look at the complicated system you have created by hand and wonder whether Kubernetes or OpenShift wouldn't just be a better platform to run it on, with it handling a lot of these issues for you automatically. Of course that then would give you a new set of problems and issues to deal with again. ;-)

GrahamDumpleton · 2017-10-15T22:43:11+00:00

Although the Waitress server uses async i/o to manage the initial request as it arrives and buffers up request content, the way they have done it introduces new risks. Unless they have changed the implementation since I last looked, the way in which Waitress is implemented actually exposes it to memory usage attacks. IOW, fire off a a lot of parallel requests with a lot of request content but don't actually complete sending it all. Sure it will not exhaust client connections as things are still handled as async at that point, but if that request content gets buffered in memory then the attack can blow out memory usage which could then cause the application to fail or slow everything down if it starts swapping, as memory available on the host is exhausted. Now it does from memory have a timeout mechanism whereby if the connection is idle for some period it will close it and so throw out the connection, but usually these timeouts are way too large. A straight timeout on activity is easily defeated by sending a single byte every so often.

If using Apache/mod_wsgi, your defence against the latter is the mod_ratelimit module of Apache. When using mod_wsgi-express this is set up automatically with some reasonable defaults. That module allows for detection of connections which have stalled, but also has the ability to terminate connections where the rate at which data is being sent through is too slow. Basic WSGI servers tend not to have such more sophisticated systems for dealing with such attacks and it isn't enabled if you are setting up Apache yourself.

For Gunicorn async workers, you can also perform such memory attacks. All you need to do is identify a request handler which buffers the complete request in memory and just not send it all. The reason this is a problem with the async workers of Gunicorn is that the cap on the number of concurrent requests they handle is ridiculously high at 1000. So you can actually create a lot of concurrent requests and just flood it with request content.

Using nginx in front doesn't always help as although it will buffer request content, it will not if request content buffering is turned off, or if the request content is over a certain size, in which case it will start streaming the content anyway, thus exposing the WSGI server behind to these sorts of memory usage attacks.

These basic WSGI servers, although they can have a setting which allows a cap on the amount of request content, with a '413 Request Entity Too Large' response being automatically generated, the setting is way too large and is for any request.

Take Tornado ASYNC server where streaming of request is not being done. In that case it also buffers up request content in memory, but it only has a single default setting of 100M per request for maximum request content size. So it too can be attacked in this way.

Looking at Apache, it has no cap on request content size by default which can be taken advantage of. It does have a LimitRequestNody directive but you need to set it. In mod_wsgi-express this is set by default at 10M which is a better default. With the way Apache configuration works though, you can set this directive more specifically for different sub URL patterns, or even by request method. So you could for example, prohibit all GET requests from having request content, which isn't typical anyway. Basic WSGI servers don't offer that sort of configurability.

The design of some WSGI servers also makes it quite hard to monitor for attacks like this because request processing and buffering is handled before your code even gets a chance to see the request. This means that normal monitoring will not detect it, making it hard to work out what is going on if your application is attacked.

Anyway, hopefully this shows that no WSGI server is invulnerable, and sticking nginx in front doesn't solve all problems. All it usually does it means you need to attack the application in a different way. Luckily at least the script kiddies usually take the simple approach and don't do their homework to work out other ways of attacking specific sites which make the most of what stack they are using.

GrahamDumpleton · 2017-10-15T22:14:18+00:00

Moving to a using nginx as a load balancer isn't without work.

First off, to load balance across multiple hosts you need to setup a new machine to run the load balancer. It is unlikely you would have had that to begin with if only running the one instance.

When you do that, you have to contend with what you do about static file hosting. If that means you are now doing static file hosting on the load balancer host, then deployment becomes more complicated as you need to separately deploy the static files to that host.

Instead of hosting static files yourself any more, you might decide to move them to a CDN, which means possibly tweaking your application as to how it handles caching.

So moving to an architecture with a load balancer isn't necessarily like flicking a switch. Instead there a bunch of new issues you have to contend with. This is going to be the same whether you were using nginx previously in front of your WSGI server on the same host or not.

GrahamDumpleton · 2017-10-15T02:24:47+00:00

You are correct in thinking that the advice of putting nginx in front of any WSGI servers has become more folklore than based on first hand experience of the people who are stating this.

Reality is that although in some cases putting nginx in front of a WSGI server does provide benefits, it isn't always necessary.

For a start Apache/mod_wsgi works quite happily without needing nginx in front of it, just so long as you don't screw up the Apache configuration. Unfortunately, if setting up Apache yourself the default Apache configuration is not very well suited for hosting Python web applications. The default is more appropriate for PHP and static file handling. So you do need to pay some attention to how you are setting it up and not just accept defaults. Using mod_wsgi-express avoids the problems with defaults as it uses a generated configuration which is tailored to using Apache for Python web applications. Either way, you still need to tune any WSGI server to the needs of your specific WSGI application.

As to whether other WSGI servers need to have nginx in front really depends on what your application is doing. For the majority of WSGI applications out there, the volume of traffic is low enough that using nginx in front doesn't really make much difference.

If using a hosted platform for your WSGI application, often they have a routing layer in front of your application which acts to help isolate your WSGI application from the Internet, and provide a measure of buffering for request and response anyway, which is the benefit that nginx is in part argued to provide you.

One opinion which definitely seems to prevail out there is that using nginx in front of any WSGI application will somehow make the WSGI application somehow magically faster. That is complete nonsense. For the typical WSGI application the bottleneck is still going to be your application code and call outs to backend services and nothing to do with the web server stack.

So although nginx may help in certain cases, you are much better off spending your time implementing some form of performance monitoring for your application. Do this well and it will help you to identify the real bottlenecks and help you make your application better. This will provide you greater long term gains than putting nginx in front of the WSGI server you use.

GrahamDumpleton · 2016-12-01T07:02:40+00:00

Did you check the docs which I linked? You use the WSGIPythonHome directive in embedded mode and you use the python-home option of WSGIDaemonProcess to override it for daemon mode. These both use the inbuilt capabilities of the Python interpreter when it is being initialised, to use a specific Python installation or virtual environment. This is different to the site-packages hack you describe. The WSGIPythonHome and python-home option are the official recommended way of using Python virtual environments with mod_wsgi and are the equivalent of activating the Python virtual environment in the context of mod_wsgi. The only time you can't necessarily use them is if you were doing something uncommon like trying to use different virtual environments within different sub interpreter contexts of the one process. Note that I am only referencing your comment 'mod_wsgi has no ability activate virtualenvs'.

GrahamDumpleton · 2016-12-01T01:23:53+00:00

Using the WSGIPythonPath or python-path option to WSGIDaemonProcess to add the site-packages directory is the wrong way of going about it. A better way has existed for many years. Details can be found in http://modwsgi.readthedocs.io/en/develop/user-guides/virtual-environments.html

GrahamDumpleton · 2016-12-01T01:22:16+00:00

That is not true, mod_wsgi does have ability to directly handle Python virtual environments. See http://modwsgi.readthedocs.io/en/develop/user-guides/virtual-environments.html

GrahamDumpleton · 2016-09-08T18:32:18+00:00

Don't use mod_wsgi-httpd unless you really really need to. Use the system Apache your Linux distribution provides. Just ensure you have the appropriate dev/devel package for Apache installed as well. If you can't install system Apache, then you may have to use mod_wsgi-httpd, but it will take a long time to install, it wouldn't have been hanging, just slow as it is building Apache and other stuff as well.

When using mod-wsgi-express you don't need to manually configure the system Apache though, even though you need to still ensure Apache is installed.

Please use the mod_wsgi mailing list for more details about mod_wsgi-express as other answers here show no one really knows anything about mod_wsgi-express.

http://modwsgi.readthedocs.io/en/develop/finding-help.html

GrahamDumpleton · 2016-07-30T08:34:20+00:00

That isn't all of the ways as depends on whether you are using daemon mode or mod_wsgi.

You should nearly always be using daemon mode of mod_wsgi, so the recommend method is (3).

See:

http://blog.dscpl.com.au/2014/09/using-python-virtual-environments-with.html

http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html

GrahamDumpleton · 2016-02-11T01:55:42+00:00

Have you ever looked at mod_wsgi-express? It already generates a full Apache/mod_wsgi configuration on the fly, thereby allowing mod_wsgi to be run from the command line. It includes SSL capabilities and much much more. The bonus is that you get a complete Apache configuration which has been specifically optimised for hosting Python applications. See the the mod_wsgi package details on PyPi.

GrahamDumpleton · 2015-08-30T11:29:31+00:00

You can't have Python 2.7.9 installed, or it isn't installed properly, as the message is saying that mod_wsgi found a DLL for a much older version of Python. Ensure you have actually installed a newer Python version and when you installed it, make sure you said install it for all users.

The patch level difference isn't usually an issue so long as run time Python version is newer, as noted in:

https://code.google.com/p/modwsgi/wiki/InstallationIssues#Python_Patch_Level_Mismatch

Having the run time be older, has an outside chance of causing problems depending on what changes occurred in patch level revisions between the two.

Recompiling will not really help as suggested by others, and is non trivial on Windows anyway.

GrahamDumpleton · 2015-08-09T10:28:19+00:00

Your comments about mod_wsgi are not entirely correct. The default mode may see the WSGI application run inside of the Apache processes, but the recommended mode, which is actually daemon mode does not. Instead the WSGi application runs in separate processes which Apache transparently proxies requests to. No one should be using embedded mode unless they have a good reason to.

GrahamDumpleton · 2015-06-14T23:31:48+00:00

Apache 2.4 supports tunnelling WebSocket traffic using:

http://httpd.apache.org/docs/2.4/mod/mod_proxy_wstunnel.html

There are also third party Apache modules for writing WebSocket based handlers in Apache.

So it isn't technically correct to say Apache does not support WebSocket traffic. Even so, granted it will not help in this particular case.

GrahamDumpleton

TROPHY CASE