Lets Solve The Gunicorn Problem
Preface
This blog post is targeted at backend developers who are already doing backend in python. If you’re new to backend development, and are looking at python, first of all, DON'T
. Please walk away to literally anything else.
Python is an amazing language for dynamic linking C
libraries to make it easy for science folks to work with data, ML and all that. But it doesn’t really have any concurrency considerations to even be called an afterthought. If you don’t want to do static languages, just go with javascript.
I’ve personally done python backend for half a decade, so this is coming from personal experience.
But it’s not all bad. As I said, python is great at abstracting over dll libraries. This solves things, especially recently thanks to amazing tooling to help build python abstractions with rust or zig
Intro
Cool. So what’s gunicorn? If your background is anything non python, be it node js or go, the idea of having to run a seperate server just to run your code on the server, apart from the actual reverse proxy web server may be alien to you, and that makes sense.
Sercers like gunicorn
or uvicorn
solve the following problem
Web traffic, usually involves calling handler functions, or API business logic for each user request. These are independent processing or queries, so we need to be able to concurrently handle these requests.
Before containers and kubernetes, the way things usually worked is you would have a server, and you would run monolithic applications as linux services. These are processes running on the server. Since python doesn’t have good threading model, the only way to actually achive concurrency was to have multiple worker processes. Think of it as having multiple pods in today’s terms. Gunicorn does that.
It supports various worker models, and spins up proceses and threads as per configuration, and route requests to these workers to achieve concurrency. In a way, gunicorn/uvicorn
was way ahead of it’s time. It’s like having pod autoscaler, as a vague kubernetes analogy.
What happens today (in python)
What we talked about so far was a totally valid approach, but thanks to microservices, most folks now have scaled to hundreds of microservice containers, which can be great. But then if you use the same process mangers that were designed for running on bare metal servers, things go crazy.
If you have a uvicorn pod with 4 workers, that’s just not right. Imagine your autoscaler in kubernetes detects that there’s not enough resources, and scales by adding a replica.. Now you just blocked a gigabyte of memory for no reason.
Here are a blog post going into more details on this one.
What should happen
If you look at most modern languages that come with green threads built in, either in the standard library (go
) or through defacto default liraries (rust
), they provide robust concurrency models
Each of these containers are single processes, that do take advantage of the available compute through a combination of os threads and green threads.
So what are we building here?
I’ve recently posted in my blog digging through HTTP and building a basic web framework on top of it. Here’s the link. At the end of this, we had a small web framework, that used tokio for concurrency to spin up green threads for each request. There’s pyo3.. so why not put two and two together to bring that to python?
Previous attempts
One way would be to start a rust server that loads a shared mutex of route map to python functions, that can be invoked when the route is triggered.
Here’s a simple codebase I wrote some time back, called axumapi. Yes, the end result is a fast, liteweight http framework, but it would take a lot of effort and time to make things like this feature complete.
There are projects like robyn, which have gotten pretty far.
But the problem with python developers in general is they are shielded from what’s actually happening to such an extent, that they end up becoming very dependent on their framework of choice, and cannot even image using something else. so we need something that can be a drop-in replacement to the process manager/ server instead
So what’s wrong with gunicorn?
Gunicorn is great, until it isn’t. Why? cuz you don’t know how it works
I can’t emphasize enough how many times I’ve faced unpredictable bugs due to weird edge cases happening in gunicorn, or uvicorn. It is open source and we can always go through the source to figure out what’s happening, but that’s only if there’s a repeatable traceback to follow.
A process manager is a pretty tricky problem, and can cause a few hidden cases due to some dependencies or something odd happening in your api handlers.
Let’s take a look at what’s available
Here are a few one liner descriptions of the various worker models gunicorn provides
Faiw warning, these are very opinionated, if you don’t agree with these, that’s okay
- gunicorn sync: runs prefork processes
not useful in containers
- gunicorn gthread: uses python thread pool
good for syncronous wsgi frameworks like django ✅ If you’re in this camp, you are good until your rps requirements are low enough, or can scale compute vertically
- gunicorn gevent: uses gevent, the library that implements green threads to python
the right apprach for containers, but needs code changes and monkey patching
- gunicorn uvicorn: uses uvicorn workers, which uses uvloop as the asyncio eventloop
great for async frameworks like fastapi ✅ If you are in this camp, you’re good. Async starlette plus uvicorn is pretty good for most cloud native use cases
What we’re gonna build is a simplified form of something called granian, which provides a wsgi/asgi interface built in rust.
What makes this possible is something called the “web server gateway interface”
So what’s WSGI?
There was a need for a standard interface so that different implementations of process managers and servers could work properly with all python web frameworks, say django, flask, fastapi, etc
It took quite a bit of prompts to get gpt to simplify it, that’s how convoluted the various implementation and documentation are, but at it’s core, its a very nice interface.
A wsgi application is basically a Callable
with two arguments
1
2
3
def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/plain')])
return [b"Hello, WSGI!"]
- The
environ
dictionary contains various metadata information like request path, headers, etc. start_response
is a function takes in a shared reference to status and headers, that the frameworks are then going to write to.
It’s a little tricky, but you’ll get the hang of it once you see the code
Let’s start coding then
We need a python library with a serve command, let’s start by creating a new pyo3
project with maturin
Refer to my pyo3 blog for more details
Depencencies
I’ll start by adding our dependencies, hyper
, tokio
and pyo3
, along with a few tracing crates
1
2
3
4
5
6
7
8
9
10
hyper = { version = "0.14.28", features = ["full", "server"] }
pyo3 = { version = "0.23.4", features = ["extension-module"] }
tokio = { version = "1.43.0", features = ["full"] }
tracing = "0.1.40"
tracing-opentelemetry = "0.27.0"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
opentelemetry = "0.26.0"
opentelemetry-otlp = { version = "0.26.0", features = ["default", "tracing"] }
opentelemetry_sdk = { version = "0.26.0", features = ["rt-tokio"] }
tracing-test = "0.2.5"
Cli command
Let’s start by defining a start function in lib.rs
. The actual function will be async, so we need to start the tokio runtime to be able to call async functions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#[pyfunction]
fn start(py: Python, path: &str, port: u16) -> PyResult<()> {
py.allow_threads(||{
tokio::task::block_in_place(move || {
let rt = Runtime::new().expect("failed");
rt.block_on(async {
serve(&path, port).await.unwrap();
});
});
Ok(())
})
}
#[pymodule]
fn servers(m: &Bound<'_, PyModule>) -> PyResult<()> {
tracing_subscriber::fmt::init();
m.add_function(wrap_pyfunction!(start, m)?)?;
Ok(())
}
The way pyo3 works, is it includes the compiled .so
shared library in the wheel file, along with a package with the same name.
We also need a command the users can run once they install the library. We can do that by overriding the __init__.py
to include a serve
function, and include it as an entrypoint in pyproject.toml
1
2
3
4
5
6
7
8
9
10
11
12
13
from .servers import start
from pathlib import Path
import argparse
import sys
def serve():
sys.path.insert(0, str(Path.cwd()))
parser = argparse.ArgumentParser(description="Start the server.")
parser.add_argument("path", type=str, help="Path to the server directory")
parser.add_argument("port", type=int, help="Path to the server directory")
args = parser.parse_args()
sys.exit(start(args.path, args.port))
We need to import the wsgi application, but the thing is our script is going to be placed in the bin/
directory within the python virtual environment, that’s what the sys.path
insert is for.
You could ofcourse add more arguments. Keep it simple though, or we’d end up with the same thing we’re trying to replace xD
Then we add it to pyproject.toml
for including it in our wheel
1
2
[project.scripts]
serve-rs = "servers:serve"
Server
We’ll soon define a wsgi application, for handing the requests. Before that, let’s define our hyper
server
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
use crate::pkg::wsgi::WSGIApp;
pub async fn serve(path: &str, port: u16) -> PyResult<()>{
let (wsgi_module, wsgi_app) = if let Some((module, app)) = path.split_once(':') {
(module, app)
} else {
return Err(PyValueError::new_err("Invalid path format"));
};
let app = Arc::new(Python::with_gil(|py|{
WSGIApp::new(py, wsgi_module, wsgi_app)
})?);
let addr = SocketAddr::from(([127, 0, 0, 1], port));
let make_svc = make_service_fn(move |_| {
let app = app.clone();
async {
Ok::<_, Infallible>(service_fn(move |req| {
let app = app.clone();
async move {
app.handle_request(req).await
}
}))
}
});
println!("WSGI Server running at http://{}", addr);
let server = Server::bind(&addr).serve(make_svc);
tokio::select! {
_ = server => {},
_ = signal::ctrl_c() => {}
}
Ok(())
}
Here’s the TLDR
- take our wsgi path as input
- load the wsgi module in an Arc, cuz it has to be shared across threads
- create a hypr server listening at the configured port, using
make_service_fn
- spawn the server in a new
tokio
thread, while also handling thectrl+c
signal
Let’s now start the wsgi module
1
2
3
4
pub struct WSGIApp {
app: Arc<Py<PyAny>>,
port: u16
}
The wsgiapp stuct stores the wsgi application
callable, i.e. Py<PyAny>
1
2
3
4
5
6
7
impl WSGIApp {
pub fn new(py: Python, module: &str, app: &str, port: u16) -> PyResult<Self> {
let module = py.import(module)?;
let app = Arc::new(module.getattr(app)?.into_pyobject(py)?.into());
Ok(Self { app, port })
}
}
the new
function loads the app and store it as an Arc
Request handling
We need to build the environ
dictionary for the input
headers
1
2
3
4
5
6
7
8
let headers: HashMap<String, String> = req.headers()
.iter()
.map(|(k, v)| {
let key = format!("HTTP_{}", k.as_str().replace("-", "_").to_uppercase());
let value = v.to_str().unwrap_or("").to_string();
(key, value)
})
.collect();
Reads the hyper request headers, which is an iterator to create a hashmap with underscored keys in upper case
body
1
2
3
4
let body_bytes = hyper::body::to_bytes(req.into_body())
.await
.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(format!("{}", e)))?
.to_vec();
Using the to_bytes
utility from hyper to create bytes, and convert to a Vec<u8>
Building the environ
We now need to start a python gil to build the environ PyDict
, and call the app callable with the environ and the start_response
function, more on that later
Also, we need to spawn a new thread for handing each request to gain fearless concurrency, as they say it
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
let (status, response_headers, body) = tokio::task::spawn_blocking(move || {
Python::with_gil(|py| -> PyResult<(u16, Vec<(String, String)>, Vec<u8>)> {
let environ = PyDict::new(py);
for (k, v) in headers.into_iter(){
environ.set_item(k.as_str().replace("-", "_").to_uppercase(), v.to_string())?;
}
environ.set_item("SERVER_NAME", "")?;
environ.set_item("SERVER_PORT", port)?;
environ.set_item("HTTP_HOST", "localhost")?;
environ.set_item("PATH_INFO", path)?;
environ.set_item("REQUEST_METHOD", method)?;
let py_body = PyBytes::new(py, &body_bytes);
let io = py.import("io")?;
let wsgi_input = io.getattr("BytesIO")?.call1((py_body,))?;
environ.set_item("wsgi.input", wsgi_input)?;
environ.set_item("wsgi.version", (1, 0))?;
environ.set_item("wsgi.errors", py.None())?;
tracing::debug!("prepared environ: {:?}", environ);
let wsgi_response = Py::new(py, WsgiResponse::new())?;
let start_response = wsgi_response.getattr(py, "start_response")?;
let res = app.call1(py, (environ, start_response, ))?;
tracing::info!("called Python WSGI function");
let status_code = wsgi_response
.getattr(py, "get_status")?
.call0(py)?
.extract::<String>(py)?
.split_whitespace()
.next()
.and_then(|s| s.parse::<u16>().ok())
.unwrap_or_default();
tracing::info!("status code: {}", &status_code);
tracing::info!("res: {:?}", &res);
let response_bytes: Vec<u8> = res
.getattr(py, "content")?
.extract::<Vec<u8>>(py)?;
Ok((status_code, vec![], response_bytes))
})
}).await.map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(e.to_string()))??;
Here’s a quick description for each keys in our environ dictionary=
SERVER_NAME → Server hostname SERVER_PORT → Server port HTTP_HOST → Request host PATH_INFO → Requested URL path. REQUEST_METHOD → HTTP method wsgi.input → Request body as a BytesIO stream. wsgi.version → WSGI version ((1, 0)). wsgi.errors → Error stream (set to None).
We then proceed to extract the reponse body and status code from result of the python invocation
Finally, we build the hyper response and send it back to the client
1
2
3
4
tracing::info!("{:?}| {:?} | {:?}", status, response_headers, body);
let mut builder = Response::builder().status(status);
Ok(builder.body(Body::from(body)).unwrap())
}
Response
We create a response object with shared mutices for status and headers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#[pyclass]
pub struct WsgiResponse {
status: Mutex<Option<String>>,
headers: Mutex<Vec<(String, String)>>,
}
#[pymethods]
impl WsgiResponse {
#[new]
pub fn new() -> Self {
WsgiResponse {
status: Mutex::new(None),
headers: Mutex::new(Vec::new()),
}
}
fn start_response(&self, status: String, headers: Vec<(String, String)>) {
let mut status_lock = self.status.lock().unwrap();
let mut headers_lock = self.headers.lock().unwrap();
*status_lock = Some(status);
*headers_lock = headers;
}
}
Thanks to the pyclass
/pymethods
macro, this behaves like just another python function when invoked by our wsgi application, i.e. the framework like django when we pass it in the handle request function
1
2
let wsgi_response = Py::new(py, WsgiResponse::new())?;
let start_response = wsgi_response.getattr(py, "start_response")?;
Conclusion
That’s it, now we compile and run it. Or you can download the library from pip. Here are the docs
Gunicorn does its job well, but like any abstraction, it’s easy to use without truly understanding what’s happening under the hood. Digging into its internals made me realize that a lot of its complexity comes from solving problems I don’t always need. So I built serve-rs—a simple, transparent alternative that does just enough without the extra baggage. The real takeaway? Whether you use Gunicorn, serve-rs, or something else entirely, understanding how things work makes you a better engineer—and sometimes, building your own tool is the best way to do that. 🚀