July 29, 2025 in Systems11 minutes
In my previous post on Linux sockets, there were several examples that referenced “syscalls” when working with sockets. I’d like to spend a little time exploring these more exhaustively - what they are, the syscalls you are most likely to run into when working with sockets, and the details of their usage given a few practical examples.
System calls (syscalls) are a common design pattern for operating systems to allow apps (running in userspace) to request things of and from the kernel (the operating system itself). Linux has a very well documented set of syscalls (though more are regularly added as new features are introduced). It is through a sub-set of this interface that an application is able to create and work with network sockets in Linux.
Fortunately, while Python is generally thought of as a fairly high-level language, its socket
package makes this pretty easy, and generally closely mirrors the underlying syscall interface. While some real-world use cases might prefer to use a more abstract framework that is built on top of these low-level primitives, it’s great for learning.
All this said, the Python API is still an abstraction layer, even if a relatively thin one; all of the behavior described in this post is technically implementation details of this library. Other languages/frameworks may do things differently. This is not meant to be a description of how these syscalls MUST be used - the documentation to which I’ll link multiple times is the best source for that. It’s just an instructive, educational example. There are definitely things that the Python
sockets
module does that are not strictly required by Linux.
A great tool for understanding the syscalls being made from a running process in Linux is strace
. You can use this to attach to a running process, but you can also just tell strace
to run a program and it will dump all of the syscalls it makes. The output is pretty configurable, and for our use case here, there’s a super handy filter using the -e
flag that lets us dump only the network
-relevant syscalls.
Let’s borrow the SOCK_STREAM
client script from the previous post:
Here’s a complete example, including output, of using strace
to dump the network-relevant syscalls being made by this script.
1mierdin@t-bug:~/socket-examples $ strace -e 'trace=%network,close' python3 sock_stream_client.py
2socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
3connect(3, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
4sendto(3, "Hello!", 6, 0, NULL, 0) = 6
5close(3) = 0
As mentioned above, the syscalls shown above generally match up with a line from our Python source code, making it pretty easy to understand. However, there are a lot of details shown in the output that aren’t explicitly obvious from our code, so let’s dig into a few things. We’ll go line-by-line through the full example above.
Unsurprisingly, the socket()
syscall is first:
1socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
You might be surprised that the second parameter isn’t simply SOCK_STREAM
as indicated in the script but rather SOCK_STREAM|SOCK_CLOEXEC
. This actually a bitwise OR of the values represented by SOCK_STREAM
and SOCK_CLOEXEC
, the latter of which enables close-on-exec functionality for the new file descriptor (a sensible security-related default).
IPPROTO_IP
actually represents a value of 0, so this means the default protocol for this domain + type will be used (TCP).Next, connect()
:
1connect(3, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
The second parameter to connect()
is actually just the representation strace
creates for the sockaddr
struct. Included you can see the address family, as well as the IP and port we passed. The third parameter, with a value of 16 tells the kernel the length of the passed sockaddr
struct as this can be variable.
Thankfully we don’t have to deal with these details when using Python’s
socket
; even though it’s still a relatively low-level / thin abstraction, there are a few details like this that are done for us. In this case we only had to pass the much simpler tuple containing a string and int.
Now that we’ve established the connection, we can send our payload with sendto()
1sendto(3, "Hello!", 6, 0, NULL, 0) = 6
You’ll notice a deviation here between the Python code which used send()
as the method name, and the actual syscall here: sendto()
. It turns out that send()
as implemented in libc
(which Python’s socket
uses under the hood) is really just a simpler wrapper on top of sendto()
, which is why we see it in the strace
output. TODO - but why?
There are a number of parameters passed here, some of which we saw in previous syscalls, but one that’s important is that NULL
is provided for the dest_addr
parameter. This parameter is normally used for unconnected sockets, but because we’ve called connect()
, we now have a connected socket to a particular destination, making this parameter unnecessary for this call.
Finally, close()
is called to tell the kernel we’re done with this file descriptor, and therefore, the socket it represents.
1close(3) = 0
Our system will recognize the socket is now closed, and will send a TCP FIN to the server to cleanly close the connection.
Now it’s time to look at what the server side of this connection is doing behind the scenes. You may remember this code from the previous post, but it’s been shortened/simplified:
We can also run the server through strace
as well, to see what syscalls are invoked both when the listening socket is created, as well as when a new connection from a client is accepted. Remember that the server has some initial startup work to do, but will eventually block until our client connects. The example below shows the syscall activity up to that point:
Some notes here:
socket()
is the same as it was for our client.bind()
, which tells the kernel which IP and port we want this socket to use. Because we aim to make this a listening socket, we must do this first.listen()
), we could have also called bind()
in that program as well. It’s less common, but there are some use cases which benefit from doing this1 2.bind()
first is required if we aim to call listen()
, we can pass wildcard-like values here. 0.0.0.0
for v4, or ::
for v6 means the kernel will bind to all interfaces, and port (0) means the kernel will pick from the ephemeral port range. Note that these can be set independently of each other.listen()
tells the kernel we want to use this socket to accept incoming connection requests. Only SOCK_SEQPACKET
and SOCK_STREAM
are connection-oriented and therefore are the only types which will allow us to call this. Trying to use another type like SOCK_DGRAM
will return EOPNOTSUPP
here.Finally, accept4()
is called. There is a lot to dig in here so I’m pausing to cover a few aspects of this call. First, we note that this is another deviation from our Python code - we called accept()
but the executed syscall is actually accept4()
. This is because while accept()
is aligned with the POSIX standard, Linux has an enhanced version of this called accept4()
which adds a flags
parameter. CPython prefers this if available so it can also pass SOCK_CLOEXEC
.
You also might notice that the output seems to stop halfway. This is because the call to accept4()
is actually blocking execution of the subsequent instructions (namely our while True
loop which calls recv()
to read data on an incoming connection) as we haven’t actually received a connection request from a client yet.
When we do connect via the client, the line completes:
1accept4(3, {sa_family=AF_INET, sin_port=htons(56494), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_CLOEXEC) = 4
This is some clever behavior from strace
and highlighted for me some interesting details about how this syscall actually works. At first glance you will note from the docs that prior to the incoming connection, the output paused just before the second parameter. Looking at the docs, this parameter is a pointer to a sockaddr
structure. So in order to call this function, you must first allocate a chunk of memory for this structure, and pass a pointer to it here - the kernel will take care of writing a new sockaddr
to that memory space when it receives a new connection (this is actually a pretty common pattern particularly for socket-related syscalls). Fortunately, cpython handles those details for us in this case.
Another important detail is that the return value for accept4()
is another file descriptor, but while the FD for accept4()
was 3, we get a new one - 4. This is because we’re actually talking about two distinct sockets:
Our server spawns a single passive socket, and then any time a connection is made, a new active socket is created. The strace
output above shows accept4()
using our passive socket’s file descriptor of 3, but returns the file descriptor 4 which points to the new active socket.
While the code in this example is simplified, it is usually a very good idea to spawn the remaining interactions with this second socket into its own thread, so that we can quickly call
accept4()
once again in our main thread to be able to process new incoming requests in parallel to this one. The server used in this post does not do this, and therefore won’t be able to accept a new connection until the prior one is closed.
Now that we have our active socket, we can see subsequent syscalls using it (fd=4
), instead of the passive socket (fd=3
). The first we’ll see is getsockname()
:
1getsockname(4, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, [128 => 16]) = 0
This first call to getsockname()
is actually a CPython implementation detail. In short, when the new file descriptor is returned from the call to accept()
, the internal socket
module code does this within the constructor for the new socket object, in order to confirm that the file descriptor is actually a valid socket (as FDs can point to just about anything in Linux) - if it isn’t, this would return ENOTSOCK
. Not really anything we need to care about as this is definitely internal to this particular implementation, but still kind of neat to learn about “accidentally”.
Another practical use case for this would be if we chose to
bind()
to some ephemeral port on either the client or server socket, we can usegetsockname()
afterwards to find out what the kernel chose.
Now, the meat of the interaction with our client can take place, by repeatedly calling recvfrom()
.
You’ll recall from our source code that we’re receiving in 4-byte chunks at a time. Each invocation of recvfrom()
returns the length of the data returned - first 4, then 2, and then 0. As I mentioned in my last post, a return value of 0 here is how Linux signifies that a connection-oriented socket (which SOCK_STREAM
certainly is) has been closed. This allows the break
statement in our own code to execute, causing the whole loop to repeat so that we can accept the next connection.
A more production-quality server must be more intelligent about this
accept()
loop logic - many applications actually disconnect server-side (HTTP being a primary example), or at the very least places timeouts on waiting for the client to provide more data. Otherwise, we’re depending on the client to close the connection gracefully, and if they don’t, we’ll lock up our server or cause a resource leak.
This isn’t an exhaustive list of socket-related syscalls, and of course there are plenty of variations possible when you consider differences in userspace implementation, other socket types, etc. This was meant to be pretty introductory to the most common ones you’ll see, and give you some tools for going further - try running strace
on your own clients/servers!
We’re sure to run into other socket-related syscalls in future blog posts on more specific topics.