Linux Sockets: Relevant Syscalls

July 29, 2025 in Systems11 minutes

In my previous post on Linux sockets, there were several examples that referenced “syscalls” when working with sockets. I’d like to spend a little time exploring these more exhaustively - what they are, the syscalls you are most likely to run into when working with sockets, and the details of their usage given a few practical examples.

What are Syscalls?

System calls (syscalls) are a common design pattern for operating systems to allow apps (running in userspace) to request things of and from the kernel (the operating system itself). Linux has a very well documented set of syscalls (though more are regularly added as new features are introduced). It is through a sub-set of this interface that an application is able to create and work with network sockets in Linux.

Fortunately, while Python is generally thought of as a fairly high-level language, its socket package makes this pretty easy, and generally closely mirrors the underlying syscall interface. While some real-world use cases might prefer to use a more abstract framework that is built on top of these low-level primitives, it’s great for learning.

All this said, the Python API is still an abstraction layer, even if a relatively thin one; all of the behavior described in this post is technically implementation details of this library. Other languages/frameworks may do things differently. This is not meant to be a description of how these syscalls MUST be used - the documentation to which I’ll link multiple times is the best source for that. It’s just an instructive, educational example. There are definitely things that the Python sockets module does that are not strictly required by Linux.

A great tool for understanding the syscalls being made from a running process in Linux is strace. You can use this to attach to a running process, but you can also just tell strace to run a program and it will dump all of the syscalls it makes. The output is pretty configurable, and for our use case here, there’s a super handy filter using the -e flag that lets us dump only the network-relevant syscalls.

Syscalls from SOCK_STREAM client

Let’s borrow the SOCK_STREAM client script from the previous post:

1import socket
2s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
3s.connect(("127.0.0.1", 8123))
4s.send(b"Hello!")
5s.close()

Here’s a complete example, including output, of using strace to dump the network-relevant syscalls being made by this script.

1mierdin@t-bug:~/socket-examples $ strace -e 'trace=%network,close' python3 sock_stream_client.py
2socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
3connect(3, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
4sendto(3, "Hello!", 6, 0, NULL, 0)      = 6
5close(3)                                = 0

As mentioned above, the syscalls shown above generally match up with a line from our Python source code, making it pretty easy to understand. However, there are a lot of details shown in the output that aren’t explicitly obvious from our code, so let’s dig into a few things. We’ll go line-by-line through the full example above.

Unsurprisingly, the socket() syscall is first:

1socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3

File Descriptors!

Before we get into the parameters, the return value (in this case, 3) warrants a little explanation. The return value from syscalls can vary wildly in meaning, but in this particular case, the value represents the file descriptor (FD) which was created for this socket.

In Linux, everything acts like a file, and sockets are no exception. As a result, socket() allocates a file descriptor, in the same way you’d get one if you’d opened a file like /tmp/file.txt with the open() syscall.

You might be surprised that the second parameter isn’t simply SOCK_STREAM as indicated in the script but rather SOCK_STREAM|SOCK_CLOEXEC. This actually a bitwise OR of the values represented by SOCK_STREAM and SOCK_CLOEXEC, the latter of which enables close-on-exec functionality for the new file descriptor (a sensible security-related default).

A little confusing, especially if you just read my last post, but IPPROTO_IP actually represents a value of 0, so this means the default protocol for this domain + type will be used (TCP).
The return value, 3, is the file descriptor that can be used for further interaction with this socket. You see this in all subsequent syscalls as the first parameter.

Next, connect():

1connect(3, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, 16) = 0

The second parameter to connect() is actually just the representation strace creates for the sockaddr struct. Included you can see the address family, as well as the IP and port we passed. The third parameter, with a value of 16 tells the kernel the length of the passed sockaddr struct as this can be variable.

Thankfully we don’t have to deal with these details when using Python’s socket; even though it’s still a relatively low-level / thin abstraction, there are a few details like this that are done for us. In this case we only had to pass the much simpler tuple containing a string and int.

Now that we’ve established the connection, we can send our payload with sendto()

1sendto(3, "Hello!", 6, 0, NULL, 0)      = 6

You’ll notice a deviation here between the Python code which used send() as the method name, and the actual syscall here: sendto(). It turns out that send() as implemented in libc (which Python’s socket uses under the hood) is really just a simpler wrapper on top of sendto(), which is why we see it in the strace output. TODO - but why?

There are a number of parameters passed here, some of which we saw in previous syscalls, but one that’s important is that NULL is provided for the dest_addr parameter. This parameter is normally used for unconnected sockets, but because we’ve called connect(), we now have a connected socket to a particular destination, making this parameter unnecessary for this call.

Finally, close() is called to tell the kernel we’re done with this file descriptor, and therefore, the socket it represents.

1close(3)                                = 0

Our system will recognize the socket is now closed, and will send a TCP FIN to the server to cleanly close the connection.

Syscalls from SOCK_STREAM server

Now it’s time to look at what the server side of this connection is doing behind the scenes. You may remember this code from the previous post, but it’s been shortened/simplified:

 1import socket
 2s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 3s.bind(("127.0.0.1", 8123))
 4s.listen()
 5while True:
 6    (conn, address) = s.accept()
 7    while True:
 8        data = conn.recv(4)
 9        if not data:
10            break

We can also run the server through strace as well, to see what syscalls are invoked both when the listening socket is created, as well as when a new connection from a client is accepted. Remember that the server has some initial startup work to do, but will eventually block until our client connects. The example below shows the syscall activity up to that point:

1socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
2bind(3, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
3listen(3, 128)                          = 0
4accept4(3,

Some notes here:

The call to socket() is the same as it was for our client.
We then call bind(), which tells the kernel which IP and port we want this socket to use. Because we aim to make this a listening socket, we must do this first.
- Though optional for our client socket (as we didn’t call listen()), we could have also called bind() in that program as well. It’s less common, but there are some use cases which benefit from doing this¹ ².
- Though calling bind() first is required if we aim to call listen(), we can pass wildcard-like values here. 0.0.0.0 for v4, or :: for v6 means the kernel will bind to all interfaces, and port (0) means the kernel will pick from the ephemeral port range. Note that these can be set independently of each other.
listen() tells the kernel we want to use this socket to accept incoming connection requests. Only SOCK_SEQPACKET and SOCK_STREAM are connection-oriented and therefore are the only types which will allow us to call this. Trying to use another type like SOCK_DGRAM will return EOPNOTSUPP here.

Finally, accept4() is called. There is a lot to dig in here so I’m pausing to cover a few aspects of this call. First, we note that this is another deviation from our Python code - we called accept() but the executed syscall is actually accept4(). This is because while accept() is aligned with the POSIX standard, Linux has an enhanced version of this called accept4() which adds a flags parameter. CPython prefers this if available so it can also pass SOCK_CLOEXEC.

You also might notice that the output seems to stop halfway. This is because the call to accept4() is actually blocking execution of the subsequent instructions (namely our while True loop which calls recv() to read data on an incoming connection) as we haven’t actually received a connection request from a client yet.

When we do connect via the client, the line completes:

1accept4(3, {sa_family=AF_INET, sin_port=htons(56494), sin_addr=inet_addr("127.0.0.1")}, [16], SOCK_CLOEXEC) = 4

This is some clever behavior from strace and highlighted for me some interesting details about how this syscall actually works. At first glance you will note from the docs that prior to the incoming connection, the output paused just before the second parameter. Looking at the docs, this parameter is a pointer to a sockaddr structure. So in order to call this function, you must first allocate a chunk of memory for this structure, and pass a pointer to it here - the kernel will take care of writing a new sockaddr to that memory space when it receives a new connection (this is actually a pretty common pattern particularly for socket-related syscalls). Fortunately, cpython handles those details for us in this case.

Another important detail is that the return value for accept4() is another file descriptor, but while the FD for accept4() was 3, we get a new one - 4. This is because we’re actually talking about two distinct sockets:

Passive (listening) socket - this is the socket which is sitting and waiting for new connections to come in. In our example, this socket should be open for the lifetime of our program.
Active (connected) socket - this is a socket which is actively connected to a peer. This means that if either side closes the connection, this socket and its file descriptor will be closed, even while our server is still running.

Our server spawns a single passive socket, and then any time a connection is made, a new active socket is created. The strace output above shows accept4() using our passive socket’s file descriptor of 3, but returns the file descriptor 4 which points to the new active socket.

Don’t block your sockets!

For production use cases, it is usually a very good idea to spawn the remaining interactions with this second socket into its own thread, so that we can quickly call accept4() once again in our main thread to be able to process new incoming requests in parallel to this one.

Now that we have our active socket, we can see subsequent syscalls using it (fd=4), instead of the passive socket (fd=3). The first we’ll see is getsockname():

1getsockname(4, {sa_family=AF_INET, sin_port=htons(8123), sin_addr=inet_addr("127.0.0.1")}, [128 => 16]) = 0

This first call to getsockname() is actually a CPython implementation detail. In short, when the new file descriptor is returned from the call to accept(), the internal socket module code does this within the constructor for the new socket object, in order to confirm that the file descriptor is actually a valid socket (as FDs can point to just about anything in Linux) - if it isn’t, this would return ENOTSOCK. Not really anything we need to care about as this is definitely internal to this particular implementation, but still kind of neat to learn about “accidentally”.

Another practical use case for this would be if we chose to bind() to some ephemeral port on either the client or server socket, we can use getsockname() afterwards to find out what the kernel chose.

Now, the meat of the interaction with our client can take place, by repeatedly calling recvfrom().

1recvfrom(4, "Hell", 4, 0, NULL, NULL)   = 4
2recvfrom(4, "o!", 4, 0, NULL, NULL)     = 2
3recvfrom(4, "", 4, 0, NULL, NULL)       = 0

You’ll recall from our source code that we’re receiving in 4-byte chunks at a time. Each invocation of recvfrom() returns the length of the data returned - first 4, then 2, and then 0. As I mentioned in my last post, a return value of 0 here is how Linux signifies that a connection-oriented socket (which SOCK_STREAM certainly is) has been closed. This allows the break statement in our own code to execute, causing the whole loop to repeat so that we can accept the next connection.

Use Idle Timeouts!

Another note in the “production quality” subject - recv() or recvfrom() will block here until our connection closes, or the client performs another send(). This is usually a bad idea - a better option is to build some idle timeout logic in so that if the server is expecting the client to keep sending data but it stops for some reason in a way that doesn’t fully close the connection, the server can close it on its own.

Conclusion

This isn’t an exhaustive list of socket-related syscalls, and of course there are plenty of variations possible when you consider differences in userspace implementation, other socket types, etc. This was meant to be pretty introductory to the most common ones you’ll see, and give you some tools for going further - try running strace on your own clients/servers!

We’re sure to run into other socket-related syscalls in future blog posts on more specific topics.

Matt Oswalt

Title here

Linux Sockets: Relevant Syscalls

What are Syscalls?

Syscalls from SOCK_STREAM client

Syscalls from SOCK_STREAM server

Conclusion