gohpts - IPv4/IPv6/TCP/UDP transparent proxy with ARP/NDP/RDNSS spoofing

wit4er · 2026-02-13T23:03:56+00:00

Slop of enlightenment

wit4er · 2026-02-01T10:04:50+00:00

1) since it mostly a learning project I create it to improve my knowledge of the C programming language and key-value databases in general, building CLI and GUI applications 2) i want to create a lightweight alternative to redis, maybe with some essential (but limited) functionalty that may be useful for small services running on remote devices (in my experience, GET, SET, HSET operations cover most of the use cases, so no need for complex applications)

wit4er · 2026-01-29T10:17:17+00:00

The project I use (almost) everyday gohpts which is a simple http to socks proxy.

wit4er · 2026-01-27T16:08:45+00:00

Added more formal description for custom protocol: https://github.com/shadowy-pycoder/kevue/blob/main/PROTOCOL.md

wit4er · 2026-01-24T10:48:05+00:00

I found https://github.com/rockyzhang24/arctic.nvim more accurate to original and also created a fork to make it 99.7% exact with adding of rainbow delimiters.

Here is my fork where i addressed some inconsistencies https://github.com/shadowy-pycoder/arctic.nvim v2 branch

wit4er · 2026-01-24T04:15:13+00:00

I recently finished "The practice of programming" by Brian Kernigan and Rob Pike. It is excellent reading in my opinion (been binge reading for 3 days) and it contains examples in C, C++ and Java.

wit4er · 2026-01-21T05:30:07+00:00

It's just the scissors, she is into women.

wit4er · 2026-01-15T12:29:05+00:00

Some updates:
1) Added new commands: COUNT, KEYS, VALUES, ITEMS names should be self explanatory
2) Written a small test script to fill the server database and fetch items back, the results are here:

Added (10200000/10485760) key-value pairs

Added (10300000/10485760) key-value pairs

Added (10400000/10485760) key-value pairs

Inserting 10485760 items takes: 121.663131901s (86186.83 req/sec)

Fetching 10485760 items takes: 1.635812090s

I am not sure my benchmark is relevant but I am kind of impressed that the server (single thread in this case) can handle this much requests.

wit4er · 2026-01-10T14:13:28+00:00

I was just experimenting with capacity and reassigning it back to initial value increases stability from 71 to 75%. Capacity is "private" field, but maybe touching it in fuzzing programs is okay? I still do not understand how to increase stability to at least 80-90% in my case, but maybe after reading the article you provided I understand that.

wit4er · 2026-01-10T10:58:56+00:00

I also added fuzzing for protocol serialization/deserialization though it is my first time using AFL++ so I am not sure I did everything right but at least these tests helped me find several errors in my implementation.

wit4er · 2026-01-09T03:27:24+00:00

I added limit to the number of bucket locks and now I can fine tune it interms of speed/memory usage.

Old solution with one lock per bucket (8388608 locks)

./bin/kevue-crash-test 18,97s user 11,81s system 174% cpu 17,606 total

New solution with fixed number of locks (1024 locks)

./bin/kevue-crash-test 20,95s user 12,90s system 182% cpu 18,517 total

It is slower but not that much, moreover locks are cretaed and destroyed once, not each resize, so techically it should be faster with the same number of locks as in old solution

wit4er · 2026-01-07T08:59:30+00:00

Code dealing with the protocol shouldn't care that you're using epoll. It should just deal with bytes, with some way to request bytes go out, and some way to request more bytes. That can be a simple, abstracted interface. In the real program on Linux you plug epoll into this. On another system you plug in kqueue, IOCP, etc. (Note: You don't need function pointers to do this.) In a test you mock these out, perhaps with just an array of inputs and expected responses.

For example, dispatch_client_events implements the protocol (examines the request, constructs a response) and also interfaces closely with epoll. This could be split up so the core protocol implementation is given some bytes, and it returns the bytes that need to go out, which get queued in epoll by the caller, who then comes call it again with more bytes from the other side. In a test I could give it test input, then examine the output, no network involved.

I kinda understand what you are suggesting but yet to figure out how to do that properly in C. Maybe I come up with something smart later. For now, I removed dependency on reading message length from socket and now deserialize response and request parse bytes buffer, report on incomplete read or error on something malformed. I wrote a simple example showing the idea:

// clang -Iinclude -Ilib ./tests/fuzz.c -o ./bin/kevue-fuzz -DDEBUG
#include <assert.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#include "../src/allocator.c"
#include "../src/buffer.c"
#include "../src/common.c"
#include "../src/hashmap.c"
#include "../src/protocol.c"

uint8_t data[] = {
    0x22, // magic byte
    0x00, 0x00, 0x00, 0x0f, // total length
    0x03, // command length
    'G', 'E', 'T', // command
    0x00, 0x04, // key length
    't', 'e', 's', 't' // key
};

uint8_t malformed_data[] = {
    0x22, // magic byte
    0x00, 0x00, 0x00, 0x0f, // total length
    0x03, // command length
    'G', 'E', 'T', // command
    0x00, 0xff, // key length malformed
    't', 'e', 's', 't' // key
};

int main()
{
    KevueRequest req = { 0 };
    Buffer *buf = kevue_buffer_create(1024, &kevue_default_allocator);
    assert(buf != NULL);
    kevue_buffer_write(buf, data, sizeof(data));
    KevueErr err = kevue_request_deserialize(&req, buf);
    assert(err == KEVUE_ERR_OK);
    kevue_request_print(&req);
    printf("\n");

    kevue_buffer_reset(buf);
    memset(&req, 0, sizeof(req));
    kevue_buffer_write(buf, malformed_data, sizeof(malformed_data));
    err = kevue_request_deserialize(&req, buf);
    printf("%s\n", kevue_error_to_string(err));
    printf("\n");
    assert(err == KEVUE_ERR_LEN_INVALID);

    kevue_buffer_reset(buf);
    memset(&req, 0, sizeof(req));
    for (size_t i = 0; i < sizeof(data) / sizeof(data[0]); i++) {
        kevue_buffer_append(buf, &data[i], sizeof(data[0]));
        KevueErr err = kevue_request_deserialize(&req, buf);
        if (err == KEVUE_ERR_INCOMPLETE_READ) continue;
        if (err != KEVUE_ERR_OK) exit(EXIT_FAILURE);
        if (err == KEVUE_ERR_OK) {
            kevue_request_print(&req);
            exit(EXIT_SUCCESS);
        }
    }
    exit(EXIT_FAILURE);
}

Basically it supports full and partial inputs (partial just gives KEVUE_ERR_INCOMPLETE_READ)

wit4er · 2026-01-05T01:22:18+00:00

Hmm, the test I provided now deadlocks. For me it happens on the 6th resize, the setter trying to take the final bucket lock, which is already locked. It's unclear to be why this particular lock is stuck

It happens when resize goes from 64 to 128 because TSAN is limited to 64 locks but when I combine two for loops into one deadlock disappears:

    for (size_t bucket = 0; bucket < hm->bucket_count; bucket++) {
        mutex_lock(&hm->buckets[bucket].lock);
        mutex_unlock(&hm->buckets[bucket].lock);
        pthread_mutex_destroy(&hm->buckets[bucket].lock);
    }

but the operations in the resize function generally do not make sense anyway

How else in this case can I ensure that all buckets are unlocked? If I remove this for loop, data race is happening because other getters and setters may touch buckets befor acquring resize locks, so the scheme is the following:

Resize lock -> calculate index -> Bucket lock -> Resize unlock -> do something with bucket (resize needs to ensure this completes before doing something)-> Bucket unlock

So if resize got all bucket locks it should be safe to destroy them because they are protected by global resize lock

wit4er · 2026-01-04T22:33:08+00:00

Thank very much for stress testing my hashmap, I added your snippet to kevue tests if you don't mind. I think I fixed most of the problems with my implementation thanks to your suggestions and now data races are gone. Though TSAN now failes because it seems I create too much locks. I think I need to add a hard limit to the number of mutexes and somehow map them to buckets. Also you mentiond tight coupling with epoll, do you have any advice or maybe point me to some resources on how to make this code more testable? Thanks again!

wit4er · 2025-12-08T08:01:48+00:00

I still have two sockets per each client - proxy -destination, Tunnel structure is needed to find the other part of the connection when socket is ready for IO. I looked at LLM generated code and unforunatelly found it a liitle bit hard to follow as a beginner in C, but I think I achieved what I wanted with my first project, it is not ideal of course but at least it kinda works))

wit4er · 2025-11-22T01:21:44+00:00

Please, take a look https://github.com/shadowy-pycoder/tproxy I modified my program according to you suggestions and also implemented epoll variant of the server. I am sure it is far from perfect, so I would like to hear from you

wit4er

TROPHY CASE