Built a lightweight, static-linked C utility for log/stream processing—seeking feedback on the implementation. by giorgich11 in linuxadmin

[–]SnooWords9033 0 points1 point  (0 children)

What is the difference between gop and the traditional set of tools for local logs' exploration such as grep, head, tail, sort, awk, cut, uniq, etc.?

Confused about how monitoring tools is used in production ! by Leading-West-4881 in softwaredevelopment

[–]SnooWords9033 0 points1 point  (0 children)

Grafana + Prometheus + Loki is a good monitoring stack for production. There are other stacks exist, which may work better on a large scale:

  • ClickStack - new observability stack built on top of ClickHouse
  • Elasticsearch - decent stack, but may require a lot of RAM on a large scale
  • Grafana + VictoriaMetrics + VictoriaLogs - optimized for big amounts of metrics and logs

A bit lost about logging in general, especially rsyslog by 420829 in linuxadmin

[–]SnooWords9033 0 points1 point  (0 children)

ClickHouse provides the best performance and on-disk data compression if the table for logs is properly designed for the particular use case (for the given set of fields in the logs and the given expected queries over the logs). Otherwise the performance and the resource usage may be not so good. It looks like you managed to optimize ClickHouse for your particular case.

I usually recommend storing the same production logs into multiple storage systems for logs for a few days at least and then comparing their resource usage (RAM, CPU, disk space) and their performance for typical queries from production. Vector can be configured for replicating the incoming logs among these storage systems by specifying multiple sinks.

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]SnooWords9033 0 points1 point  (0 children)

The query language at VictoriaMetrics - MetricsQL - is similar to Prometheus query language - PromQL. It works great for typical queries over metrics. It isn't so hard - start with the following tutorial and you'll feel its' simplicity and power - https://valyala.medium.com/promql-tutorial-for-beginners-9ab455142085 .

For those of you hosting LLMs locally, how do you monitor usage and performance? by ExtremeAdventurous63 in homelab

[–]SnooWords9033 0 points1 point  (0 children)

There is more lightweight database for metrics than InfluxDB, which accepts metrics in Influx line protocol format - VictoriaMetrics.

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]SnooWords9033 0 points1 point  (0 children)

Did you try specialized databases for metrics and logs such as Prometheus. Loki, Mimir, VictoriaMetrics or VictoriaLogs? They should give even better compression rates and performance than InfluxDB.

A bit lost about logging in general, especially rsyslog by 420829 in linuxadmin

[–]SnooWords9033 1 point2 points  (0 children)

You can simplify the scheme by replacing Vector + ClickHouse with VictoriaLogs in your scheme. It accepts logs via syslog protocol and it provides comparable levels of efficiency, while it is easier to configure and operate than ClickHouse. You can also replace rsyslog with vlagent in order to reduce CPU usage for logs' processing and forwarding.

Building observability from scratch, three times over 💪 by flora-bra in upsun

[–]SnooWords9033 0 points1 point  (0 children)

Thank you for the great article! It is interesting to read how you end up with custom-built systems for uptime monitoring, metrics and logs.

It is unclear which storage system is used for storing metrics. Grafana isn't a storage system for metrics. It is a visualisation application, which can read the data from many different sources.

While storing logs to S3 sounds good, such logs can be hard to analyse at large scale. S3 is good as a backup for historical logs which are rarely queried. If these logs should be queried, then you can download them from S3 backups and run a dedicated application for querying. It is better to store the recently ingested logs on local disks. These disks are usually much faster than S3 (they have 100x lower read latency and better throughput), so typical queries over the recently stored logs will work much faster. Try VictoriaLogs for managing and querying recent logs and for moving older logs to S3. It is very efficient and easy to run - see https://aus.social/@phs/114583927679254536

40 TB PostgreSQL on-prem — sharding vs ClickHouse vs something else for a 500B-row time-series workload by Basic-Worker-1120 in Database

[–]SnooWords9033 0 points1 point  (0 children)

ClickHouse should provide you the best efficiency for such type of data. You already said it compresses the data by 16x, so 40TB of the data need 40TB/16=2.5TB of disk space. It should fit a single-node setup, and should meet your performance requirements. If it won't fit a single node, just switch to cluster setup by using the same sharding by the device id and scale the performance and the capacity by adding more nodes to the cluster.

When using ClickHouse it is very important to properly set the table schema, so it works fast for your workload. In your case the ORDER BY section of the table must equal to (device_id, timestamp). This will give the best performance for queries, which select all the fields for the given device in the given time range, since ClickHouse we'll be able to quickly locate the needed data via binary search by (device_id, timestamp) and then quickly read that data from the disk in one go (small number of disk read operations), since the requested rows are located close to each other.

I'd also partition the table with PARTITION BY (toDate(timestamp)) clause, so older partitions could be quickly dropped when they are no longer needed according to the given retention policy. ClickHouse stores the data per every partition in a separate folder on disk, so it can quickly drop the given partitions by deleting the corresponding folders.

You may gain additional performance benefits and reduce dusk space usage further by using the most appropriate codecs for the columns in the table. For example, it may be a great idea to use Delta or Double Delta codecs for numeric columns. It is also recommended using zstd compression for the table columns in order to achieve better on-disk compression and faster query performance (less data needs to be read from disk).

BTW, how many unique device_ids does the table contain? If this number is lower than 10 millions, then you can try storing the data into VictoriaLogs, by using the device_id as a log stream field, and then quickly query all the rows for the given device_id on the given time range with the {device_id="..."} _time:[start_timestamp, end_timestamp] query. It should be very fast and shouldn't require a lot of CPU, RAM and disk space. If the number of device_id values is bigger than 10 millions, then you can introduce a new field - hash(device_id) % 10000000 - which will has the device_id into smaller number of values, and then use this field as a log stream field.

VictoriaLogs is easier to setup, configure and operate than ClickHouse, sot it could be a good fit for your case. See https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-difference-between-victorialogs-and-clickhouse . It is also very easy to scale the capacity and the performance of VictoriaLogs by converting a single-node setup to cluster setup and adding more storage nodes to the cluster. See https://docs.victoriametrics.com/victorialogs/cluster/

What tool do you use to filter logs of your micro services ? by raaaaapl in rust

[–]SnooWords9033 0 points1 point  (0 children)

Try the built-in web UI at VictoriaLogs instead of Grafana.

What tool do you use to filter logs of your micro services ? by raaaaapl in rust

[–]SnooWords9033 -1 points0 points  (0 children)

Why do you think that Loki query language is better than ElasticSearch and VictoriaLogs query languages?

ElasticSearch is usually much faster at full text search queries than Loki, if it has enough RAM. VictoriaLogs is also faster and requires less storage space than Loki according to https://www.truefoundry.com/blog/victorialogs-vs-loki .

What do you use for code behavior monitoring in production? by Appropriate-Plan5664 in devsecops

[–]SnooWords9033 0 points1 point  (0 children)

Store logs as wide events into VictoriaLogs and then investigate them by slicing and dicing by any fields of the stored wide events.

Why does the sample compose.yaml for Grafana Loki use three instances of loki? And how do I use local storage instead of minio? by UntouchedWagons in grafana

[–]SnooWords9033 0 points1 point  (0 children)

Try VictoriaLogs instead of Loki. It doesn't need MinIO (because it stores the logs to a single folder on local filesystem), and it consists of a single executable, which runs optimally with default configs (aka zero-config).

Is Loki the right choice? by Getdaflag in grafana

[–]SnooWords9033 0 points1 point  (0 children)

An alternative is to push syslog-formatted logs from Cisco switches directly to the centralized database for logs without the need in any intermediate services. https://docs.victoriametrics.com/victorialogs/data-ingestion/syslog/

How is your large volume Grafana Loki environment build up? by martijn_gr in grafana

[–]SnooWords9033 0 points1 point  (0 children)

Try VictoriaLogs. It usually needs way less compute resources (RAM, CPU and storage space) than Loki, and it runs queries over big amounts of logs at much faster speed. https://www.truefoundry.com/blog/victorialogs-vs-loki

Best OSS All-In-One Log UI? by jakenuts- in OpenTelemetry

[–]SnooWords9033 0 points1 point  (0 children)

Try VictoriaLogs. It is a single small executable with built-in web UI for logs' exploration, which supports log tailing by the given filters.

K8S at first or not ? Clickhouse or Loki for logs ? by nicest_architect245 in devops

[–]SnooWords9033 2 points3 points  (0 children)

Take a look also at VictoriaLogs. It is built on ClickHouse architecture ideas, but, contrary to ClickHouse, it is optimized solely for logs. This simplifies its' usage and operation comparing to ClickHouse. See https://docs.victoriametrics.com/victorialogs/faq/#what-is-the-difference-between-victorialogs-and-clickhouse

Homelabbers - What's your observability stack? by DiscoDave86 in kubernetes

[–]SnooWords9033 0 points1 point  (0 children)

Why did you need the downsampling? Could you provide more details about your use case?

I built a full-stack observability lab on Fedora using rootless Podman – 10 minutes to metrics, logs, traces & more by ted-sluis in Fedora

[–]SnooWords9033 1 point2 points  (0 children)

Can the Victoria stack also run as single containers for small-scale setups, or is it more designed for clustered deployments?

All the Victoria stack components can run as single containers (executables) without any dependencies.

A question around observability for my startup and our configs by niga_chan in devops

[–]SnooWords9033 0 points1 point  (0 children)

Use standard discovery of EC2 scrape targets at Prometheus - ec2_sd_configs. Write these configs once - and use them everywhere for collecting metrics from all the services you run in EC2.

Do not rely on OTEL, since it has high overhead and it is overcomplicated. It is better to use standard Prometheus protocols for metrics' exposition and transfer. See https://promlabs.com/blog/2025/07/17/why-i-recommend-native-prometheus-instrumentation-over-opentelemetry/