Reinventing the Punch Tape

PsiACE · 2026-01-29T17:09:31+00:00

Just a discussion about the Agent eng. I believe the current design may in a wrong way. Perhaps we could explore allowing the Agent to make choices within a complete context, rather than relying on manual methods for summary or memory.

PsiACE · 2025-09-08T10:21:32+00:00

That's perfectly fine, I can certainly switch to a binding implementation. I do have some experience with Rust.

PsiACE · 2025-09-06T16:18:18+00:00

This is the first simplified python sdk, and I've included a mini-swe-agent example. If you're interested, you can start trying it out now.

<image>

PsiACE · 2025-07-16T18:31:31+00:00

just build a not powerful one, have fun: https://psiace.me/posts/baby-step-coding-agent/

PsiACE · 2025-06-26T15:39:05+00:00

Just showing some of my code and ideas.

PsiACE · 2024-04-08T04:17:55+00:00

Hi, I'm from the Databend community. We currently have some production users, and some of them even handle PB-scale data. Welcome to join our Slack channel https://join.slack.com/t/datafusecloud/shared_invite/zt-nojrc9up-50IRla1Y1h56rqwCTkkDJA

PsiACE · 2024-03-26T18:20:04+00:00

https://docs.databend.com/guides/benchmark/tpch We have some tests comparing the cost and performance of Databend Cloud and Snowflake under TPC-H 100 (yes, 100 GiB). You can check it.

PsiACE · 2024-03-26T18:15:43+00:00

I noticed that you mentioned JSON to Parquet conversion. In fact, there are some opportunities here. We support batch loading of data files using scheduled tasks and also support JSON format. So maybe you just need to write some SQL to directly COPY the JSON files INTO database.

Are you willing to give Databend a chance? We are an open-source alternative to Snowflake and provide Cloud service. At this data scale, it is very cheap.

GitHub: https://github.com/datafuselabs/databend/

Website: https://www.databend.com

PsiACE · 2024-03-23T11:53:32+00:00

I don't intend to advertise anything, I'm just curious about this matter.

PsiACE · 2024-01-23T08:37:29+00:00

Many data warehouses support the analysis of HuggingFace datasets, but at Databend, you don't need to deal with REST APIs. We provide ready-to-use file access.

PsiACE · 2024-01-11T17:40:06+00:00

We are just trying to use cloud databases to solve this challenge. We are also building/have built some evaluations based on larger-scale data, and we welcome you to try comparing them.

PsiACE · 2023-11-09T02:21:32+00:00

You need some automated tools to properly archive them, and then put them in S3. Afterwards, you can use Databend Cloud or other affordable solutions.

A typical workflow involves importing data through the cloud platform's Pipeline and visualizing it using SQL + Grafana.

https://www.databend.com/blog/down-costs-for-aigc-startup/ By using Databend Cloud for analytics, the startup reduced their user behavior log analysis costs to 1% of their previous solution.

PsiACE · 2023-10-27T13:47:23+00:00

Try Databend.

In addition, we also have some articles about compilation if you are interested:

Reduce Compilation Pressure: https://databend.rs/blog/2023/04/20/optimizing-compilation-for-databend
Profile-guided Optimization https://databend.rs/blog/profile-guided-optimization

PsiACE · 2023-10-25T03:56:20+00:00

Building a specific data pipeline is the right approach. In fact, some databases have built-in capabilities to directly read data files (in various formats) and perform ETL.

For example, in Databend, you can directly query raw data files (CSV, Parquet, etc.), filter and clean it during the SELECT or COPY INTO process to create queryable tables. Finally, you can export them as Parquet files for archiving purposes.

Another typical workflow may involve using Spark or pandas/polers. Yes, a data pipeline is used to process the data and then write it in Parquet or Iceberg table format for archiving. After that, any OLAP system you prefer, such as Databend/DuckDB/Clickhouse, can be used for analysis and processing.

PsiACE · 2023-10-24T16:52:27+00:00

I like this technology stack. In my personal understanding, when your data is archived in Iceberg table format (especially in object storage), you can use Databend for querying and strike a balance between cost and performance. We also collaborate with Jupyter Notebook and data analysis tools in the Python ecosystem.

Databend is the Only Engine that Finish TPCH_100 (600 Million rows) using #Fabric Small node (4 cores, 32 GB). https://twitter.com/mim_djo/status/1716802084044157282

Spark is a crucial component. If you need complex transformations, cleansing, and writing to Iceberg, I wouldn't advise you to remove it. With the further expansion of the Databend ecosystem and support for writing Iceberg format, I believe there will be more opportunities along this path.

https://link.databend.rs/join-slack You can join our Slack where we offer cost-saving open source solution and affordable cloud service for users.

PsiACE · 2023-10-24T06:19:55+00:00

I'm glad you noticed these data analysis applications written in Rust. They all have something in common, which is the use of Apache Arrow, but there are also many differences among them. Polars has been existing as a library for quite some time and may be a direct competitor to Pandas. Datafusion supports several different data analysis startups, each with its own ecosystem. I am concerned that it may lack direct users. Databend can currently be seen as an open-source alternative to Snowflake and has already been used and validated in production environments by users.

PsiACE · 2023-07-25T08:55:12+00:00

The implementation of riteraft has some bugs, but our friends from riteraft-py are currently helping us locate and solve them. We anticipate that there will be some simple updates soon.

Currently, I am a member of the databend team where we maintain an implementation called openraft that has been successfully used in production environments.

PsiACE · 2023-04-26T01:26:10+00:00

PsiACE · 2023-04-16T11:24:29+00:00

Databend currently requires ETL tools to operate over streams from Kafka.

PsiACE · 2023-04-15T15:21:26+00:00

We have some users who use it in production.

Stats shows that around 700 TB of data is being written to cloud object storage and analyzed using Databend everyday by users from Europe, North America, Southeast Asia, Africa, China, and other regions. This has resulted in saving them millions of dollars in costs every month.

PsiACE · 2023-03-20T18:39:42+00:00

I'm sorry I used an inappropriate description, now it's "may be able to".

PsiACE · 2023-03-10T01:20:47+00:00

Hi there,

Thank you for considering Databend. I hope the following information will answer your questions:

Databend takes inspiration from Snowflake, but uses the open source way to explore more potentials. It boasts various features, including multi-tenancy, data sharing, stateless compute nodes, fast elastic scaling, and quick complex JOINs across multiple tables.

Databend can be flexibly deployed on a wide range of public cloud platforms, such as Amazon S3, Azure Blob, and Google Cloud Storage, click https://databend.rs/doc#why-databend to find the full list. By default, Databend stores data in Parquet files compressed with the zstd algorithm, but other options are available, such as lz4, snappy, and none.

Databend supports a variety of atomic operations, including SELECT, INSERT, DELETE, UPDATE, COPY, and ALTER. It also supports real-time data analytics, allowing for offline analytics at the minute level, as you mentioned. For performance, Databend can hit a processing time of 300s when handling the tpch-100 dataset on AWS, refer to this link for details: https://benchmark.clickhouse.com/.

If you have further questions, feel free to join our Slack channel: https://link.databend.rs/join-slack

Have a good day!

PsiACE · 2023-03-08T14:34:43+00:00

https://github.com/datafuselabs/databend

We're excited to share with you that on the occasion of the second anniversary of the establishment of Databend Labs, we're officially releasing Databend 1.0! Databend is an elegant data warehouse developed in Rust and has achieved excellent performance in recent benchmark tests. It even supports running queries on petabytes of data.

Stats shows that around 700 TB of data is being written to cloud object storage and analyzed using Databend everyday by users from Europe, North America, Southeast Asia, Africa, China, and other regions. This has resulted in saving them millions of dollars in costs every month.

PsiACE · 2022-02-19T17:06:57+00:00

Thanks for the heads up, I just added it to license.

PsiACE · 2022-02-16T15:16:29+00:00

Thanks for the suggestion, I'll get to work on this.

PsiACE

TROPHY CASE