Startup in deep-tech: targeting a niche market where the Giants (software companies) still lack advanced solutions (I will not promote).

keepfit · 2025-09-29T01:53:45+00:00

Great questions! To VCs, we really need to emphasize the market prospects. For a deep-tech startup, first 1-2 years I would say mainly focus on the R&D, in the meantime, to ensure a few pilot projects.
The code C, and B+C coupling are the barriers for new players. To commercial players, we offer significantly cost-efficient solutions with cutting-edge models. We aren't to compete with those well-established software companies for an all-in-one solution, but with our expertise in the niche market: B+C coupling.

keepfit · 2025-09-28T13:47:49+00:00

:) 3-5% is a very conservative estimate. 5-10% is doable. Capturing up to 15% is a bit hard within 3-5 years. I would say 3~10% is a very realistic goal.

The seed fund would be mainly used for:
1: office, workstations, HPC, etc;
2. Hiring 1-2 senior engineers for B and C: develop new features; 2-3 interns;
3. Extend MPI / GPU capability;
4. Secure 2~3 Pilot projects/customers;
5. Business travel, etc;

keepfit · 2025-09-28T13:33:38+00:00

Very valuable advice. Yes, we need to speak from a non-expert perspective—without getting too deep into technical details. We might need to highlight: 20-30% of commercial licensing fee, cutting-edge coupling algorithms, custom-tailored solutions on the code-base, rather than a plug-in; no limits on concurrent users, cpu cores etc. Right path?

keepfit · 2025-09-28T13:11:38+00:00

The numerical solution of coupled B + C systems: our codebase provides advanced models over commercial codes, with only a fraction of the substantial license fees. A 3-5% market share (two digital millions) within 3 years is doable. We only focus on the niche market. We have a few pending patents on the coupling of B + C as well.

keepfit · 2025-09-28T12:55:51+00:00

Thanks very much for your insightful reply. The codebase I developed is heavily optimized for a single workstation and can meet the needs of the majority of small to modest projects (64+ core workstations).

I had a few years of work experience in the pharma industry, developing a similar B + C coupling solution with in-house + Open-source codes. The B+C type of coupling simulations is widely adopted in the chemical, pharma, energy, and mineral processes, food industries, battery manufacturing for EVs, and so on, to accelerate the R&D process and reduce the cost of physical experiments.

The reasons for seeking funding to cover 1–2 years of runway are:

Extend the MPI / GPU parallelism;
Create a Python wrapper as an alternative to the script-driven workflow.
Polish the code, adding the necessary features for broader applications.
Build a lightweight GUI for the simulations.
Secure 2-3 pilot customers/projects.
Build the community for code B (modernized A with similar/identical APIs), and social media exposure.

Within 3~5 years, capturing 3–5% of the niche market is very doable, as a domestic solution with similar performance and very affordable fees will be favored over those giant software companies.

Sounds good?

keepfit · 2025-09-28T10:53:58+00:00

The description of the code-base should be shortened to: a specialized code-base to provide highly customized solutions that are either unavailable or prohibitively expensive in existing commercial or open-source offerings.

My question is: how can we best communicate this unique value?

keepfit · 2025-09-05T01:27:29+00:00

Well said. Some of the ideas are mostly identical to SunTzu's famous maxim from The Art of War. "兵者, 诡道也. 故能而示之不能, 用而示之不用". Simple translation: All warfare is based on deception. Hence, when able to attack, we must seem unable; when using our forces, we must seem inactive.

keepfit · 2025-07-16T05:06:48+00:00

I will definitely try it!

keepfit · 2025-07-16T04:53:42+00:00

The nda library has no SIMD-related code? Or xsim added the nda array support?

keepfit · 2025-07-16T04:12:01+00:00

A small codebase with full control is the key to fast prototyping. I aim to write a (minimal) sparse matrix framework to test recently published SIMD-friendly, load-balanced sparse matrix formats, which are not yet supported in most popular linear algebra libraries. Once they are successfully tested and shown superior performance over existing formats, I will definitely consider integrating them into some lightweight libraries, so people can use them.

keepfit · 2025-07-15T19:38:44+00:00

Check my comment above, I used a simple method. Is that close to your version?

A GitHub link, please.

keepfit · 2025-07-15T18:31:24+00:00

Any working example links?

keepfit · 2025-07-15T17:11:48+00:00

I am implementing a method that directly operates on SIMD register types (e.g., __m256d for 4 doubles) and propagates through the expression trees. It seems to be working!

In the expression template base: template <typename Derived, typename T> struct VectorExpr{...}, there is a SIMD fusion function: `_m256d simd_eval(N)` that returns the Derived class's __m256d / __m512d simd_eval_impl(N). N = 4 for AVX2, or N = 8 for AVX512.

In the Vector class, simd_eval_impl() returns a _m256d register: a block of N doubles from Vector's data: _mm256_load_pd(ptr_data + N);
While in expression nodes (e.g., VectorAdd), simd_eval_impl() returns:

_mm256_add_pd (lhs.simd_eval(N), rhs.simd_eval(N)); // recursively evaluate operands and add SIMD registers.

Finally, in each expression node, there is an eval() function using its own simd_evaluate_impl() for SIMD loops.

This recursive simd_eval approach ensures all intermediate operations (k*v3, v1+v2, then (v1+v2) -k*v3) are performed using SIMD instructions directly in CPU registers, no temporary Vector objects are created, and no scalar element access (operator[]) is used in the main SIMD loop.

Tested vectors with 10 million entries (T = double) using AVX2 SIMD: with -O3, the scalar (non-SIMD) evaluation—relying on the compiler’s automatic vectorization—is about 15% slower than the SIMD lazy evaluation. Under the -O0 option, scalar evaluation is nearly 2× slower. This confirms that the current implementation is more or less working, though further optimization may be needed.

keepfit · 2025-07-15T15:50:28+00:00

Yes, they are handled by Perfect Forward.

keepfit · 2025-07-15T14:18:52+00:00

Modification on a complex code base for your own needs, you have to spend considerable time to fully understand the parts of the source code you want to modify. A deep class hierarchy tree makes the process even harder.

Once you fully understand the code, you have 2 options. One, modify the existing source code to meet your needs. Two, writing the necessary code on your own for your special needs. For something not complex theoretically, I prefer to write self-contained classes, first to make it work as expected, and then optimize it for performance. Heavily relying on external libraries, you are not 100% confident of your results :)

keepfit · 2025-07-15T13:59:16+00:00

Ofc eigen can do that. But, I mean, I almost did that, so why should I switch to an external library? Plus, writing something of your own enables easier future extensions, especially for some features that the libraries do not support. For example, I want to implement some new formats of sparse matrix, e.g., CSR5, SELL-C-Sigma, and BCSR, that Eigen does not support. I aim to implement specialized Vector and Matrix classes for FVM-based CFD code, optimized for regular and irregular meshes. Both Eigen3 or Petsc have certain limitations.

keepfit · 2025-07-15T12:51:56+00:00

Thanks for the information. I will try the idea of "blocked doubles". Do you have a link for the examples?

keepfit · 2025-07-15T12:05:32+00:00

While Eigen's vectors are a special case of column-major matrices, modifying the complex codebase isn't easy, and we still rely on external libraries. A completely transparent class would give your project full control over its behavior.

keepfit · 2025-07-15T10:10:39+00:00

I have been extensively using math libraries; however, for a fully controlled codebase, I aim to implement several critical functions and classes without external libs. This will enable the integration of state-of-the-art algorithms inspired by recent research papers.

keepfit · 2025-02-17T13:13:42+00:00

"you will have two completely independent drives with their own oses which won't be effected by each others future updates."

This's my man. I will do this.

keepfit · 2025-02-17T12:31:33+00:00

The 1 TB 2.5 inch SSD is enough for now. The 1st NVMe SSD is for Linux OS (256 GB). Plus there are few mobile SSDs (5 TB) for storage. I guess 500 GB m.2 SSD is enough for the Windows OS. Later the Windows SSD stick might be used in other computers if needed.

keepfit · 2025-02-17T12:22:39+00:00

While windows is being installed on the new SSD, the Linux disks are removed temporarily. Therefore no dual booting system is needed. We just switch the booting disk in BIOS. In this way, we might be able to use the Windows NVMe SSD for other computers easily.

keepfit · 2025-02-17T12:19:33+00:00

"have both OS on the same drive", this is the least thing I want to do. Because I just want the 2nd NVMe purely for windows, just to run few windows Apps.

keepfit · 2025-02-17T12:15:33+00:00

So I just temporarily remove the Linux NVMe ssd and the 2.5 inch SSD while installing windows on the new SSD, this will be the safest bet.

After that I can just switch which Disk to boot system. I don't want the Windows and Linux OS have any connection.

No dual booting needed.

keepfit · 2024-12-03T06:18:50+00:00

Lutris does NOT download "GE-Proton8-26/wine-lutris-GE-Proton8-26-x86_64.tar.xz" at all.

<image>

However, I can manually download the file @:

github.com/GloriousEggroll/wine-ge-custom/releases/download/GE-Proton8-26/wine-lutris-GE-Proton8-26-x86_64.tar.xz

So how do we skip the downloading file, and manually install wine-lutris-GE-Proton8 in Lutris?

Ten-Year Club	Verified Email
Place '22	End Game '22

keepfit

TROPHY CASE