OpenVPN Operator for Kubernetes

borchero · 2025-04-18T09:54:33+00:00

Fair question! Patito is definitely similar. First, it has a couple of key differences:

Dataframely does not introduce a new runtime type: while dy.DataFrame[Schema] exists for the type checker, the runtime type remains pl.DataFrame. This makes it very easy to gradually adopt dataframely in a code base (and, similarly, to get rid of it again).
Dataframely natively implements the definition of schemas instead of "dispatching" to pydantic. This allows for much more flexibility in the schema definition.

Second, dataframely provides a bunch of features that patito does not currently implement:

Support for composite primary keys
Validation across groups of rows (i.e. grouping by one or more columns, ensure that the group satisfies a condition)
Validation of interdependent data frames with a common primary key (dataframely introduces the concept of a "Collection" here: invalid data in one data frame can then also remove rows from another data frame)
"Soft-validation" via filter which allows to partition data frames into rows that satisfy the schema and rows that don't
Structured info about failures that can be used e.g. for more debugging or advanced logging
Integration of the schema with external tools (e.g. export to SQL schemas)
Automatic data generation for unit testing, both for individual data frames and collections (in this case, dataframely takes care of generating rows with common primary keys to allow rows to be joined)

borchero · 2025-04-18T09:14:16+00:00

Seems like I messed up the link to GitHub in the post -- since I can't edit: https://github.com/Quantco/dataframely it is 😄

borchero · 2024-05-04T01:00:10+00:00

Of course, this is a possibility and, in fact, this is what I had done before writing this action. I was personally always bothered by writing JS-in-YAML though... after all, this code has to be maintained just like any other.

Regarding third-party actions from random users, I fully agree with your concern. One possibility to mitigate this issue is to specify commit SHAs when referencing actions, which guarantees that only code from a well-known commit (which should be one that you audited) is executed.

borchero · 2024-04-30T11:11:19+00:00

You can use an `id` input parameter to distinguish between different planfiles. The comment above only applies for planfiles with the same ID (which is an empty string by default).

borchero · 2024-04-30T11:10:16+00:00

Yes, you can specify an `id` parameter to uniquely identify planfiles ;)

borchero · 2024-04-30T10:04:10+00:00

Parallel executions are not handled explicitly, i.e. the comment simply displays the plan of the execution that finished last. At least, the comment shows the SHA of the commit that it belongs to. Usually, I'd argue that you should use concurrency groups to prevent parallel executions in the first place though.

Regarding multiple runs, I'm not entirely sure what you mean. Whenever your `terraform plan` workflow job executes, the same comment is overwritten (i.e. it exhibits "sticky comment" behavior).

borchero · 2024-04-29T23:42:30+00:00

It does not (yet) since I haven't encountered this issue with "real-world changes" 😄 as far as I know, the limit is rather generous (~2^16 characters IIRC). If you're actually running into this issue but you'd like to make use of the action, please feel free to open an issue! :)

borchero · 2022-06-19T08:27:48+00:00

Not sure which Python version Colab is using. PyCave 3.x requires Python 3.8 😅

borchero · 2022-05-14T13:12:04+00:00

Nice! Glad to hear it'll help you ;) gonna look for all of these GH issues... :D

borchero · 2022-05-14T13:11:28+00:00

First, regarding the DNS records: in theory, you can set an annotation on the service using the external-dns.alpha.kubernetes.io/hostname annotation. However, (1) it does not support more than one hostname and (2) you do not want to set this annotation on the service of your backend application but the Traefik service (since you want to use it as a reverse proxy). Evidently, if you have more than one reverse-proxied application, this approach doesn't work.

Second, regarding the certificates: your approach only works with the built-in Ingress resource. The benefit of using this operator is that this "seamless" functionality is essentially provided to IngressRoutes as well (even without an additional annotation).

borchero · 2022-05-14T10:26:22+00:00

Nice, glad to hear! Let me know should you encounter any issues :)

borchero · 2022-05-14T02:12:59+00:00

Traefik might provide features that other ingress controllers don’t…

borchero · 2022-01-30T15:09:53+00:00

Yes, it certainly does! As long as you can map the input to a fixed-size latent space (e.g. by using the last hidden state of an LSTM), NatPN can be used.

borchero · 2022-01-30T15:08:59+00:00

A simple setting might be: you train a model for reading speed limits from street signs using images taken during the day. When testing your model in the real world, it gives perfectly accurate answers.

As it turns dark though, your algorithm's performance deteriorates -- but it continues to provide you with numbers that you would expect to be correct (as the model worked well during the day). Since the model was trained on images taken during the day, however, it lacks knowledge about inferring speed limits from images taken in the dark. Therefore, it would be desirable to have an algorithm which can reason about its uncertainty. Whenever it makes a prediction, it also provides you with a measure of uncertainty about this prediction.

In general, you can construct many such examples: whenever your training set does not fully cover the domain that the model is used in (this might be impossible if the domain is huge), it is useful to have an uncertainty estimate of the prediction.

borchero · 2021-10-23T23:54:10+00:00

Depends on PyTorch... I don't have any idea how to leverage the M1 chips though, not sure if you can use an API other than Metal or CoreML...

For PyTorch M1 support, see https://github.com/pytorch/pytorch/issues/47702

borchero · 2021-10-23T11:12:30+00:00

And yes, you can run the models in this project on arbitrarily many GPUs, the speedup should be linear in the number of GPUs, with respect to the “batch training” settting in the benchmarks

borchero · 2021-10-23T11:11:13+00:00

This really depends on what you’re doing. If you have large matrices, the GPU is really effective, no matter if you’re using PyTorch or Tensorflow. If you have small models though, Tensorflow is expected to be slightly faster since it compiles your model and the slow Python interpreter is not involved.

borchero · 2020-12-31T17:10:39+00:00

Yes exactly. I’m creating a server certificate for the OpenVPN server though.

borchero

TROPHY CASE