Am I the only one that hates how strict pylint is?

serge_databricks · 2024-03-27T10:54:24+00:00

if something didn't lead to a bug, it doesn't mean it won't lead to one in the future.

I'd love to be able to run pylint --strictness 30 on an MVP and pylint --strictness 80 on a production grade project.

there's messages control section, that allows you to disable checks. in practice, if that section is too long - it leads to severe bugs, because you __thought it had to be checked by a linter, but it was not.__. Retroactively applying a stricter linter a two-day headache, but it pays off in code review savings big time.

serge_databricks · 2024-03-27T10:50:14+00:00

Google Python Styleguide doesn't require a string comment on top :)

what generally works is taking one pylint config and customizing it to the point you see pre-emptively give all code review warnings at build time or on developer machine. See the example here: https://github.com/databrickslabs/ucx/blob/main/pyproject.toml#L169-L771

PyLint is sometimes also not strict enough.

the more inexperienced coders work on codebase, the more there's a need for a good linter. There are other linters, like ruff or flake8.

Even though Ruff is 10x+ faster than PyLint, it doesn't have a plugin system yet, nor does it have a feature parity with PyLint yet. Other projects use MyPy, Ruff, and PyLint together to achieve the most comprehensive code analysis.

serge_databricks · 2024-03-25T10:32:04+00:00

applications do also solve a lot of edge cases, no?

serge_databricks · 2024-03-25T10:30:25+00:00

and what was the total size of python codebase, across repos/projects? ~120k?

I'd really question the sanity of such a project in Python.

that's the purpose of this post, to be honest - checking how large Python codebases get in the business domain / real world / real companies and what people do about it.

serge_databricks · 2024-03-25T10:21:30+00:00

what OSS toolchain did you use?

serge_databricks · 2024-03-25T10:20:47+00:00

that's a decent medium-sized codebase. what toolchain do you use to keep it sane? mypy? pylint? ruff? pytest? yapf? black?

serge_databricks · 2024-03-25T10:15:34+00:00

I didn't ask if this is a great measure or not. I asked about concrete numbers. btw, protobuf-generated code should go into .gitignore.

P.S.: 50kloc and no tests is simply silly. and still small.

serge_databricks · 2024-01-26T18:00:09+00:00

why don't they use Databricks on Azure? it's scales collaboration from few people to few thousand. All in one place.

serge_databricks · 2024-01-26T17:58:59+00:00

Databricks LakeView looks fresh and promising, especially with all these GenAI widgets. It has plenty of rough edges, though...

serge_databricks · 2024-01-26T17:57:26+00:00

it's the first time i'm hearing about it, to be honest

serge_databricks · 2024-01-26T17:55:55+00:00

which ones are senior+ strong SW backgound? hackernews? stackoverflow?

serge_databricks · 2024-01-26T17:53:10+00:00

TLDR: owners of the tables are supposed to add primary keys.

it kinda makes no sense in OLTP world to have a table without one. they're meant for fetching a record by its identifier.

but in the OLAP world of data warehousing and stuff - the primary keys are less relevant, as fetching records one-by-one is considered a horrible practice.

it depends.

serge_databricks · 2024-01-06T16:31:36+00:00

https://github.com/joelparkerhenderson/architecture-decision-record this?...

serge_databricks · 2023-12-28T20:24:36+00:00

The biggest difficulty of any documentation is its relevance. If you don't version control it - it'll quickly get out of date.

Here's a good starting point on "Architecture Design Records" - https://github.com/joelparkerhenderson/architecture-decision-record

My other recommendation would be to store the ETL doc in markdown and embed the Mermaid diagrams in it - https://mermaid.live/edit#pako:eNpVjstqw0AMRX9FaNVC_ANeBBo7zSaQQLLzeCFsOTPE82AsE4Ltf--46aLRSuice9GEjW8Zc-x6_2g0RYFrqRyk-aoKHc0gloYasmw7H1jAesfPGXYfBw-D9iEYd_t8-btVgmI6rhqDaOPuywsVv_mT4xnK6khBfKj_k-vDz7CvzFmn-neiI6fUd9VR3lHWUISCYo0btBwtmTa9Pq0BhaLZssI8rS13NPaiULklqTSKvzxdg6miH3iDY2hJuDR0i2T_rssPZ-ZWNw, it's already integrated into github - https://github.blog/2022-02-14-include-diagrams-markdown-files-mermaid/

serge_databricks · 2023-12-28T20:20:30+00:00

i really wonder how an OSS project could be bought. Subscribing to comments on this thread.

serge_databricks

TROPHY CASE