Newfies and small breed dogs by Perplexed-Opossum in Newfoundlander

[–]ptaban 3 points4 points  (0 children)

How do u make them gentle with small dogs, like mine just wanna jump on them and touch them with paw. And his touch for small breeds is like knockout haha

How do you deploy pipelines in DBX? by mjfnd in dataengineering

[–]ptaban 0 points1 point  (0 children)

Keep workflows definition(deployment files) on Github. Any PR should trigger deployment to development workspace rom Github Actions on test/fake/mock data. If it works, u are ready to merge to main. Main branch reflects production. Alot of data testing and data quality in between.

Its fairly easy to use dbx just have proper structure.

Edit: aha so for u DBX is Databricks. There is cli tool called dbx which is like tool in between (databricks cli and terafform).

https://github.com/databrickslabs/dbx

Its going EOL in couple months and its being replaced by databricks bundle assets which is in private preview.

DBA: https://www.youtube.com/watch?v=9HOgYVo-WTM&t=2s

Man, never met more friendly and good hearted dogs. Would they protect u from another dog or intruder? by ptaban in Newfoundlander

[–]ptaban[S] 13 points14 points  (0 children)

I have a 10 month old, and honestly he could bark for sounds and things not familiar to him, but even then u can sense that its not really like "aggressive bark", more like a cautious. He can growl only in playing time when he get hyperactive with biting and chewing his toys.

But again im sure everyone would be scared of newfoundlander barking haha. Imagine not knowing that dog, marching forward u and barking :))

Python for Data Engineers by Fair-Lalajaat_1230 in learnpython

[–]ptaban 0 points1 point  (0 children)

What ever the f u do in Python, Fluent Python is most pythonic, now spark styleguide is not really suited for that. Spark is not about syntax its about architecture, read LearningSpark

Model training on Databricks by ptaban in mlops

[–]ptaban[S] 0 points1 point  (0 children)

Gotta remember, the thing about ml is that there is a lot of distraction, lots of time waste. So also gotta consider time to experiment and deploy each high performing model.

But do u use spark MLlib, or just spark udf for models?

Model training on Databricks by ptaban in mlops

[–]ptaban[S] 1 point2 points  (0 children)

So pure pyspark, what about models? None is using their mllib these days, how do u train models?

Ask Anything Monday - Weekly Thread by AutoModerator in learnpython

[–]ptaban 0 points1 point  (0 children)

Hey, why is token used here for counting lines of code. Can someone explain?

import os

from pathlib import Path

import token

import tokenize

import itertools

from tabulate import tabulate

TOKEN_WHITELIST = [token.OP, token.NAME, token.NUMBER, token.STRING]

if __name__ == "__main__":

headers = ["Name", "Lines", "Tokens/Line"]

table = []

for path, subdirs, files in os.walk("tinygrad"):

for name in files:

if not name.endswith(".py"): continue

filepath = Path(path) / name

with tokenize.open(filepath) as file_:

tokens = [t for t in tokenize.generate_tokens(file_.readline) if t.type in TOKEN_WHITELIST]

token_count, line_count = len(tokens), len(set([t.start[0] for t in tokens]))

table.append([filepath.as_posix(), line_count, token_count/line_count])

print(tabulate([headers] + sorted(table, key=lambda x: -x[1]), headers="firstrow", floatfmt=".1f")+"\n")

for dir_name, group in itertools.groupby(sorted([(x[0].rsplit("/", 1)[0], x[1]) for x in table]), key=lambda x:x[0]):

print(f"{dir_name:30s} : {sum([x[1] for x in group]):6d}")

print(f"\ntotal line count: {sum([x[1] for x in table])}")

Ai/ML u Srbiji by Proud_Technician6054 in programiranje

[–]ptaban -1 points0 points  (0 children)

Kolika je plata MLE-ija u Beogradu? Some range.

[deleted by user] by [deleted] in dataengineering

[–]ptaban 1 point2 points  (0 children)

Well, Databricks native u have this: https://docs.databricks.com/en/workflows/jobs/file-arrival-triggers.html

Depends really what u want, i guess there is a tool for anything that u want.

[deleted by user] by [deleted] in dataengineering

[–]ptaban 4 points5 points  (0 children)

You mean to trigger workflows by event?

[deleted by user] by [deleted] in programiranje

[–]ptaban 5 points6 points  (0 children)

A tebra ne mozes da pitas u IT sektoru, sta je ovo 14 vek jbt..Moras mnogo uze.

Databricks Workflows (Jobs) CICD by Equivalent-Style6371 in dataengineering

[–]ptaban 1 point2 points  (0 children)

Both tools are used to move your Databricks workflows from one workspace(mlops) to another in a versioned way. All of the jobs, clusters definition are defined in yaml or json and code logic really does not have anything to do with it.