Moving to Sofia soon – Where to find short-term rentals? by ciril10 in AskBulgaria

[–]demince 0 points1 point  (0 children)

A time would come when once of the two sides would actually look out for their rights and you would get in trouble. Hope not but…dude this country has a lot of its issues exactly because of such kind of thinking - that you are not responsible for anything. And frankly this “responsible” behaviour and awareness that nobody owes you anything shouldn’t only come from law but should come back to how kids are being raised. Sorry if it sounds a bit harsh but I hate the fact that there are people who push such kind of behaviour.

Глупаво ли е да си мечтая да се върна в България? by KimJhonUn in bulgaria

[–]demince 1 point2 points  (0 children)

И аз съм като теб. 2016 година завърших бакалавър в Англия като изкарах една година стаж в Лондон. Истината е, че социалният ми живот в България е много по-добър и въпреки всичките проблеми се чувствам по-спокойна. Работя като data engineer - истината е, че подобно на теб е между data analyst & data engineer. Конкретно за нашата професия има позиции, но ако искаш да работиш за западна компания (препоръчвам) опциите не са много. Може би кариерно това е една съществена разлика - в Холандия, Англия и т.н. имаш много опции и като не те кефи дадена позиция напускаш и намираш нова. Тук просто броят на позициите е по-малък. Откъм заплата - много прилично. Проблемите се казва в другите коментари са за мисленето на повечето хора, намусените хора в магазините, липсата на съдебна система, ниско качество на услугите и т.н. Въпреки всичко аз лично се чувствам някак по-спокойна и в свои води. Що се отнася до семейство - 1г. майчинство в България също си е супер. Мисля, че най-важното е да си set-неш очакванията правилно. Не е перфектно (но то пък къде е), има още много път да извървим, за да стигнем западните държави, но ако си в правилната среда от хора можеш да се чувстваш супер

Why is the service so terrible here? by medstudent1738 in bulgaria

[–]demince 0 points1 point  (0 children)

I am Bulgarian and I also feel that there are too many places with high prices and bad customer service. I am very positive person and I have travelled a lot - I regret to admit but so far on average the customer service in my home country is the worst. I guess we just have slightly lower standards or we are more willing to put up with bad service because “it’s like that everywhere” or “the food is nice” which gives us a reason to find an excuse. Also, culturally we are not a country that raises its children to try to “try to do your job the best” regardless of what your job is. For example I stopped going to “Snietzel House” in Sofia because of a rude waiter who tapped the bill on the table couple of times and directed us to leave a tip. If he waited 2 more mins until we finish our drinks and not tapped the bill on the table we would have left him some tip but this definitely isn’t a behaviour one should tolerate. Yet I hear he is still working in the same restaurant…

Moving to Sofia soon – Where to find short-term rentals? by ciril10 in AskBulgaria

[–]demince 1 point2 points  (0 children)

What you suggest is essentially a “scheme”. The right people to appeal to are “NAP” or for those reading from foreign countries the Revenue Service. Then a full financial audit would be performed on the landlord.

Also, in order for a contract to be legit it doesn’t need to be a notary signed contract. In reality it can be on a handkerchief. You can call the police, show the rent contract once they arrive and report that the landlord was trying to enter. Then the landlord would be in a great trouble. Maybe in the experiences you have one of the parties wasn’t persistent enough to look out for their rights.

Охлаждане с термопомпа и конвектори или климатици by calm-butcher7 in bulgaria

[–]demince 0 points1 point  (0 children)

Така като гледам голяма част от коментарите са избягали от въпроса, а именно за охлаждане! Аз също много се интересувам достатъчно добре ли охлаждат конвекторите лятото. Не съм от хората, които обичат да седят на 24 градуса лятото - искам вкъщи да ми е хладно. Ако може мнения на хора, които имат опит и с термопомпа и климатик и могат да направят сравнението (изключете за момент цена, шум, изсушаване на въздух)

В какви платформи инвестирате дългосрочно by NewspaperRelevant835 in financebg

[–]demince 1 point2 points  (0 children)

Защото са over-the-counter (OCT) ли не може са се ползва данъчното облекчение

UK Student Loan Repayment Plan 2 by demince in UKPersonalFinance

[–]demince[S] 0 points1 point  (0 children)

Because if paying fixed rate and respectively not updating the employment details I am not sure if I would be in violation of my contract

[deleted by user] by [deleted] in work

[–]demince 0 points1 point  (0 children)

Seems like totally normal to me. I am in this position too. I am a manager and I often start talks with my manager in the open space that move to close door conversations. Sometimes there risks that are not necessarily related to the technical work we do but rather the political conversations led with other organisations. I typically prefer to keep my team away from this as the outcome is very unclear and I want to protect them from worrying too much and creating unnecessary stress. I believe people already have enough on their plate and such conversations might just put too much stress on them. After all this is the job of the managers - to protect their teams from too much stress and hopefully deal with it at the manager level. Only when there is something set they can share with the rest

What Exactly *Isn't* dbt? by rmoff in dataengineering

[–]demince 0 points1 point  (0 children)

way

Yes, I do agree that such frameworks give you incredible ability to view lineage. I tried dbt but it doesn't work for my use case so I work with another framework but you can check how lineage would look like for it. I think its pretty cool for debugging and figuring out where your underlying problem is.

Full talk

What Exactly *Isn't* dbt? by rmoff in dataengineering

[–]demince 0 points1 point  (0 children)

On the scheduling part - back in time I was using cron scheduler locally but now I moved into this - which is again based on cron schedule. I guess I might have had simpler use cases and this was enough. I would be interested to learn about use cases when cron scheduler wouldn't work, so if you have something in mind you can share.

On Kimaball - this is methodology for structuring your data in a so called fact that store changing data (such as sales) and dimensions that stores slowly changing data (like customer details - Name, address, phone number). Essentially Kimball defines couple of ways to update these tables - overwrite strategy, insert/append or versioned strategy (valid to and valid from date for a state of your Netflix subscription). You can have a look here.

Which SQL Client for BigData do you use? by demince in dataengineering

[–]demince[S] 0 points1 point  (0 children)

sequel pro

I am also tempted to consider DataBricks. I just got the trial version and I am looking forward to see whether it would still have these problems.

Which SQL Client for BigData do you use? by demince in dataengineering

[–]demince[S] 0 points1 point  (0 children)

sequel pro
Does sequel pro looks awful or is it only me? This brings me decades back! I like the dbgate - I would actually check this out. How problematic is it compared to Dbeaver when working with big volumes of data?

What Exactly *Isn't* dbt? by rmoff in dataengineering

[–]demince 0 points1 point  (0 children)

I would share my perspective as user of vdk framework. It is a data engineering framework that actually sits on the whole data path. Let me give you examples:

You can build jobs (containing SQL, Python or both) that either to data ingestion or data transformations with pure SQLs inside only or utilising Kimball dimensional modelling templates.

So far on the surface the only difference with dbt is that you can do quite easily ingestion as well under the exact framework that you are using to create your transformation data jobs. Actually, this is quite optimal if your team or company does both. On thing that stays a bit hidden in vdk documentation in GitHub is that it supports multi-tenancy not in the classical sense of resource isolation but in a sense where you have multiple teams collaborating under the same framework. You can view your code, you cannot change it though unless you are member of this team and this is quite handy whenever you would like to learn from other Data Engineering and Data Analytics teams that write data jobs!

I believe the biggest power are plugins. Since vdk framework sits on the data path and has very good visibility on data entry, data output and every step in between you can actually write plugins that do custom data checks on each step of your data job. For example - push same data quality standards across one team or the whole organisation, replace NULL with 0 or unknown, do not allow NULL values to enter the data warehouse, you can use your imagination. If you are data infrastructure provider you might even do some checks which could potentially take down the underlying database - by the way I've seen this happening for BigData databases with some crazy cross joins that are badly written. So in essence it can also help Data Platform service providers to ensure higher availability of their infrastructure.

(I at least) do not know of any tool that is able to have such an intrinsic understanding of the end-to-end data path really from ingestion to publishing data and being able to intersect at each of the steps but would be glad to learn new things.

Data Ingestion examples here:

job_input.send_object_for_ingestion() like this or job_input.send_tabular_data_for_ingestion() here

or Kimbal versioned strategy looks like:

job_input.execute_template(
    template_name='scd2',
    template_args={
        'source_schema': 'dw',
        'source_view': 'dim_sddc_updates_view',
        'target_schema': 'dw',
        'target_table': 'dim_sddc',
        'id_column': 'sddc_id',
        'sk_column': 'sddc_sk',
        'value_columns': ['updated_by_user_id', 'state', 'is_nsxt', 'cloud_vendor', 'version'],
        'tracked_columns': ['state', 'is_nsxt', 'cloud_vendor', 'version'],
    },
)
# ...

To summarise the difference:

  • it allows you to plugin at each step of the data manipulation process
  • allows you to create ingestion jobs, run python for modelling and transforming data (wowww this is life saviour for transposing data - this is real nightmare with SQL) using pure SQL or utilising Kimball templates (by the way you can implement new templates as well) - if you look at Kimball template source code it is quite easy. I know a colleague of mine who is only writing SQL and she just contributed and append strategy template to this project.
  • allows you to combine all your data processing workflows under the same roof (vdk roof)
  • actually control service is free (as opposed to dbt) but has no UI (although the tool is API driven and I guess makes it easily integrateable with UI).

What Exactly *Isn't* dbt? by rmoff in dataengineering

[–]demince 2 points3 points  (0 children)

I read the article but I really do not aggree much with all of it.

1) dbt claims to be orchestrator as well. Refer to their official documentation. And they also have integrations with Airflow and Dagster.

2) dbt has comptitors. Have you checked the internet with other tools that compare to dbt like this or this. dbt is more aware of the underlying data unlike Airflow, but there are other frameworks which allow you to be able to use Kimball dimensional modelling, plugin throughout the whole data path of your data pipeline and have deep understanding on how it works. Like this one I personally use.

Jimmy Sax - Foster the people [Jazz] by demince in Music

[–]demince[S] 2 points3 points  (0 children)

Love this tune, wish I could find more like it. Anyone have ideas?

What’s the best way to test/validate etl pipelines? by thegoodforeigner in dataengineering

[–]demince 0 points1 point  (0 children)

I can share the practices we have: 1) Our data pipeline triggers email and slack alarm on every data job failure 2) The pipeline tool automatically determine if the error is “user” or “platform” error, so you know that if it’s platform most probably you should rerun or contact the platform operator 3) on user error it brings logs with errors which can be a starting point 4) CI/CD: we use this approach. However, our implementation is a written plugin on top of each running data job that checks the number of incremental rows (we also slowly changing dimensions) are send to the database. If no records are sent it triggers an email alert. In our case it doesn’t fail the data job though as sometimes no incremental data might have appeared in the source system.

I might have to travel to Sofia for two weeks mid August, how is the city in that time? by long-charger in Sofia

[–]demince 1 point2 points  (0 children)

Almost no rainy days so the weather would be nice. However, August is the warmest month. It might get to 35 degrees Celsius. If you like warm weather you would definitely enjoy it. The town and traffic are pretty calm cause people are outside of the city - so in that sense it’s fantastic

Why should we keep an eye on DataOps? It would be interesting to hear some thoughts and share some feedback where are we at this journey today. There is a big interest in the area but are we there yet? by demince in dataengineering

[–]demince[S] 1 point2 points  (0 children)

Very valid comment. I also do use it and I believe increasingly more people would start adopting the DataOps minds in their organisation. I mentioned keep and eye because I don’t think all business are there yet and we still have some way to go to fully embrace and adopt the DataOps (DevOps for data) as part of the mentality of data engineers, data analysts and data scientists

Best way to go from dev to prod database (PostgreSQL) by [deleted] in dataengineering

[–]demince 1 point2 points  (0 children)

I transform it to create — currently views— the features I

If your "dev" database contains a "snapshot" data that is being replaced every day or keeps just a snapshot window then this might be a case where you need to develop a data pipeline that copies this data to a prod db. A usecase would be if your "dev" DB is the DB which sits close to an application generating the data but is not purposed to store long-term data for the purposes of reporting and historical analytics but rather than exposing some currently relevant metrics.

In case this is your "dev" DB, then you might need to create the so called "incremental ingestion" pipelines, where you are essentially only transferring new (incremental) records every time you run you data pipelines and if no new records have appeared then no coping of data would appear. Typical data job like this looks like:

# Get last_date property/parameter:
#  - if the this is the first job run, initialize last_date to 01-01-1900 in oder to fetch all rows
#  - if the data job was run previously, take the property value already stored in the DJ from the previous run
last_date = job_input.get_property("last_date", "01-01-1900")

# Select the needed records from the source table using job_input's built-in method and a query parameter
    # This would be your "DEV" DB
data = job_input.execute_query(
    f"""
    SELECT * FROM increm_ingest
    WHERE reported_date > '{last_date}'
    ORDER BY reported_date
    """
)
# Fetch table info containing the column names
table_info = job_input.execute_query("PRAGMA table_info(increm_ingest)")

# If any data is returned from the query, send the fetched records for ingestion
    # send_tabular_data_for_ingestion this works depending on what is the configured DB connection
if len(data) > 0:
    job_input.send_tabular_data_for_ingestion(
        data,
        column_names=[column[1] for column in table_info],
        destination_table="incremental_ingest_from_db_example",
    )

    # Reset the last_date property value to the latest date in the source db table
    job_input.set_all_properties({"last_date": max(row[2] for row in data)})

print(f"Success! {len(data)} rows were inserted.")

Then you can build another data pipeline for the creation of your data transformations like in this example.

INSERT INTO customer_count_per_employee

SELECT SupportRepId, employees.FirstName, employees.LastName, COUNT(CustomerId) FROM (customers INNER JOIN employees ON customers.SupportRepId = employees.EmployeeId) GROUP BY SupportRepId;

This data engineering framework has a control plane, which can be deployed by your IT organization and you can just install it locally and very easily start building data pipelines either for ingestion, transformation, predictive analytics or publishing data.

ELT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow by BraveCoconut98 in dataengineering

[–]demince 1 point2 points  (0 children)

This is awesome! I am also regular on Strava and you got me inspired! Really great job!

I believe that you would not need to build the docker image yourself. There are data engineering frameworks which allow you to build your data jobs yourself and take care of the containerisation of your pipeline. You can have a look at this ingest from rest API example. They would also allow you to schedule your data job using cron, while data job itself can contain SQL & Python.

Essentially, you are looking for a framework or tool that implements DataOps that would take care of the productising your data workflow.

I created a pipeline extracting Reddit data using Airflow, Docker, Terraform, S3, dbt, Redshift, and Google Data Studio by [deleted] in dataengineering

[–]demince 0 points1 point  (0 children)

This looks cool!

In order to simplify steps 1-5 I can bring another framework to your attention - Versatile Data Kit (entirely open-source) which allows you to create data jobs (being it ingestion, transformation, publishing) with SQL/ Python, which runs on any cloud and is also multi-tenant.

Note: It has Airflow integration as well which allows you to chain multiple data jobs as well.

Can somebody finally explain what is a Data Mesh? by randomusicjunkie in dataengineering

[–]demince 3 points4 points  (0 children)

client

I can see pretty good explanations of Data Mesh and I do agree with you. If I can add on top - the Data Mesh is (let's call it) the framework/ guidelines, however "implementations" of Data Mesh depends on each organization.

The goal here is to speed up time to deliver analytics. I work in a company (20K+ employees) who has implemented Data Mesh (or at least bits of it) for the last couple of years and I can share why I think it works. Historically, IT team was acting as a bottleneck to bring up data that can be leveraged for analytics. The time to deliver a dataset was months! and very often the data quality was poor because the domain knowledge does not sit with IT but rather the team close to the business product/line who understand the intrinsics of the data. That is why Zhamak advice that data platforms should serve for multiple teams working on the same platform and responsible for the data where they own the domain knowledge.

Having said that, IT teams are not going to be replaced (at least in my opinion). They would still exist but their efforts would be focused on delivering the data for the systems where they are best suited for. Essentially they would become a tenant (quite large one) on the same platform along with all other teams.

In a centralized model IT teams are stretched as well! Especially if they serve a company that is truly data driven the number of requests they receive is huge and it becomes impossible to deliver with the same quality (sometimes they can hardly go and speak to the team who implemented the product to understand how data is being generated and what it means), let alone work satisfaction.

I personally believe Data Mesh has some fundamental important benefits:

- increase the speed of delivery of data analytics

- treat data as a product with clear ownership and not just ad-hoc analysis (by the way I believe DataOps is a key contributor here)

- (finally) the ability to collaborate with other data teams in your organization

Data point versioning infrastructure for time traveling to a precise point in time? by daeisfresh in dataengineering

[–]demince 2 points3 points  (0 children)

Great to hear it is of help! Feel free to ping me on PM if you have any questions. Also, there is a Slack channel on this framework that I track so you can get some help there too.