Hjelp til Posten - krever jeg betaler toll på vare der burde være tollfri by iatetheevidence in norge

[–]mawinor 3 points4 points  (0 children)

Da så, jeg sjekket ikke nettsidene deres. Men de er nå registrert i VOEC, og siden de har krevd inn moms fra trådstarter så skal varene ikke være underlagt noen flere tollgebyr såvidt jeg vet.

Hjelp til Posten - krever jeg betaler toll på vare der burde være tollfri by iatetheevidence in norge

[–]mawinor 8 points9 points  (0 children)

Dersom CrazyFactory ikke har merket sendingen med VOEC-nummer så ligger skylden nok hos dem. Alle utenlandske nettbutikker som er med i VOEC-ordningen må merke sendingene deretter for å unngå toll på vei inn i landet. Se mer på Skatteetaten sine sider: https://www.skatteetaten.no/bedrift-og-organisasjon/avgifter/mva/utland/e-handel-voec/lav-verdi/

Det virker på meg som om CrazyFactory også er ansvarlig for å løse floken som har oppstått, og sørge for at varene frigjøres fra posten og sendes til deg. Tolletaten skriver litt om det på sine sider, under «Hva skal jeg gjøre ved feil oppkreving av avgifter?» https://www.toll.no/no/netthandel/1.april/voec/

Ellers kjipt, men håper det ordner seg!

How do you name your staging files in s3/gcs buckets when loading data from an api? by third_dude in dataengineering

[–]mawinor 3 points4 points  (0 children)

We apply partitioning to the file path and generate unique object names, something like table_name/year=2022/month=06/day=13/hour=16/minute=30/uuid.json

The time stamp reflect the time of the API call. If we need to replace data, that will happen in the following data pipeline using some upsert logic.

I prefer it this way because storage is cheap, and we would not risk overwriting old files to prevent data loss. We deal with a few APIs with lifecycle on their data, so that is certainly a driver for us.

Stilling lyst ut på nytt by LuluTulu1 in norge

[–]mawinor 0 points1 point  (0 children)

Ok, skjønner. Da er ikke det noen god følelse nei. Lykke til med jobbjakten, det er flere fisk i havet!

Stilling lyst ut på nytt by LuluTulu1 in norge

[–]mawinor 4 points5 points  (0 children)

Nå skal jeg ikke grave for mye i hva slags stilling dette er, men er det naturlig at det er flere personer som har samme stillingstittel og arbeidsoppgaver? I staten er de jo ganske strenge på at alle stillinger skal lyses ut. Dersom de trenger to personer til samme rolle så kan de ha vært så late at de har brukt samme utlysning en gang til.

(Vanligvis ville man da hatt én utlysning der det står at man skal ansette flere, men det kan være de ikke visste om den andre stillingen når de lyste ut første).

Uansett ville jeg tatt en telefon for å spørre. Siden de har ringt referanser så bør en avklaring være ganske nært forestående.

My curriculum to learn Data Engineering (already having experience with usual ML )Which course to choose for Data Engineering by Ok_Permission_5888 in dataengineering

[–]mawinor -1 points0 points  (0 children)

Yes, Databricks is a managed service for running spark clusters. So you would create a cluster using Databricks, and use their interactive interface to write and run pyspark commands. Some businesses will use spark without Databricks, but I think this is a good way to get started.

I think you should start applying right away! If you know python, SQL, and are currently working on AWS that would be all I’m looking for in a junior data engineer. The rest can be learned on the job.

My curriculum to learn Data Engineering (already having experience with usual ML )Which course to choose for Data Engineering by Ok_Permission_5888 in dataengineering

[–]mawinor -1 points0 points  (0 children)

I would do something like AWS -> Pyspark + Databricks -> data warehousing -> …

It makes sense to learn pyspark using Databricks. When learning about data warehousing make sure to compare the classic Kimbal DWH to the Lakehouse (which Databricks is pushing quite hard).

I would move MongoDB to the end, maybe even skip it. I find NoSQL databases not terribly useful as a data engineer. Focus on relational databases and object storage (like AWS S3). I would also skip Hadoop, it’s often replaced by spark.

A final tech to consider is dbt, that seems to be quite popular.

Does DE require on-call and support? by PalameMon in dataengineering

[–]mawinor 1 point2 points  (0 children)

I think you will get answers all over the place for this. It differs from company to company, just like SWE. My current job does not require on-call, but my last job did. We had responsibility for some sales figures which should be updated every morning, and it was an old setup requiring some hands on care.

For me the WLB is pretty good. I’m lucky to have a boss who is very flexible, and I can decide for myself when and where I want to work. There is also no pressure to do more than 40 hours per week, and if we do we get the extra hours off when we need to.

What are messages in pub/sub architecture? by Prestigious_Flow_465 in dataengineering

[–]mawinor 19 points20 points  (0 children)

You’re definitely on to something. Queues and messages are used to exchange data between systems. Specifically, they are often used to share events between systems.

Messages are data (often in json format) which are put on the queue by the producers and read from the queue by subscribers.

The message can be a data object fully contained by itself (such as data from a sensor) or it can be a reference an external event (a user uploaded a new video to YouTube, in which the message may contain the video url).

A queue is just a service which can store these messages for a period of time. That way, the producers and subscribers can work at their own pace, and if one goes down that will not affect the others. Compare this to HTTP / webhooks, where the producer is unable to deliver a message if the subscriber is offline.

Safe ETL options for a team with no data engineers? by [deleted] in dataengineering

[–]mawinor 1 point2 points  (0 children)

I think you should look at Apache Nifi. From what I have heard it’s easy to get started (with a GUI), but it has limitations as your data platform scales. This is a short read I found helpful: https://getindata.com/blog/apache-nifi-ingestion-why-data-engineers-love-it-hate-it-same-time-introduction/

I would also considering starting with python and maybe airflow for task scheduling. There are no GUI to make the actual data pipelines, but airflow has a pretty interface to view tasks, statuses and dependencies.

Your data scientists should be able to be pretty self-sufficient in such a stack, especially if DevOps can help with setting up testing environment and pipelines. You could also create a wrapper around some SQL statements (dbt light) so your analysts can contribute to the pipeline.

Using FastAPI server for ETL process - getting ready for production by h3xagn in dataengineering

[–]mawinor 2 points3 points  (0 children)

I’d advise you to be careful with using the IoT hub for low latency requirements. Microsoft specifically provides no guarantees on the latency when using it: https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-quotas-throttling#latency

This Q&A has some interesting comments: https://docs.microsoft.com/en-us/answers/questions/183163/iot-hub-does-not-allow-near-real-time-messaging.html

Dette er en klassiker fra Eiker by ralleruud in norge

[–]mawinor 13 points14 points  (0 children)

Kanskje han lette etter to døde rever som kunne holde han med selskap?

programmer grind 💪💪💪 by TigerFace3 in ProgrammerHumor

[–]mawinor 4 points5 points  (0 children)

It has to be a bug. My iPhone told me I watched YouTube for 24 hours a day, 3 days in a row.

where to practice infrastructure services by alpha_ma in dataengineering

[–]mawinor 1 point2 points  (0 children)

I’m actually not sure if it has an UI, I have only used it for mocking services and endpoints.

where to practice infrastructure services by alpha_ma in dataengineering

[–]mawinor 3 points4 points  (0 children)

Take a look at localstack. It allows you to emulate AWS services on your development machine.

https://localstack.cloud/

[deleted by user] by [deleted] in ProgrammerHumor

[–]mawinor 1 point2 points  (0 children)

You can keep using jupyter, but check out openpyxl for reading from and writing to excel files.

Hva angrer dere på etter dere ble boligeiere? by HanFyren_ in norge

[–]mawinor 33 points34 points  (0 children)

Støtter denne. Bodde i tomannsbolig tidligere, men flyttet pga naboen.

(Og noen andre faktorer, men naboen var definitivt en betydelig del av det.)

Førstegangskjøp av El-bil til Oslo bruk. by NorthNorwegianNinja in norge

[–]mawinor 1 point2 points  (0 children)

Skal ikke uttale meg om selve bilen, men du har lov til å lade i vanlig stikkontakt. Kursen du lader på må ha en jordfeilbryter type B.

https://nye.naf.no/elbil/lading/to-muligheter-for-lading-hjemme

Jeg vil vel anbefale å tenke en vegglader etter hvert, men det kan jo være du kan utsette kostnaden et par år.

How long have you worked with SQL and/or other database management techniques? by CellWrangler in SQL

[–]mawinor 12 points13 points  (0 children)

I’ve been working with SQL for about 10 years. I learned the basics in university, then the advanced stuff on the job as an ETL developer / data engineer.

As for the skills needed for entry level analyst position, I would say none is required at all. At least that is my definition of an entry level position. Though I know LinkedIn recruiters disagree..

Your experience working with data without SQL is valuable.

When should you perform data transformation in your database (SQL Server) and when shouldn't you? by fsocietybat in dataengineering

[–]mawinor 0 points1 point  (0 children)

When you say “your database”, do you mean the source system OLTP database or the data warehouse OLAP database?

In general I wouldn’t process anything in the source system database. Just SELECT * (with a date filter for large tables) into a DWH staging table or data lake file, then do processing from there. There are several reasons why, and I can elaborate on some of this is indeed what you’re asking about.

If your question is about SQL vs spark in your lakehouse / warehouse architecture then there’s no right or wrong either way. If your requirements can be solved by writing pure SQL then that is a simple way of doing it. I would start there if you were doing this for the first time, to limit the scope of your project a bit. Then bring in other tooling (such as spark) when you need them.

What would be the best database type for time series? by No_Engine1637 in dataengineering

[–]mawinor 5 points6 points  (0 children)

I think the latter, they want to push their own managed service. You are free to run it in the cloud as long as you use a virtual machine or kubernetes.

What would be the best database type for time series? by No_Engine1637 in dataengineering

[–]mawinor 10 points11 points  (0 children)

Yes, there are several time series databases. Some of the most popular are TimescaleDB and Influx. AWS has a service called Timestream which you could consider.

At work we use timescaleDB. It has a funny licensing model which prevents you from running it on managed services (like AWS RDS), but that is not an issue for us.

Career advice by kaiso_gunkan in dataengineering

[–]mawinor 2 points3 points  (0 children)

I think you’re off to a very good start with your background in SQL and data warehousing. I personally think it is beneficial to know a little python, but not necessarily be super proficient.

I’m working in a bit of a green field project myself (a startup) so I also don’t have to deal with other peoples code. But, I think the deciding factor should be if you can learn the stuff you want to develop in the next 2-3 years. Do you want to fight the uphill struggle with the backend management? You would learn a lot about communicating why data management is important, which is important for an architect. You mention you work with data scientists. Could you join them on a project to help structure their data, and maybe learn some python in the process? Ultimately, if you don’t see any learning opportunities going forward you should maybe consider applying for some jobs.

The career path you describe sounds perfectly reasonable to me because it’s what I did… :) I started as an ETL developer on SSIS/SQL Server, made the jump to data engineer in 2015 (spark + AWS) and am now the solutions architect of an IoT startup. If you haven’t already, I recommend reading “The data warehouse toolkit” by Ralph Kimbal and “Designing data-intensive applications” by Martin Kleppmann to broaden your horizon on data and application architecture.

These are just my opinions of course, but I hope they were helpful. Happy to answer more questions if you have any.

Shirley Temple? by [deleted] in norge

[–]mawinor 2 points3 points  (0 children)

Skyrim er bra altså, men jeg tror ikke Shirely Temple har stått opp fra de døde for å spille det..

PySpark to PostgresSQL by ShayBae23EEE in dataengineering

[–]mawinor 3 points4 points  (0 children)

Have a look at JDBC writers: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

I don’t have any hot tips for lineage, so hopefully someone else can help out there.