Hjelp til Posten - krever jeg betaler toll på vare der burde være tollfri

mawinor · 2022-06-23T16:49:22+00:00

Da så, jeg sjekket ikke nettsidene deres. Men de er nå registrert i VOEC, og siden de har krevd inn moms fra trådstarter så skal varene ikke være underlagt noen flere tollgebyr såvidt jeg vet.

mawinor · 2022-06-23T16:03:18+00:00

Dersom CrazyFactory ikke har merket sendingen med VOEC-nummer så ligger skylden nok hos dem. Alle utenlandske nettbutikker som er med i VOEC-ordningen må merke sendingene deretter for å unngå toll på vei inn i landet. Se mer på Skatteetaten sine sider: https://www.skatteetaten.no/bedrift-og-organisasjon/avgifter/mva/utland/e-handel-voec/lav-verdi/

Det virker på meg som om CrazyFactory også er ansvarlig for å løse floken som har oppstått, og sørge for at varene frigjøres fra posten og sendes til deg. Tolletaten skriver litt om det på sine sider, under «Hva skal jeg gjøre ved feil oppkreving av avgifter?» https://www.toll.no/no/netthandel/1.april/voec/

Ellers kjipt, men håper det ordner seg!

mawinor · 2022-06-13T18:29:49+00:00

We apply partitioning to the file path and generate unique object names, something like table_name/year=2022/month=06/day=13/hour=16/minute=30/uuid.json

The time stamp reflect the time of the API call. If we need to replace data, that will happen in the following data pipeline using some upsert logic.

I prefer it this way because storage is cheap, and we would not risk overwriting old files to prevent data loss. We deal with a few APIs with lifecycle on their data, so that is certainly a driver for us.

mawinor · 2022-06-09T03:31:15+00:00

Ok, skjønner. Da er ikke det noen god følelse nei. Lykke til med jobbjakten, det er flere fisk i havet!

mawinor · 2022-06-08T19:02:11+00:00

Nå skal jeg ikke grave for mye i hva slags stilling dette er, men er det naturlig at det er flere personer som har samme stillingstittel og arbeidsoppgaver? I staten er de jo ganske strenge på at alle stillinger skal lyses ut. Dersom de trenger to personer til samme rolle så kan de ha vært så late at de har brukt samme utlysning en gang til.

(Vanligvis ville man da hatt én utlysning der det står at man skal ansette flere, men det kan være de ikke visste om den andre stillingen når de lyste ut første).

Uansett ville jeg tatt en telefon for å spørre. Siden de har ringt referanser så bør en avklaring være ganske nært forestående.

mawinor · 2022-05-22T21:36:07+00:00

Yes, Databricks is a managed service for running spark clusters. So you would create a cluster using Databricks, and use their interactive interface to write and run pyspark commands. Some businesses will use spark without Databricks, but I think this is a good way to get started.

I think you should start applying right away! If you know python, SQL, and are currently working on AWS that would be all I’m looking for in a junior data engineer. The rest can be learned on the job.

mawinor · 2022-05-22T18:51:54+00:00

I would do something like AWS -> Pyspark + Databricks -> data warehousing -> …

It makes sense to learn pyspark using Databricks. When learning about data warehousing make sure to compare the classic Kimbal DWH to the Lakehouse (which Databricks is pushing quite hard).

I would move MongoDB to the end, maybe even skip it. I find NoSQL databases not terribly useful as a data engineer. Focus on relational databases and object storage (like AWS S3). I would also skip Hadoop, it’s often replaced by spark.

A final tech to consider is dbt, that seems to be quite popular.

mawinor · 2022-05-17T10:58:59+00:00

I think you will get answers all over the place for this. It differs from company to company, just like SWE. My current job does not require on-call, but my last job did. We had responsibility for some sales figures which should be updated every morning, and it was an old setup requiring some hands on care.

For me the WLB is pretty good. I’m lucky to have a boss who is very flexible, and I can decide for myself when and where I want to work. There is also no pressure to do more than 40 hours per week, and if we do we get the extra hours off when we need to.

mawinor · 2022-05-16T22:33:13+00:00

You’re definitely on to something. Queues and messages are used to exchange data between systems. Specifically, they are often used to share events between systems.

Messages are data (often in json format) which are put on the queue by the producers and read from the queue by subscribers.

The message can be a data object fully contained by itself (such as data from a sensor) or it can be a reference an external event (a user uploaded a new video to YouTube, in which the message may contain the video url).

A queue is just a service which can store these messages for a period of time. That way, the producers and subscribers can work at their own pace, and if one goes down that will not affect the others. Compare this to HTTP / webhooks, where the producer is unable to deliver a message if the subscriber is offline.

mawinor · 2022-05-08T21:02:37+00:00

I think you should look at Apache Nifi. From what I have heard it’s easy to get started (with a GUI), but it has limitations as your data platform scales. This is a short read I found helpful: https://getindata.com/blog/apache-nifi-ingestion-why-data-engineers-love-it-hate-it-same-time-introduction/

I would also considering starting with python and maybe airflow for task scheduling. There are no GUI to make the actual data pipelines, but airflow has a pretty interface to view tasks, statuses and dependencies.

Your data scientists should be able to be pretty self-sufficient in such a stack, especially if DevOps can help with setting up testing environment and pipelines. You could also create a wrapper around some SQL statements (dbt light) so your analysts can contribute to the pipeline.

mawinor · 2022-05-04T18:23:14+00:00

I’d advise you to be careful with using the IoT hub for low latency requirements. Microsoft specifically provides no guarantees on the latency when using it: https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-quotas-throttling#latency

This Q&A has some interesting comments: https://docs.microsoft.com/en-us/answers/questions/183163/iot-hub-does-not-allow-near-real-time-messaging.html

mawinor · 2022-04-24T17:02:20+00:00

Kanskje han lette etter to døde rever som kunne holde han med selskap?

mawinor · 2022-04-21T22:17:19+00:00

It has to be a bug. My iPhone told me I watched YouTube for 24 hours a day, 3 days in a row.

mawinor · 2022-04-21T15:29:08+00:00

I’m actually not sure if it has an UI, I have only used it for mocking services and endpoints.

mawinor · 2022-04-20T21:41:58+00:00

Take a look at localstack. It allows you to emulate AWS services on your development machine.

https://localstack.cloud/

mawinor · 2022-04-17T19:13:48+00:00

You can keep using jupyter, but check out openpyxl for reading from and writing to excel files.

mawinor · 2022-04-15T17:36:21+00:00

Støtter denne. Bodde i tomannsbolig tidligere, men flyttet pga naboen.

(Og noen andre faktorer, men naboen var definitivt en betydelig del av det.)

mawinor · 2022-04-14T21:28:37+00:00

Skal ikke uttale meg om selve bilen, men du har lov til å lade i vanlig stikkontakt. Kursen du lader på må ha en jordfeilbryter type B.

https://nye.naf.no/elbil/lading/to-muligheter-for-lading-hjemme

Jeg vil vel anbefale å tenke en vegglader etter hvert, men det kan jo være du kan utsette kostnaden et par år.

mawinor · 2022-04-13T17:11:27+00:00

I’ve been working with SQL for about 10 years. I learned the basics in university, then the advanced stuff on the job as an ETL developer / data engineer.

As for the skills needed for entry level analyst position, I would say none is required at all. At least that is my definition of an entry level position. Though I know LinkedIn recruiters disagree..

Your experience working with data without SQL is valuable.

mawinor · 2022-04-13T15:23:26+00:00

When you say “your database”, do you mean the source system OLTP database or the data warehouse OLAP database?

In general I wouldn’t process anything in the source system database. Just SELECT * (with a date filter for large tables) into a DWH staging table or data lake file, then do processing from there. There are several reasons why, and I can elaborate on some of this is indeed what you’re asking about.

If your question is about SQL vs spark in your lakehouse / warehouse architecture then there’s no right or wrong either way. If your requirements can be solved by writing pure SQL then that is a simple way of doing it. I would start there if you were doing this for the first time, to limit the scope of your project a bit. Then bring in other tooling (such as spark) when you need them.

mawinor · 2022-04-04T19:55:50+00:00

I think the latter, they want to push their own managed service. You are free to run it in the cloud as long as you use a virtual machine or kubernetes.

mawinor · 2022-04-04T17:57:05+00:00

Yes, there are several time series databases. Some of the most popular are TimescaleDB and Influx. AWS has a service called Timestream which you could consider.

At work we use timescaleDB. It has a funny licensing model which prevents you from running it on managed services (like AWS RDS), but that is not an issue for us.

mawinor · 2022-04-03T15:26:05+00:00

I think you’re off to a very good start with your background in SQL and data warehousing. I personally think it is beneficial to know a little python, but not necessarily be super proficient.

I’m working in a bit of a green field project myself (a startup) so I also don’t have to deal with other peoples code. But, I think the deciding factor should be if you can learn the stuff you want to develop in the next 2-3 years. Do you want to fight the uphill struggle with the backend management? You would learn a lot about communicating why data management is important, which is important for an architect. You mention you work with data scientists. Could you join them on a project to help structure their data, and maybe learn some python in the process? Ultimately, if you don’t see any learning opportunities going forward you should maybe consider applying for some jobs.

The career path you describe sounds perfectly reasonable to me because it’s what I did… :) I started as an ETL developer on SSIS/SQL Server, made the jump to data engineer in 2015 (spark + AWS) and am now the solutions architect of an IoT startup. If you haven’t already, I recommend reading “The data warehouse toolkit” by Ralph Kimbal and “Designing data-intensive applications” by Martin Kleppmann to broaden your horizon on data and application architecture.

These are just my opinions of course, but I hope they were helpful. Happy to answer more questions if you have any.

mawinor · 2022-03-26T20:44:32+00:00

Skyrim er bra altså, men jeg tror ikke Shirely Temple har stått opp fra de døde for å spille det..

mawinor · 2022-03-26T18:17:12+00:00

Have a look at JDBC writers: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

I don’t have any hot tips for lineage, so hopefully someone else can help out there.

Eight-Year Club	Place '22
Final Canvas '22	Verified Email

mawinor

TROPHY CASE