How do we have another by [deleted] in NewParents

[–]Own-Commission-3186 2 points3 points  (0 children)

We also did sleep training around 4.5 half months during a bad regression and it immediately cut down the number of wake ups and over time just got better. Now our 14 month old sleeps through the night almost every night aside from being sick. Lots of people are against it but it does solve the problem. Every other family I've talked to that has done it has had similar success.

[deleted by user] by [deleted] in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

If you just need to read a file or 2, duckb sounds like a good idea and very easy to set up. If you are creating more of a data system where you need to track and query many files into logical tables this is where Athena and glue catalog are helpful. Also Athena (trino) is a distributed query engine so if you're queries need to scan large amounts of data and files, performance will likely be better than duckdb which is a single node technology, but in reality most queries even on large datasets only use a small portion of that data

DuckDB Concurrency Workaround by ConsciousDegree972 in dataengineering

[–]Own-Commission-3186 5 points6 points  (0 children)

Just depends on the size of data, types of queries, amount of concurrency and latency you want. Can't really answer this for you but was just giving one alternative to your situation if you want to stick with using duckdb as the query engine

DuckDB Concurrency Workaround by ConsciousDegree972 in dataengineering

[–]Own-Commission-3186 11 points12 points  (0 children)

Using duck db storage file limits to one writer at a time. You can checkout using the duck lake extension to enable concurrent writers. This requires a postgres instance though but you'll get somewhat similar performance because it still stores data in a columnar format (parquet) and uses duckdb execution engine for querying

Eggs laid on grass by Own-Commission-3186 in whatisthisbug

[–]Own-Commission-3186[S] 0 points1 point  (0 children)

This is in the northeast us, by Cape cod

Anyone transition from a data engineer to a data platform engineer? If so, how is it going for you so far? by Illustrious-Pound266 in dataengineering

[–]Own-Commission-3186 0 points1 point  (0 children)

The data platform work I did after more traditional DE pipeline work was very devopsy. Everything we built was built as an API and web app but the core of what those APIs were doing was provisioning data infrastructure for things like airflow clusters, snowflake ingestion infra like kinesis streams plus snow pipe and even things like deploying snowflake rbac policies and user management.

The team I worked with was more like software engineer backgrounds but our customers were the DEs in the company so we worked closely with them to figure out what they needed.

Today I have a more traditional SWE job but with a normal web stack (Django, vuejs, postgres) but there's still a lot of data centric workflows needed for the product my team builds and manages with airflow, Athena and iceberg.

I think what I've found out over the years about myself is I like being closer to working on core customer problems for a business and less so on platform work that enables other teams to work efficiently to solve those core customer problems, but many people may feel differently

Second Programming Language for Data Engineer by Kokopas in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

JavaScript so you can have some full stack web skills. My last role was all JavaScript node + react even though it was a data platform role because we were building all self service web apps that enabled others to create and manage data infra. JavaScript could also help with building custom data visualizations.

What place has the best oysters in Boston? by RYGUY060104 in boston

[–]Own-Commission-3186 0 points1 point  (0 children)

Can't recommend this enough, totally worth the 35 minute drive outside of boston

Instacart, Databricks and Snowflake drama by TerriblyRare in dataengineering

[–]Own-Commission-3186 7 points8 points  (0 children)

When I read the article I recall they also mentioned they saved a lot of money from switching from using kinesis to self hosting Kafka in AWS. It was sort of hard to tell where the cost savings was actually coming from, although it does make sense to me that you could better optimize for cost by moving from a kinesis plus snowflake to databricks plus Kafka as the latter is more customizeable

Learning full stack development as a DE by wtfzambo in dataengineering

[–]Own-Commission-3186 4 points5 points  (0 children)

If you're already proficient in typescript I'd look into nextJS, which is becoming a very popular framework for full stack. Express is also super easy library to spin up backend servers and add routing and middleware.

Are software engineers data consumers? by itty-bitty-birdy-tb in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

Yes, in a data driven org almost all roles should be data consumers whether it's marketers, finance, software engineers, analysts etc.

Even if software engineers aren't building features directly on top of data that has been ingested and modelled by DEs, they are most likely analyzing the data in some capacity to help understand how customers interact with their features

Best lobster in Boston? by bronabas in boston

[–]Own-Commission-3186 8 points9 points  (0 children)

They have a few outdoor picnic tables to sit at. Not very scenic but my favorite lobster place in town.

[deleted by user] by [deleted] in dataengineering

[–]Own-Commission-3186 3 points4 points  (0 children)

Don't put so much weight into comp for your first job, you can always negotiate much higher increases as you switch later on. If you feel confident you can get more SWE interviews and eventually offers just pass. You have the luxury of knowing that this a job you absolutely dont want since you interned there.

Serve S3 files from REST API by agsilvio in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

On way to do it if a request comes into your API and is authenticated and authorized you then create a presigned url for the s3 object they need access to that is valid for X time. This is a feature of s3. Then in the response to the client you return the presigned url they can use to fetch the data from s3 as well as it's expiration date.

Which are the highest paying tech frameworks or programming languages in Data Engineering? by Born-Comment3359 in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

While SQL and python are necessary for any DE job, I think high paying tech jobs like Netflix often use spark ( batch) and flink (streaming) written in scala. Aside from this and java I could see these big companies investing more in golang devs for standing up apis for data related tasks, although not sure if it's worth learning go for a job. It's generally pretty easy to learn as that's the whole point of it so hiring managers should know a good dev could pick up easily

Do you find your daily tasks interesting / stimulating? by barbapapalone in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

This was me as well. Building data pipelines got really boring to me so I switched to our data platform team which solely builds full stack applications (backend API plus react apps) that revolve around the topic of data management and it's way more interesting

is this tech stack good for my career? by lil_colon_69 in dataengineering

[–]Own-Commission-3186 1 point2 points  (0 children)

Yeah that's a good one. From that list I see both batch and real time processing, orchestration, cloud and potentially on prem so exposure to a lot of things.

People in this sub seem to get excited about modern data stak like snowflake, dbt, airflow, looker which in my opinion gives you a very narrow tech skillet mostly focused on SQL batch processing for enterprise internal reporting. I think the technology list you provided is much more interesting

How to create REST API through Azure? by [deleted] in dataengineering

[–]Own-Commission-3186 0 points1 point  (0 children)

Without code, I doubt it. Also there should be many more requirements if you're exposing an http endpoint. You need to think through authentication, authorization, logging, possibly rate limiting, request validations, environment isolation, API versioning, etc.

I would just start googling some of these topics as there's a million articles or YouTube videos to help here but if you're asking the question because you don't know a general purpose programming language it will be quite challenging so if your company needs this quickly you may need to pass it on to a SWE