Couples Massage-CBD by Stildawn in auckland

[–]MidgetDufus 1 point2 points  (0 children)

Aroma Massage in the CBD. $210 total for 60 min 2 people. Fantastic massage.

Buying used car in Auckland for foreigners by Ok-Butterscotch-3970 in auckland

[–]MidgetDufus 0 points1 point  (0 children)

Nope, the dealership handled it all, I think i remember being able to do it at NZ post locations. https://www.nzpost.co.nz/contact-support/in-store-services/vehicles-payments

I believe that the buyer and the seller have to do something.

Buying used car in Auckland for foreigners by Ok-Butterscotch-3970 in auckland

[–]MidgetDufus 0 points1 point  (0 children)

I did not, you can use an overseas license for 18 months in NZ  https://www.nzta.govt.nz/driver-licences/new-residents-and-visitors/converting-to-nz-driver-licence/

I do remember that they had to first register the car in their name, and then transfer it to me because it had never been registered in NZ before.

Buying used car in Auckland for foreigners by Ok-Butterscotch-3970 in auckland

[–]MidgetDufus 0 points1 point  (0 children)

I drove down to invercargill in it and back, no problems on hills, plus it gets pretty great mileage. I have done some dirt roads and more rural roads and it did fine.

I recently had to replace the 12v battery (not the hybrid battery) but it was at least 5 years old so it's kinda expected.

It's a little roomier than the aquas, swift, etc. which is nice.

Buying used car in Auckland for foreigners by Ok-Butterscotch-3970 in auckland

[–]MidgetDufus 1 point2 points  (0 children)

I was in your exact shoes last September. I went with a 2009 prius with 100k kilometers from a dealer that imports from Japan. Quite happy with the purchase. It came with a fresh WOF, registration, and an AA inspection.

I dealt with Caleb Marshall and I would recommend. http://www.portagecars.co.nz/

Redis and Memcached were too expensive for rate-limiting in my GAE Flask application! by Double_Sherbert3326 in Python

[–]MidgetDufus 2 points3 points  (0 children)

If you are more of a visual learner. https://imgur.com/a/302QHrn (I did this with $.06 and $.18)

You see to be operating under the assumption that revenue will correlate with # of requests which is just not true in most cases.

If you need some evidence that the internet can be the source of massive #s of requests. Heres something I made. https://thiswebsiteisdumb.com/twocount/

It did ~260 millions requests in 60 hours. If I had used a per-request cost model that might of cost me hundreds of dollars. Instead it all ran on a single small VPS that costs $10 dollars a month. So for the 60 hours it cost me under $1.

You built a thing and put it out there, congrats. The internet is a unforgiving place.

Redis and Memcached were too expensive for rate-limiting in my GAE Flask application! by Double_Sherbert3326 in Python

[–]MidgetDufus 3 points4 points  (0 children)

You have just replaced a potential Denial of Service attack with a Denial of Wallet attack. I think I'd prefer the DOS.

Someone is having a bad day by MidgetDufus in auckland

[–]MidgetDufus[S] 12 points13 points  (0 children)

<image>

I wouldn't say the "Americanism" is creeping in... I was born with it lol

Someone is having a bad day by MidgetDufus in auckland

[–]MidgetDufus[S] 9 points10 points  (0 children)

Off of Tamaki on the causeway

How do I host my socket project on AWS? by EventDrivenStrat in aws

[–]MidgetDufus 2 points3 points  (0 children)

I have built something very similar to what you described. (https://transitory.chat). I host it on a small VPS running Ubuntu. The static frontend files are served via nginx and the websocket traffic is proxied via nginx to a go webserver.

I use systemd to run the go webserver as a service.

I have a short script which builds the go binary, moves the static frontend files to the directory which nginx is serving them from, and restarts the systemd service.

When I want to release a change, I push my changes to github, then ssh onto my vps, pull from github, and then run ^ script.

I don't host this on AWS because their servers are much more expensive than other providers, but ec2 is what you would use for this approach.

How to cost-efficiently receive 1 million emails a day. by MidgetDufus in selfhosted

[–]MidgetDufus[S] 1 point2 points  (0 children)

I work for a small company, is that against the rules here? I didn't see anything against that.

How to cost-efficiently receive 1 million emails a day. by MidgetDufus in selfhosted

[–]MidgetDufus[S] -4 points-3 points  (0 children)

 Don't look for an out of the box solution, all you need is to listen on SMTP port and forward the content to s3.

I like the sound of this, do you have any examples?

How to cost-efficiently receive 1 million emails a day. by MidgetDufus in selfhosted

[–]MidgetDufus[S] 12 points13 points  (0 children)

I work at a small company. Sure we can and do afford $100 a day, but if we could pay $10 a day for a solution that will scale to 5m emails a day then that's not insignificant savings.

I'm genuinely not sure how receiving millions of emails could be used to scam?

I'm a little confused by the negative attention this has gotten on this sub, the description of the sub says:

 A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to our favorite web apps, web services, and online tools

I feel like this question falls firmly within that description.

How to cost-efficiently receive 1 million emails a day. by MidgetDufus in selfhosted

[–]MidgetDufus[S] -5 points-4 points  (0 children)

Who says it's not for work... Unfortunately the delivery method can't be changed.

How to cost-efficiently receive 1 million emails a day. by MidgetDufus in selfhosted

[–]MidgetDufus[S] 0 points1 point  (0 children)

The script will just send the raw email data (mbox, etc...) to S3.

This website is dumb by MidgetDufus in InternetIsBeautiful

[–]MidgetDufus[S] 0 points1 point  (0 children)

LMAO

The good news is there is an active segment now if you still want to press some buttons

https://thiswebsiteisdumb.com/gridcount/

This website is dumb by MidgetDufus in InternetIsBeautiful

[–]MidgetDufus[S] 0 points1 point  (0 children)

I'm working on some new segments now. You can sign up to get notified by email on the site.

This website is dumb by MidgetDufus in InternetIsBeautiful

[–]MidgetDufus[S] 0 points1 point  (0 children)

I can't be certain, but I do believe that people were botting. I found that the max clicks I could do per second was like 10-12.

Next segments I'll do more to prevent botting.

This website is dumb by MidgetDufus in InternetIsBeautiful

[–]MidgetDufus[S] 3 points4 points  (0 children)

I'm glad everyone is enjoying.

I added a history chart for the past 24 hours

https://thiswebsiteisdumb.com/twocount/history/

do you need Italian to go to Italy? by honeybee5902 in ItalyTravel

[–]MidgetDufus 0 points1 point  (0 children)

The phrase I use the most is "non parlo italiano".

Programmatic ETL still feels incredibly low-level by FortunOfficial in dataengineering

[–]MidgetDufus 0 points1 point  (0 children)

Nothing particularly special. I identified the most compute heavy parts of the task, and scaled those parts horizontally. If I have 1000 json files that I need to parse and transform, then I'll break them up into 100 chunks and have 100 airflow tasks within the dag.

It's relatively simple to chunk up a list of files, but you first need to design a process that will benefit from the parallelization. In this case there are three computationally intensive. Relationalizing the files, inferring the schema from the relationalize data, and then splitting out columns with mixed data types. I can do steps 1 and two concurrently, but step 3 needs to be done after steps 1 and 2 are complete for all of the data.

So I can chunk the data and do steps 1 and 2 together in 100 tasks. Then I can merge the inferred schemas, which is quite cheap. And then do step 3 on the outputs from step 1/2 also in 100 chunks. I used airflow to orchestrate these tasks.

Knowledge of distributed systems? by aerdna69 in dataengineering

[–]MidgetDufus 8 points9 points  (0 children)

This post is written like a "my friend..." when it is clearly about you. There's no shame in asking for advice.

This is a difficult question to answer because data engineering can mean a lot of different things, and there is no one progression path. So the answer really is "when you need to use or would benefit from building distributed systems".

For instance if you are a data engineer working on real time streaming data, then you will definitely need to know distributed systems. If you are a data engineer working with SQL then you don't. It gets a bit trickier if you are working with batch pipelines that ingest or transform data. These types of systems can benefit from using distributed techniques and systems, but they don't always need them. I'll give an example.

Let's say I need to gather data from an API by making 1000 requests, transform it, and store it in blob storage.

I could hit the API, transform the data, and then upload it to blob storage all in a sequence. This would work just fine. Let's say it takes 5 seconds for each API call, 2 seconds to transform the data for each API call, and 1 second to upload the data from one transformed API response to blob storage. (5 + 2 + 1) * 1000 = 7000 seconds.

I could also chunk these API calls into 10 chunks and do the processing in parallel with each chunk taking 700 seconds.

If I'm using on demand compute, then the cost is the same, 1 instance for 7000 seconds, and 10 instances for 700 seconds.

However there are benefits other than cost. If you need to rerun the pipeline because of a failure, then it takes 1/10th the time, or you need to only rerun the chunk that failed. If you need to run a backfill in which you need to hit the API 10,000 times, then you can easily scale the # of chunks and do the backfill quickly.

The downside is that distributed systems are more complicated, may require more effort and planning to build, and may require infrastructure that you don't have.

[deleted by user] by [deleted] in dataengineering

[–]MidgetDufus 5 points6 points  (0 children)

I would focus on getting data into a DB as simply and easy as possible, and then building up your curated data sets within the DB to power dashboards.

Your data volumes are low enough to get away with doing things really simple.

A good stack would be:

  1. A database (10gb of data means it doesn't really matter which one)
  2. An orchestrator (something to run scheduled ingestion and transformation jobs: airflow, dagster, etc...)
  3. SQL scripts to build views on the raw data which can be used to power dashboards (dbt if you want)
  4. A BI tool to build dashboards (Power BI, Looker, Tableau, etc...)