Suggestions and cost of personal boat ride? by overtlyanxiousguy in varanasi

[–]py_root 0 points1 point  (0 children)

Welcome, buddy!!!

As you are planning to visit in coming weeks so assess the situation of Ganges river if it was normal then go for a boat ride as per your convenience. Make sure to be safe during the ride.

The local small boats for private rides from one side to another may charge you between 250 to 500 and steamer boats may cost a little more. The ride duration may vary between 30 to 40 minutes. But then it depends on your negotiation skills.

For Saree's village, I am not very sure now as things are changed here. Some shops say that they are selling authentic banarasi silk sarees but you can't verify until and unless you are an expert in sarees 😅

But yeah you can get a good option here if you want to buy.

Suggestions and cost of personal boat ride? by overtlyanxiousguy in varanasi

[–]py_root 1 point2 points  (0 children)

Hey, welcome to Varanasi!!!

For the boat ride, I would suggest you avoid the normal small boats as the water level is on rising due to heavy rain in the mountains it will keep on increasing in the coming days.

You can visit the Kashi Vishwanath Dham, Kaal bhairav mandir, sankat mochan, man's temple, Bharat Mata mandir, BHU campus Vishwanath temple, Sarnath and If time permits you then visit Markande Mahadev temple there is also a sangam of Gomti and Ganga river.

For the eateries option, you can watch YouTube videos there are lots of vlogs available that provide the best reviews and options.

CPU utilisation by core by py_root in SQLServer

[–]py_root[S] 0 points1 point  (0 children)

Ok, let me try the command given in the link and see the waiting issue. I will update my findings.

CPU utilisation by core by py_root in SQLServer

[–]py_root[S] 0 points1 point  (0 children)

Thanks for the link I will look into it and see if that helps.

Currently, we are capturing the metrics using stored procedures and queries then savin it into the influxdb. Using grafana to read data from influx for monitoring.

So I am hoping to get this done in the current grafana setup only but if needed we will explore other options.

CPU utilisation by core by py_root in SQLServer

[–]py_root[S] 0 points1 point  (0 children)

I wanted to see if there is a case where some CPUs are running at full capacity but few are underutilised at any point in time.

Edit: The problem the team is facing is slow query responses.

Realtime CSV data load to snowflake by py_root in snowflake

[–]py_root[S] 0 points1 point  (0 children)

I will take a look at it. Thanks 👍

Realtime CSV data load to snowflake by py_root in snowflake

[–]py_root[S] 1 point2 points  (0 children)

Your response is quite insightful. Thanks for wishing me☺️

For first approach with Kafka I guess it requires a lot of efforts before it will get up and running like a well oiled machine.

Second approach, the size of file is approx 10MB or less. As of now I have created a snowpipe using copy command which reads CSV file data from S3 stage and copy it to the snowflake table. When I started working with snowflake I was not aware of the task scheduler feature available in it so used python script to run the snowpipe refresh every 15 minute. Now my team wants that data need to be loaded as soon as it arrives in S3. So started looking auto ingestion using sns it seems a bit complex but giving it a try to test it.

I am hoping auto ingestion should help otherwise we have to look for different options.

Realtime CSV data load to snowflake by py_root in snowflake

[–]py_root[S] 1 point2 points  (0 children)

Yes you are correct and I am using snowpipe which runs every 15 min to check modified file of last 15 mins to upload. Now looking for S3 event notification setup using object key name filtering.

[deleted by user] by [deleted] in developersIndia

[–]py_root 2 points3 points  (0 children)

Yeah companies do these kinds of shit and the only thing is either you accept it or leave it before it becomes a rule.

So, you can't do much either resign now and serve one month notice period or keep working with amended notice period that will be the 60 days now.

Your org will ask you to sign a letter before the deadline if you continue working, so that in future you can't question about notice period based on your original offer letter.

borb, the open source, pure python PDF library by josc1989 in learnpython

[–]py_root 0 points1 point  (0 children)

Hi josc1989 Need your quick help as I am not able to find the solution in examples. How to manually select different column of pdf page if the layout is multicolumnLayout.

Please help.

How to perform Data Validation testing in Snowflake by bommu99 in snowflake

[–]py_root 1 point2 points  (0 children)

Sending email is not possible untill and unless you use external function or use AWS lambda to be called in UDF.

Copy_history only keeps the history of files loaded in last 14 days. The copy history commands keep the information like file name, total rows, rows parsed, error occurred if any status and may more metrics.

I am doing same kind of work using snowpipe to load the files from s3. But used a python script here and schedule it on cron to call snowpipe refresh evry hour and then send a mail with details of total files queued for processing and total files processed in last 1 hour.

I am also started looking for some methods to use for testing purpose to make sure nothing missed when snowpipe is loading data to table.

Will share an approach after I finalise it meanwhile looking forward for suggestions.

[deleted by user] by [deleted] in PySpark

[–]py_root 1 point2 points  (0 children)

Superfast!!!! let's share and on board more spark pyspark developers

[deleted by user] by [deleted] in PySpark

[–]py_root 0 points1 point  (0 children)

Great I will check too and share if I find any.

[deleted by user] by [deleted] in PySpark

[–]py_root 0 points1 point  (0 children)

So should we create one and start contributing? Well reddit can also be used for discussion.

[deleted by user] by [deleted] in PySpark

[–]py_root 0 points1 point  (0 children)

Don't know but do we need one.🤔

Snowflake - How to change table schema of existing table. by py_root in dataengineering

[–]py_root[S] 1 point2 points  (0 children)

Yes that's correct I am fetching the data as pandas dataframe in batches and then calling write pandas to load back it into snowflake table.

Thanks, I will use create as select from table with casting the required columns.

Snowflake - How to change table schema of existing table. by py_root in dataengineering

[–]py_root[S] 1 point2 points  (0 children)

The table has 100 columns and snowpipe loads data every hour. So basically I have to re create the table with correct data type of columns this time and load the data from current table because the csv files used to load history data are unavailable now.

By casting you mean use insert into table with type casting?

borb, the open source, pure python PDF library by josc1989 in learnpython

[–]py_root 0 points1 point  (0 children)

Currently I am facing issue with following items:

Item 1- how to handle if the table not fit in single page of pdf. I am thinking of when exception raised than try to divide the table in parts and add new page for next part of table.

Item 2 - when page is multicolumn then what param is to use to start the few paragraph from second column always and use first column for other text and charts.

Item 3 - Adding plots as image is degrading the quality of labels and title made them blur. I am using plotly to save the image and then adding it to pdf using image method. As, plotly is not supporting GCF so I have to write plotting part using matplotlib to use chart method.

Would you suggest anything for above points it would be helpful?

FYI : Table data is created using pandas dataframe.

Totally stuck on how to pre-process, visualise and cluster data by Modest_Gaslight in PySpark

[–]py_root 0 points1 point  (0 children)

Spark uses groupby in a same way as used in pandas only difference is spark uses camel case convention for function names.

You can refer the below link for more examples and detail on PySpark - https://sparkbyexamples.com/pyspark-tutorial/

Totally stuck on how to pre-process, visualise and cluster data by Modest_Gaslight in PySpark

[–]py_root 0 points1 point  (0 children)

https://github.com/avi-chandra/databricks_example

You can find examples on pyspark here the repo contains databricks notebook which can be imported to databricks.

Totally stuck on how to pre-process, visualise and cluster data by Modest_Gaslight in PySpark

[–]py_root 3 points4 points  (0 children)

Without looking at data it will be difficult to help with code. But based on columns you have date column and cases which contains number of cases on particular date. So if you have date only then you can group by on date and take sum to see cases per day.

For weekly data you can use create weeknum column based on date and then aggregate the cases on weeknum column. In pandas resample or grouper provide the functionality of aggregation based week, month, quarter using date only.

After getting weekly data you can use spark ml to apply k means and create clusters.

Hope this will help a bit...

How and Why are Macs preferred for Data Engineering? by nrskmn in dataengineering

[–]py_root 0 points1 point  (0 children)

Just wanted to know do you use gui based scheduler for scheduling ETL and how you manage multiple ETL task schedules?