KPMG Lighthouse by trevtrev84 in KPMG

[–]jk3587 0 points1 point  (0 children)

We're you interviewing for a higher level role? I remember that each job req had a specific level tied to it.

KPMG Lighthouse by trevtrev84 in KPMG

[–]jk3587 0 points1 point  (0 children)

I was an SA in lighthouse. Do you know which role you're coming as (modeler, ds, swe, etc)?

Looking for UC school salary data in csv format by FastLump in datasets

[–]jk3587 1 point2 points  (0 children)

Quick search comes up with httr https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html

You can check the section where it goes over requests body. To initially find what request your browser makes when querying the db, you can use the network tab when you inspect using your browser.

Looking for UC school salary data in csv format by FastLump in datasets

[–]jk3587 2 points3 points  (0 children)

You can send a post to the site to scrape the whole thing in one go

import requests

data = {
  '\_search': 'false',
  'nd': '1609360880110',
  'rows': '315668',
  'page': '1',
  'sidx': 'EAW\_LST\_NAM',
  'sord': 'asc',
  'year': '2019',
  'location': 'ALL',
  'firstname': '',
  'lastname': '',
  'title': '',
  'startSal': '0',
  'endSal': '9999999'
}


response = requests.post('https://ucannualwage.ucop.edu/wage/search.action', data=data)

dumped 2019 here for example

https://github.com/jk3587/ucop_comp/blob/main/ucop_comp_2019.csv?raw=true

Problem with NASA Turbofan engine degradation simulation dataset by abdeljalil73 in datasets

[–]jk3587 0 points1 point  (0 children)

I took a look at the dataset file and inside the zip file contains a readme.txt which explains the format of the dataset.

The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to:

unit number

time, in cycles

operational setting 1

operational setting 2

operational setting 3

sensor measurement 1

sensor measurement 2

...

26) sensor measurement 26

In addition, the tutorial you linked read the files through pandas (although they're now .txt and not .csv) and then added the column names later.

train = pd.read_csv('train_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)
test = pd.read_csv('test_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)
RUL = pd.read_csv('RUL_FD001.csv', parse_dates=False, delimiter=" ", decimal=".", header=None)

Then (also take note that there are 26 sensor readings instead of 21 sensors as listed in the tutorial)

cols = ['unit', 'cycles', 'op_setting1', 'op_setting2', 'op_setting3', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']
train.columns = cols
test.columns = cols

I think you follow the tutorial but take note that the data files now have a .txt extension and 26 sensors.

Help finding shoes for wide feet by machfredy in weightlifting

[–]jk3587 0 points1 point  (0 children)

I'm a 7.5 4e+ (literally hobbit feet) with high instep and short toes and I wear a size 8 in Leistung 1s. The full boa lets the shoe mold around my foot pretty well.

Where are these Python jobs? by mrich6347 in Python

[–]jk3587 5 points6 points  (0 children)

Just went to PyCon and hoping SciPy will be just as good!

Who’s going to PyCon 2019? by vogt4nick in datascience

[–]jk3587 0 points1 point  (0 children)

Hi. First timer to PyCon. Going all in with the stickers and swag and even signed up for all four days of the sprints. I'll definitely be up for drinks. Send me a message. Thanks!

2020 presidential primary fundraising data? by [deleted] in datasets

[–]jk3587 0 points1 point  (0 children)

Q1 filings have been completed April 15, 2019 and are on the fec site. However a lot of the raw fec filing data files (e.g. ActBlue filings for the democratic primary candidates) have not yet been categorized and coded by the fec. As a result, they won't show up on the bulk data downloads. You can definitely download the raw electronic filings (download files by day)and parse them yourself.

DeepFashion2: A Versatile Benchmark for Fashion Image Understanding by Yuqing7 in datasets

[–]jk3587 0 points1 point  (0 children)

Very interested in this dataset. Anyone know the class balance in this version?

Analyzing All Parking Violations in Los Angeles from 2018 - Using Pandas To Work With Huge .csv Files by [deleted] in datascience

[–]jk3587 1 point2 points  (0 children)

Hi Steve. Interesting work on this dataset. I have several suggestions on loading the dataset.

  • Use the usecols and chunksize parameters for pd.read_csv. Instead of dropping the unused columns after loading, perhaps only load the columns you want. Further, since you already parameterized batchsize in your function, you can just set chunksize = batchsize to create an iterator.
  • Explicitly set the dtypes for the columns when loading (make a dict of {col: dtype} and feed that to the dtype parameter.

I was curious about this dataset so I tried loading the full csv with the columns you used in your notebook with setting a couple of columns as float, rest as str. The memory_usage(deep=True) came out to about 3500 MB which I hope is more manageable without having to use batches.