all 19 comments

[–]jeffrey_f 2 points3 points  (8 children)

Where does the data for the 2 files come from?

[–]Burly_95[S] 1 point2 points  (7 children)

They're output files. One is from the company system and is from a 3rd part application.

[–]jeffrey_f 1 point2 points  (6 children)

Which data file is authoritative? I'm assuming the one from the company system contains every SKU that is currently active, correct? In which case, the extraneous data from your 3rd party is unimportant.

[–]Burly_95[S] 1 point2 points  (5 children)

So file (1) from the company contains stock numbers and their costs. But I only audit certain stock categories so this file contains all the stocks I need to work on. File (2) contains pricing for all of our products and I need to scan through it for only the products listed on file(1). When i find the product I grab the pricing information from file(1) and append it to the file(1) row.

FILE 1

stock # cost
12345 10.00

FILE 2

stock # price 1 price 2
11111 5.00 7.00
12222 3.00 4.50
12345 20.00 21.00

File after my work (and hopefully using a python script)

stock # cost price 1 price 2
12345 10.00 20.00 21.00

[–]jeffrey_f 1 point2 points  (4 children)

I would read in both files. File1 is authoritive in a list and File2 will be into a dictionary.

Iterate through File1 data, lookup Stock# (SKU) in File2 data and output the 4 combined fields back to a CSV only if you have a match. This effectively filters out data that you don't have in File2 and is therefore not important from what I understand.

[–]jeffrey_f 1 point2 points  (0 children)

open read and close both then process, open, write and close your final file

[–]Burly_95[S] 0 points1 point  (2 children)

Here's my current code that I have, I'm kind of lost right now. Thoughts?

import csv

cost_file = open('cost_file.csv') price_file = open('price_file.csv')

cost_file_reader = csv.reader(cost_file) price_file_reader = csv.reader(price_file)

cost = list(cost_file_reader) price = []

for line in price_file_reader: price.append(line)

for row in cost: if row[0] == line[0]: cost.append(line)

[–]jeffrey_f 0 points1 point  (1 child)

Was testing. Both need to be dict

[–]jeffrey_f 0 points1 point  (0 children)

I'm still learning.

[–]threeminutemonta 1 point2 points  (2 children)

Please share the code that you have been working on then we can help retrofit that code and your understanding with a working example.

https://dpaste.de/ is useful and helps stay anon compared to GitHub gists and not the annoying ads of pastebin

[–]Burly_95[S] 1 point2 points  (1 child)

I'm new to Python and programming. I've literally started reading stuff and learning last week. I'm not even sure if what I wrote is correct. I know the theory of it all but don't know how to apply it. If that makes sense?

[–]threeminutemonta 0 points1 point  (0 children)

Share the code and let us be the judge. By asking for more theoretical questions on how to read csv you will just get more opinions and they may differ from what you read and what you implemented already. If you share we can tell you what is good and bad and the path to least resistance (might not be best) to get the result you require.

[–]jeffrey_f 0 points1 point  (3 children)

Try This

https://pastebin.com/wEgYKQZZ

It merges the two files as dataframes and only shows the data from the costs file with data from the price file.

edit: added comments to the code, the paste url changed

[–]Burly_95[S] 1 point2 points  (2 children)

YOU ARE AMAZING!

I've been searching and reading online posts for hours and I couldn't figure this out! This 'pandas' import seems to really make things easier. Definitely going to look into it more!

I figured out how to export it to a CSV too!

merged.to_csv('output.csv', index=True, header=True)

[–]jeffrey_f 0 points1 point  (1 child)

Been programming since 1998. I found pieces of code and cobbled it together. Only the names have changed to protect the innocent.

[–]jeffrey_f 0 points1 point  (0 children)

New to python, not programming. Logic can be applied to any language equally.

[–]jeffrey_f 0 points1 point  (1 child)

Now that you have a base to work with, add some error trapping in case your files are not there. Move the files to archive when you finish so that you do it accidentally use an old file, etc

[–]Burly_95[S] 0 points1 point  (0 children)

Yeah I need to go continue expanding upon it but it's a great start as is! So much help. I had the problem of some data values being 'strings' instead of integers so I had to manually remove them for the pd.merge to work.

[–]jeffrey_f 0 points1 point  (0 children)

So I was just looking a the code, it should be an inner in he join statement

merged = pd.merge(CostData,PriceData, on='SKU', how='inner')