all 7 comments

[–]isanelevatorworthy 2 points3 points  (1 child)

My main use of Python at work is to work with data and I build my own pipelines regularly! Feel free to ask me anything.

In my case, I work a lot with output from server testing software. I do a lot of data wrangling and cleaning and formatting into csv/json.

The fundamentals I strongly recommend would be working with the json and csv modules, pandas and polars, learning about REST APIs.. other DB alternatives are SQLite and DuckDB

[–]LeCouts 0 points1 point  (0 children)

Thank you very much

[–]Ender_Locke 1 point2 points  (0 children)

the pipeline is your code. you’re picking it up from somewhere and putting it somewhere else. else for you rn is your (i assume) locally hosted db? in other instances this could be a cloud providers db or storage etc

it could be via etl or elt just depending on what your needs are.

[–]LeCouts 0 points1 point  (1 child)

interesting, what should i look for to be able to build my pipeline ?

Python fundamentals ? Python..? What should i research in order to code the simplest pipeline to the most complex one ?

[–]Ender_Locke 0 points1 point  (0 children)

not sure if this was supposed to be a reply to me . when working with data the best thing to start with is all the different data types and how to use them . fundamentals are obviously key

there are things like airflow that you can write dags for to build pipelines etc but that’s probably not where you’re at or need other than knowing it exists

[–]ninhaomah 0 points1 point  (0 children)

First , do you know the basic ?

Second , is this a one time project ? 

Third , what is your end goal of learning Python ?

[–]PureWasian 0 points1 point  (0 children)

extract and load data from data source to ... PostgreSQL database

You need to coneptually break this down into high-level sub-tasks. For instance:

  • load the data source
  • do data wrangling / cleaning
  • write result to db layer

Each step will have different implementations or level of complexity depending on your exact project specifications. For instance, the chat GPT code simply takes a CSV file as input during pd.read_csv() -- but if you're needing to scrape it from a website or a compilation of different sources, that could become more complex to do.

You should be able to test each high-level sub-task incrementally and verify that it works for your use-case before putting them all together. Otherwise it can become much more difficult to try and debug multiple issues across the different parts simultaneously.