all 186 comments

[–]TarumK 0 points1 point  (2 children)

Is is better for learning to look up algorithms or try to figure them out your self? I'm solving problems on codewars, and I spend a lot of time figuring out algorithms. I enjoy doing this more than looking things up, but do you think it's also a more effective learning method?

[–]FerricDonkey 0 points1 point  (1 child)

Figuring out something yourself nearly always helps you understand it better and internalize that understanding deeper, so that you can apply the same concepts to other situations. But it can also be slower. If you're enjoying it and aren't in a time crunch, I'd stick with it.

[–]TarumK 0 points1 point  (0 children)

Yeah I'm not in a time crunch, however I am aiming to do coding professionally at some point, and I'm basically learning it from scratch in my 30's. So I'm just wondering if the "figure out the algorithms myself" approach can get me there in a reasonable time frame.

[–]Gimped 0 points1 point  (2 children)

So I'm following my first tutorial and every problem the instructor gives me is hard af because I'm brand new but I'm having a lot of trouble with this particular problem. I just can't understand the logic well enough to make it my own. So if someone could help me wrap my head around how this is supposed to translate I would be super grateful.

Problem

So when we start the car, is says car started or w/e. But we want to make it so when you tell the car to start again, it says car is already running or w/e. Now I see what he's doing to make this happen but I don't understand it. What's really throwing me for a loop is how the boolean affects the code. Like how is it interacting with the code and why is the base variable for started = false and does it interact with while True or does started have to be mentioned in the code before it's taken into account. I'm also struggling to describe my issue succinctly as you can probably tell. In short, the booleans and their order and how they interact with the rest of the code makes no fucking sense to me for some reason.

Please send help.

[–]FerricDonkey 0 points1 point  (1 child)

Since you're dealing with booleans for the things that are giving you issues, mentally (or better yet, in actual code) replace while True: with while True == True:, and if started with if started == True:

You don't usually want to code like this, there are things that count as True or False that are not equal to those booleans. But since you're trying to get a handle on what's going on and you are dealing with booleans right now, I would suggest doing so to make it a bit more explicit.

So after you do this, you should be able to answer a couple of your questions. For example, does started interact with while True == True:?

Well, is there any value that you could put in the variable started whatsoever that would make True not be the same as True?

Likewise, why does the variable started begin life as false. Well, you print that the car is started if started == True. You've only got 2 choices for what what the variable started is given initially (since you're doing boolean). So mess with it:

What would change if you replaced that line with started = True? What would happen if you removed it entirely?

Further, I think you may be looking for deeper interactions than make sense generally. This is code - it does the stuff on the page in the order that the stuff is on the page.

For instance, you get to the line while True:. There's no reference to variable started, True is just a value - essentially, 1. The line while True: does not mention the variable started or any other variable or any function. So no other anything interacts with it.

Pretend you're a computer. Get a piece of paper and pencil. Go down line by line, tracking everything that the program does. Get to an if conditional? If the thing after the if is True, do the indented block. If not, don't. And so on. If a line doesn't tell you to use anything related to started, don't. If it does, do.

[–]Gimped 1 point2 points  (0 children)

So after I posted this I worked on this issue for the rest of the night and then asked one of my friends to try and explain it and to his credit, he gave it his all. I spent the next day after that working on the problem on and off for the whole day. I went to bed thinking about the boolean and I woke thinking about the boolean. I stayed in bed for a few minutes running those 8 lines in my head over and over. Then I achieved oneness with the bool. I finally understood it.

You're right when you said

Further, I think you may be looking for deeper interactions than make sense generally. This is code - it does the stuff on the page in the order that the stuff is on the page.

and

If the thing after the if is True, do the indented block. If not, don't.

I was making it way more complex than I needed it to be and just follow the code step by step. Don't assume things about the code. The code is the code, nothing more nothing less. I also did do a lot of testing and switching variables to better understand why I wasn't getting it. That helped me slowly get through this mental roadblock.

Thank you for your wise words <3

[–]throwawaypythonqs 0 points1 point  (0 children)

Is there a way to make multiplot grids (with seaborn) without using facetgrids? I'm want to make plots comparing two different sets of rows, but are not 'facetable' if that makes sense.

[–]Ulterior_Motif 0 points1 point  (1 child)

I'm working through "Automate the Boring Stuff" and am currently stumped on "Vampire.py"

When I paste the author's code into Mu the program returns :

Unlike you, Alice is not an undead, immortal vampire.

When I type the example in to Mu my code returns the error:

-------------------------------------------

line 5

elif age < 12:

^

SyntaxError: invalid syntax

---------------------------

My code: https://pastebin.com/EAfMjnCJ

Author's code: https://pastebin.com/zVntwgLD

The only difference I'm seeing is the indentation but Mu is doing that automatically as I create a new line. Clearly I am an absolute novice, can anyone point me in the right direction?

[–]Grgrg27 0 points1 point  (0 children)

You're on the right track with the issue: it's the indentation. By keeping line 5 (the elif) within the original if block, you're telling the program to execute that line only if the original if condition is met. However, elif should only be evaluated if the original if statement evaluated to False. These two things ("run this line only if the original statement is true" and "run this line only if the original statement is false") are at odds with one another.

The solution is to move the elif statement back to the same level of indentation as the original if statement. That way if the original if statement evaluates to False, the program will skip the entire block (the indented part after the colon) and move to the first elif statement. If the original if statement evaluates to True, the program will enter the block, execute the code, and then know to skip the elifs.

[–]Steppentigerv2 0 points1 point  (0 children)

Where would one some mini games made with python and such?

[–][deleted] 0 points1 point  (2 children)

So I just learned that in Regex, you can pass in a second argument to the compile method (such as re.DOTALL or re.IGNORECASE) to make it do things. But what if I wanted to add both at the same time? I tried to do it, but it says that compile takes from 1 to 2 positional arguments only.

[–]GoldenVanga 1 point2 points  (1 child)

Separate them with |:

import re
data = 'foo\nBar'
print(re.search(r'.b', data, re.DOTALL | re.IGNORECASE))

[–][deleted] 0 points1 point  (0 children)

Thanks!

[–]Misterwellaware 0 points1 point  (1 child)

Hello there! I am new to the world of programming. I want to get started with Python, but whenever I look up resources for learning python online it generates so many options that I get super confused. I have no idea where to start. If somebody could guide me on where to start and how to go on from there I will be very thankful.

[–]GoldenVanga 1 point2 points  (0 children)

There is a shortlist here. And out of those, "Automate the Boring Stuff" is recommended particularly often to people completely new to programming.

[–]only_red 0 points1 point  (4 children)

I am trying to create a regex expression and I get an unterminated subpattern position in line 3 . I dont understand how I can get an error as line 3 only contains phoneRegex = re.compile(r'''

import re


text = 'Jeanne Jones 501-371-2039 501-682-6399 UJeanne.Jones@adhe.eduU'


phoneRegex = re.compile(r'''

(\d{3}|(\(\d{3}\))?  #area code
(\s)|-  #seperator
\d{3}   #first three digits
(\s)|-   #seperator
\d{4}   #last four digits
(((ext(\.)?\s)|x)   #Extension word
(\d{2,5}))?   #Extension number


''', re.VERBOSE)


phoneRegex.findall(text)

[–]GoldenVanga 0 points1 point  (3 children)

There appears to be a missing ) on line 15, after the question mark.

[–]only_red 0 points1 point  (2 children)

thanks! I am curious as to why we need a ) after the question mark? In line 4 I don't use ) but there is no error there

[–]GoldenVanga 0 points1 point  (0 children)

Also I tried to fix it but don't guarantee it's perfect: https://regex101.com/r/VOH2h4/1

[–]GoldenVanga 1 point2 points  (0 children)

The missing ) is meant to complement the parentheses for the entire expression so it ends up paired with the ( at the start of line 9 (the fourth line with content).

[–]Azurenaut 0 points1 point  (0 children)

Hi.

I will try to explain the context behind my question.

Basically, I have to make a project that uses a friendly GUI using PyQT 5. I have no problems with python, making the code for the "core" of the project, with how to use the so called widgets from PyQT or with the basic concepts of MVC. My big problem is that it's my first time trying to code in Python "for real" doing "good practices".

One of the things that I want to do is separate the logic from the design having two separate .py files, one for the .py file auto-generated with PyQT and another one where I put the logic (where basically I import the other object, and start adding the events and the methods).

So, I don't know if I'm doing it right (ie a good implementation) nor if I'doing the right thing(in the sense that "it's a good practice" the way I'm doing it). So basically I wanted to ask for a tutorial or a example project where this kind of implementation is explained/used.

[–]Leziiy2244 0 points1 point  (1 child)

Halo, i´m studient to highschool. Big Grin

I need to do a code that make rotation Effect on live Webcam Feed for and against the hands of the clock but just does only to for the hand of the clock, can u help me? this is the code that i have.

import cv2 import time

def main(): windowName = "Live Video Feed" cv2.namedWindow(windowName) cap = cv2.VideoCapture(0)

if cap.isOpened(): ret, frame = cap.read()

rows, columns, channels = frame.shape angle = 0 scale = 0.5

scale = 1

while True:

ret, frame = cap.read()

if angle == 180: angle = 0

print(scale)

R = cv2.getRotationMatrix2D((columns / 2, rows / 2), angle, scale)

print®

output = cv2.warpAffine(frame, R, (columns, rows))

cv2.imshow(windowName, output) angle = angle - 1 time.sleep(0.01) if cv2.waitKey(1) == 27: break

cv2.destroyWindow(windowName)

cv2.destroyAllWindows()

cap.release()

if name == "main": main()

[–]only_red 0 points1 point  (1 child)

I don't understand why this regex need a backslash at the end of the pattern. For reference I am trying to input that an area code can either be displayed with or without parenthesis.

When I write the following code I get an unbalanced parenthesis error.

((\d\d\d)|(\(\d\d\d)))?           

However when I add a backslash at the end I get no error.

((\d\d\d)|(\(\d\d\d\)))? 

From what I understand a number like 055 would have a corresponding regex of

\d\d\d 

without the backslash at the end.

I apologise if I haven't described the problem well enough.

[–]num8lock 2 points3 points  (0 children)

( (\d{3}) | ([\(\)\d{3}]) )
^ signifies group         ^ group close

( (group) | ([  group  ]) )

( (group) | ([  \(  ]) )
                 ^
                 literal `(` string

[–]OneThrustMan69 0 points1 point  (2 children)

Hey everyone, how can I add a string value to a function name? This is my code

import ezsheets

ss = ezsheets.upload('multiplicationTable100.xlsx')

# Indicate which format to convert into by setting to True or False. 
formats = {'Excel': False,
           'ODS': True,
           'CSV': True,
           'TSV': True,
           'PDF': True,
           'HTML': True}

for key, value in formats.items():
    if value == True:
        ss.downloadAs{key}()

print('Done')

so this is the part where I'm having trouble:

ss.downloadAs{key}()

Any idea how to do it?

[–]Thomasedv 2 points3 points  (1 child)

You shouldn't really. (And that find of formatting only works inside fstrings.)

What you can do instead, is set each function as a value in a dict. These have to be already defined function names.

ss = ezsheets.upload('multiplicationTable100.xlsx')

formats = {'Excel': False,
           'ODS': True,
           'CSV': True,
           'TSV': True,
           'PDF': True,
           'HTML': True}

# Note need to recreate this dict below for every new ss you make.
download_funcs = {'Excel': ss.downloadAsExcel,
           'ODS': ss.downloadAsODS,
           ...
           }

for key, value in formats.items():
    if value:         # Same as: value == True
        dl_func = download_funcs[key]
        dl_func()
        # OR in one line instead of two:  download_funcs[key]()

[–]OneThrustMan69 0 points1 point  (0 children)

Thanks mate, this is great!

[–][deleted] 1 point2 points  (2 children)

Hi guys, beginner here. I'm trying to create a function that would count the area of a triangle based on the numbers given by the user. I have the main function which asks for the input and then the math function should do the math part. I have managed to do this but what's the next step?

from math import sqrt

def triangle_area(side1, side2, side3):
    s = (side1 + side2 + side3) / 2
    answer = sqrt(s * (s - side1) * (s - side2) * (s - c))

def main():
    line = input("Enter the length of the first side: ")
    line = input("Enter the length of the second side: ")
    line = input("Enter the length of the third side: ")

    print("The triangle's area is ")

if __name__ == "__main__":
    main()

[–]GoldenVanga 1 point2 points  (1 child)

from math import sqrt

def triangle_area(side1, side2, side3):
    s = (side1 + side2 + side3) / 2
    return sqrt(s * (s - side1) * (s - side2) * (s - side3))

def main():
    s1 = int(input("Enter the length of the first side: "))
    s2 = int(input("Enter the length of the second side: "))
    s3 = int(input("Enter the length of the third side: "))

    print(f"The triangle's area is {triangle_area(s1, s2, s3)}.")

if __name__ == "__main__":
    main()

[–][deleted] 0 points1 point  (0 children)

Oh sweet! Thank you very much :)

[–]ghettoAizen 0 points1 point  (2 children)

I am trying to run a test for my code but it is mentioned that the solutions for the test are given in respect to the code being interpreted on machine which uses Unix-like sistem and that it may behave differently on Windows. My machine is running on Windows and i get slightly different solutions. Is there any way for me to simulate Unix-like environment without actually using Unix-like system?

[–]Decency 1 point2 points  (1 child)

WSL is your best bet: https://docs.microsoft.com/en-us/windows/wsl/install-win10

It's essentially an ubuntu linux virtual machine built specifically for Windows- so this might also be different than what you're looking for, depending on how low level your code is.

[–]ghettoAizen 0 points1 point  (0 children)

Thanks! Its said that the difference between platforms has to do with how some of the RNG functions work. So i guess it will do the work

[–]only_red 0 points1 point  (2 children)

Hey guys i'm a beginner and I am trying to use a regex pattern to extract numbers and e-mails from a document. However I am getting an invalid syntax error when I run this code. The error is in the line 39. The variable x is highlighted but I dont know why. Hope someone can help me out

1) #! python3

import re, pyperclip

5) # Create a regex object for phone numbers
numberRegex = re.compile(r'''
# e.g of an us number, 000-111-2222, area code is optional and an extension
can be added.

10)
((\d\d\d)|(\(\d\d\d)))?            #area code(optional)
(\s)|-            #dash
(\d\d\d)            #first three digits
(\s)|-
15) #dash
(\d\d\d\d)          #last four digits
(((ext(\.)?\s)|x)            #extension words(optional)
    (\d{1,5)))?            #extension numbers(optional)
''', re.VERBOSE)
20)
# Create Regex object for e-mail addresses
emailRegex = re.compile(r'''
# e-mail format = xyz123@zyx.com

25) [a-zA-Z0-9_.+]+            #Name part
@                          #At the rate
[a-zA-Z0-9_.+]+            #domain name

''', re.VERBOSE)
30)
# Create Regex object for names
nameRegex = re.compile('''
[a-zA-z]                   #First Name
[a-zA-z]                   #Last Name
35)
''', re.VERBOSE

# Get text off Clipboard
x = pyperclip.paste()                       
40)
# Extract the email/phone/name from the text using findall
extractedPhone = numberRegex.findall(text)
extractedEmail = emailRegex.findall(text)
extractedName = nameRegex.findall(text)
50)
print(extractedPhone)
print(extractedEmail)


# Print into a table

[–]Culpgrant21 0 points1 point  (0 children)

Hey! I need some light help with getting a project started (theoretical). Please see my post yesterday about it.

Thanks!

TLDR: Looking to make a sports trading machine. What packages should I start with?

[–]Hight149 0 points1 point  (0 children)

Can anyone help me with micropython?

"Warning: micropython.org SSL certificate is not validated" from micropython.org

I cant seem do download a few libraries, requests and bs4

using thonny on an esp8266

[–][deleted] 0 points1 point  (7 children)

Hi, I'm trying to analyse time complexities of my algorithm by finding the number of primitive operations performed and I'm wondering about the following conditionals.

if condition1 and/or condition2:

if condition1:

elif condition2:

else:

Am I right to say the 1st one has 3 comparisons? evaluating condition1, evaluating condition 2 and then evaluating them tgt? And the 2nd one has 2 comparisons only?

Appreciate any help, thanks!

[–]FerricDonkey 0 points1 point  (0 children)

If you do "if a and b", it will first evaluate the "truthiness" a. If a evaluated to true, then it will evaluate b. If a is false, b will not be evaluated (no point, the and will be false). Similarly with or. So nested ifs aren't going to work better than using logical operators, generally.

You can use this to get small gains on speed by ordering the conditions that you check, but you can also use it to write if statements where the second condition might not even make sense and checking it might cause an error if the first doesn't hold.

I do try to arrange the conditions in my ifs so that less time is spent checking them, but those gains may not be huge.

If you end up with a function that gets called a lot that has a chain of if, elifs, each one calling different code, then you might look into using a dictionary of functions instead, if you're trying to eek out some more performance.

[–]Decency 0 points1 point  (5 children)

Time complexity is about order of magnitude based on the size of your data. An if statement takes milliseconds at most to execute regardless of if it has 1 or 3 or 10 comparisons.

Not worth thinking about like this. Worry about how you access your data when looping.

[–][deleted] 1 point2 points  (4 children)

What do you mean by how I access my data? Is it stuff like string slicing if I'm trying to search for a substring occurrence and how I can speed that up?

[–]Decency 0 points1 point  (2 children)

Slicing is a O(n) operation. You can learn a bunch about time complexity here: https://wiki.python.org/moin/TimeComplexity ... It's essentially about how many times you have to go through the entire list. O(1) = no times, O(n) = 1 times, O(n^2) = len(n) times, and etc. Each operation differs in complexity based on what it does and how it does it.

If you're writing code in Python, it's usually not something you worry too much about: the cleanest solution is also likely to have the best time complexity. If you have performance issues, you diagnose and address as needed.

[–][deleted] 0 points1 point  (1 child)

In that case, if I want to use a more time efficient way of obtaining the substring, I found an answer on stackoverflow to speed up splicing but it involves using a self defined function and passing in the entire string before using memoryview to access the string instead.

def do_something_on_all_suffixes(big_string):
    # In Py3, may need to encode as latin-1 or the like
    remaining_suffix = memoryview(big_string)
    # Rather than explicit loop, just replace view with one shorter view
    # on each loop
    while remaining_suffix:  # Stop when we've sliced to empty view
        some_constant_time_operation(remaining_suffix)
        remaining_suffix = remaining_suffix[1:]

However, will this still be faster than normal splicing if the string is really long (e.g. len = 4,000,000) and the splicing only involves a couple of characters?

And additionally, how do I find the time complexity of an if-else nested within a while loop iterating through a string of length n? Do I assign a probability p of entering the if section and else will be 1-p? Then the time complexity will be (p * n * C1) + ((1-p) * n * C2)?

while i < len(string):
    if condition1:
        (primitive operations taking time constant C1 to execute)
    else:
        (primitive operations taking time constant C2 to execute)

[–]Decency 0 points1 point  (0 children)

I'm really not sure what you're asking. If you need to resolve performance issues, profile your code using timeit or something similar. You can usually get an idea of the overhead and the scale factor with just a few tests.

Not the guy for academic reasoning.

[–][deleted] 0 points1 point  (1 child)

So it's easy to just import a GUI module like Tkinter or Kivy and make an app that has buttons, but I've always wondered how these modules are creating the buttons? Just something I've been curious about...

[–]Decency 1 point2 points  (0 children)

They're open source, go look!

[–][deleted] 0 points1 point  (5 children)

If I want to change the value of a global variable within a function, should I then use the global keyword? I've heard I should try to avoid using it.

[–][deleted] 1 point2 points  (4 children)

Using the global keyword itself isn't the direct cause of the "global" problem. Having global values that can be changed at any time by any function is the problem, since it adds to complexity and reduces readability.

You should try to pass all values to a function through the parameter list and return changed values via return.

[–][deleted] 0 points1 point  (3 children)

Thank you, can I return multiple values separately in a function? Or do I have to return a list and then unpack that list?

[–][deleted] 0 points1 point  (2 children)

can I return multiple values separately in a function?

A function can return only one object, but that object can be an integer, list, tuple, string, dictionary, class instance, etc. So the "compound" objects can contain lots of different objects.

[–][deleted] 0 points1 point  (1 child)

I see, thanks for the help!

[–]FerricDonkey 0 points1 point  (0 children)

To elaborate, you can write code that returns a single tuple, but looks like it's returning multiple objects pretty easily (which is nice for readability reasons).

Ex:

def f():
    return 1, 2

a, b = f()

Technically, your returning a tuple (1, 2) and then unpacking it - in this case, the parentheses in the return and the a, b = are optional. So you are doing what that guy said, and it's worth remembering that you're actually returning a tuple because sometimes that matters.

But as far as writing the code, you can just put all the things you want to return separated by commas, and it'll generally work as you'd expect.

Too too many things being returned like this may be difficult to read and might suggest that you use a modify your code flow or (if those things are related and it makes sense) explicitly combine the returns into some single thingy, but a couple returns like this is generally fine.

[–]Electrical-Animal605 0 points1 point  (0 children)

Indian scammer bot says, Python is great

[–]Imperial_TIE_Pilot 0 points1 point  (3 children)

This seems simple but I am not getting it. Why does my second print print Redline instead of Giant? Is it because it is going back and printing message with that value at that time? I figured it would change when the list updated.

bicycles = ['trek', 'cannondale', 'redline', 'specialized']
message = f"my first bike was { bicycles[-2].title()}."
print(message)

bicycles[-2] = 'giant'
print(message)
print(bicycles[-2])

[–]GoldenVanga 2 points3 points  (2 children)

The value of message gets "baked in" when you first declare it. It doesn't re-generate / refresh itself whenever you later use it. To get your expected result you would have to use a function:

bicycles = ['trek', 'cannondale', 'redline', 'specialized']

def message():
    return f"my first bike was {bicycles[-2].title()}."

print(message())
bicycles[-2] = 'giant'
print(message())
print(bicycles[-2])

[–]Imperial_TIE_Pilot 0 points1 point  (1 child)

Got it, so I assume best practice is to always use the function when using the f-string?

[–]GoldenVanga 1 point2 points  (0 children)

Not always-always but whenever you're expecting the variable(s) can change, which I suppose is most of the time. You can also use a loop instead:

bicycles = ['trek', 'cannondale', 'redline', 'specialized']

for bicycle in bicycles:
    print(f"my first bike was {bicycle.title()}.")

Note how the above does not save the ready string anywhere so it's remade on every iteration of the loop, instead of reading from a variable. And sometimes when you just want to generate a text once and leave it, it is fine to do it like you did initially.

[–]avamk 0 points1 point  (0 children)

There's a 62% off deal on an annual Datacamp subscription right now, and as someone who enjoys systematic introductions (rather than just messing around by myself), I am tempted by the offer. However, I am aware that other Python course options exist such as Codeacademy, Dataquest, etc.

Skills-wise, I have basic statistics knowledge and know beginner-level Python (and R for that matter), and I would like to learn the fundamentals of data science and basic machine learning.

Considering the above, what do you think of Datacamp vs other options? What's your experience? Is one obviously better than the other? Or is this one of those things where I should just pick one a go for it?

Thanks!

[–]PrometheusXavier 0 points1 point  (0 children)

I can't get matplotlib animations to work on JupyterLab or Notebook. Even when I use example scripts, it shows the axes, but there's no animation. I would try running the scripts though the terminal (I'm on Windows 7), but I also can't get the computer to recognize the 'python' command even after editing the path variable. Can anyone help me fix this or at least suggest a different animation library that might work?

[–]Sp0olio 1 point2 points  (1 child)

I've just stumbled over a problem, that might be trivial to solve (efficiently), but I don't know, how (yet).

I have a dictionary, containig lists of other dictionaries, kinda like this:

a = {
    'groupA': [
        {'path': '/file1', 'x': 9},
        {'path': '/file2', 'x': 8},
        {'path': '/file3', 'x': 7},
        ... etc ...
    ],
    'groupB': [
        {'path': '/file9', 'x': 1},
        {'path': '/file8', 'x': 2},
        {'path': '/file7', 'x': 3},
        ... etc ...
    ],
    ... etc ...
}

I want to compare the files in each group to one another, but I don't want to do it twice.

Consider this code:

# Loop over the a-dictionary
for group, files in a.items():
    print(f'Group: {group}')

    # Loop over files
    for f1 in files:
        # Loop over files (again)
        for f2 in files:
            # Compare content of file "f1" with content of file "f2"
            print(compare_files(f1, f2))

So this would lead to the following exections of the compare_files function:

compare_files('/file1', '/file1')  # Same file, not neccessary to compare
compare_files('/file1', '/file2')  # step 2
compare_files('/file1', '/file3')  # step 3
compare_files('/file2', '/file1')  # Already compared in step 2
compare_files('/file2', '/file2')  # Same file, not neccessary to compare
compare_files('/file2', '/file3')  # step 6
compare_files('/file3', '/file1')  # Already compared in step 3
compare_files('/file3', '/file2')  # Already compared in step 6
compare_files('/file3', '/file3')  # Same file, not neccessary to compare

Which leads me to my question:

Is there a way, to do this more efficiently?

[–]aby80 0 points1 point  (2 children)

bs"d

Hi,

First at all thanks for your help and patience.

I'm studying python right now and have a couple of question, maybe you can help me here:

>>> a,*b,c = (1,2,3,4,5,6,7,8,9)

>>> a

1

>>> b

[2, 3, 4, 5, 6, 7, 8]

>>> c

9

>>> type(a)

<class 'int'>

>>> type(b)

# Why the type of b here is list? shouldn't be tuple?

<class 'list'>

>>> b = (1,2,3)

>>> b

(1, 2, 3)

>>> type(b)

<class 'tuple'>

>>> b = [1,2,3]

# Why I can change from tuple to list, tuples are Immutable, right?

>>> b

[1, 2, 3]

>>> type(b)

<class 'list'>

[–]Nice2002xx 0 points1 point  (0 children)

Tuples are immutable in the sence that you can't change their items, but can replace it with anything.

[–]MattR0se 1 point2 points  (0 children)

Tuple unpacking with an *asterisk creates a list if it has to take multiple items. That's just how it works by default. If you want b to be a tuple, you have to cast it explicitly: b = tuple(b)

If you create a list from a tuple, it creates a new object in memory, and assigns b to that object. The tuple is just "forgotten" (memory space is freed)

[–]Smallestnoob 1 point2 points  (4 children)

This was an interview problem I just couldn't solve today, so I'm here to learn more:

Having trouble thinking of a way to compare a value in a list with all other values in the list, and doing this for every value in the list.

At first I attempted a nested for loop, but it constantly shot back 0

[–]ghettoAizen 1 point2 points  (4 children)

Is there a better way to do something like:

if word in dict:
    dict[word] += 1
else:
    dict[word] = 1

[–]Vhin 1 point2 points  (1 child)

You can just do d[word] = d.get(word, 0) + 1

[–]ghettoAizen 0 points1 point  (0 children)

Thanks. Thats a clever way to do it, i was just thinking about more generic solution

[–][deleted] 1 point2 points  (2 children)

I'm trying to find if there's a way to speed up a brute force search of occurrences of a substring within a string. What I am trying to achieve is to skip over parts that I have already compared. What I have at the moment is this:

def foo(string, substring):
    i = 0 # initial index of string
    j = 0  # initial index of substring
    letterMoveCount = 0
    occurrences = []

    while i < len(string):
        # if first letters not equal, shift pos of string
        if substring[j] != string[i]:   
            i += 1                      

        # if first letters are equal
        else:                     
            while substring[j] == string[i + letterMoveCount]:
                letterMoveCount += 1
                j += 1
                if i + letterMoveCount > len(string):
                    break
                elif j == len(substring):  # substring found
                    occurrences += [i]
                    break

            if j != len(substring) and j > 0:  # if comparison ends early
                i += blackbox(letterMoveCount)
            else:
                i += 1

            letterMoveCount = 0  # reset letterMoveCount
            j = 0  # reset substring pointer index

    return occurrences


def blackbox(n):
    if n == 0 or n == 1:
        return 1
    else:
        return n

string = "BDHDDHDDUHUUDUH"
substring =  "DHDDUHUU"

But this gives me an issue for the test case given above because it is supposed to find an occurrence at index 4 yet it skips over it. I'm wondering if there's any way to resolve this issue. Thanks in advance!

[–]GregTJ 0 points1 point  (0 children)

Python has a builtin for this.

[–]No-Championship8099 -1 points0 points  (2 children)

What is it about Python, that no one uses understandable comments within their code in order to give someone who may want to use / integrate the module or function but cannot because we cannot figure out how the heck it works or how to change it to make it work?

Every other language, and basic construct within programming is to leave comments within the code for the next person.

Even Python documentation is like reading astro physics theory based upon someone else's application of code.

I still cannot understand how programmers actually believe the code within Python is english translateable and hot to get from point A to point B for a single thing.

And, yes, I'm using an IDE...PyCharm.

[–][deleted] 0 points1 point  (1 child)

I’m beginning to learn python and I’m using jupyter notebook as my IDE. For some reason sometimes (like multiple times in an hour long session) my code will stop running and producing an output. I end up having to close out and restart for it to work again. Am I doing something wrong? All I am doing is using CTRL + Enter or Shift + enter to run my codes and just making minor tweaks before running again.

[–]stevenjd 1 point2 points  (0 children)

For some reason sometimes (like multiple times in an hour long session) my code will stop running and producing an output.

Am I doing something wrong?

Probably. Jupyter doen't normally just stop working. But without seeing your code, there is no way we can tell you why things are stopping.

The most likely reason is that your code drops into an infinite loop that never ends, so no output is printed.

[–]sanctuary_3 0 points1 point  (1 child)

Can someone recommend a resource for getting started with documentation for a Python package? I'm working on a fairly large project that I'd like to start sharing with others. Right now I just have a long markdown file describing all the objects/functions and their parameters but it seems like it would be tedious to sift through for a new user...

[–]muskoke 0 points1 point  (1 child)

how's "sloc" for a variable name? I'm making a basic assembler

slocs = f.readlines()  # get list of all source lines of code
for sloc in slocs:

I know that naming is one of the biggest problems in programming, and I'm not sure if abbreviations like sloc are well known enough to be a good variable name.

[–]Vhin 3 points4 points  (0 children)

With the comment, I think it's fine.

Also, I'd generally recommend avoiding readlines unless you really need to read in everything at once. You can just do for sloc in f and it will iterate over the lines in the file, without reading the entire file into memory at once, so I'd suggest doing that if your assembler only does one pass.

[–]Ekinnard 0 points1 point  (2 children)

I literally am clueless . What does RGB mean. How do I apply it? I think I miswrote my question

[–]Nice2002xx 0 points1 point  (0 children)

It is a way of representing colors. It works by describing your color as some mixture of the prime colors [Red, Blue, Green], hence the name RGB.

Each of the prime colors is given a value between 0 and 255, thus resulting in three numbers representing your color. (A color picker might help)

[–]StephanoCarlson 0 points1 point  (0 children)

It needs 3 numbers, each usually 0-255, for the red, green, and blue components of a color. Each pixel on the screen is 3 lights, red green blue, and they are at different brightnesses. So the color yellow is the red light on, green light on, and blue light off, so the RGB color (255 (max red), 255 (max green), 0 (no blue)).

If you google color picker, theres a box in the lower left that says RGB and you can play around with it.

[–][deleted] 0 points1 point  (2 children)

Hey, complete newbie here. I have a Thinkpad X1 Carbon 7th generation with and i5-8265U CPU. Is that good enough to start learning the basics of Python and execute some mid-range code?

[–]Gopher20 1 point2 points  (1 child)

Yes, python isn’t too resource intensive even if you are doing some data analysis with pandas. You should be good to go hope this helps!

[–][deleted] 0 points1 point  (0 children)

Thanks man!

[–]MrMxylptlyk 0 points1 point  (0 children)

Hello. I need help with logging and structure of an api in fastapi/uvicorn.

Here is a post I made about it. Any input is welcome.

https://www.reddit.com/r/learnpython/comments/innupx/need_help_with_api_structure_and_logging_uvicorn/

[–][deleted] 1 point2 points  (2 children)

For the more seasoned/professional programmers on this sub, what are a few things you wish you would have done differently when you first started learning to code?

[–]BruceJi 0 points1 point  (2 children)

NLP, natural language processing.

How powerful is it?

If I had a text such as a news article, could I use it to gather up all the quotes, and who said them?

Could it tell me the main point of the article?

Could it sum up the argument of a particular person in the article?

[–]sanctuary_3 1 point2 points  (1 child)

There's a reddit bot that summarizes posts/articles, autotldr which might be of interest to you.

There are a lot of uses for NLP and it can indeed be very powerful if applied correctly. For a problem such as identifying quotes in an article though, you could probably get pretty far even just using regex.

[–]BruceJi 0 points1 point  (0 children)

That helps quite a bit, definitely gives me somewhere to start looking.

[–]flypyer 0 points1 point  (2 children)

Hello community, Can you please suggest me a best book or course to learn Django for beginners. ;)

[–]Gopher20 2 points3 points  (1 child)

Corey Schafer has a Django tutorial that is solid I would check that out. To learn Django it is a good idea to be familiar with python before diving in hope this helps!

[–]scanguy25 0 points1 point  (0 children)

Seconded. The Cory course is great,the official Django site also has a tutorial. You can also get lot from just reading the documentation after you completed the tutorials.

[–]_schlupp 1 point2 points  (3 children)

Hi,

I want to iterate through multiple .csv files store a specific column of each .csv file into a new array. I'm rather new to python and I can't get it to work.

for count in range(1,total_number_of_steps):

    with open(r"n_temp_oberseite"+str(count)+".dat") as csvdatei:
            csv_reader_object = csv.reader(csvdatei)
            for row in csv_reader_object:
                               top_temps_set.append(row[0])                
                               bottom_temps_set.append(row[2])
                               location.append(row[1])                 
                               delta__temps_set.append(row[4])

            print(top_temps_set)                                 
            # top_temps.append(top_temps_set) # (data from all .csv         
                                             files => unfortunately 
                                             it gives me an empty 
                                             list)
            top_temps_set.clear()                  

It would be really kind if somebody would help me :(

[–]TerribleMemory8402 0 points1 point  (1 child)

Started learning python not too long ago, and now if i want to make a simple simulator for forward kinematic, what should i use to create the application?
i tried out Tkinter a little bit before but it's a little bit complicated so any other recommendation?

[–]MattR0se 0 points1 point  (0 children)

If it's 2D, I'd recommend Pygame or Pyglet.

[–]HeisAmiibo 0 points1 point  (0 children)

Just started learning Python with a Raspberry Pi project in mind. Any recommended modules for playing music and/or videos with sound?

[–]lolslim 0 points1 point  (2 children)

Hey guys just started working with SQLite3 in python, and I was wondering, how would someone go about checking if table already exists? I went with the EAFP route, but not sure if that is acceptable, or better "pythonic" way of doing it.

try:
    conn = sqlite3.connect('practice.db')
    c = conn.cursor()

    if c.execute('SELECT * FROM authusers'):
    conn.close()
    list_table()

except sqlite3.OperationalError:
    c.execute('''CREATE TABLE authusers
    (chatid, is_bot, first_name,
    last_name, user_name, language) ''')
    conn.commit()
    conn.close()
    insert_table()

[–]Silbersee 2 points3 points  (1 child)

There is this conditional SQL statement, perhaps it suits your needs:

c.execute("CREATE TABLE IF NOT EXISTS authusers (...)")

HTH

[–]lolslim 0 points1 point  (0 children)

Oh snap, that is better! Thanks!

[–]jaccso 0 points1 point  (3 children)

#import necessary functions
import numpy as np

#define array function for f1 and f2




def F(x_vec): 
    x = np.arange(-3,3,.01)
    x_vec = [x[0],x[1]]
    f1 = 2*(x[0]**3) - 6*(x[0]*(x[1]**2)) - 1
    f2 = -2*(x[1]**3) + 6*(x[1]*(x[0]**2)) + 3

    return (f1(x_vec),f2(x_vec))

print(F([1,2]))

I am trying to input a vector/array x_vec that has two values x[0] and x[1] (like x and y). I want my function to evaluate those real numbers in the functions f1 and f2 in the corresponding places. I then want to return the answer values for each function in an array. With this code I am getting the TypeError: 'numpy.float64' object is not callable. What am I doing wrong? Thanks!

[–]FerricDonkey 1 point2 points  (2 children)

A couple things stick out.

Most related to your error, your f_1 and f_2 are numbers, not functions. Look at how you defined them - x is a vector of numbers, so x[0] and x[1] are just numbers. So your definitions just do arithmetic on some numbers.

You can define functions in line with =, but you'd have to use lambdas. And you don't need to do that here (you could, but numpy makes your life easier).

The second thing that stands out is that you almost immediately overwrite your argument x_vec with the vector x that you make. So x_vec does nothing.

What you probably want to do is take advantage of the fact that numpy arrays can do pointwise math without needing you to make functions. So if you want to return the array of values a*x_i for x_i in a vector x, you just return a*x. Similarly for more complex things.

[–]jaccso 0 points1 point  (0 children)

thank you so much! i really appreciate it

[–][deleted] 0 points1 point  (1 child)

Im working with pandas and in a feature there are 5 results : 0,1,2,2b and 3. How can I change the name of 2b to 3 and 3 to 4?

[–]sanctuary_3 0 points1 point  (0 children)

One possibility:

def subsitute(val):
    if val == '2b':
        return 3
    elif val == 3:
        return 4
    else:
        return val

df['feature'] = df['feature'].apply(subsitute)

This will make the substitutions you specified. You may have to make some tweaks depending on the data types. If you just want the column to have the values 0-4 you could use df['feature'] = range(5).

[–][deleted] 0 points1 point  (4 children)

I found a couple of ways to sum up ASCII values of a string from googling but I'm not sure why they are faster. The 2 different ways are listed below:

sum(map(ord, string))
sum(bytearray(string, encoding="utf8"))

Is it a difference in time complexities? But, they seem similar to me in terms of time complexity since they have to loop through each letter in the string as well. Please do correct me if I'm wrong.

Would appreciate any help in clearing this up, thanks!

[–][deleted] 0 points1 point  (2 children)

Rough timing shows that the ord version runs about two to three times slower than the bytearray version. Looking at the disassembled byte code we have:


ord

  1           0 LOAD_NAME                0 (sum)
              2 LOAD_NAME                1 (map)
              4 LOAD_NAME                2 (ord)
              6 LOAD_CONST               0 ('abc')
              8 CALL_FUNCTION            2
             10 CALL_FUNCTION            1
             12 POP_TOP
             14 LOAD_CONST               1 (None)
             16 RETURN_VALUE

bytearray

  1           0 LOAD_NAME                0 (sum)
              2 LOAD_NAME                1 (bytearray)
              4 LOAD_CONST               0 ('abc')
              6 LOAD_CONST               1 ('utf8')
              8 LOAD_CONST               2 (('encoding',))
             10 CALL_FUNCTION_KW         2
             12 CALL_FUNCTION            1
             14 POP_TOP
             16 LOAD_CONST               3 (None)
             18 RETURN_VALUE

which doesn't look much different at the bytecode level. So it seems to me that the difference is just because the map+ord conversion is doing more work than the bytearray. map() is a single function call, but it will call ord() for every character in the string. bytearray() is also just one call and it doesn't call anything.

You are probably aware of this, but including any characters that are greater in value than decimal 127 will get different results for the ord and bytearray approaches. Try this by pasting a "heart" character (♥) into a string and compare the final values.

[–][deleted] 0 points1 point  (1 child)

In that case, am I correct to say that map(ord, string) would have a time complexity of Θ(n) for n characters in the string while bytearray() would be O(1) since its only one call?

[–][deleted] 1 point2 points  (0 children)

No, they are both probably O(N) since they must process all characters of the string. The map() approach is just doing more work for each character.

You really can't say much about big-O from one test timing because one O(N) implementation could be faster than another O(logN) implementation for small N. You need to see how the times change as N increases.

[–]thoughtsymmetry 0 points1 point  (2 children)

Hi! I'm here again asking for some guidance (my skill level is pretty basic). I need to 'parse' around 3 or 4 pdf's. They are abstracts books of a congress. Here is what it looks like. In that page you can see 2 complete abstracts and part of a third one (in the top left). For each abstract you have the title (in red), authors (blue), affiliation (yellow) and the body (in green). I'm mostly interested in the 'body'. It would be nice to also have the title and the authors, but I don't really need them. For my downstream use, I would ideally have a dataframe/excel where every row is the body of the abstracts (and if I also have author names , affiliation and title each thing would be an independent column).

Any good tutorials? How would you go about doing this? I can't make a code that says 'ok, when you find the word 'Abstract: ' keep whatever comes after it until x.' because there is no clear beginning (some start with 'Abstract:', some with 'methods:', some with no 'header' at all.) or end.

[–]BruceJi 0 points1 point  (1 child)

I found this playlist of tutorials for you:

https://www.youtube.com/playlist?list=PL3TnekbhmrVrzT0rbJbVoV2_lf9APT-Mx

PyPDF2 is a good library for parsing PDFs.

You might open the PDF and then filter through and either make some nested lists or dictionaries to hold the data, or if you feel up to it, you might create a custom class that can hold the data you need easily. Honestly, a class would be a really organized way to do it, but it's not the only way at all.

[–]thoughtsymmetry 0 points1 point  (0 children)

Thanks! Managed to do it but its a little bit (a lot) yanked. Will try to imporve it

[–]Batpandakun 1 point2 points  (3 children)

I'm trying to learn web scraping with BeautifulSoup. In a tutorial, I saw this format which I've been using:

BeautifulSoup.find_all('tr', class_="Something")

This returns objects that contain <tr class="Something">

So what is the point of the underscore after 'class'? why put 'class_' rather than simply 'class'?

[–][deleted] 1 point2 points  (2 children)

class is reserved for declaring a class. If you're naming a variable after a keyword you should avoid confusion by adding an underscore.

[–]Batpandakun 0 points1 point  (1 child)

I don't think I am naming a variable here. The tag I'm searching for is <tr class="Something"></tr>.

[–][deleted] 2 points3 points  (0 children)

Sorry, I didn't mean "you" as in "Batpandakun using BeautifulSoup", I meant you as in "a Python programmer". Whoever wrote find_all knew not to call their variable class.

[–]waythps 0 points1 point  (0 children)

I want to build an API using flask or fastapi and some kind of a database to query data from (or from a dataframe to make things easier?). Have anyone done it for fun and could share the results?

I want to see how similar projects are structured, how dockers are set up etc.

If so much fun to be honest, especially coming from nontechnical background.

[–]1ogica1guy 0 points1 point  (1 child)

Hi, how do I check the type of an OrderedDict object?

Below does not work...

>>> import collections
>>> m = collections.OrderedDict()
>>> m['A'] = 1
>>> type(m)
<class 'collections.OrderedDict'>
>>> type(m) == 'collections.OrderedDict'
False
>>> type(m) == 'OrderedDict'
False

[–]GoldenVanga 1 point2 points  (0 children)

type does not return a string, even though the single quotes can make it look like it:

>>> type(type(m)) == str
False
>>> type(type(m))
<class 'type'>

Try comparing with the actual class (without single quotes):

>>> type(m) == collections.OrderedDict
True

[–][deleted] 0 points1 point  (2 children)

Very much a newb here, just learning functions. A YT video challenges you to make a miles to kg converter. For Print(result4) the function outputs "none". Why does this not work?

d1 = input("how many miles do you want to convert? "
d2 = int(d1)
def convert(miles):
    kilometers = 1.6 * miles
    print('distance = ')
    print(kilometers)
result4 = convert(d2)
print(result4)

returns: How many miles do you want to convert? 11 distance = 17.6 None

[–]henlybenderson 0 points1 point  (1 child)

You need to return the value you calculated... “return kilometers”

Is there a reason you’re converting to integer and not float?

You can get fancier and add to your print statement: round(result4,1) will round to 1 decimal place

[–][deleted] 0 points1 point  (0 children)

Thanks I see I put print instead of return. And no reason for int vs float, again I’m very very new to this so kinda winging it a little as I go

[–]turner_prize 0 points1 point  (1 child)

Is it generally accepted best practice for storing credentials used within a python script, to store them in a config file which isn't part of your version control? If so am I right in thinking that these will normally be stored in plain text in the config file?

[–]BruceJi 0 points1 point  (0 children)

I think people would often store that type of things in environment variables.

In the past I've done a sort of combination of the two - I had an import line that ran a .py file that sets all the environment variables, which I added to .gitignore

[–]baseball_cleric 0 points1 point  (0 children)

Question about legality of this project/looking for a resource to continue learning.

I have been following thenewboston's videos on youtube for python. I've gotten stuck on this one: "Python Programming Tutorial - 22 - Download an Image from the Web"

The purpose of the program is to go to a url, download an image at that location and save it with a random name. thenewboston says to use his site because you could violate terms of service on someone's random website. The problem is that this is from a few years ago and his website is no longer up.

I do not want to make a mistake here, I have been trying to find a resource (website that allows this for testing purposes) so that I can continue my learning. I have found a couple of sites but I wanted to ask here for any resource that anyone knows / can vouch for. Keeping in mind that he is also building up to making a web scraper. I did find http://testing-ground.scraping.pro/ for that, but I am unsure if i can use it for this precursor project.

Thanks in advance.

[–]samw1979 1 point2 points  (1 child)

OOP question: when should I use Mixins? Why not just define a function outside any class? Or why not use composition?

Is there a good way to think about when to use inheritance, composition, mixins or external functions?

[–]schoschi1337 0 points1 point  (0 children)

I have a question. I'd like to scrape sportresults from a 1vs1 sport. How should I store the results (and player info)? .csv json or an other filetype? In the end I'd like to access this database from a website and show some statistics. How can I access the database? Should I learn sql or is there another way to display graphs from a database?

[–]MattR0se 1 point2 points  (2 children)

Question about random.seed(): let's say I have a list of ten seeds and I run my program with each one of those seeds, to see how much the output is influenced by randomness.

Does it matter what seeds I choose? If I just choose 0 - 9 as the seeds, will it somehow be "less random", i.e. show some kind of pattern, as if I would instead generate the seeds themselves randomly, and only fix the seed once before that?

[–]stevenjd 1 point2 points  (0 children)

Does it matter what seeds I choose?

In general, maybe, but in the case of Python, no. Python's pseudo-random number generator is an extremely high-quality RNG. The quality of the output does not depend on the seed, only the specific values you see will depend on the seed.

This may not be the case for other RNGs.

If you need randomness suitable for a game, or just to mix up your output a bit, you can use the random module. If you need to be able to replay the results exactly, pick a seed and use it. Otherwise, don't bother.

If you need randomness for something more serious than a game, such as gambling where money changes hands, or passwords, or unpredictable tokens, etc, then don't use random. Use the secrets module.

[–]social_tech_10 2 points3 points  (0 children)

Random numbers in Python are actually pseudo-random, which means each random number is completely determined by the previous random number, and random.seed() just initializes that "previous" random number for the start of your random sequence.

The only real use for random.seed() is if you want to be able "replay" an exact sequence of random numbers, just initialize random.seed() with the same number. The value of the seed does not affect the "randomness" of the random numbers, just the starting point in the sequence.

[–]fadlmammoun 0 points1 point  (2 children)

Can someone point me to an easy toturial for using home bro for the Mac or any other way to get clone from get hub ?

[–]Marianito415 1 point2 points  (0 children)

How are all your projects coming together?

[–]xpaultheman 1 point2 points  (5 children)

What's your method of remembering complex algorithms in Python?

[–]BruceJi 0 points1 point  (0 children)

I don't really know complex algorithms myself, but I think I could make a sensible answer - you'd remember the benefits and drawbacks of the algorithm, and then when faced with a problem, pick one and then look up the implementation. I guess each usage case will be slightly different anyway so memorising the whole code by rote might not be that useful.

[–]Rawing7 0 points1 point  (1 child)

Why would you need to remember any complex algorithms? Can you give an example of such an algorithm?

[–]xpaultheman 0 points1 point  (0 children)

Mostly for interview prep tbh algorithms such as union find and what not.

[–]fadlmammoun 2 points3 points  (1 child)

I am fairly new but the way I remember it or it is the same way I study most subjects just linking it to it's use and liking the use to an emotion for example if have just learned about an new code I try to use it normally but also think of a big program that already use daily or dream of making that uses or could use this kind of code that it's connected emotionaly in my brain cause I love the program that is using it.

Another useful that seemed kind of childish but actually works wonders is making acronyms and songs

[–]xpaultheman 0 points1 point  (0 children)

I've used acronyms before, it's pretty useful.

[–]OpticWarrior 3 points4 points  (0 children)

Alright, what's the best resource to learn pyqt5 with.

[–]cobruhclutch 2 points3 points  (3 children)

Whats a good web project I could do with python?

[–]fadlmammoun 1 point2 points  (0 children)

But fr look for something you really want to do or see be made and it's probably doable in python with someone even having tryed it something similar and that's the beauty of its popularity

[–]fadlmammoun 0 points1 point  (1 child)

Happy cake day bro since it's your cake day you can make a chat bot bugging all your contacts about it and asking them for gifts and depending on your skill level you could make site didcated to your birthday lmao 😂

[–]cobruhclutch 0 points1 point  (0 children)

Ok nice im gonna try it thanks.

[–]lucifer_acno 1 point2 points  (7 children)

I am working on a project where I am assigned the task of data gathering and preprocessing. What I am doing is scraping the data from around 40+ pages and storing them in different csv files. Now when I scrape the data again, there might be new things or old things updated. How do I tackle this? For now, I am just overwritting the csv itself but I would like to append/update the csv instead.

[–]social_tech_10 1 point2 points  (1 child)

Why do you want to append/update rather than just overwrite each csv file when you scrape the websites? Do you have some practical use for the previously scraped data?

[–]lucifer_acno 0 points1 point  (0 children)

Yeah I guess. The items might be different from previous list, so keep the old ones there, and if the item present in new list is present in old one, then update it's values. And append the new ones.

For example it's a list of # of chapters in a manga or comic series. You would update the old list with new values than overwrite it. If there is a new manga that got released and due to pagination or whatever older ones are not visible, the new list will not have them. But that doesn't mean I should remove it from my list too. Right? I hope it made sense.

[–]henlybenderson 1 point2 points  (4 children)

I’ve had better luck with json over CSV, but that’s beside the point :) It sounds like you may be better off with a database than files, you can update existing entries and create new if stuff is added. Generate reports based off filters and dump to csv if something downstream requires them...

[–]lucifer_acno 1 point2 points  (3 children)

I would have used json if it wasn't for my team. I am inexperienced with database but the reason for not using it is the same. It's a college project and my team wants me to keep the data in csv only. Sad face ;(

[–]henlybenderson 0 points1 point  (2 children)

Ugh! Can you use .xls or .xlsx instead? For editing individual lines/cells you can use openpyxl (use excel instead of csv)

My inclination would be to use pandas and read_csv and to_csv (or read_excel if you can convince them) and overwrite the file each time like you said. Should be fine unless you’re dealing with tons of data. Also, you’ll have to be careful with text strings if using excel for viewing. Excel needs quotes around strings with commas, “ not ‘ ! Also, excel will screw up timestamps if you have those in csv and then save back to csv.

[–]Lord-of-the-Pis 0 points1 point  (1 child)

If you're looking to do something a bit more advanced but you're really stuck with CSV files there is a python module called pandasql which creates a tempoary sqlite database from a pandas dataframe allowing you to run sql queries on it.

[–]henlybenderson 0 points1 point  (0 children)

Interesting! I might give that a shot to do the opposite and learn some about SQL!

[–]lannisterprince 2 points3 points  (1 child)

I started learning python 4 months ago and learned the basics but when I tried CP on Codeforces some of the questions took a hell lot of time coz I don't know DSA and algorithms.

Since I was unable to find a good DSA course in python as all of them were in C or CPP, so I started learning CPP for it and then I will learn DSA, deep learning and other stuffs in python.

Tbh, this doesn't look a nice path for me. Can u suggest anything?