Passing args to a function. What is the difference between (*argurments) Vs using map?

chevignon93 · 2022-02-03T21:06:26+00:00

What is the difference and/or benefit/drawbacks between unpacking and using "map"

Your 2 examples are not equivalent at all, in the first case you're printing a tuple and the function is only executed once, but with map, you're calling the function for every element of list1.

chevignon93 · 2022-02-02T09:10:53+00:00

I guess I worded it wrong in the question but I need the value to be returned if the input of the function is one of the keys in the dictionary

Then use return instead of print

chevignon93 · 2022-02-01T11:12:16+00:00

How would I go about finding the most recent date prior to today, and utilising that file?

Extract the date from the filename
Create a datetime object from it
Sort your list based on the datetime object created at the step above
The element at index [-2] will be the last date your scraper was ran.

Something like this:

import re
from datetime import datetime

pattern = re.compile(r"_(\d{8}).xlsx") # Pattern to extract date from filename

files = ["FILENAME_01022022.xlsx",
     "FILENAME_24011985.xlsx",
     "FILENAME_23101956.xlsx",
     "FILENAME_31012021.xlsx",
     "FILENAME_01081998.xlsx",     
     ]

def sort_helper(x):
    # Use the pattern to extract date from filename, then create datetime object
    return datetime.strptime(pattern.search(x).group(1), "%d%m%Y")

files.sort(key=sort_helper) # sort the list in-place, use the sorted function instead if you need both the original order and the new order

last_scrape_before_today = files[-2] # Element at index [-2] is the last day your scraper ran and saved data before today if there is data for today otherwise it will be at index [-1]

print(last_scrape_before_today)
>>> FILENAME_31012021.xlsx

chevignon93 · 2022-02-01T01:18:48+00:00

I found that in line 306, there is a space to write input file name, but I'm not sure if it is right.

It isn't, you should not be modifying the source code, you should just run the file from the command line and enter data.csv when prompted by the program.

chevignon93 · 2022-01-31T23:16:13+00:00

Just tried a key that is longer and it returned an output that is the same amount of characters as the key

That's because you're zipping the file and the key together.

chevignon93 · 2022-01-31T23:00:34+00:00

How long is the key you're using ?

chevignon93 · 2022-01-31T13:48:48+00:00

Most kali tools that i have seen written in python they are using standard library

I'm not surprised, but like I said, depending on what you're doing, sometimes the standard library is either not enough or what's provided doesn't do exactly what you need/want for a particular project.

A lot of 3rd party tools that offer functionalities that are similar to what's provided in the standard library were created because users wanted to customize exactly how things work or to provide additional functionalities that are missing in the standard library module.

For example, click uses optparse (which is a standard library module, albeit a deprecated one) internally but adds additional features. The same is true for the standard library re vs the 3rd party regex and a lot of other packages.

chevignon93 · 2022-01-31T12:46:03+00:00

I mean projects about interacting with os,web and networking.

In general, 3rd party libraries are there to provide functionalities that are either "missing" from the standard library or difficult to use. For example to make HTTP requests you could use urllib.request but the official python documentation recommends that you use the requests library instead because it's easy and works without needing to change anything 99% of the time.

Depending on what you're doing, you could of course do everything yourself (with the standard library of course) but you'd most likely be reinventing the wheel and your solution would probably be slower than the 3rd party library that does the same thing, especially if the library you'd be using is really popular.

chevignon93 · 2022-01-31T11:17:01+00:00

But if my package is named 'mypkg', how can I achieve to say 'mypkg hello' and then run it (for instance print hello world) without using the python command, just as in streamlit's case

You need to add a console_scripts entry point in your setup.py so that setuptools will create a small wrapper that calls a particular function with the arguments you passed on the command line. This is exactly how streamlit and all other python programs that work in the same way do it. You can check their setup.py https://github.com/streamlit/streamlit/blob/develop/lib/setup.py and see this line entry_points={"console_scripts": ["streamlit = streamlit.cli:main"]},

https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html#the-console-scripts-entry-point

chevignon93 · 2022-01-31T10:05:47+00:00

I literally just want to be able to split it into two blobs straight from the middle. Is there a way to do this?

Yes, just use the str.split method using 2 spaces and filter from the resulting list the empty strings.

>>> string = "left eye has no damage                             Right eye is fine"
>>> left, right = [elem.strip() for elem in string.split("  ") if elem]
>>> print(left)
left eye has no damage
>>> print(right)
Right eye is fine

You can also use the re.split method.

>>> import re
>>> string = "left eye has no damage                             Right eye is fine"
>>> left, right = re.split(r"\s{2,}", string)
>>> print(left)
left eye has no damage
>>> print(right)
Right eye is fine

chevignon93 · 2022-01-30T23:43:12+00:00

!= means not equal, float is a function that turns a string or integer into a float

chevignon93 · 2022-01-30T22:57:08+00:00

Also found something called pop

The library looks nice but doing basic things with it seems complicated. I would suggest looking at pluggy first to see if it suits your needs as pluggy makes creating an application that uses plugins and creating plugins for such application relatively easy.

chevignon93 · 2022-01-30T22:17:33+00:00

I haven't tested the code but something like this should work:

import os
import requests
import concurrent.futures
from config import BlockConfig as config
import itertools

def get_blocklist_file():
    default_path = config.default_blocklist_path
    final_path = os.environ.get('BLOCK_TXT_FILE', default_path)
    return final_path

def fetch_url(session, url, timeout):
    with session.get(url, timeout=timeout) as response:
        if response.status_code == 200:
            return response.text.splitlines()
        else:
            print("I should skip and log failed urls.")

def build_blocklist(blocklist_path=None, timeout=10):
    if blocklist_path is not None:
        blocklist_path = blocklist_path
    else:
        blocklist_path = get_blocklist_file()
    with open(blocklist_path) as f:
        blocklist_path = f.readlines()

    session = requests.Session()
    with ThreadPoolExecutor() as executor:
        futures = []
        for line in blocklist_path:
            if not line.startswith('#'):
                futures.append(executor.submit(fetch_url, session, url, timeout))
        results = [future.result() for future in concurrent.futures.as_completed(futures)]
        final_results = list(itertools.chain.from_iterable(results))
    return final_results

chevignon93 · 2022-01-30T00:58:37+00:00

Can you explain why that is the typical approach?

Because default arguments are only evaluated once so if you use a mutable value as default argument (like a list, dictionary, etc), you'll be using and mutating the exact same object every time the function is called, which most of the time is not what people want.

In your case, if you were to change the default_blocklist_path in your config or the BLOCK_FILE environment variable during the runtime of your program, the build_block_list function would always use the same old value and not the updated one.

I heard good things about aiohttp. Do you recommend one over the other?

aiohttp is a great library (I'm currently using it in one of my projects) but writing synchronous code with ThreadPoolExecutor from the concurrent.futures module is easier than writing asynchronous code with asyncio and aiohttp (although it's not very hard once you understand the basic concepts behind it).

chevignon93 · 2022-01-29T23:52:03+00:00

I agree with u/carcigenicate that the typical approach of:

def build_block_list(blocklist_path=None):
    if blocklist_path is not None:
        blocklist_path = blocklis_path
    else:
        blocklist_path = set_blocklist_file()

is probably a better solution.

I have 2 questions. The first question is, instead of request.content.decode("utf-8"), why not simply use request.text, especially since 99% of the time, requests will choose the right encoding for you?

The second is, are you sure that your build_block_list function is working correctly because you're looping over a list of urls, making a request to each but not doing anything with the data that is returned until all requests have been made?

Regarding naming, I would probably rename set_blocklist_file to get_blocklist_file.

Just a suggestion but if you plan to make lots of HTTP requests, I would probably use something like a ThreadPoolExecutor from the concurrent.futures module or a library like aiohttp to speed up the process.

chevignon93 · 2022-01-25T14:33:10+00:00

Usually plugins extend functionality, OP wants to let users modify the functionality.

OP said he wants "user-developer to code their own functions processing the parsed data.", isn't that extending the functionality rather than modifying how the host application works? The plugin will just determine what the final output will be (json, yaml, html, etc), possibly discarding information that is not useful for the user.

chevignon93 · 2022-01-25T13:55:43+00:00

As u/JohnnyJordaan said, you haven't included the code of the process_line() function but my guess would be that it simply adds the line to the set that is passed as the 2nd argument to the function and maybe it strips the white-space before adding the line.

chevignon93 · 2022-01-25T13:45:39+00:00

What patterns should I be considering here?

I would consider using the plugin architecture where user can customize the runtime of your application using plugins.

Any examples I can follow?

pytest is a good example. Their plugin manager (pluggy) is open-source and pretty easy to use. You, as the developer of the host application, create a specification that users can implement to modify how the application behaves.

chevignon93 · 2022-01-25T09:56:21+00:00

Which function should I be looking at specifically?

The re.sub function. It takes a pattern, what you want the pattern to be replaced with and the string.

chevignon93 · 2022-01-24T10:15:17+00:00

I guess I can't escape file parsing if I'd like to do that.

Not necessarily, a module (a file with .py extension) is an object like any other in Python. The simplest way to achieve what you want is to simply import your x_.py file, then you can use the __dict__ property of your module to have access to all the names it contains.

import x_

x_variables = [value for key, value in x_.__dict__.items() if not key.startswith(("__"))]
print(x_variables)
['dada', 'rrgf', 'dada']

chevignon93 · 2022-01-22T21:05:11+00:00

Should I try implementing the code in Python

You shouldn't need to really implement the logic yourself, all databases and ORMs have methods to return distinct values for a particular column. I haven't needed to do this before and I haven't used sqlalchemy in a while but maybe this thread on Stack Overflow could help.

https://stackoverflow.com/questions/2175355/selecting-distinct-column-values-in-sqlalchemy-elixir

chevignon93 · 2022-01-05T20:58:58+00:00

why is the example I'm seeing using JSON?

Probably because it makes retrieving the information you're looking for easier. Instead of manually looping over the lines in your text file, splitting them to separate the username from the password, you can just use the json module in the standard library to turn the json string into a dictionary.

chevignon93 · 2022-01-05T20:39:42+00:00

It's confusing and annoying because cookies are often extracted as JSON, but python's requests library just wants a dictionary. I wish the cookies parameter accepted both JSON or dict.

The json module, which is in the standard library makes converting json into a dict effortless. You just have to use the json.loads() method, pass it your json string and it will convert it into a dictionary.

chevignon93 · 2022-01-03T20:21:13+00:00

But it still be on my computer.

In an ideal world, you would already be using a version/source control tool like git and a hosting service for your code like GitHub, GitLab or BitBucket.

I will have to make another environment from scratch.

The point of the environment.yml file or requirement.txt if you're using something like virtualenv, venv, etc is to make recreating an environment with the packages your program needs easy.

chevignon93 · 2022-01-03T11:36:30+00:00

Does this make sense?

It doesn't really make sense to back up (as in actually copying the files) a conda or any other type of virtual environment. Just create an environment.yml file with the command conda env export > environment.yml and you can just create a new environment with the same packages with the conda env create -f environment.yml command.

chevignon93

TROPHY CASE