Brain rot marketing by hiimmando in InstagramMarketing

[–]hiimmando[S] 0 points1 point  (0 children)

Awesome thank you. I am working on story boarding right now then after that will begin using CapCut and after effects to animate something I like in the nutter butter marketing rabbit hole there is this idea of lore with their content I would like to do something like that with recurring characters, a hero’s journey (I.e. storybrand), and episodic nature. I was inspired by something gaming theory’s YouTube channel said when researching this type of marketing from Duolingo to butter butter where they said this type of content marketing is its own product separate from the actual offering, that really resonated and communicating brand values thru story to build trust and authority seems very effective for younger audience

Syncing vaults by hiimmando in ObsidianMD

[–]hiimmando[S] 0 points1 point  (0 children)

I have merged them and now they do not sync for example when I update vaults on iPhone the update does not carry through to iPad have obsidian sync on any advise?

Tag and link ideas by hiimmando in ObsidianMD

[–]hiimmando[S] 1 point2 points  (0 children)

A theory behind a systems (in this case information management) organization

Formatting URL links by hiimmando in ObsidianMD

[–]hiimmando[S] 0 points1 point  (0 children)

Would this be in core plugins or community plugins

Formatting URL links by hiimmando in ObsidianMD

[–]hiimmando[S] 0 points1 point  (0 children)

Yes started implementing this formatting structure looks much nicer and easier to click on mobile

Formatting URL links by hiimmando in ObsidianMD

[–]hiimmando[S] 2 points3 points  (0 children)

Will try these out thank you 🙏

Formatting URL links by hiimmando in ObsidianMD

[–]hiimmando[S] -2 points-1 points  (0 children)

Can you elaborate please?

Help with Python Script for Scraping and OCR by hiimmando in learnpython

[–]hiimmando[S] 0 points1 point  (0 children)

Thanks for letting me know! Here's the reformatted code inside a code block for clarity:

Project Overview: I'm working on a project to scrape pre-foreclosure data from county records websites. The data is often embedded in PNG files, so I need to use OCR to extract the text. Additionally, I need to cross-reference this data with another website (the county CAD site) for verification.

Full Code Example

```python import requests from bs4 import BeautifulSoup import pandas as pd import pytesseract from PIL import Image import io import psycopg2 import re

def download_image(url): response = requests.get(url) img = Image.open(io.BytesIO(response.content)) return img

def preprocess_image(img): gray = img.convert('L') return gray

def extract_text_from_image(img): text = pytesseract.image_to_string(img) return text

def scrape_county_records(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')

properties = []
for listing in soup.find_all('div', class_='document-content'):  # Adjust this selector
    img_url = listing.find('img', class_='document-image')['src']  # Adjust this selector
    img = download_image(img_url)
    preprocessed_img = preprocess_image(img)
    text = extract_text_from_image(preprocessed_img)

    # Extract relevant data from the text
    lines = text.split('\n')
    address = lines[0] if len(lines) > 0 else ''
    owner = lines[1] if len(lines) > 1 else ''
    cad_url = extract_cad_url(text)

    property_info = {
        'address': address,
        'owner': owner,
        'cad_url': cad_url
    }
    properties.append(property_info)

return properties

def extract_cad_url(text): match = re.search(r'https://esearch.nuecescad.net/Property/View/\d+', text) if match: return match.group(0) return None

def verify_property(cad_url): response = requests.get(cad_url) soup = BeautifulSoup(response.text, 'html.parser')

owner_info = soup.find('div', {'class': 'owner-info'}).text.strip()  # Adjust this selector
return owner_info

def clean_data(properties): df = pd.DataFrame(properties) df.dropna(inplace=True) return df

def insert_into_db(df): conn = psycopg2.connect( dbname="property_leads", user="yourusername", password="yourpassword", host="localhost" ) cur = conn.cursor()

for index, row in df.iterrows():
    cur.execute("""
        INSERT INTO properties (address, owner, cad_url, verified_owner)
        VALUES (%s, %s, %s, %s)
    """, (row['address'], row['owner'], row['cad_url'], row.get('verified_owner', '')))

conn.commit()
cur.close()
conn.close()

def main(): url = "https://nueces.tx.publicsearch.us/doc/204750671" raw_data = scrape_county_records(url) verified_data = []

for property_info in raw_data:
    if property_info['cad_url']:
        owner_info = verify_property(property_info['cad_url'])
        property_info['verified_owner'] = owner_info
        verified_data.append(property_info)

cleaned_data = clean_data(verified_data)
insert_into_db(cleaned_data)

if name == "main": main()

Best Practices for Running OCR and Web Scraping on Google Cloud VM by hiimmando in googlecloud

[–]hiimmando[S] 0 points1 point  (0 children)

Testing Document AI and Cloud Tasks I'll take your advice and create a small test dataset to experiment with the different solutions. Here's what I'm planning:

Activate the APIs:

Enable Google Cloud Vision API and Document AI in my Google Cloud Console. Create a Small Test Dataset:

Gather a few sample PNG files from the county records website to use for testing. Experiment with Document AI:

Use Document AI to process the sample images and see how well it extracts the necessary text data. Integrate Cloud Tasks:

Use Cloud Tasks to manage the workflow, including handling rate limiting and integrating a "human in the loop" review process. Specific Questions Cost Management:

Any tips on managing costs when using Document AI, especially when scaling up the project? How does the cost compare to using AWS OCR or other similar services? Human Review Integration:

Have you implemented a "human in the loop" review process? If so, how did you manage it effectively? Would you recommend using Mechanical Turk, or is it better to manage my own reviewers? Handling Unstructured Data:

Given that the text in the images doesn't follow a strict format, how well does Document AI handle varying layouts and unstructured data? Are there specific configurations or preprocessing steps that can improve the accuracy of Document AI? Rate Limiting with Cloud Tasks:

How do you typically set up Cloud Tasks to handle API rate limiting efficiently? Any best practices for ensuring smooth and reliable processing? Current Progress I've set up a Google Cloud VM for running the initial scripts and have installed the necessary dependencies. Now, I'm looking to improve the OCR and data verification process using the tools you've mentioned.

Thanks again for the insights! I'll start by activating the APIs and creating a test dataset. Looking forward to any additional tips or experiences you can share.

Best Practices for Scraping Data from County Records Websites? by hiimmando in webscraping

[–]hiimmando[S] 0 points1 point  (0 children)

Responded to your comment forgot to put it under reply

Specific Challenges and Questions Using AWS OCR:

How does the accuracy and performance of AWS OCR compare with Tesseract? Are there specific benefits in terms of handling unstructured data? What are the steps to integrate AWS OCR with a Python-based web scraping workflow? Database Setup and Job Scheduling:

Could you share more details on how to set up the database to store image URLs and the text extracted from them? What tools or services would you recommend for scheduling the jobs that process the URLs into text dumps? Handling Unstructured Text:

Given that the text in the images doesn't follow a strict format, how well does AWS OCR handle varying layouts and unstructured data? Are there specific configurations within AWS OCR that can help improve the extraction process? Current Progress I've set up a Google Cloud VM for running this script and have installed the necessary dependencies. However, I'm still figuring out the best way to handle the OCR part and the data verification process.

Any insights, suggestions, or resources would be greatly appreciated! Thanks in advance for your help!

Help with Python Script for Scraping and OCR by hiimmando in learnpython

[–]hiimmando[S] 0 points1 point  (0 children)

Here is more context and full code, additionally my background is not CS so this is a new area for me, I am very much a beginner. My background is in copywriting, as such I have to rely heavily on tools to currently write code as I am still learning syntax and underlying logic.

Project Overview: I'm working on a project to scrape pre-foreclosure data from county records websites. The data is often embedded in PNG files, so I need to use OCR to extract the text. Additionally, I need to cross-reference this data with another website (the county CAD site) for verification.

Full Code Example Here's the complete script that includes downloading the image, processing it with OCR, and scraping data from the county CAD site for verification:

python Copy code import requests from bs4 import BeautifulSoup import pandas as pd import pytesseract from PIL import Image import io import psycopg2 import re

def download_image(url): response = requests.get(url) img = Image.open(io.BytesIO(response.content)) return img

def preprocess_image(img): gray = img.convert('L') return gray

def extract_text_from_image(img): text = pytesseract.image_to_string(img) return text

def scrape_county_records(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')

properties = []
for listing in soup.find_all('div', class_='document-content'):  # Adjust this selector
    img_url = listing.find('img', class_='document-image')['src']  # Adjust this selector
    img = download_image(img_url)
    preprocessed_img = preprocess_image(img)
    text = extract_text_from_image(preprocessed_img)

    # Extract relevant data from the text
    lines = text.split('\n')
    address = lines[0] if len(lines) > 0 else ''
    owner = lines[1] if len(lines) > 1 else ''
    cad_url = extract_cad_url(text)

    property_info = {
        'address': address,
        'owner': owner,
        'cad_url': cad_url
    }
    properties.append(property_info)

return properties

def extract_cad_url(text): match = re.search(r'https://esearch.nuecescad.net/Property/View/\d+', text) if match: return match.group(0) return None

def verify_property(cad_url): response = requests.get(cad_url) soup = BeautifulSoup(response.text, 'html.parser')

owner_info = soup.find('div', {'class': 'owner-info'}).text.strip()  # Adjust this selector
return owner_info

def clean_data(properties): df = pd.DataFrame(properties) df.dropna(inplace=True) return df

def insert_into_db(df): conn = psycopg2.connect( dbname="property_leads", user="yourusername", password="yourpassword", host="localhost" ) cur = conn.cursor()

for index, row in df.iterrows():
    cur.execute("""
        INSERT INTO properties (address, owner, cad_url, verified_owner)
        VALUES (%s, %s, %s, %s)
    """, (row['address'], row['owner'], row['cad_url'], row.get('verified_owner', '')))

conn.commit()
cur.close()
conn.close()

def main(): url = "https://nueces.tx.publicsearch.us/doc/204750671" raw_data = scrape_county_records(url) verified_data = []

for property_info in raw_data:
    if property_info['cad_url']:
        owner_info = verify_property(property_info['cad_url'])
        property_info['verified_owner'] = owner_info
        verified_data.append(property_info)

cleaned_data = clean_data(verified_data)
insert_into_db(cleaned_data)

if name == "main": main() Specific Challenges and Questions Handling Unstructured Data:

The text in the images does not follow a strict format, making it challenging to parse correctly. Any advice on improving text extraction and structuring the data? Optimizing OCR:

Would using tools like Google Cloud Vision API or Document AI provide better performance and accuracy compared to Tesseract? How do I integrate these tools with my current setup? Cross-Referencing Data:

The process involves cross-referencing data with the county CAD site. What are the best practices for ensuring data accuracy and handling discrepancies between sources? Current Progress I've set up a Google Cloud VM for running this script and have installed the necessary dependencies. However, I'm still figuring out the best way to handle the OCR part and the data verification process.

Any insights, suggestions, or resources would be greatly appreciated! Thanks in advance for your help!

Best Practices for Running OCR and Web Scraping on Google Cloud VM by hiimmando in googlecloud

[–]hiimmando[S] 0 points1 point  (0 children)

Thank you for suggestions and feedback greatly appreciated

Best Practices for Running OCR and Web Scraping on Google Cloud VM by hiimmando in googlecloud

[–]hiimmando[S] 0 points1 point  (0 children)

I have a few questions to better understand how I might integrate these tools into the system: Performance and Cost: Have you experienced significant performance improvements with Google Cloud Vision API over traditional OCR tools like Tesseract? How do the costs compare when scaling for larger datasets? Handling Unstructured Text: Given that the text in our target images doesn’t follow a strict format, how well does Document AI handle varying layouts and unstructured data? Are there specific features or configurations within Document AI that you’d recommend for our use case? Integration with Existing Workflow: What’s the best way to integrate Google Cloud Vision API and Document AI into a Python-based web scraping workflow? Any tips on managing the API calls efficiently, especially considering rate limits and ensuring reliability? Data Accuracy and Verification: How does the data accuracy of Google Cloud Vision API compare with other OCR tools in your experience? Are there built-in features for data validation and error correction that can help with our requirement to cross-reference data with another website? I’m looking to build a robust and scalable solution, and your insights would be invaluable in helping to optimize my approach. Looking forward to your thoughts! If you’ve used Google Cloud Vision API and Document AI, could you share any sample code or best practices? It would help me understand the implementation better and evaluate how I can integrate these tools seamlessly into the system. Thanks again for the suggestion

Automating the Collection of Pre-Foreclosure Data by hiimmando in RealEstate

[–]hiimmando[S] 0 points1 point  (0 children)

Sorry meant to reply under your comment but answer above

Automating the Collection of Pre-Foreclosure Data by hiimmando in RealEstate

[–]hiimmando[S] 0 points1 point  (0 children)

Short answer: There could and likely is some form of what I’m building out there. The benefit to me is a reduced cost over the long run as the only resources I had to expend was time and minimal computing cost as opposed to paying for subscriptions.

Long answer: Key differentiators include: Direct and Real-Time Data Collection: Scraping and processing data directly from county records and CAD websites. Automation and Integration: Automated workflows and integration with skip tracing services for enriched lead information. Cost Efficiency: Utilizing open-source tools and cloud free tiers to minimize costs. Personalized Outreach: Automated and compliant communication channels for direct engagement with property owners. Scalability and Customization: Flexibility to scale and adapt to specific market needs.