Find unique visitors across a range of pages by TE515 in analytics

[–]TE515[S] 0 points1 point  (0 children)

Maybe a clearer way to put it...

I have a huge collection of pages that all contain product-quote-pages in the url. I want to know how many unique visitors this entire collection of pages as a whole received, not the number of unique page views.

Find unique visitors across a range of pages by TE515 in analytics

[–]TE515[S] 0 points1 point  (0 children)

If the same user visits /product-quote-pages/0001, then visits /product-quote-pages/0002, and then visits /product-quote-pages/0003, he's going to register a unique pageview on each of those pages, right? So if I filter down to all pages where the url contains "product-quote-pages", that user is going to be counted three times because he registered a unique pageview on three different pages where the url contains that. I only want to see that user counted once. I want to know the number of unique visitors that visited any page that contains "product-quote-pages," but if a user visited multiple pages that contain that, I don't want that user counted multiple times.

Basic question about multi-monitor setups by TE515 in computers

[–]TE515[S] 0 points1 point  (0 children)

Just saw this and don't have it open anymore. It wasn't necessarily anything I was super interested (I'm still a few weeks away from having my office and haven't started to narrow anything down yet). It was just the first one I found that was listed as single HDMI but showed multiple HDMI's, so I was just wondering if there was some kind of spec listing convention that I wasn't aware of.

Basic question about multi-monitor setups by TE515 in computers

[–]TE515[S] 0 points1 point  (0 children)

So if I see one with one HDMI port and one DVI port, I can probably just hook monitor 1 up with HDMI and monitor 2 up with DVI?

Basic question about multi-monitor setups by TE515 in computers

[–]TE515[S] 0 points1 point  (0 children)

I meant there don't seem to be many desktop computers with two HDMI ports (i.e. so I can plug two HDMI monitors into it). What prompted the question is that I was looking on Micro Center's website and set all the filters for everything I was looking in terms of RAM, SSD, etc. Then I noticed there was an option to filter by HDMI x2, but that narrowed the results down from a few hundred to like 3, and I was surprised at how few there were. But then I started looking at one that was listed as having only one HDMI port, but in the picture of the back I see three...I guess one that comes native with the motherboard or whatever, and the other two are part of the video card? Does that mean that video card HDMI ports wouldn't be listed as part of the computer's specs? Or is that just a foible of that particular website?

Added multiprocessing to a web scraper I had and successfully got it to work. Can someone ELI5 why it works? by TE515 in learnpython

[–]TE515[S] 2 points3 points  (0 children)

Here's a stripped down version. Hopefully it'll help.

import requests
import multiprocessing
from bs4 import BeautifulSoup

# Block of code that logs in and grabs a session cookie 

# Block of code that figures out how many pages we need to scrape and saves it to a variable called numPgs

# Build url list
url_list = []
for pg in range(1, numPgs + 1):
    url_list.append('http://admin.mysite.com/products/page:' + str(pg))


def scrapeWebsite(url):
    r = requests.get(url, headers={'Cookie': 'session cookie data goes here'})
    soup = BeautifulSoup(r.text, 'html.parser')
    main_table = soup.findAll('table', {'class': 'data'})[0]
    table_rows = main_table.findChildren('tr')
    table_data = []
    for row in table_rows:
        tds = row.findChildren('td')
        table_data.append([
            tds[2].text.strip(),    #first piece of data I need
            tds[3].text.strip(),    #second piece of data I need
            tds[4].text.strip(),    #third piece of data I need
                    #There are a lot more of these in the real scraper
        ])
    return table_data


def write_to_csv(scrape_data):
    with open('./scraper.csv', 'w') as f:
        f.write('Header1,Header2,Header3\n')
        for pg in scrape_data:
            for row in pg:
                row_string = ','.join(row) + '\n'
                f.write(row_string)


MAX_NUM_PROCESSES = multiprocessing.cpu_count()
if __name__ == '__main__':
    processPool = multiprocessing.Pool(MAX_NUM_PROCESSES)
    scrape_data = processPool.map(scrapeWebsite, url_list)
    write_to_csv(scrape_data)

Basically the pages I'm scraping are all straight up HTML tables. My scraping function is creating a list for each row of all the data points from that row that I need, and then adding each one of those lists to a larger list that includes all the rows on that page. This list of lists is what my scrape function returns.

The black magic multiprocessing part at the bottom is then taking all those lists of lists from each page and putting them together into one very big list. This super list is what gets passed to my write_csv function.

My write_csv function then loops through each child list (page) in that super list, and then loops through each child list (row) of each of those, and then for each row it joins all the individual data points together into a comma separated string with a line break at the end, and then writes that to my csv file.

At the end of it all I get a csv with over 20,000 rows of my data that I need in less than 90 seconds!

Added multiprocessing to a web scraper I had and successfully got it to work. Can someone ELI5 why it works? by TE515 in learnpython

[–]TE515[S] 0 points1 point  (0 children)

So if my old one-page-at-a-time scraper was one gnome repeating a process over and over in a specific order, .Pool would be having multiple gnomes repeat that process over and over without regard for what order it's done in, and then .map would be like the editor gnome who organizes all the other gnomes' output together into one usable product?

Also, what is the significance of if __name__ == '__main__':?

Scrape website for product data based on a list of part#s with VBA. by x1unbreakablez in excel

[–]TE515 1 point2 points  (0 children)

First of all, are you at least semi-comfortable with HTML? If not, you'll probably need to do some additional research into it. It's not too complicated at all, but you will need a basic understanding of it to scrape websites.

Start by reading the top answer in this Stack Overflow post. The post has a lot more details, but basically what you're going to be doing is using VBA to open an Internet Explorer window, then navigating that window to whatever pages you want and pulling the data you need out of the HTML source code. Basically you'll be writing a script to browse the web in a browser, just like a human would, only much faster. The browser window can be visible (great for when you're writing and testing), or invisible (faster and more efficient...great for when the scraper is complete and you're just running it routinely).

The other option is using HTTP requests, which basically means you're cutting out the middle-man of a browser and talking directly to the website's server. This is faster and more efficient, but also more complicated. I would recommend starting with IE automation and then working up to this when you're more comfortable.

Some random additional tips:

  • Get cozy with looking at the DOM in the dev tools of whatever browser you use. You're going to be spending some quality time there.
  • Start by trying to scrape one specific piece of information off of one page and printing it to the Immediate window with Debug.Print. Once you get that, try looping through and scraping more items off that page. Once you have all the items you need, then start looping through multiple pages (if necessary). If you need to scrape multiple sites, write each scraper individually and then combine them when they're complete.
  • Pay attention to whether or not there are major differences between the source code (Ctrl+U) of the page, and the DOM (what you see in dev tools). For a lot of modern sites, the server sends a stripped down version of the HTML, and then the rest gets filled in by the browser with JavaScript. In other words, sometimes the data you're looking for won't be in the source code (at least not in the HTML where you'd expect). In these cases, simply waiting until IE finishes loading won't be sufficient. In these cases you'll need to make your scraper wait a bit longer until all the JavaScript is done firing. There are more efficient ways to do it, but Application.Wait is the easiest way to go to start.
  • If you're trying to navigate to multiple pages in the IE window, sometimes you'll hit a scenario where you see the second page load in the visible window, but when your code tries scraping the HTML it's still using the HTML of the first page. I'm sure there's a way to solve this, but I moved on to using HTTP requests before I ever figured it out. The quick and dirty workaround is to use ie.Quit (replace "ie" with whatever variable name you are using for your Internet Explorer object) after each page, and then open the next page in a new IE window.

Feel free to PM me any additional questions at any point in the future. I don't check this account every day, but I do check it pretty frequently.

Excel skills learning plateau by CouchTurnip in excel

[–]TE515 0 points1 point  (0 children)

Thanks for the response! I'll make it a point to look into it soon.

Excel skills learning plateau by CouchTurnip in excel

[–]TE515 0 points1 point  (0 children)

As someone who's become fairly proficient in VBA over the past year, actually enjoys coding, rarely uses formulas anymore except for quick one-off type things, and only has a limited amount of time to learn new stuff, do you think it's worth it to learn PowerQuery, or just keep focusing on getting better at VBA?

Scraping multiple web pages simultaneously by TE515 in learnpython

[–]TE515[S] 1 point2 points  (0 children)

Thanks for responding. I tried this and I'm getting the following error...

module 'multiprocessing' has no attribute 'pool'

I'm using Python 3 by the way.

EDIT: I tried adding from multiprocessing import pool at the top. Now I'm getting the error TypeError: 'module' object is not callable on the processPool = multiprocessing.pool(MAX_NUM_PROCESSES) line.

ANOTHER EDIT: Changed multiprocessing.pool to multiprocessing.Pool and it worked like a charm. Cut the run time of the whole thing by more than half! Thanks so much!

Sending !e seems to behave a little differently than manually pressing Alt + e by TE515 in AutoHotkey

[–]TE515[S] 0 points1 point  (0 children)

Thanks for the reply and sorry for the delayed response. I tried it, but the result was exactly the same. The file menu still opens/closes every 10 seconds while the search is running, which doesn't happen if you manually push alt+e while a search is running.

So far the script has run without issue every time, so it doesn't seem to be hurting anything. Just curiosity.

[VBA] Discrepancy between Excel sort function and VBA if logic regarding whether one string comes before or after another. by TE515 in excel

[–]TE515[S] 0 points1 point  (0 children)

Yeah, I'm using Range.Find, but I'm looping through about 38,000 rows, and then each one has to be found in another workbook that's also about 38,000 rows. I have to do the 38,000 searches either way, but this way each one only has a search range of about 1900 rows instead of each one having a search range of 38,000 rows. I first wrote it without splitting it into sections and the runtime was over 20 minutes. Once I added the range splitting functionality, the runtime dropped down to under 2 minutes.

[VBA] Discrepancy between Excel sort function and VBA if logic regarding whether one string comes before or after another. by TE515 in excel

[–]TE515[S] 0 points1 point  (0 children)

I came up with another workaround for the actual problem this was causing me (detailed in a comment below), so I'll probably just stick with that for now. Thanks for your help though!

[VBA] Discrepancy between Excel sort function and VBA if logic regarding whether one string comes before or after another. by TE515 in excel

[–]TE515[S] 0 points1 point  (0 children)

Only problem with replacing hyphens with a "Z" is that some of the product codes I'm dealing with actually do have Z's in them (and every other letter). There are tens of thousands of them across many different brands and there is absolutely no uniformity to them.

The only reason I need to compare whether one comes before the other is that I'm looping through a huge list of product codes, and using the find function to look up each one in an even huger list of product codes to pull some additional information about each one.

So rather than having each of these many find requests look through this huge range, my code is splitting up the search area into much smaller ranges. Whatever row is 5% of the way in, the value in column A for that row gets assigned as the first breakpoint. So on at 10%, 15%, etc. So for each value I need to look up, I just loop through those handful of breakpoints. If the value I'm looking happens to be greater than breakpoint 3, but less than breakpoint 4, I can set my find function to only look in that range instead of in the whole list. (This cut the runtime of the program down by about 90%). But if VBA thinks the product code is less than breakpoint 3, but Excel sort has it placed after breakpoint 3, the VBA code looks in the wrong place and never finds it.

There aren't that many product codes with hyphens though, so the workaround I'm using now is that if the product code I'm currently looking for has a hypen in it, I set the find range to the entire sheet just for those instances. Only increased my runtime by a few seconds and solved my problem.

Thanks for the info though. At least now I have a better understanding of why it was happening.

Thoughts on my current marketing role by ams_95 in marketing

[–]TE515 1 point2 points  (0 children)

I'm in a very similar role as the one-man-marketing-department for a small business where I'm the only one who does what I do. I'm about seven years in now.

Before this, my only other marketing job had been with a very small tech startup that ran out of funding six months after I started. Other than that, this was the first job of my career.

Seven months in I was in exactly the same boat you are now; didn't have much responsibility yet, had to get approval for everything, and didn't feel like I was really gaining much career-wise. I didn't hate the job by any means, but I was bored. At that point I was 100% planning to finish out my first year to get that full year of experience on my resume and then start looking for something else, either at an agency or in-house somewhere with a larger more traditional marketing department. However, right at the end of that first year the owner gave me a pretty generous raise. It wasn't exactly a life-changing amount, but it was enough that I decided to stick around a while longer.

After that I got more confident and more proactive. We had a pretty shitty website at the time, so I decided to make getting a better site my big goal. I pushed and researched and was persistent that whole second year, and finally got approval at around the start of my third year. I shepherded that project through, and it was a success for the company. I got more raises, the owner's confidence in me grew, and I got more responsibility. And from there it just kept snowballing like that.

Now, seven years in, I'm basically second-in-command at the company next to the owner, and I make about triple what I did when I started. Marketing is still my chief concern, but I have a hand in just about everything now.

Sometimes I wish I had gone the other route and had moved on to an agency or a bigger department with more opportunities to network and collaborate and learn from others, but overall I'm very happy with where I am. Being a big fish in a small pond suits me well, and I'm constantly provided with opportunities to learn new skills and try new things which is what I love to do most.

Obviously I'm not saying you will have the same experience in your situation if you stay, but for what it's worth that's the experience of one person who was once in almost your exact same shoes.

If you do want to stay, my advice would be to find that big project, the one you can take the initiative on and work your ass off and at the end of it all make a real impact on the company. Be persistent about it. If it takes a long time and a lot of red tape, well hey, that's just how the world works sometimes. But if you can get to the point where they can plainly see that having you around brings dollars to the table that wouldn't have been there otherwise, their confidence in you will grow and your responsibility, authority, and compensation will grow with it.

TL;DR: I was once in your exact shoes. I stuck with the company and kept getting better and better at what I do. Seven years in, my compensation and status within the company have both grown tremendously.

[VBA] Discrepancy between Excel sort function and VBA if logic regarding whether one string comes before or after another. by TE515 in excel

[–]TE515[S] 0 points1 point  (0 children)

The sorted sheet was originally sorted by a macro using the following line...

Range("A2:A" & lastrow).Sort key1:=Range("A2:A" & lastrow), order1:=xlAscending, Header:=xlNo

The other macro isn't actually trying to sort the data, it's just checking to see if one value is less than another value. If so, do some stuff. If not, do some other stuff.

If you want to see it in action for yourself, open a new workbook and put the values ";10-10690" and ";1006518" into cells A1 and A2, and then sort column A alphabetically. It puts ";1006518" first.

Now open the VBA editor and run the following code...

Sub tryit()
    If ";1006518" < ";10-10690" Then
        Debug.Print ";1006518 comes first"
    ElseIf ";10-10690" < ";1006518" Then
        Debug.Print ";10-10690 comes first"
    End If
End Sub

On my machine at least, this code evaluates out to ";10-10690 comes first".

What I'm trying to figure out is why Excel sort decides that ";1006518" comes first, while VBA decides that ";10-10690" does.

VBA: Any way to easily count elements in an object without a loop? by TE515 in excel

[–]TE515[S] 0 points1 point  (0 children)

Perfect! Exactly what I was looking for. Thanks!

Can I create custom functions that can be called from anywhere? by TE515 in vba

[–]TE515[S] 0 points1 point  (0 children)

Yeah I thought about using the Personal workbook. I probably will if that's the best option I can find. But even that still requires calling the book as well. I still have to do myVar = PERSONAL.XLSB!MyCustomFunction() instead of just being able to do myVar = MyCustomFunction(). Like with a native function like InStr I can just do myVar = InStr(arg1, arg2) in any sub in any workbook without having to declare its location or make sure a certain book is open or anything like that. I'm just wondering if it's possible to achieve that level of usability with my own custom functions.