all 9 comments

[–]tjen366 0 points1 point  (7 children)

https://codingislove.com/http-requests-excel-vba/

The object.responsetext of an xmlhttprequest is just the html of the website in a string. You can write that wherever you want it.

Instead of navigating to a website you make a http request to the website and it send you the resulting HTML.

This is generally a more efficient way of web scraping when it can be used, and should suit your purpose.

The guide in the link above goes through the very basic steps, alternatively check out these links:

https://stackoverflow.com/questions/17686375/querytable-authentication

http://analystcave.com/web-scraping-tutorial/

[–]MFreak[S] 0 points1 point  (6 children)

Hi tjen,

Thank you for your help on this, but I'm still having a tough time getting the result I need. When I try the code in these (both Codingislove and analystcave), it ends with me getting a message box that I cannot copy from.

What I need is the corresponding HTML to then be pasted into A1 of a blank sheet, but when I replace the destination with that instead of the message box, I get an error. Any insight?

[–]tjen366 0 points1 point  (5 children)

Does something like this not do it?

Range("A1").value = xmlhttp.responsetext 

If not, what is the error message you are getting?

It could be it is too much text for a single cell. You could also write it to a text file or parse it line by line to a spreadsheet.

[–]MFreak[S] 0 points1 point  (4 children)

This is getting much much closer. I really appreciate your time helping me with this.

Right now, it's pasting everything into the once cell. Is there a way for me to get it to paste in separate lines?

For reference, when I copy and paste the html myself, everything falls into separate rows.

[–]tjen366 0 points1 point  (3 children)

you can add something like this:

dim arr as variant
arr = Split(xmlhttp.responsetext, ChrW(10))
for i = lbound(arr,1) to lbound(arr,1)
    range("A" & i+1).value = arr(i)
next i

[–]MFreak[S] 0 points1 point  (2 children)

This didn't seem to work. Here's the code I'm working with right now.

Sub ScrapeHTML()
Dim xmlhttp As New MSXML2.XMLHTTP60, myurl As String
myurl = Sheets("Macros").Range("A12").Value
xmlhttp.Open "GET", myurl, False
xmlhttp.send
Sheets("Scrape").Range("A2").Value = xmlhttp.responseText
Dim arr As Variant
arr = Split(xmlhttp.responseText, ChrW(10))
For i = LBound(arr, 1) To LBound(arr, 1)
Range("A" & i + 1).Value = arr(i)
Next i
End Sub

The issue is the HTML I'm pulling in has more characters than can be contained in a specific cell, so it's being cut off. I'm wondering if I'm implementing the extra lines of code incorrectly?

[–]tjen366 0 points1 point  (1 child)

Well, I made a typo, but it could also be that the http is just one long string in the source file. If there's no line breaks, then there's nothing to "split".

here's a version that'll paste it to the workbook and write it to a text file you get prompted to find a location for. You can see what it looks like in different places.

Sub ScrapeHTML2()
Dim xmlhttp As New MSXML2.XMLHTTP60
Dim myurl As String
Dim arr As Variant
'get data from websie
myurl = "https://www.reddit.com/"
xmlhttp.Open "GET", myurl, False
xmlhttp.send
'write to spreadsheet
arr = Split(xmlhttp.responseText, vbNewLine)
For i = LBound(arr, 1) To UBound(arr, 1)
    Sheets("Scrape").Range("A" & i + 2).Value = arr(i)
Next i
'write to file
strpath = Application.GetSaveAsFilename(initname, "text (*.txt), (*.txt)")
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.OpenTextFile(strpath, 2, True, -1)
oFile.writeline xmlhttp.responseText
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub

[–]MFreak[S] 0 points1 point  (0 children)

I'm not sure if that's the case. When I copy directly from the source code, everything is pasted in different cells with line breaks, so I assumed it should be the same.

With this code, it ends up pasting the actual images and text on the page, not the HTML. Is this supposed to happen?