I am going to show you 2 ways of scraping data from a website. One with the standard approach and 1 with JSON.
For this example, I want the stock numbers from yahoo in my worksheet.
This data is rendered in a table format so I can use the following code to bring it back to my worksheet:
Sub ScrapeSite()
'3/8/18 - this is really laggy and slow!
Dim appIE As Object
Set appIE = CreateObject("InternetExplorer.Application")
'navigate to the site
With appIE
.navigate "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC"
'.Visible = True
End With
' Wait while the page is loading
Do While appIE.Busy
DoEvents
Loop
Dim Table As Object
Dim tRows As Object
Dim tCols As Object
Dim tRow As Object
Dim tCol As Object
Dim intExcelCol As Integer
Dim intExcelRow As Integer
Set Table = appIE.document.getElementsByTagName("table")
'these debug statements are to tell us where we are in the process.
Debug.Print "Set table..."
'Get the rows
'set tRows = Table(0).getElementsByTagName("tr")
'sometimes the code stalls and receives an error, but just check your internet connection and rerun the procedure.
'Debug.Print "Got Rows..."
'Get the column headings which use the "td" tag
Set tCols = Table(0).getElementsByTagName("td")
Debug.Print "Got Columns..."
intExcelRow = 2
intExcelCol = 1
For Each tCol In tCols
'after filling in 8 columns move to the next row
If intExcelCol <= 7 Then
' Output the contents of the cell to the spreadsheet
Debug.Print tCol.innerText
Sheet1.Cells(intExcelRow, intExcelCol) = tCol.innerText
'Increase the intExcelCol value so the next time around the data will output to the next column
intExcelCol = intExcelCol + 1
Else
intExcelRow = intExcelRow + 1
intExcelCol = 0
Debug.Print "Row: " & intExcelRow
End If
Next
appIE.Quit
Set tRow = Nothing
Set tRows = Nothing
Set tCol = Nothing
Set tCols = Nothing
Set appIE = Nothing
MsgBox "Done"
End Sub
This method worked buy was very clunky and slow.
One of the big things with web scraping involves getting the tables and rows in the table, and then returning the items back to your sheet, and that’s what the example above does.
This slow and sketchy method wouldn’t work for me. So I needed another way.
The data I want is really just a JSON string.
JSON supposedly replaced XML as a standard way of transferring data.
Anyway that’s fine, but Yahoo is speedily rendering stock quotes via a JSON string.
So what we are looking for is the data after the “HistoricalPriceStore” text. If you want to know if you are dealing with a proper JSON string, paste your string into http://json.parser.online.fr/
So, yes, that assessment is correct. So you can use the following faster code to scrape the information:
Sub ParseMultipleRowsJSON()
Dim http As New XMLHTTP 'ms xml object library 3.0
Dim html As New HTMLDocument 'ms html object library
Dim varResponse As Variant
Dim dict As Dictionary 'reference ms scripting runtime for this
Dim intJSONRows As Integer
Dim intJSONCols As Integer
Dim strDate As String
strurl = "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC"
'json online parser at: http://json.parser.online.fr/ (to see if your text is a valid json string)
With http
.Open "GET", strurl, False
.send
'varResponse(1) has the json string
varResponse = Split(.responseText, "HistoricalPriceStore"":")
End With
'this is using the JsonConverter.bas found at https://github.com/VBA-tools/VBA-JSON (just copy and paste into a module)
Set dict = ParseJson(varResponse(1))
'initialize the excel destination vars
intExcelRow = 2
intExcelCol = 1
Set JSONRowsCollection = dict("prices")
For intJSONRows = 1 To JSONRowsCollection.Count
Set JSONColsCollection = JSONRowsCollection(intJSONRows)
'For intJSONCols = 1 To JSONColsCollection.Count
'If intExcelCol > 7 Then
' intExcelCol = 1
'Else
'(((A1/60)/60)/24)+DATE(1970,1,1) - convert the unix date to regular number 43532.60417
strdate1 = JSONColsCollection("date") / 60 / 60 / 24
strDate = strdate1 + DateValue("1/1/1970")
'now format the date as date
Sheet1.Cells(intExcelRow, 1) = strDate 'Unix Timestamp date - need to convert
Sheet1.Cells(intExcelRow, 2) = JSONColsCollection("open")
Sheet1.Cells(intExcelRow, 3) = JSONColsCollection("high")
Sheet1.Cells(intExcelRow, 4) = JSONColsCollection("low")
Sheet1.Cells(intExcelRow, 5) = JSONColsCollection("close")
Sheet1.Cells(intExcelRow, 6) = JSONColsCollection("volume")
Sheet1.Cells(intExcelRow, 7) = JSONColsCollection("adjclose")
'Debug.Print JSONColsCollection("date")
'End If
'intExcelCol = intExcelCol + 1
'Next intJSONCols
intExcelRow = intExcelRow + 1
Next intJSONRows
'clean up
Set http = Nothing
Set html = Nothing
End Sub
Your end product should look like this:
(no, I didn’t author all of this code, I got the ideas from various places on the internet, and put all the “pieces” together 🙂 . So kudos to all those wonderful people out there, which share their knowledge with us. Thanks!)
Let me know if you have any questions.
****************************************************
|
|



