I am going to show you 2 ways of scraping data from a website. One with the standard approach and 1 with JSON.
For this example, I want the stock numbers from yahoo in my worksheet.
This data is rendered in a table format so I can use the following code to bring it back to my worksheet:
Sub ScrapeSite() '3/8/18 - this is really laggy and slow! Dim appIE As Object Set appIE = CreateObject("InternetExplorer.Application") 'navigate to the site With appIE .navigate "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC" '.Visible = True End With ' Wait while the page is loading Do While appIE.Busy DoEvents Loop Dim Table As Object Dim tRows As Object Dim tCols As Object Dim tRow As Object Dim tCol As Object Dim intExcelCol As Integer Dim intExcelRow As Integer Set Table = appIE.document.getElementsByTagName("table") 'these debug statements are to tell us where we are in the process. Debug.Print "Set table..." 'Get the rows 'set tRows = Table(0).getElementsByTagName("tr") 'sometimes the code stalls and receives an error, but just check your internet connection and rerun the procedure. 'Debug.Print "Got Rows..." 'Get the column headings which use the "td" tag Set tCols = Table(0).getElementsByTagName("td") Debug.Print "Got Columns..." intExcelRow = 2 intExcelCol = 1 For Each tCol In tCols 'after filling in 8 columns move to the next row If intExcelCol <= 7 Then ' Output the contents of the cell to the spreadsheet Debug.Print tCol.innerText Sheet1.Cells(intExcelRow, intExcelCol) = tCol.innerText 'Increase the intExcelCol value so the next time around the data will output to the next column intExcelCol = intExcelCol + 1 Else intExcelRow = intExcelRow + 1 intExcelCol = 0 Debug.Print "Row: " & intExcelRow End If Next appIE.Quit Set tRow = Nothing Set tRows = Nothing Set tCol = Nothing Set tCols = Nothing Set appIE = Nothing MsgBox "Done" End Sub
This method worked buy was very clunky and slow.
One of the big things with web scraping involves getting the tables and rows in the table, and then returning the items back to your sheet, and that’s what the example above does.
This slow and sketchy method wouldn’t work for me. So I needed another way.
The data I want is really just a JSON string.
JSON supposedly replaced XML as a standard way of transferring data.
Anyway that’s fine, but Yahoo is speedily rendering stock quotes via a JSON string.
So what we are looking for is the data after the “HistoricalPriceStore” text. If you want to know if you are dealing with a proper JSON string, paste your string into http://json.parser.online.fr/
So, yes, that assessment is correct. So you can use the following faster code to scrape the information:
Sub ParseMultipleRowsJSON() Dim http As New XMLHTTP 'ms xml object library 3.0 Dim html As New HTMLDocument 'ms html object library Dim varResponse As Variant Dim dict As Dictionary 'reference ms scripting runtime for this Dim intJSONRows As Integer Dim intJSONCols As Integer Dim strDate As String strurl = "https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC" 'json online parser at: http://json.parser.online.fr/ (to see if your text is a valid json string) With http .Open "GET", strurl, False .send 'varResponse(1) has the json string varResponse = Split(.responseText, "HistoricalPriceStore"":") End With 'this is using the JsonConverter.bas found at https://github.com/VBA-tools/VBA-JSON (just copy and paste into a module) Set dict = ParseJson(varResponse(1)) 'initialize the excel destination vars intExcelRow = 2 intExcelCol = 1 Set JSONRowsCollection = dict("prices") For intJSONRows = 1 To JSONRowsCollection.Count Set JSONColsCollection = JSONRowsCollection(intJSONRows) 'For intJSONCols = 1 To JSONColsCollection.Count 'If intExcelCol > 7 Then ' intExcelCol = 1 'Else '(((A1/60)/60)/24)+DATE(1970,1,1) - convert the unix date to regular number 43532.60417 strdate1 = JSONColsCollection("date") / 60 / 60 / 24 strDate = strdate1 + DateValue("1/1/1970") 'now format the date as date Sheet1.Cells(intExcelRow, 1) = strDate 'Unix Timestamp date - need to convert Sheet1.Cells(intExcelRow, 2) = JSONColsCollection("open") Sheet1.Cells(intExcelRow, 3) = JSONColsCollection("high") Sheet1.Cells(intExcelRow, 4) = JSONColsCollection("low") Sheet1.Cells(intExcelRow, 5) = JSONColsCollection("close") Sheet1.Cells(intExcelRow, 6) = JSONColsCollection("volume") Sheet1.Cells(intExcelRow, 7) = JSONColsCollection("adjclose") 'Debug.Print JSONColsCollection("date") 'End If 'intExcelCol = intExcelCol + 1 'Next intJSONCols intExcelRow = intExcelRow + 1 Next intJSONRows 'clean up Set http = Nothing Set html = Nothing End Sub
Your end product should look like this:
(no, I didn’t author all of this code, I got the ideas from various places on the internet, and put all the “pieces” together 🙂 . So kudos to all those wonderful people out there, which share their knowledge with us. Thanks!)
Let me know if you have any questions.
****************************************************
|