Scraping Web Page Data Automatically with Excel VBA

Home More Excel (VBA) Training Videos








Scraping data from web pages can be done in a variety of different ways including the built in feature 'Get  Data from Web' in MS Excel. But this method of extracting data from web pages may not work in many situations. Also we learnt that we can scrape data from web pages like the jobs.com website quickly and easily because the web design of the website allows us to access the web pages and data easily using VBA. But how do you extract data from a website which has a difficult form - the name or ID of the button are not given. The data is hidden in tables upon tables and sometimes within nested table. Here adopt a different strategy although the study of the source code of the web page remains the most important element. A button can be hidden inside a form or connect to the data source via a javascript print command. So study the source code carefully and then device your plan of action. Amazing articles heve been written by Joćo Neto.
In our case we tried to extract data from another very important jobs website monster.com. The button was difficult to locate and could be clicked only indirectly via the form object. The Excel VBA code below describes the complete process:
Sub clickFormButton()
'we define the essential variables
Dim ie As Object
Dim form As Variant, button As Variant

'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
Set ie = CreateObject("InternetExplorer.Application")

'more variables for the inputboxes - makes our automation program user friendly
myjobtype = InputBox("Enter type of job, eg. sales, administration")
myexperience = InputBox("enter your no of years experience, for example, 3")
mycity = InputBox("Enter the city where you wish to work")

With ie

.Visible = True
.navigate ("http://www.monsterindia.com")

' we ensure that the web page downloads completely before we fill the form automatically
While ie.ReadyState <> 4
DoEvents
Wend

'assigning the vinput variables to the html elements of the form
ie.document.getelementsbyname("fts").Item.innertext = myjobtype
ie.document.getelementsbyname("exp").Item(0).Value = myexperience
ie.document.getelementsbyname("lmy").Item.innertext = mycity
' accessing the button via the form
Set form = ie.document.getElementsbytagname("form")

Set button = form(0).onsubmit
form(0).submit

' again ensuring that the web page loads completely before we start scraping data
Do While ie.busy: DoEvents: Loop

Set TDelements = .document.getElementsbytagname("td")
r = 0
c = 0

For Each TDelement In TDelements
Sheet1.Range("A1").Offset(r, c).Value = TDelement.innertext
r = r + 1
Next

End With

' cleaning up memory
Set ie = Nothing

End Sub

Watch the Excel training video below to see how we can scrape data from difficult web pages using Excel vba quickly and easily:


Once you have the data you need to check whether it has blank rows. You can use the following VBA code to remove the blank rows quickly and easily:

Sub deletblankrows()
lastrow = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row
For i = 1 To LastRow
If Cells(i, 1) = "" Then
Cells(i, 1).EntireRow.Delete
End If


Next

End Sub

Note: you may have to run the macro several times to delete all the blank rows!

We can also use Autofilter to remove the blank rows:

sub myautofilter()
Range(Selection, Selection.End(xlDown)).Select
   
    Selection.AutoFilter
    Range("A1").Select
    Range(Selection, Selection.End(xlDown)).Select
    Selection.AutoFilter
    ActiveSheet.Range("A:A").AutoFilter Field:=1, Criteria1:="<>"
    Selection.Copy
    Range("B1").Select
    ActiveSheet.Paste
    Columns("A:A").Select
    Application.CutCopyMode = False
    Selection.ClearContents
    Range("B1").Select
    ActiveWorkbook.Save
End Sub

To extract the required based on a keyword like  "Administration"  you can use the following VBA code:

Sub extractDatatoNeighboringColumn()
lastrow = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row
For i = 1 To LastRow
If InStr(Cells(i, 1).Value, "Administration") Then
Cells(i, 2).Value = Cells(i, 1).Value
End If
Next
End Sub
Watch this video on YouTube
Reference
Home More Excel (VBA) Training Videos