The Invoke-WebRequest cmdlet is a command in PowerShell that allows you to send HTTP and HTTPS requests to web servers and retrieve the response. It is primarily used for web scraping, automating web tasks, and interacting with web-based APIs.
With Invoke-WebRequest, you can perform various actions such as downloading web content, submitting forms, sending headers, handling cookies, and more. It provides a way to interact with web pages and retrieve HTML or other data from them.
Table of Contents
Invoke-WebRequest: The Basics
This cmdlet was introduced in Windows PowerShell 3.0. It has different aliases depending on the PowerShell edition you’re using.
In Windows PowerShell, the Invoke-WebRequest cmdlet aliases are curl, iwr, and wget.
In PowerShell Core, the alias is limited to iwr.
The basic usage of Invoke-WebRequest is retrieving the webpage. The only required information is the URL.
Invoke-WebRequest -Uri <website URL>
For example, I’ll download the https://theitbros.com website and save it to the $web variable.
Note. If the command has hanged when using Windows PowerShell, append the -UseBasicParsing switch:
$web = Invoke-WebRequest -Uri https://theitbros.com -UseBasicParsing
Next, Let’s find out the resulting data type:
$web.GetType()
The resulting object is a BasicHtmlWebResponseObject data type.
What properties does this object have?
$web | Get-Member -MemberType Properties
You can already guess by the properties listed in the above screenshot the type of data you can get from each property.
For example, the Content property contains the HTML code of the webpage. While the RawContent property also includes the HTML code, including the headers.
The Images and Links properties contain information about the images and links on the website.
Web Scraping
Web scraping refers to the automated process of extracting information or data from websites. It involves writing code to retrieve specific data from web pages and saving it in a structured format, such as a spreadsheet or a database.
Web scraping enables you to gather data from multiple sources quickly and efficiently without the need for manual copying and pasting.
Using Invoke-WebRequest, or any other tool, to do web scraping is not a one-size fits all process. You need patience, apart from your code-analyzing skills.
Note. The following wed-scraping examples using Invoke-WebRequest works as of this writing. However, it may eventually stop working when the target websites change.
Example: Extract the Article Title and Links
Suppose I want to get a list of all articles shown on the https://theitbros.com homepage.
The first thing I’ll do is run the following command to store the webpage in a variable:
$s = Invoke-WebRequest -Uri https://theitbros.com
Next, I’ll inspect which properties I can use:
$s.Links | Get-Member -MemberType Properties
According to the screenshot below, the properties are:
- href — contains the bare URL address.
- outerHTML — container the HTML hyperlink code.
- tagName — the href property HTML tag.
Now that I know the properties, I can simply run the below command to get all URLs.
$s.Links | Select-Object href
As you can see above, the href property lists all URLs found on the website, including links to non-articles. So I need to filter out some URLs that are not relevant.
When I analyzed the URLs, I realized that I needed to exclude the following:
- The website’s root URL — https://theitbros.com/
- URLs with these parts:
- /about-the-authors/
- /category/
- /contact-us/
- HTML A tags that don’t start with — <a href=”https://theitbros.com
The below code translates that logic into a PowerShell filter using Where-Object.
$s.Links | Where-Object { $_.href -ne "https://theitbros.com/" -and $_.href -notlike "*/privacy-policy/*" -and $_.href -notlike "*/category/*" -and $_.href -notlike "*/author/*" -and $_.href -notlike "*/about-the-authors/*" -and $_.href -notlike "*/contact-us/*" -and $_.outerHTML -like "<a href=`"https://theitbros.com*" }
When I ran the above command, I got the below result.
At this point, the result is more clear. But I only want to scrape the article title and URL. In this case, I need to extract the title in between the > and < characters. With that in mind, I’ll use a RegEx match using this pattern – ‘>(.*?)<’.
Below is the updated code.
# Get TheITBros.com article links ## Store the page in a variable $s = Invoke-WebRequest -Uri https://theitbros.com -Verbose ## Filter links $s.Links | Where-Object { $_.href -ne "https://theitbros.com/" -and $_.href -notlike "*/privacy-policy/*" -and $_.href -notlike "*/category/*" -and $_.href -notlike "*/author/*" -and $_.href -notlike "*/about-the-authors/*" -and $_.href -notlike "*/contact-us/*" -and $_.outerHTML -like "<a href=`"https://theitbros.com*" } | Select-Object ` @{ n = 'Title'; e = { $(([regex]::Match($_.outerHTML, '>(.*?)<')).Groups[1].Value) } }, @{ n = 'URL'; e = { $_.href } }
The result is shown below.
That’s how you can use the Invoke-WebRequest cmdlet to perform web scraping.
Example: Run a Google Search and Extract the Title and URL
Using the same approach as the previous example, here’s a working Invoke-WebRequest example on how to run a Google search and retrieve the title and URL from the results.
Note. This example was tested only in Windows PowerShell.
# Run a Google search $r = Invoke-WebRequest -Uri https://google.com/search ` -Body @{q = 'Recover Domain Controller' } $r.Links | Where-Object { $_.InnerHtml -match 'class=DnJfK' } | Select-Object ` @{n = 'Title'; e = { ($_.innerText).Split("`n")[0] } }, @{n = 'URL'; e = { ($_.href).Replace('/url?q=', '').Split('&')[0] } }
Resolving Shortened URLs
Are you familiar with URL shorteners like TinyURL, Bitly, and ShortURL? Their purpose is simple — to shorten URLs.
For example, these short URLs resolve to — https://theitbros.com/how-to-restore-domain-controller-from-backup/.
- https://tinyurl.com/yckmmarn
- https://bit.ly/3OpOZrB
- https://shorturl.at/JRU27
The Invoke-WebRequest can help extract the original URL where these short URLs redirect.
First, let’s get run the following Invoke-WebRequest using one of the shortened URLs:
$shortUrl = '[https://tinyurl.com/yckmmarn](https://tinyurl.com/yckmmarn)' $result = Invoke-WebRequest -Uri $Url -UseBasicParsing
On Windows PowerShell, you can extract the original URL from the BaseResponse.ResponseUri.AbsoluteUri property.
On PowerShell Core, the original URL is in the BaseResponse.RequestMessage.RequestUri.AbsoluteUri property.
Downloading Files
The Invoke-WebRequest cmdlet also lets you download files and save them to a path. To do so, you must append the -OutFile <FilePath> parameter.
For example, the command below downloads the file from ‘http://speedtest.ftp.otenet.gr/files/test100k.db’ to the current directory.
$fileURL = 'http://speedtest.ftp.otenet.gr/files/test100k.db' Invoke-WebRequest -Uri $fileURL -OutFile $($fileURL).Split('/')[-1]
If the source requires authentication, you can do so using the following examples.
Basic Authentication
To perform basic authentication, use the -Credential parameter and provide the PSCredential object:
$credential = Get-Credential Invoke-WebRequest ` -Uri $fileURL ` -OutFile $($fileURL).Split('/')[-1] ` -Credential $credential
Certificate Authentication
To perform certificate-based authentication, you can use the -CertificateThumbprint parameter and specify the certificate thumbprint you want to use. The thumbprint must valid and found in the personal certificate store.
$CertificateThumbprint = '0C251EA5C919AE134AA8494AA7FE15DE3A5E3635' Invoke-WebRequest ` -Uri $fileURL ` -OutFile $($fileURL).Split('/')[-1] ` -CertificateThumbprint $CertificateThumbprint
You can also use the -Certificate parameter, followed by the X509Certificate2 object:
$Certificate = Get-Item Cert:\CurrentUser\My\0C251EA5C919AE134AA8494AA7FE15DE3A5E3635 Invoke-WebRequest ` -Uri $fileURL ` -OutFile $($fileURL).Split('/')[-1] ` -Certificate $Certificate
NTLM/Kerberos Authentication
If the authentication mechanism to use is NTLM or Kerberos, you can append the -UseDefaultCredentials switch instead. This switch will use the currently logged-on user’s credentials.
$fileURL = 'http://speedtest.ftp.otenet.gr/files/test100k.db' Invoke-WebRequest ` -Uri $fileURL ` -OutFile $($fileURL).Split('/')[-1] ` -UseDefaultCredentials
Conclusion
In conclusion, the Invoke-WebRequest PowerShell cmdlet is a powerful tool that allows developers and system administrators to send HTTP and HTTPS requests programmatically. With its rich set of features and user-friendly syntax, it simplifies the process of interacting with web resources, retrieving data, and automating web-based tasks.
We explored the fundamental concepts of making HTTP and HTTPS requests using Invoke-WebRequest, starting with the basic syntax and parameters. By examining the response object, we gained insights into the server’s response status, headers, and content.
By harnessing the power of Invoke-WebRequest, developers and system administrators can streamline their workflows, enhance productivity, and unlock a world of possibilities in their PowerShell scripting endeavors.