How to Crawl Websites to Download Software Updates with PowerShell

I wrote this script to check for updates to some common software used in my organization. In order to do so, the script uses PowerShell to crawl the website of each piece of software so that it can get the software version number and download link. The script compares the version of the software it found online to the version it found in the local folder inside the script directory. If the version online is newer, it downloads the file and renames it to a nice format including the version number. The software (64-bit where available) crawled for download is:

  • 7-Zip
  • Google Chrome
  • Mozilla Firefox
  • Mozilla Firefox ESR
  • Adobe Flash Player ActiveX
  • Adobe Flash Player NPAPI Plugin
  • Java
  • VLC Media Player

I figured this script would be a good example to share because it uses a few different methods to find the information it needs on each publisher’s website. Some people can find these examples helpful to modify the code to crawl for something else. In the case of Adobe Flash Player for enterprise distribution, the website requires a login, so I just left the script to check the version and then open a browser window going to the login screen so that the user can manually download the MSI files. Since the Adobe enterprise distribution license says the download link cannot be shared, I have edited it out of the script, so you will need to replace it yourself if you want to use this function. Just search for all instances of “https://INSERT-ADOBE-DISTRIBUTION-LINK-HERE” and replace it with the URL you received from Adobe.

I split this script into functions, with a function to download the software and a separate function to crawl the website for each piece of software. Inside of each crawl function, I split the variables and crawl methods that need modification. This also means that you can remove the last part of the script after line 315 if you would rather use the script to manually call each function using the PowerShell console in order to check for just one piece of software.

Again, this script is very application-specific, so unless you need to check for updates and download this specific set of software, you will need to modify the script for whatever software you need. Use it as a very broad example of crawling the web with PowerShell. There are many different ways to do this, and I’m sure some might be better, but this way works exactly as I need it to for my specific use case. And of course if a website for a particular piece of software changes, the crawl function for that software will break, but there’s no way around that with web crawling.

Here is the required directory structure for this specific set of software:

And here is the full PowerShell script:

6 thoughts on “How to Crawl Websites to Download Software Updates with PowerShell”

  1. Hello Boris

    Subject: How to Crawl Websites to Download Software Updates with PowerShell.

    How can exclude the beta versions from the downloads in the $program READ?
    $programREAD = Get-ChildItem “.\$folderName\” -name | Sort-Object -Descending | Where-Object {$_.Name -NotMatch “nls”} | Select-Object -First 1

    Thanks

    1. Hi Akram,

      It depends what website you’re trying to crawl and how their page is laid out in HTML. If the link contains the word “beta” then you can exclude links that contain that string inside the < a > tags. Just an example – it’ll really depend on the web page.

  2. Hello Boris

    Page: How to Crawl Websites to Download Software Updates with PowerShell

    How to change the script to exculde the beta version from the download?

    $programREAD = Get-ChildItem “.\$folderName\” -name | Sort-Object -Descending | Select-Object -First 1
    Thanx

    1. With the way the 7zip download page is built, there is no good easy way to do this, however, if we assume the beta will always be the first link displayed, then we can simply change $programURL = $program[0] to $programURL = $program[1] to select the second 64 bit MSI download displayed on the page instead of the first, which in this case will be the non-beta version.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.