List Links On A Web Page ( Using PowerShell)

Background

At work, we have setup a collaboration site that allows users to post documents and images.

It happens to be back-end by Microsoft SharePoint.

With all the things I love about Microsoft, one of the areas that continues to stumble me is the structurally layout of SharePoint Lists and Document Stores (when one is try to gain that knowledge from the DB).

Yes, I know I should not sneak in and try to access the back-end directly.

But, the exposed Web Services run best when ran on the SharePoint Server.

Code

Here is a working code that I am slaving on.

Configuration File

Synopsis

To allow quick uptake we have a very shallow configuration file.

Here are the elements that we cover:

…  A sample is here.

Configuration Item Usage Sample
configuration/appsettings/outputFileName  Output File name  c:\tmp\pInterestImages0614PM.txt
 configuration/appsettings/searchTag Search Tag pin (as we are targeting http://www.pinterest.com ).

Will usually be jpeg, gig, docs, pdf, etc

 configuration/appsettings/ WebSiteList List of web pages  https://www.pinterest.com/daniel_adeniji/autumn-october/

 

Sample XML File



<?xml version="1.0"?>

<configuration>

	<startup>
		<supportedRuntime version="v2.0.50727" safemode="true"/> 
	</startup>


	<appSettings>

		<add
			key="outputFileName"
			value= "c:\tmp\pInterestImages0614PM.txt" 
		/>


		<add
			key="searchTag"
			value="/pin/" 
		/>
		
		<WebSiteList>
				
			<add 
			   name="AutumnOctober"
               URL="https://www.pinterest.com/daniel_adeniji/autumn-october/"
				/>

				
		</WebSiteList>


	</appSettings>

	
</configuration>


Sample XML File

PowerShell – Code



#############################################################################

#.DESCRIPTION

# Lists the names of Links \ Documents on the submitted Web Sites

#

#.PARAMETER Configuration File

# Configuration File

#

#.EXAMPLE

# listWebPageLinks configurationFile

##############################################################################

param

(

		$configFile = $(throw "You must specify a configuration file")

)

#Indicates that we need to be running at minimal PowerShell 3.0

#requires -version 3.0



#Declare Module Variables
$HTTP_SUCCESS = 200


$ERROR_MESSAGE_HTTP_ERROR = "Invoke-WebRequest failed while connection to {0}.  Error Code is {1} and Error Description {2}"

$ERROR_MESSAGE_CONFIGURATION_FILE_FOUND  = "File {0} found"

$ERROR_MESSAGE_CONFIGURATION_FILE_NOTFOUND  = "File {0} not found!"

$ERROR_MESSAGE_CONFIGURATION_FILENAME_NOT_INDICATED = "Filename not indicated in XML File"

$ERROR_MESSAGE_CONFIGURATION_FILENAME_NOT_STRING = "Filename not string in XML File - Filename is {0}"

$ERROR_MESSAGE_CONFIGURATION_SEARCHTAG_NOT_INDICATED = "Search Tag not indicated in XML File"





# Initialize Arrays

$objLinks = @()

$objLinksAll = @()



#Construct an out-array to use for data export

$OutArray = @()


######################################################
#Declare variables
######################################################
#set iRC to 0
[long]$iRC = 0

[int]$iWebSiteID = 0;
[string]$outputFileName = $null;
[string]$searchTag = $null;
[int] $iNumberofMatches = 0;
[int] $iNumberofMatchesGlobal = 0;

# Declare XML Object
$xmlConfig = New-Object -TypeName XML

$strLog = "XML Configuration File is {0}" -f $configFile
Write-Host $strLog


# Validate that configuration file exists
If ( (Test-Path $configFile) -eq $False)
{

  $errLine = $ERROR_MESSAGE_CONFIGURATION_FILE_NOTFOUND -f $configFile
  Write-Host $errLine -ForegroundColor red            
  
  #http://stackoverflow.com/questions/2022326/terminating-a-script-in-powershell
  throw $errLine

}


# Load Configuration File
$xmlConfig.Load($configFile)

# read app settings
$objAppSettings = $xmlConfig.configuration.appSettings

# read app settings - add section
$objAppSettingsAdd = $xmlConfig.configuration.add

# get output file name courtesy of app settings
$outputFileNameConfig = $objAppSettings.add.outputFileName


# get item list
$addList = $xmlConfig.SelectNodes("/configuration/appSettings/add")


# Iterated element list
foreach($add in $addList)
{

	if ($add.key -eq "outputFileName")
	{

		[string]$outputFileName = $add.value;

	}
	elseif ($add.key -eq "searchTag")
	{

		[string]$searchTag = $add.value;

	}



}


# Ensure that output file name is specified
# http://techibee.com/powershell/check-if-a-string-is-null-or-empty-using-powershell/1889
if ([string]::IsNullOrEmpty($outputFileName)) 
{

	$errLine = $ERROR_MESSAGE_CONFIGURATION_FILE_NOTFOUND
	Write-Host $errLine -ForegroundColor red            

	#http://stackoverflow.com/questions/2022326/terminating-a-script-in-powershell
	throw $errLine

}


# Ensure that Search Tag is specified
if ([string]::IsNullOrEmpty($searchTag)) 
{

	$errLine = $ERROR_MESSAGE_CONFIGURATION_SEARCHTAG_NOT_INDICATED
	Write-Host $errLine -ForegroundColor red            

	#http://stackoverflow.com/questions/2022326/terminating-a-script-in-powershell
	throw $errLine

}


#Sorround Search Tag in wildcard
$searchTagWildcard = '*' + $searchTag + '*'


# read list of Web Sites
$objWebSiteList = $objAppSettings.WebSiteList

#$objWebSiteList | Get-Member


#############################################################
# Display Output Variables
#############################################################
$strLog = "Output Filename :- {0}" -f $outputFileName
Write-Host $strLog -ForegroundColor black

$strLog = "Search Tag :- {0}" -f $searchTag
Write-Host $strLog -ForegroundColor black


$strLog = "Search Tag (Wildcard) :- {0}" -f $searchTagWildcard
Write-Host $strLog -ForegroundColor black



# for each Web Site in our config file		
foreach ($configNode in $objWebSiteList.add)
{

	###################################################
	# Set Loop Variables
	###################################################
	$iWebSiteID = $iWebSiteID +1
	$webSiteName = $configNode.name
	$webSiteURL = $configNode.URL


	try
	{



		$strLog = "Processing WebSite {0}) {1} {2}"  `
					-f $iWebSiteID, $webSiteName, $webSiteURL

		Write-Host $strLog -ForegroundColor black


		# Added -UseDefaultCredentials, as without it got the error indicated below
		# The remote server returned an error: (401) Unauthorized.
		$htmlWebResponseObject = Invoke-WebRequest -URI $webSiteURL `
									-UseDefaultCredentials `
									-DisableKeepAlive



		# Get Web Request Status Code
		$WROStatusCode = $htmlWebResponseObject.StatusCode 


		# Get Web Request Status Description
		$WROStatusDescription = $htmlWebResponseObject.StatusDescription


		#Clear Collection
		$objLinks.Clear()


		# If we succcesfully connected to Web Site
		if ($WROStatusCode -eq $HTTP_SUCCESS)
		{

			# Get Web Links
			$objLinksAll = $htmlWebResponseObject.Links

			# commented out that lists out type name			
			#$linkType = $objLinksAll.GetType().FullName
			

			# Get # of Links
			$iNumberofLinks = $objLinksAll.Count

			# initialize loop variables
			$iLinkID = 0;
			$iNumberofMatches = 0;

			# Iterated through links
			while ($iLinkID -lt $iNumberofLinks)
			{

				# Get Link Node
				$objLinkNodeX = $objLinksAll[$iLinkID]

				# Get Link HREF
				$linkHREF = $objLinkNodeX.href

				# Get Link Text
				$linkText = $objLinkNodeX.innerText

				# If matches what we are looking for
				if (
				          ($linkHREF -like $searchTagWildcard) `
					  -or ($linkText -like $searchTagWildcard) `
				   )

				{

					#Increment Number of matches
					$iNumberofMatches = $iNumberofMatches + 1;

					# create match object
					$objPSLinkNode = New-Object PSObject

					# set match object properties
					Add-Member -InputObject $objPSLinkNode -MemberType NoteProperty -Name innerText -Value ""
					Add-Member -InputObject $objPSLinkNode -MemberType NoteProperty -Name href -Value ""

				
					# set match object property data
					$objPSLinkNode.href = $objLinkNodeX.href
					$objPSLinkNode.innerText = $objLinkNodeX.innerText

					# Add matching node to list of Nodes
					$objLinks += $objPSLinkNode
					

				}



				#move to next link				
				$iLinkID = $iLinkID +1

				

			} #while ($iLinkID -lt $iNumberofLinks)


			#If matches identified, add them to Global List
			if ($objLinks)
			{

			
				$strLog = "`t Number of matches {0}" -f $iNumberofMatches
				Write-Host $strLog -ForegroundColor black


				# If Number of matches greater than 0, append to mainline list			
				if ($objLinks.Count -gt 0)
				{
				
					#Add the object to the out-array
					$outarray += $objLinks
					
					# Increment Global Number of Matches
					$iNumberofMatchesGlobal = $iNumberofMatchesGlobal `
						+ $iNumberofMatches
					
				} #if ($objLinks.Count -gt 0)
				

			} #if ($objLinks)
			

		} # unable to connect to web site
		else
		{

			$strLog = $ERROR_MESSAGE_HTTP_ERROR -f `
						  $webSiteURL, $WROStatusCode `
						, $WROStatusDescription

			Write-Warning $strLog


		}

		

	}

	catch

	# Powershell try/catch/finally

	# http://stackoverflow.com/questions/6779186/powershell-try-catch-finally

	{

		Write-Host "_____________________________________________________________________" -ForegroundColor red

		$strLog = "Failed to connect to URL :- {0}" -f $webSiteURL
		Write-Host $strLog -ForegroundColor red

		$errLine = "Exception Type :- {0}" -f $_.Exception.GetType().FullName
		Write-Host $errLine -ForegroundColor red

		$errLine = "Exception Message :- {0}" -f $_.Exception.Message
		Write-Host $errLine -ForegroundColor red

		$errLine = "Exception Source :- {0}" -f $_.Exception.Source
		Write-Host $errLine -ForegroundColor red

		# http://stackoverflow.com/questions/17226718/how-to-get-the-line-number-of-error-in-powershell
		$lineNumber = $_.InvocationInfo.ScriptLineNumber
		$errLine = "Script Line Number :- {0}" -f $lineNumber
		Write-Host $errLine -ForegroundColor red
		

		$errLine = "Exception HResult :- {0}" -f $_.Exception.HResult
		Write-Host $errLine -ForegroundColor red


		$errLine = "Exception Inner :- {0}" -f $_.Exception.InnerException 
		Write-Host $errLine -ForegroundColor red
		
		# set module return code	
		$iRC =	$_.Exception.HResult
		

		$errLine = "Exception StackTrace :- {0}" -f $_.Exception.StackTrace
		Write-Host $errLine -ForegroundColor red            

		Write-Host "_______________________________________________" -ForegroundColor red


		#Emit line breaks	
		Write-Host ""			
		Write-Host ""			

	}		

	

	#Clear Collection
	$objLinks.Clear()

	

} #foreach ($configNode in $objWebSiteList.add)





# Create Folder, if it does not exists
if ( $outputFileName )
{

	#get base folder name
	$outputFolderName = split-path $outputFileName
	

	#Howto create a folder if it doesn?????? "!t exists using PowerShell
	# http://blog.uwe.elflein.eu/?p=42
	$folderExistence = Test-Path -PathType Container $outputFolderName

	if($folderExistence -eq $false)
	{

		# Create Folder
	    New-Item $outputFolderName -type Directory

	}


}


$strLog = "Number of matches {0}" -f $iNumberofMatchesGlobal
Write-Host -Message $strLog


$strLog = "Number of entries in array {0}" -f $outarray.Count
Write-Host -Message $strLog


$strLog = "Output Filename is {0}" -f $outputFileName
Write-Host -Message $strLog



if ( ($outputFileName) -and ($outarray) )
{

	try
	{

		
		#http://en.community.dell.com/techcenter/powergui/f/4834/t/19571829
		#Cannot bind argument to parameter 'InputObject' because it is null.
		#export array to CSV
		$outarray | 
			Where-Object {$_} |
			export-csv $outputFileName `
				-notype -force  -ErrorAction SilentlyContinue

		$iRC = 0

	}

	catch

	# Powershell try/catch/finally

	# http://stackoverflow.com/questions/6779186/powershell-try-catch-finally

	{

		$iRC = $_.Exception.HResult
		
		# http://stackoverflow.com/questions/17226718/how-to-get-the-line-number-of-error-in-powershell
		$lineNumber = $_.InvocationInfo.ScriptLineNumber

		$strLog = "Failed to export file :- {0}.  Exception Type {1} Message {2} line# {3}" `
					-f $outputFileName,  $_.Exception.GetType().FullName `
					, $_.Exception.Message, $lineNumber

		Write-Host $errLine -ForegroundColor red

		$strLog = "Exception Source {0} Hresult {1} Inner Exception {2}" -f $_.Exception.Source,  $_.Exception.HResult, $_.Exception.InnerException 
		Write-Host $errLine -ForegroundColor red

		$errLine = "Exception StackTrace :- {0}" -f $_.Exception.StackTrace
		Write-Host $errLine -ForegroundColor red            

		#Emit line breaks	
		Write-Host ""			
	

	} # catch		

	

}
else
{

	$strLog = "Will not prepare Output File"
	Write-Warning -Message $strLog

	if (!$outputFileName)
	{

		$strLog = "Output File name is not indicated"
		Write-Error -Message $strLog

		#set return code
		$iRC = -100

	}

	if (!$outarray)
	{

		$strLog = "Empty result set"
		Write-Error -Message $strLog

		# set return code
		$iRC = -101

	}


}


#Exit with Return Code
Exit($iRC)


Source Control

GitHub

https://github.com/DanielAdeniji/ListWebPageLinksUsingPowerShell

Utility

Utility – XML

There are a couple of utilities that are useful when dealing with XML contents.

 

Per XML Editor I lazily chose a very capable free one.http://freeformatter.com is also free and it is nice and unclustered

http://freeformatter.com is also free and it has a nice and uncluttered interface for encoding and ‘escaping’ XML contents; the escaping is needed to embed HTTP URLs and other such data in XML files.

 

Dedicated

I have many special ladies in my life.  I must dedicate this post to those ones who do not have the “Spirit of Singleness”, that Paul encouraged, yet out of obedience they are trying to live it out; even if it is in short lurches.

Maurette Brown Clark – The One He Kept For Me
https://www.youtube.com/watch?v=3sE40VAIpfI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s