Get XML Sitemap Urls using XPath and C#

In this example we will get the all the Urls from an XML Sitemap using XPath. First you need to load the sitemap xml file using an XmlTextReader by passing in the Url location of the sitemap, then use the XmlReader create an XPathDocument, which creates an XPathNavigator.

Then create a XmlNamespaceManager, which you add sitemap namespace to, then pass the XmlNamespaceManager into Select method along with your XPath Expression.

private IEnumerable GetUrls(string url)
{
    List urls = new List();
    XmlReader xmlReader = new XmlTextReader(string.Format("{0}sitemap.xml", url));
    XPathDocument document = new XPathDocument(xmlReader);
    XPathNavigator navigator = document.CreateNavigator();

    XmlNamespaceManager resolver = new XmlNamespaceManager(xmlReader.NameTable);
    resolver.AddNamespace("sitemap", "http://www.google.com/schemas/sitemap/0.9");

    XPathNodeIterator iterator = navigator.Select("/sitemap:urlset/sitemap:url/sitemap:loc", resolver);

    while (iterator.MoveNext())
    {
        if (iterator.Current == null)
            continue;
        
        urls.Add(iterator.Current.Value);                    
    }

    return urls;
}


Comments

No comments yet.

Add Yours

  • Author Avatar

    YOU


Comment Arrow




About Author

Robert

Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning hands down.