Clean and normalize URL lists before sitemap submission.
The Sitemap URL List Cleaner processes a raw list of URLs and removes duplicates, strips UTM tracking parameters (?utm_source=, ?utm_medium=, etc.), normalizes trailing slashes for consistency, and filters to a single domain. This is essential before generating or submitting a sitemap. Sitemaps with tracking parameters create thousands of 'unique' URLs from a single page, wasting crawl budget and potentially causing duplicate content issues. Google's sitemap protocol allows maximum 50,000 URLs per file — cleaning ensures you don't waste slots on junk URLs.
UTM parameters (utm_source, utm_medium, utm_campaign, fbclid, gclid) don't change page content. A URL with and without UTMs serves the same page. In a sitemap, each unique URL string is treated as a separate page — leading Google to index tracked versions and split ranking signals.
Normalization ensures equivalent URLs are recognized as the same. Steps: lowercase the domain, remove default ports (:443 for https), sort query parameters consistently, apply a uniform trailing-slash rule, and decode unnecessarily encoded characters.
No. Including noindex pages in your sitemap sends contradictory signals — you're telling Google to index (sitemap) and not index (noindex) the same page simultaneously. Remove all noindex, redirect destination (non-canonical), and login/admin pages from sitemaps.