Managing an Extractor's URL List
From the Settings tab of an Extractor, you change manage the list of URLs extracted for when an Extractor starts a crawl run. You can either manually add URLs, import them from a file, extract them from other pages with Chained Extractors, or add similar URLs using URL Discovery.
Elements of the Inputs View
- Extract from: Dropdown to set whether the Extractor uses URLs from an explicit list of URLs provided or URLs extracted by another Extractor.
- Show invalid URLs: Shows only the invalid URLs present in the current list of URLs. This is disabled if there are no invalid URLs present.
- Remove all URLs: Removes all the URLs from the list to start over.
- Cleanup URLs: Removes any duplicate or invalid URLs from the list.
- Download URLs: Download a list of the URLs in CSV, Excel, JSON, or NDJSON format.
- Import URLs: Import a list of URLs from a CSV or Excel (XLSX) file.
- Generate URLs: Create URLs using the URL Generator.
- List view: Shows all of the URLs currently added.
- URLs Input: You can
- Save: This saves any changes made to the URL list. When you add/remove/update URLs using the URLs Input, the changes will not be saved until you click Save.
- Run URLs: Starts a new crawl run. If you have unsaved changes, this button will be disabled until you save your changes.
- Total URLs: Display a count of URLs in the list. This is also how many queries a crawl run will use with that list of URLs (If screen capture is enabled then the total number of queries will be doubled).
Importing URLs from a File
Clicking Import URLs will reveal the Import URLs view which allows you to add URLs from a CSV or Excel file. This list of URLs can either replace or be added to your current list of URLs.
Elements of the Import URLs View
- Browse: Reveal file browser to select the file to import URLs from.
- Include column headers: Set whether the file includes column headers. If selected, then the first row will not be imported.
- Page: Select which sheet the URLs are saved on (for Excel files only).
- Column: Select which column the URLs are saved in.
- Append/Replace: Choose whether the list of URLs from the file are added to the current list of URLs or replaces/overwrites the current list.
- Preview list: Shows a preview of the URLs from the column selected.
- Cancel: Closes the Import URLs view and returns to the Extractor's settings.
- Upload URL list: Adds the selected URLs for import to the Extractor's URL list.