Skip to content

Datasets

Once the report is configured by the Grepsr team, it is ready for the client to initiate data extraction. You can start the one time extraction from the datasets page. Additionally, if you have a recurring requirement, schedules are set up for automation data extraction. You can find all available datasets in a report through the Datasets page which are arranged chronologically. You can click on the dataset to view its underlying data.

Starting a new run

If you wish to initiate an on-demand run, you can do that from the datasets page by clicking on “Start a new run”. You can add custom parameters while starting a new run if necessary.

Download datasets

If you wish to download the extracted data, click on three dot icon and click on download. Exported files get displayed. Click on download to dowload it in the supported format.

Re-Export datasets

If you wish to export datasets in any other format other than the configured one, click on export data , select among the formats supported by Grepsr and click on re-export. This will export the data in the desired file format and you can download it after export is completed.

File feeds

File feeds is a dataset feed in XML. A feed can be accessed via a unique URL. Types of File feed :

  - Latest Feed: Links to the most recent dataset that was fully processed
  - Current Feed: Links to the specific dataset that is currently being viewed
  - All Feed: Contains a link to all the dataset in the report since beginning

Delivery trails

You can check the delivery details in the delivery trails. When you click on any of the delivery statuses, it provies an extended view of the delivery. This section provides a record of whether the required file formats were delivered to the correct destinations and confirms whether the delivery was successful.

The following delivery statuses are supported in the platform

  - Delivered: All files have been successfully delivered to the correct destinations.
  - Failed: The delivery was not successful;at least one file failed to deliver.
  - Uploading: The exported file is currently being uploaded to S3 for future access.
  - Skipped: Delivery was intentionally skipped during manual run or manual export.
  - Exporting: The dataset is currently being written to a file.
  - Zipping: The file or folder is being compressed before uploading to S3.
  - Not Notified: The delivery destination has not been set.
  - No Records: There is no available data or files to deliver.

Merge and Dedupe Datasets

If you want to combine the datasets generated from multiple individual runs, you can do so using the Merging dataset option available on the Datasets page.

To get started, navigate to the Datasets page and click on the Merge Datasets button. From there, you can select up to 50 individual runs whose datasets you wish to merge into a single, consolidated dataset.

Additionally, if there is a possibility of overlapping or redundant data across the selected runs, you can enable the Dedupe merged dataset checkbox. This option ensures that duplicate records are automatically identified and removed during the merge process, resulting in a cleaner and more streamlined dataset.

What does different run status signify?
  • Grey : It denotes setup is in progress before actual extraction.
  • Green : It denotes proper completion.
  • Orange: It denotes the extraction is in progress and
  • Red: It denotes extraction completed with one or more errors.
What does the different data columns signify?
  • DNU : The runs marked DNU (Do not use) will not be marked in billing. These runs could be test runs and not actual runs
  • Total Requests: The total number of URLs requested during the data generation process
  • Fill rate : It denotes availability of records in the datasets. Fill rate is 100% if data is present in all row*column in dataset.