GSQ Report OCR Index File

URL: https://geoscience.data.qld.gov.au/dataset/ds000079

The Geological Survey of Queensland (GSQ) is the custodian of over 100,000 reports and submissions from the Queensland resources industry, dating back more than 100 years. These legacy reports have been digitised using Optical Character Recognition (OCR) software to make them machine-readable.
The GSQ Report Index contains a list of single words and phrases of up to 3 words that you would expect to occur together in a sentence, and the report PIDs of the reports these words and phrases occur in. The purpose of this search capability is to find reports that contain terms of interest based on text content, across commodities and report types.
Please note, the GSQ Report Index contains only words and letters, no numbers. If you are looking for reports on a particular permit or borehole number, the broader GSQ Open Data Portal is a more suitable place for your search.
The GSQ Report Index is in JSON format, and can be read into a python script as a dictionary to filter.

This GSQ Report Index was created by the OCR of more than 80,000 open-file reports. As more reports become open-file in the future, the GSQ Report Index will be updated.

Data and Resources

This dataset was harvested

Additional Info

Field Value
Source https://geoscience.data.qld.gov.au/dataset/ds000079
Author GSQOpenData@resources.qld.gov.au
Maintainer GSQOpenData@resources.qld.gov.au
Version 1.0
Last Updated June 8, 2024, 18:42 (AEST)
Created May 8, 2023, 15:04 (AEST)
harvest_object_id f6467bcb-556e-4cc9-b987-b08f05c5836e
harvest_source_id ab92c79e-7b6d-4ae9-b70b-a0e84e78006b
harvest_source_title Geoscience Harvester