Scraper for Graye's project
| cmd | ||
| lib | ||
| vendor | ||
| .gitignore | ||
| .tool-versions | ||
| config.toml.example | ||
| go.mod | ||
| go.sum | ||
| LICENSE | ||
| main.go | ||
| Makefile | ||
| nevada_zips.txt | ||
| README.md | ||
| tests.http | ||
Graye's Project
This is designed to build records from http://www.nvsexoffenders.gov
Setup
- Copy the
config.toml.exampletoconfig.toml - Get a cookie ID and complete the captcha on the website with it
- Add add the cookie ID to the
CookieContentsetting
Commands
nvsxoffndr has two sub commands that need to be run in order for normal parsing
download
This command has a lot of flags for controlling how many records you download and setting the cookie.
It will cache your files locally, and subsequent runs will skip over scraping existing records.
$ nvsxoffndr download --help
parse
Once you have cached the files locally the parse command requires no additional flags normally.
This produces an a csv file called 'out.csv' by default.
$ nvsxoffndr parse --help
URLS
Old
- http://www.nvsexoffenders.gov/
- http://www.nvsexoffenders.gov/SearchOffender.aspx by name
- http://www.nvsexoffenders.gov/OffenderSearchResults.aspx Results when search by name
- http://www.nvsexoffenders.gov/GeographicalSearch.aspx by geo
- http://www.nvsexoffenders.gov/OffenderDetails.aspx?Display=Main&Id=30221
2020-03-23 Changes to site
- SessionID tied to IP
- SessionID invalidates faster
- http://www.nvsexoffenders.gov/OffenderDetails.aspx?Display=Main&Id=dyMIeTAHB4miRoDtAiGZbQ== ## Record 51