Scraper for Graye's project
This repository has been archived on 2025-08-25. You can view files and clone it, but you cannot make any changes to its state, such as pushing and creating new issues, pull requests or comments.
Find a file
2020-05-28 15:43:19 -07:00
cmd Dead Code Clean 2020-05-25 07:54:47 -07:00
lib Dead Code Clean 2020-05-25 07:54:47 -07:00
vendor Colly 2 didn't work for me 2020-05-25 07:56:20 -07:00
.gitignore Set terms file 2020-05-24 11:00:31 -07:00
.tool-versions Add tool version 2020-05-28 15:43:19 -07:00
config.toml.example Clean up config 2020-05-25 07:47:00 -07:00
go.mod Colly 2 didn't work for me 2020-05-25 07:56:20 -07:00
go.sum Colly 2 didn't work for me 2020-05-25 07:56:20 -07:00
LICENSE Clean up and use viper to config 2020-02-13 15:05:43 -08:00
main.go Refactor into Cobra & Lib, Save Files to Disk 2020-02-13 22:27:50 -08:00
Makefile Add an update to make 2020-05-16 21:22:51 -07:00
nevada_zips.txt Update zips 2020-05-16 21:26:41 -07:00
README.md Refactor Parsing for testability 2020-03-24 13:34:13 -07:00
tests.http Selenium one page 2020-05-24 00:06:50 -07:00

Graye's Project

This is designed to build records from http://www.nvsexoffenders.gov

Setup

  • Copy the config.toml.example to config.toml
  • Get a cookie ID and complete the captcha on the website with it
  • Add add the cookie ID to the CookieContent setting

Commands

nvsxoffndr has two sub commands that need to be run in order for normal parsing

download

This command has a lot of flags for controlling how many records you download and setting the cookie.

It will cache your files locally, and subsequent runs will skip over scraping existing records.

$ nvsxoffndr download --help

parse

Once you have cached the files locally the parse command requires no additional flags normally.

This produces an a csv file called 'out.csv' by default.

$ nvsxoffndr parse --help

URLS

Old

2020-03-23 Changes to site