Skip to content

HasData/hasdata-cli

Repository files navigation

hasdata-cli

The official command-line interface for HasData — web scraping, SERP, and real-estate/e-commerce data APIs, wired for shell scripts, LLM agents, and RAG pipelines.

CI Release License: MIT

One static binary. Every API at hasdata.com exposed as a subcommand. No SDK install, no dependencies, no glue code — curl | sh, export a key, pipe JSON to jq or straight into your LLM prompt.

curl -sSL https://raw.githubusercontent.com/HasData/hasdata-cli/main/install.sh | sh
export HASDATA_API_KEY=hd_xxx
hasdata google-serp --q "best espresso machine 2026"

Why a CLI?

  • Agents & tool use — drop hasdata <api> into LangChain, LlamaIndex, CrewAI, or your own agent loop as a shell tool. Stable JSON in, stable JSON out.
  • RAG ingestion — stream fresh Google, Amazon, Zillow, and arbitrary web data into your vector store from a cron job or a Makefile, no backend required.
  • Prompt-time groundinghasdata google-serp ... | jq .organic_results ➜ into a system prompt to cut hallucinations on current events, product pricing, real-estate comps, reviews.
  • Dataset building — parallel GNU-xargs invocations produce JSONL for LLM fine-tuning or evals.
  • Humans too — one-off lookups from your terminal, full --help for every flag, tab-completion for every enum.

Install

Platform Command
macOS / Linux curl -sSL https://raw.githubusercontent.com/HasData/hasdata-cli/main/install.sh | sh
Windows manual download the .zip from Releases, extract, put hasdata.exe on %PATH%
From source go install github.com/HasData/hasdata-cli@latest

The install.sh script detects your OS/arch, downloads the matching asset, and verifies its SHA-256 against the published checksums.txt before installing.

Configure

export HASDATA_API_KEY=your_key        # preferred for CI / containers / agents
# or
hasdata configure                      # writes ~/.hasdata/config.yaml (0600)

Precedence: --api-key flag > HASDATA_API_KEY env > ~/.hasdata/config.yaml. Get a key from the HasData dashboard.

First calls

# Google SERP — structured organic / ads / knowledge graph / PAA
hasdata google-serp --q "langchain vs llamaindex" --gl us --pretty

# Render + scrape any URL (JS, proxies, markdown output, AI extraction)
hasdata web-scraping \
  --url "https://news.ycombinator.com" \
  --output-format markdown \
  --ai-extract-rules-json '{"top_story":{"type":"string","description":"headline of the top story"}}' \
  --pretty

# Amazon product lookup for price monitoring / comparison
hasdata amazon-product --asin B08N5WRWNW --pretty

# Zillow listings with complex filters
hasdata zillow-listing \
  --keyword "Austin, TX" --type forSale \
  --price-min 400000 --price-max 900000 \
  --beds-min 3 --home-types house --home-types townhome \
  --sort priceLowToHigh --pretty

Every command supports --help, --pretty, --raw, --output file, --verbose, --timeout, --retries, shell completion.

Using it with LLMs

Agent tool-call (Python + OpenAI-style tools)

import subprocess, json

def hasdata(cmd: list[str]) -> dict:
    """Shell-tool wrapper around the hasdata CLI. Usable as an LLM tool."""
    out = subprocess.check_output(["hasdata", *cmd, "--raw"], text=True)
    return json.loads(out)

tool_spec = {
    "name": "web_search",
    "description": "Run a Google SERP query and return structured results.",
    "parameters": {
        "type": "object",
        "properties": {
            "query":   {"type": "string"},
            "country": {"type": "string", "default": "us"},
            "n":       {"type": "integer", "default": 10},
        },
        "required": ["query"],
    },
}

def web_search(query: str, country: str = "us", n: int = 10) -> dict:
    return hasdata(["google-serp", "--q", query, "--gl", country, "--num", str(n)])

Feed tool_spec to Claude / GPT / Gemini tool calling — zero Python dependencies on the HasData side.

RAG ingestion (bash loop)

for q in "$@"; do
  hasdata google-serp --q "$q" --num 50 --raw \
    | jq -c '.organic_results[] | {url:.link, title, snippet}' \
    >> serp-corpus.jsonl
done

Point your embedder at serp-corpus.jsonl.

Prompt-time grounding (no vector store)

CONTEXT=$(hasdata google-serp --q "latest gpu benchmarks" --num 5 --raw \
  | jq -r '.organic_results[] | "- \(.title): \(.snippet)"')
llm "Answer using this context only:\n$CONTEXT\n\nQuestion: what's the fastest consumer GPU right now?"

Available APIs

hasdata --help lists all of them with per-call pricing. Grouped overview:

Category Commands
Google SERP google-serp · google-serp-light · google-ai-mode · google-news · google-shopping · google-immersive-product · google-events · google-short-videos
Google Maps google-maps · google-maps-place · google-maps-reviews · google-maps-contributor-reviews · google-maps-photos
Google Other google-images · google-trends · google-flights
Search Engines bing-serp
Web web-scraping (headless, AI extraction, markdown output, screenshots)
E-commerce amazon-product · amazon-search · amazon-seller · amazon-seller-products · shopify-products · shopify-collections
Real Estate zillow-listing · zillow-property · redfin-listing · redfin-property · airbnb-listing · airbnb-property
Business / Local yelp-search · yelp-place · yellowpages-search · yellowpages-place
Jobs indeed-listing · indeed-job · glassdoor-listing · glassdoor-job
Social instagram-profile

Flag patterns

  • Scalars / enums--q text, --num 50, --block-ads=false. Enum flags validate client-side and offer tab-completion.
  • Booleans defaulting to true — paired negated form: --no-block-ads, --no-screenshot.
  • Lists — repeat (--lr lang_en --lr lang_fr) or comma-join (--lr lang_en,lang_fr). Serialized as key[]=value for GET endpoints.
  • Anything ending in -json — accepts raw JSON, @path/to/file.json, or - for stdin. Works for --ai-extract-rules-json, --js-scenario-json, --extract-rules-json, --headers-json, etc.
  • Key-value objects — e.g. --headers User-Agent=foo (repeatable, splits on first =, values with = preserved). Combine with --headers-json for a JSON base; kv items override per key.

Output & scripting

  • JSON responses pretty-print when stdout is a TTY; raw when piped (great for jq). Force with --pretty / --raw.
  • --output file writes raw response bytes (works for screenshot / image endpoints too).
  • --verbose prints the outgoing URL and X-RateLimit-* headers on stderr.
  • Exit codes: 0 success · 1 user error · 2 network · 3 API 4xx · 4 API 5xx. Script-safe.

Shell completion

# zsh
hasdata completion zsh > "${fpath[1]}/_hasdata"
# bash
hasdata completion bash > /usr/local/etc/bash_completion.d/hasdata
# fish
hasdata completion fish > ~/.config/fish/completions/hasdata.fish

Enum values auto-complete (hasdata google-serp --gl <TAB>us, gb, ca, …).

Update

hasdata update           # upgrade to latest release
hasdata update --check   # report available version without installing

A once-per-24h check prints a one-line notice to stderr when a newer version is out. Disable with check_updates: false in ~/.hasdata/config.yaml.

How the CLI stays current

Every command here is generated from the live schema at https://api.hasdata.com/apis. A scheduled GitHub Action re-runs the generator, and a hash of the normalized spec short-circuits diffs when nothing changed. When HasData ships a new API, a PR lands here within 24 hours, then a release goes out — and hasdata update brings it to your machine.

Contributing locally:

go generate ./...     # regenerate cmd/gen_*.go from api.hasdata.com
go build ./...
go test ./...

Resources

License

MIT — use it commercially, embed it in your agent, ship it inside a container. Just don't hold us liable.

About

HasData APIs in your terminal. Google SERP, Amazon, Zillow, Indeed, Maps, and more. Clean structured output ready for LLMs, RAG pipelines, and automation. Built for devs and AI pipelines. One command, clean JSON out.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors