webinfo -- Extract metadata from web pages
webinfo extracts common metadata (title, description, canonical, image, etc.)
from web pages and provides helpers to download images and generate thumbnails.
- Keep metadata extraction simple and deterministic.
- Use clear precedence rules for HTML/meta parsing.
- Provide practical image utilities with minimal API surface.
- Keep context-aware network operations as the default style.
- Go 1.25.10 or later
- Task command (local tool for this repository)
task test
task govulncheck
Run all maintenance tasks:
task
ci: lint (golangci-lintwithgosec), tests, andgovulncheckCodeQL: scheduled and push/PR static analysis
go get github.com/goark/webinfo@latestimport "github.com/goark/webinfo"ctx := context.Background()
info, err := webinfo.Fetch(ctx, "https://example.com", "")
if err != nil {
return err
}
fmt.Println(info.Title, info.Description)imgPath, err := info.DownloadImage(ctx, "images", true)
if err != nil {
return err
}
thumbPath, err := info.DownloadThumbnail(ctx, "thumbnails", 150, false)
if err != nil {
return err
}
imgBytes, err := info.ImageBytes(ctx)
if err != nil {
return err
}
fmt.Println(len(imgBytes))Fetch(ctx, rawURL, userAgent)extracts metadata from a page.(*Webinfo).ImageBytes(ctx)downloadsWebinfo.ImageURLinto memory.(*Webinfo).DownloadImage(ctx, destDir, temporary)downloadsWebinfo.ImageURL.(*Webinfo).DownloadThumbnail(ctx, destDir, width, temporary)creates a resized thumbnail.
Fetchuses explicit precedence for metadata extraction:- title:
title->twitter:title->og:title - description:
meta[name=description]->twitter:description->og:description - image:
twitter:image->og:image
- title:
DownloadImageresolves extension in this order:- URL path extension
- response
Content-Type - sniff first 512 bytes (
http.DetectContentType) - fallback
.img
DownloadThumbnailuses width150whenwidth <= 0.ImageBytesreads the full response body into memory; very large images can increase memory usage.
This package wraps errors with github.com/goark/errs and attaches context
values such as url, path, and dir.
