Skip to content

Latest commit

 

History

History
341 lines (260 loc) · 7.23 KB

File metadata and controls

341 lines (260 loc) · 7.23 KB

🚀 ScraperPro Setup Guide Complete step-by-step guide to get your scraping SaaS up and running in under 30 minutes. 📋 Prerequisites Checklist

Python 3.10+ installed pip package manager Git (for GitHub upload) Text editor (VS Code, Sublime, etc.) Basic terminal/command line knowledge

🎯 Setup Steps Step 1: Project Setup (5 minutes) Create Project Directory bash# Create and enter project directory mkdir scraper-pro cd scraper-pro Create File Structure bash# Create all necessary directories mkdir -p logs output data/clients data/configs

Create Python files

touch scraper.py app.py api_server.py touch requirements.txt README.md .gitignore Create Virtual Environment (Recommended) bash# Create virtual environment python -m venv venv

Activate it

On Windows:

venv\Scripts\activate

On Mac/Linux:

source venv/bin/activate Step 2: Copy the Code (5 minutes)

scraper.py: Copy the main scraper code from the first artifact app.py: Copy the Streamlit interface code api_server.py: Copy the API server code requirements.txt: Copy the dependencies list README.md: Copy the documentation

Step 3: Install Dependencies (3 minutes) bashpip install -r requirements.txt Note: If you get errors:

Windows users might need Visual C++ build tools Mac users might need: xcode-select --install Linux users might need: sudo apt-get install python3-dev

Step 4: Test Basic Functionality (5 minutes) bash# Test the scraper python scraper.py You should see output like: Created client with API key: abc123... Extracted X items from page 1 Scraping completed. Total items: X Step 5: Launch Web Interface (2 minutes) bashstreamlit run app.py Your browser should automatically open to http://localhost:8501 First Time Setup:

Click "Register" tab Fill in your name and email Select "FREE" tier for testing Click "Create Account" IMPORTANT: Copy and save your API key!

Step 6: Create Your First Scraper (5 minutes)

Login with your API key Go to "⚙️ Configurations" tab Fill in the form:

Name: "Test Quotes Scraper" URL: https://quotes.toscrape.com/ Container: .quote Field 1: text → .text Field 2: author → .author Check "Enable Pagination" Next Page Selector: .next > a

Click "💾 Save Configuration"

Step 7: Run Your First Scrape (2 minutes)

Go to "🎯 Run Scraper" tab Select "Test Quotes Scraper" Click "🚀 Run Scraper" Wait 10-20 seconds View results in "📊 Results" tab

You're now scraping! 🎉 🌐 Deploy to Production Option A: Run Locally for Testing bash# Web interface streamlit run app.py

API server (in another terminal)

python api_server.py Option B: Docker Deployment (Recommended) bash# Build and run with Docker Compose docker-compose up -d

Check status

docker-compose ps

View logs

docker-compose logs -f Access:

Web Interface: http://localhost:8501 API Server: http://localhost:5000

Option C: Deploy to Heroku bash# Install Heroku CLI first: https://devcenter.heroku.com/articles/heroku-cli

Login

heroku login

Create app

heroku create scraper-pro-yourname

Add buildpack

heroku buildpacks:set heroku/python

Deploy

git push heroku main

Open app

heroku open Option D: Deploy to DigitalOcean/AWS

Create a Droplet/EC2 instance (Ubuntu 22.04) SSH into server Clone your repo Install dependencies Run with PM2 or systemd

bash# Install PM2 npm install -g pm2

Start Streamlit

pm2 start "streamlit run app.py --server.port=8501" --name scraper-web

Start API

pm2 start api_server.py --name scraper-api

Save PM2 config

pm2 save pm2 startup 🔐 Production Checklist Before going live with paying customers:

Change default passwords/keys Enable HTTPS (Let's Encrypt) Set up backups for data directory Configure firewall (only allow 80, 443, 22) Set up monitoring (UptimeRobot, etc.) Create Terms of Service Create Privacy Policy Set up payment processing (Stripe) Test all tier limits Create support email/system

💳 Add Payment Processing Stripe Integration (Recommended) bashpip install stripe python# Add to scraper.py import stripe stripe.api_key = "your_stripe_secret_key"

def create_subscription(client, tier): """Create Stripe subscription""" customer = stripe.Customer.create( email=client.email, metadata={'client_id': client.client_id} )

prices = {
    'PRO': 'price_pro_monthly',  # Create in Stripe Dashboard
    'ENTERPRISE': 'price_ent_monthly'
}

subscription = stripe.Subscription.create(
    customer=customer.id,
    items=[{'price': prices[tier]}]
)

return subscription

📊 Analytics Setup Google Analytics Add to app.py: python# In app.py header st.markdown("""

<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-XXXXXXXXXX'); </script>

""", unsafe_allow_html=True) 🐛 Troubleshooting Issue: "Module not found" bashpip install -r requirements.txt --upgrade Issue: Streamlit won't start bash# Check if port is in use lsof -i :8501 # Mac/Linux netstat -ano | findstr :8501 # Windows

Kill process or use different port

streamlit run app.py --server.port=8502 Issue: Scraping returns no data

Check CSS selectors with browser DevTools Verify website structure hasn't changed Try different user agent Check if site requires JavaScript (use Selenium)

Issue: Rate limits not working bash# Check if dates are being saved correctly ls -la data/clients/ cat data/clients/[client_id].json 📧 Getting Your First Customers Marketing Checklist

Create Landing Page

Use your Streamlit app as demo Add signup form Show pricing clearly

Content Marketing

Blog: "How to scrape [industry] data" YouTube: Scraper tutorials Reddit: Help in r/webscraping

Direct Outreach

LinkedIn: Message potential customers Cold email: Local businesses Upwork/Fiverr: Offer services

SEO

"web scraping service" "[industry] data scraping" "automated data collection"

Pricing Strategy Start Low, Prove Value:

Week 1-2: Free tier only (build reputation) Week 3-4: Add Pro at $29/mo (test market) Month 2+: Increase to $49/mo Month 3+: Add Enterprise tier

First Customer Tactics:

Offer 50% off for first 3 customers Money-back guarantee Free setup assistance Lifetime discount for feedback

🎓 Next Steps Once you have 5-10 paying customers:

Add More Features

Email notifications Data visualization Scheduled reports Webhook integrations

Improve Infrastructure

Load balancing Redis caching PostgreSQL database CDN for outputs

Scale Marketing

Paid ads Affiliate program Partnerships API marketplace listing

📚 Learning Resources Web Scraping

Web Scraping with Python BeautifulSoup Documentation Scrapy Tutorial

Business

Indie Hackers - Learn from other founders r/SaaS - SaaS community MicroConf - Bootstrap SaaS conference

Technical

Streamlit Docs Flask API Tutorial Docker Basics

💪 Success Metrics Track these KPIs:

Week 1: 10 signups (any tier) Week 2: 1 paying customer Month 1: $100 MRR Month 2: $500 MRR Month 3: $1,000 MRR Month 6: $5,000 MRR

🎉 You're Ready! You now have:

✅ Professional scraping SaaS ✅ Multi-tier pricing ✅ Web interface ✅ API access ✅ Client management ✅ Rate limiting ✅ Export capabilities

Go get your first customer! 🚀

Need help? Create an issue on GitHub or email support@scraperpro.com