OptiBlogAi Usage Guide

Overview

The main.py script is the primary entry point for OptiBlogAi's content research and analysis pipeline. It performs automated keyword research, web scraping, content extraction, and SEO analysis.

Prerequisites

1. Environment Setup

Before running main.py, ensure you have:

Python Environment: Python 3.10+ with all dependencies installed
Google API Credentials: Required for search functionality
Environment Variables: Properly configured .env file

2. Required Environment Variables

Create a .env file in the project root with:

GOOGLE_API_KEY=your_google_api_key_here
GOOGLE_CSE_ID=your_custom_search_engine_id_here

How to get these credentials:

Google API Key: Visit Google Cloud Console, enable Custom Search JSON API
Google CSE ID: Create a Custom Search Engine at Google CSE

3. Installation

# Clone the repository
git clone <repository-url>
cd OptiBlogAi

# Create virtual environment
python -m venv myenv
source myenv/bin/activate  # On Windows: myenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install additional dependencies (if needed)
pip install python-dotenv

Running main.py

Basic Usage

# From the project root directory
cd /home/kalie/work/projects/OptiBlogAi
python src/main.py

Interactive Prompts

The script will prompt you for:

Search Keyword: The topic you want to research
Number of Results: How many top-ranking pages to analyze (recommended: 5-10)

Sample Execution

$ python src/main.py
Enter a keyword to search: artificial intelligence in healthcare
Enter the number of results to fetch: 5

Sample Use Cases

Case 1: Blog Content Research

Scenario: You want to write a blog post about "sustainable fashion trends"

Input:

Keyword: sustainable fashion trends
Number of results: 8

Expected Output:

Title: 10 Sustainable Fashion Trends That Are Changing the Industry
Readability: 65.4
Top keywords: ['sustainable', 'fashion', 'trends', 'eco-friendly', 'organic']
---
Title: The Future of Sustainable Fashion: 2024 Trends
Readability: 72.1
Top keywords: ['sustainable', 'fashion', 'eco', 'brands', 'materials']
---
...

Generated Files:

sustainable fashion trends_2025-07-15/ (directory with crawled content)
extracted_content/sustainable fashion trends_2025-07-15/ (processed content)

Case 2: Competitive Analysis

Scenario: Analyzing competitors for "AI marketing tools"

Input:

Keyword: AI marketing tools
Number of results: 10

Expected Output:

Title: Best AI Marketing Tools for 2024: Complete Guide
Readability: 58.7
Top keywords: ['AI', 'marketing', 'tools', 'automation', 'software']
---
Title: 15 AI-Powered Marketing Tools Every Marketer Needs
Readability: 61.3
Top keywords: ['AI', 'marketing', 'tools', 'analytics', 'personalization']
---
...

Case 3: Niche Research

Scenario: Exploring a specific niche like "indoor plant care for beginners"

Input:

Keyword: indoor plant care for beginners
Number of results: 6

Expected Output:

Title: Indoor Plant Care Guide: Complete Beginner's Handbook
Readability: 78.2
Top keywords: ['indoor', 'plants', 'care', 'beginners', 'watering']
---
Title: How to Care for Indoor Plants: A Step-by-Step Guide
Readability: 75.9
Top keywords: ['indoor', 'plants', 'care', 'light', 'watering']
---
...

Output Structure

Directory Structure Created

project_root/
├── {keyword}_{date}/               # Raw crawled content
│   ├── page_1.html
│   ├── page_2.html
│   └── ...
└── extracted_content/
    └── {keyword}_{date}/           # Processed content
        ├── analysis_results.json
        ├── extracted_text_1.txt
        └── ...

Data Analysis Output

For each analyzed page, you'll see:

Title: The main title of the webpage
Readability Score: Flesch Reading Ease score (0-100, higher = easier to read)
Top Keywords: The 5 most frequent keywords found in the content

Readability Score Interpretation

90-100: Very Easy (5th grade level)
80-90: Easy (6th grade level)
70-80: Fairly Easy (7th grade level)
60-70: Standard (8th-9th grade level)
50-60: Fairly Difficult (10th-12th grade level)
30-50: Difficult (College level)
0-30: Very Difficult (Graduate level)

Advanced Usage

Customizing Parameters

You can modify the script to customize:

# In main.py, modify these values:
crawler = WebpageCrawler(respect_robots=False)  # Change to True to respect robots.txt
crawl_results = crawler.batch_crawl(urls, delay=2.0)  # Adjust delay between requests

Batch Processing

For processing multiple keywords, you could create a wrapper script:

# batch_process.py
keywords = [
    "artificial intelligence",
    "machine learning",
    "deep learning"
]

for keyword in keywords:
    # Modify main.py to accept keyword as parameter
    # Then call it programmatically

Troubleshooting

Common Issues

API Key Errors

Error: Invalid API key
Solution: Check your .env file and Google API key

No Results Found

Error: No search results
Solution: Try different keywords or check CSE configuration

Permission Errors

Error: Permission denied creating directory
Solution: Ensure write permissions in the project directory

Import Errors

Error: Module not found
Solution: Run from project root and check sys.path configuration

Best Practices

Keyword Selection: Use specific, long-tail keywords for better results
Result Limits: Start with 5-10 results to avoid rate limiting
Delay Settings: Keep crawler delay at 2+ seconds to be respectful
Storage Management: Regularly clean up generated directories

Integration with Other Components

This script is designed to work with:

LLM Generator: Pass analyzed content to content generation pipeline
SEO Tools: Use extracted keywords for SEO optimization
Content Processor: Further refine and structure the content

Next Steps

After running main.py, you can:

Analyze Results: Review the readability scores and keywords
Generate Content: Use the research as input for AI content generation
SEO Optimization: Apply the discovered keywords to your content strategy
Competitive Analysis: Compare your content against the analyzed competitors

Support

For issues or questions:

Check the logs in the generated directories
Review the extracted content for data quality
Ensure all dependencies are properly installed
Verify API credentials and quotas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OptiBlogAi Usage Guide

Overview

Prerequisites

1. Environment Setup

2. Required Environment Variables

3. Installation

Running main.py

Basic Usage

Interactive Prompts

Sample Execution

Sample Use Cases

Case 1: Blog Content Research

Case 2: Competitive Analysis

Case 3: Niche Research

Output Structure

Directory Structure Created

Data Analysis Output

Readability Score Interpretation

Advanced Usage

Customizing Parameters

Batch Processing

Troubleshooting

Common Issues

Best Practices

Integration with Other Components

Next Steps

Support

FilesExpand file tree

usage-guide.md

Latest commit

History

usage-guide.md

File metadata and controls

OptiBlogAi Usage Guide

Overview

Prerequisites

1. Environment Setup

2. Required Environment Variables

3. Installation

Running main.py

Basic Usage

Interactive Prompts

Sample Execution

Sample Use Cases

Case 1: Blog Content Research

Case 2: Competitive Analysis

Case 3: Niche Research

Output Structure

Directory Structure Created

Data Analysis Output

Readability Score Interpretation

Advanced Usage

Customizing Parameters

Batch Processing

Troubleshooting

Common Issues

Best Practices

Integration with Other Components

Next Steps

Support