This repository contains the tasks completed as part of the CodeAlpha Data Analytics Internship. The project demonstrates practical skills in data collection, data analysis, visualization, and sentiment analysis using Python.
- Extracted data from a public website using Python and BeautifulSoup.
- Collected book information such as title, price, rating, and availability.
- Created a custom dataset and stored it in CSV format.
Files:
- web_scraping_books.py
- books_data.csv
- Performed data exploration to understand structure and data types.
- Cleaned and prepared data for analysis.
- Identified patterns, trends, and anomalies using statistics and visuals.
- Asked meaningful questions and validated assumptions.
File:
- eda_books.ipynb
- Created multiple visualizations using Matplotlib and Seaborn.
- Designed charts to clearly communicate insights.
- Explained each visualization and crafted a data story to support decision-making.
File:
- eda_books.ipynb
- Performed sentiment analysis on the IMDB Movie Reviews dataset.
- Analyzed movie reviews and classified sentiments as positive, negative, or neutral.
- Visualized sentiment distribution and compared predicted vs actual labels.
- Extracted insights to understand audience perception.
Dataset Source: IMDB Movie Reviews Dataset (Kaggle) https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Note: The dataset file is not included in this repository due to GitHub file size limitations.
Files:
- sentiment_analysis_imdb.ipynb
- IMDB Dataset.csv
- Python
- Pandas
- NumPy
- BeautifulSoup
- Matplotlib
- Seaborn
- TextBlob
- Jupyter Notebook
- VS Code
This project provided hands-on experience in real-world data analytics tasks, including data collection, analysis, visualization, and sentiment analysis. The completed tasks demonstrate the ability to extract insights from data and present them in a meaningful and understandable way.