Generate text embeddings using OpenAI models through LangChain, reduce them to 3D with PCA, and plot them with Matplotlib so you can eyeball the semantic relationships between words.
- Generate embeddings for arbitrary text using OpenAI's embedding models.
- Choose between:
text-embedding-3-largetext-embedding-3-smalltext-embedding-ada-002(default)
- Dimensionality reduction with PCA.
- 3D scatter plot saved as a high-resolution PNG.
Install dependencies with:
pip install -r requirements.txtYou will also need an OpenAI API key.
-
Clone this repository:
git clone https://github.com/MandalAutomations/Vector-Visualizer-OpenAI.git cd Vector-Visualizer-OpenAI -
Create a
.envfile in the project root with your OpenAI API key:echo "OPENAI_API_KEY=your_api_key_here" > .env
-
(Optional) choose a different embedding model by editing
EMBEDDING_MODELnear the top ofmain.py:EMBEDDING_MODEL = "text-embedding-ada-002"
-
Add the words/phrases you want to embed to
words.txt, one per line:nfl football soccer basketball baseball
Run the script to generate embeddings and plot them:
python main.pyThis will:
- Read each line from
words.txtand request an embedding for it. - Reduce the embeddings to 3D with PCA.
- Save the visualization to
3d_plot_small.pngin the project root.
.
├── main.py # Main script — embeds words.txt and plots them
├── words.txt # Words/phrases to embed (one per line)
├── requirements.txt # Dependencies
├── 3d_plot_small.png # Example output plot
└── .env # API key (not committed)
- Edit
words.txtto change which words are compared. - Change
EMBEDDING_MODELinmain.pyto swap models. - Adjust the
dpiargument insideplot_embeddings_3dto control output resolution.
