Useful references in this area
- Perplexity - https://www.perplexity.ai/ - Given google's descent into mediocrity
- Duck AI - https://duckduckgo.com/?q=DuckDuckGo+AI+Chat&ia=chat&duckai=1
- Scira - https://scira.ai/
-
ChatBot Arena - https://lmarena.ai/ - ManyBots?
-
ChatGPT - https://openai.com/index/chatgpt/
-
Claude - https://claude.ai/login?returnTo=%2F%3F - You have to try Mr Shannon at least once, right?
-
Deepseek - https://chat.deepseek.com/ - Anyone not like 10x cheaper?
-
Gemini - https://gemini.google.com/app/bf9ff53a9cd3ff1e?hl=en-AU - Just like the old ad
-
Genspark - https://www.genspark.ai/
-
Huggingchat - point of interesting being Cohere Command R - https://huggingface.co/chat/
-
Kimi - https://www.kimi.com
-
Le Chat - https://chat.mistral.ai/chat
-
Mercury Playground - https://chat.inceptionlabs.ai/ -> Diffusion LLM test
-
z.ai - https://chat.z.ai/
-
Qwen - https://chat.qwen.ai/
-
DeepWiki - https://deepwiki.org/ - automagic github wiki overviews
-
https://chatjimmy.ai/ - Llama 3 8B on a chip - blazing fast
- OpenRouter - https://openrouter.ai/ - Real ManyBots
- Geology Oracle - https://geologyoracle.com/
- https://notebooklm.google/
- note says 300 sources with the paid version but seems to stop working at some stage with total content - e.g. if you put books in
- https://www.alphaxiv.org/ - Chat with arxiv
- https://sciarena.allen.ai/ - compare models for a research question
- if git bash installed for a user then setx CLAUDE_CODE_GIT_BASH_PATH "C:\Users\rscott\AppData\Local\Programs\Git\bin\bash.exe"
so claude works
FROM glm-4.7-flash
PARAMETER num_ctx 65536
ollama create glm-4.7-flash-64k -f Modelfile
ollama launch claude --config
- Llama.cpp https://github.com/ggml-org/llama.cpp
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu128
-
Go to the llama.cpp releases page
-
Download the package matching your GPU:
- NVIDIA: llama--bin-win-cuda-cu12.2.0-x64.zip (or whichever CUDA version matches your driver)
- AMD: llama--bin-win-vulkan-x64.zip
-
Extract it — llama-server.exe is right there, no install needed
-
Run it:
llama-server.exe -m your-model.gguf -ngl 99 --host 0.0.0.0 --port 8080
-ngl 99 offloads all layers to GPU.
llama-server.exe -m path\to\zai-org_GLM-4.6V-Flash-Q6_K_L.gguf --mmproj path\to\mmproj-zai-org_GLM-4.6V-Flash-f.gguf --port 8080 -ngl 99
llama-server -m "D:\llama\Qwen3.6-35B-A3B-UD-Q4_K_M.gguf" --alias qwen36-35b-a3b --host 127.0.0.1 --port 8080 -c 131072 -ngl 999 -fa on --jinja --no-mmap --cache-type-k q8_0 --cache-type-v q8_0 --n-cpu-moe 30
Prerequisites:
- NVIDIA: Make sure you have the CUDA toolkit (or at least the CUDA runtime DLLs) matching the release you downloaded. Usually having up-to-date NVIDIA drivers is enough since they bundle the runtime.
- AMD: Vulkan drivers (typically included with AMD Adrenalin drivers).
That's it — no compilation, no WSL needed. Just extract and run. ▸ Time: 10s
- LLM https://github.com/simonw/llm [from Datasette]
- Opencode https://github.com/sst/opencode
- anomalyco/opencode#1669 - using opencode and ollama
- opencode go https://opencode.ai/go
- Claude Code router - https://github.com/musistudio/claude-code-router
- Gemini cli [current decent free use level - but slow as a consequence]
- no longer useful and also deprecated
- antigravity migration, antigravity interface apparently being a cluster https://www.antigravity.google/docs/gcli-migration
- Cursor - see Composer 2.5 variant
- Copilot
- Copilot cli
- Amazon Q Developer
- now has native windows version - which is of course buggy as would appear to be the usual js wrapper around other things
- https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html
- https://kiro.dev/docs/cli/
- curl -fsSL https://cli.kiro.dev/install | bash
- Important! Before you can continue, you must update your PATH to include: /home/rscott/.local/bin
Add it to your PATH by adding this line to your shell configuration file: export PATH="$HOME/.local/bin:$PATH"
- Use the command "kiro-cli" to get started!
rscott@bananasplits:/mnt/c/Users/rscott$ nano ~/.bashrc rscott@bananasplits:/mnt/c/Users/rscott$ source ~/.bashrc
- now native to Ollama as well
- https://github.com/tmustier/pi-extensions/tree/main/pi-ralph-wiggum
- seems to accumulate context in chat session not clear per iteration
- https://github.com/rahulmutt/pi-ralph - is a simple looper apparently
- the 4B KM_M quants seem useable locally
- the 22.1GB 35B_3AB MOE with some expert offloading to the cpu works on 16GB - e.g. an ancient Tesla can run it - test settings getting closer to 20 T/S
- https://insiderllm.com/guides/best-way-run-qwen-3-6-35b-moe-locally/
- https://medium.com/@tolgaeren/running-pi-with-local-llms-c596aa14b062
- C:\Users\rscott>C:\Users\rscott\llama\llama-b9673-bin-win-cuda-12.4-x64\llama-server -m "C:\Users\rscott.cache\huggingface\hub\models--unsloth--Qwen3.6-35B-A3B-GGUF\snapshots\a483e9e6cbd595906af30beda3187c2663a1118c\Qwen3.6-35B-A3B-UD-Q4_K_M.gguf" --host 127.0.0.1 --port 8080 -c 131072 -ngl 999 -fa on --jinja --no-mmap --n-cpu-moe 30
- Ralph Wiggum - https://github.com/ghuntley/how-to-ralph-wiggum
- Gas Town - https://github.com/steveyegge/gastown
- Loom - https://github.com/jordanhubbard/loom
- OpenClaw - https://github.com/openclaw/openclaw - https://openclaw.ai/
- openclaw gateway restart
- openclaw tui
- https://www.reddit.com/r/LocalLLaMA/comments/1jauy8d/giving_native_tool_calling_to_gemma_3_or_really/
- Redundant likely with Gemma 4
- Nondeterminism in inference - https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
- https://huggingface.co/Nanbeige/Nanbeige4.1-3B - find out why this one is interesting
- https://latentpatterns.com/
- from the OG Ralph
- https://www.dell.com/en-au/lp/dt/nvidia-ai AI-Factory
- Rust Token Killer https://github.com/rtk-ai/rtk