A web application that scrapes web pages, extracts main content, and uses OpenLLaMA to convert the content into specified formats.
- Web interface for URL input and format selection
- Playwright-based web scraping
- Content extraction and HTML cleanup
- OpenLLaMA integration for content transformation
- Flask-based web server
- Install uv (if not already installed):
pip install uv- Clone the repository:
git clone https://github.com/arman-bd/www2any.git
cd www2any- Create a virtual environment and install dependencies using uv:
uv venv
source .venv/bin/activate
# Windows: .venv\Scripts\activate
uv pip sync pyproject.toml- Install Playwright browsers:
playwright install- Install OpenLLaMA:
Follow the instructions in the OpenLLaMA website to install the OpenLLaMA API server.
For development, install additional development dependencies:
uv pip sync --editable ".[dev]"- Start the Flask server:
uv run www2any-
Open your browser and navigate to
http://localhost:5000 -
Enter a URL and select your desired output format
-
Click "Process" to get the transformed content
Create a .env file in the project root with the following settings:
OPENLLAMA_API_URL=http://localhost:8080
FLASK_ENV=development
![]() |
![]() |
|---|
Run tests using pytest:
uv run pytestRuff is used for code formatting, linting, and import sorting. Here are the common commands:
- Format code:
ruff format src - Lint and fix code:
ruff check --fix src - Run tests:
pytest
Add these settings to your .vscode/settings.json:
{
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
},
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff"
},
"python.analysis.typeCheckingMode": "basic",
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,
"python.testing.pytestArgs": [
"tests"
]
}MIT License

