- Notifications
You must be signed in to change notification settings - Fork126
kerberos auth for proxy#660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Open
jadewang-db wants to merge1 commit intodatabricks:mainChoose a base branch fromjadewang-db:kerberos-auth-proxy
base:main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Uh oh!
There was an error while loading.Please reload this page.
Open
Changes fromall commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
136 changes: 136 additions & 0 deletionsCLAUDE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| # CLAUDE.md | ||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
| ## Repository Overview | ||
| This is the official Python client for Databricks SQL. It implements PEP 249 (DB API 2.0) and uses Apache Thrift for communication with Databricks clusters/SQL warehouses. | ||
| ## Essential Development Commands | ||
| ```bash | ||
| # Install dependencies | ||
| poetry install | ||
| # Install with PyArrow support (recommended) | ||
| poetry install --all-extras | ||
| # Run unit tests | ||
| poetry run python -m pytest tests/unit | ||
| # Run specific test | ||
| poetry run python -m pytest tests/unit/test_client.py::ClientTestSuite::test_method_name | ||
| # Code formatting (required before commits) | ||
| poetry run black src | ||
| # Type checking | ||
| poetry run mypy --install-types --non-interactive src | ||
| # Check formatting without changing files | ||
| poetry run black src --check | ||
| ``` | ||
| ## High-Level Architecture | ||
| ### Core Components | ||
| 1. **Client Layer** (`src/databricks/sql/client.py`) | ||
| - Main entry point implementing DB API 2.0 | ||
| - Handles connections, cursors, and query execution | ||
| - Key classes: `Connection`, `Cursor` | ||
| 2. **Backend Layer** (`src/databricks/sql/backend/`) | ||
| - Thrift-based communication with Databricks | ||
| - Handles protocol-level operations | ||
| - Key files: `thrift_backend.py`, `databricks_client.py` | ||
| - SEA (Streaming Execute API) support in `experimental/backend/sea_backend.py` | ||
| 3. **Authentication** (`src/databricks/sql/auth/`) | ||
| - Multiple auth methods: OAuth U2M/M2M, PAT, custom providers | ||
| - Authentication flow abstraction | ||
| - OAuth persistence support for token caching | ||
| 4. **Data Transfer** (`src/databricks/sql/cloudfetch/`) | ||
| - Cloud fetch for large results | ||
| - Arrow format support for efficiency | ||
| - Handles data pagination and streaming | ||
| - Result set management in `result_set.py` | ||
| 5. **Parameters** (`src/databricks/sql/parameters/`) | ||
| - Native parameter support (v3.0.0+) - server-side parameterization | ||
| - Inline parameters (legacy) - client-side interpolation | ||
| - SQL injection prevention | ||
| - Type mapping and conversion | ||
| 6. **Telemetry** (`src/databricks/sql/telemetry/`) | ||
| - Usage metrics and performance monitoring | ||
| - Configurable batch processing and time-based flushing | ||
| - Server-side flag integration | ||
| ### Key Design Patterns | ||
| - **Result Sets**: Uses Arrow format by default for efficient data transfer | ||
| - **Error Handling**: Comprehensive retry logic with exponential backoff | ||
| - **Resource Management**: Context managers for proper cleanup | ||
| - **Type System**: Strong typing with MyPy throughout | ||
| ## Testing Strategy | ||
| ### Unit Tests (No Databricks account needed) | ||
| ```bash | ||
| poetry run python -m pytest tests/unit | ||
| ``` | ||
| ### E2E Tests (Requires Databricks account) | ||
| 1. Set environment variables or create `test.env` file: | ||
| ```bash | ||
| export DATABRICKS_SERVER_HOSTNAME="****" | ||
| export DATABRICKS_HTTP_PATH="/sql/1.0/endpoints/****" | ||
| export DATABRICKS_TOKEN="dapi****" | ||
| ``` | ||
| 2. Run: `poetry run python -m pytest tests/e2e` | ||
| Test organization: | ||
| - `tests/unit/` - Fast, isolated unit tests | ||
| - `tests/e2e/` - Integration tests against real Databricks | ||
| - Test files follow `test_*.py` naming convention | ||
| - Test suites: core, large queries, staging ingestion, retry logic | ||
| ## Important Development Notes | ||
| 1. **Dependency Management**: Always use Poetry, never pip directly | ||
| 2. **Code Style**: Black formatter with 100-char line limit (PEP 8 with this exception) | ||
| 3. **Type Annotations**: Required for all new code | ||
| 4. **Thrift Files**: Generated code in `thrift_api/` - do not edit manually | ||
| 5. **Parameter Security**: Always use native parameters, never string interpolation | ||
| 6. **Arrow Support**: Optional but highly recommended for performance | ||
| 7. **Python Support**: 3.8+ (up to 3.13) | ||
| 8. **DCO**: Sign commits with Developer Certificate of Origin | ||
| ## Common Development Tasks | ||
| ### Adding a New Feature | ||
| 1. Implement in appropriate module under `src/databricks/sql/` | ||
| 2. Add unit tests in `tests/unit/` | ||
| 3. Add integration tests in `tests/e2e/` if needed | ||
| 4. Update type hints and ensure MyPy passes | ||
| 5. Run Black formatter before committing | ||
| ### Debugging Connection Issues | ||
| - Check auth configuration in `auth/` modules | ||
| - Review retry logic in `src/databricks/sql/utils.py` | ||
| - Enable debug logging for detailed trace | ||
| ### Working with Thrift | ||
| - Protocol definitions in `src/databricks/sql/thrift_api/` | ||
| - Backend implementation in `backend/thrift_backend.py` | ||
| - Don't modify generated Thrift files directly | ||
| ### Running Examples | ||
| Example scripts are in `examples/` directory: | ||
| - Basic query execution examples | ||
| - OAuth authentication patterns | ||
| - Parameter usage (native vs inline) | ||
| - Staging ingestion operations | ||
| - Custom credential providers |
175 changes: 175 additions & 0 deletionsdocs/proxy_configuration.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,175 @@ | ||
| # Proxy Configuration Guide | ||
| This guide explains how to configure the Databricks SQL Connector for Python to work with HTTP/HTTPS proxies, including support for Kerberos authentication. | ||
| ## Table of Contents | ||
| - [Basic Proxy Configuration](#basic-proxy-configuration) | ||
| - [Proxy with Basic Authentication](#proxy-with-basic-authentication) | ||
| - [Proxy with Kerberos Authentication](#proxy-with-kerberos-authentication) | ||
| - [Troubleshooting](#troubleshooting) | ||
| ## Basic Proxy Configuration | ||
| The connector automatically detects proxy settings from environment variables: | ||
| ```bash | ||
| # For HTTPS connections (most common) | ||
| export HTTPS_PROXY=http://proxy.example.com:8080 | ||
| # For HTTP connections | ||
| export HTTP_PROXY=http://proxy.example.com:8080 | ||
| # Hosts to bypass proxy | ||
| export NO_PROXY=localhost,127.0.0.1,.internal.company.com | ||
| ``` | ||
| Then connect normally: | ||
| ```python | ||
| from databricks import sql | ||
| connection = sql.connect( | ||
| server_hostname="your-workspace.databricks.com", | ||
| http_path="/sql/1.0/warehouses/your-warehouse", | ||
| access_token="your-token" | ||
| ) | ||
| ``` | ||
| ## Proxy with Basic Authentication | ||
| For proxies requiring username/password authentication, include credentials in the proxy URL: | ||
| ```bash | ||
| export HTTPS_PROXY=http://username:password@proxy.example.com:8080 | ||
| ``` | ||
| ## Proxy with Kerberos Authentication | ||
| For enterprise environments using Kerberos authentication on proxies: | ||
| ### Prerequisites | ||
| 1. Install Kerberos dependencies: | ||
| ```bash | ||
| pip install databricks-sql-connector[kerberos] | ||
| ``` | ||
| 2. Obtain a valid Kerberos ticket: | ||
| ```bash | ||
| kinit user@EXAMPLE.COM | ||
| ``` | ||
| 3. Set proxy environment variables (without credentials): | ||
| ```bash | ||
| export HTTPS_PROXY=http://proxy.example.com:8080 | ||
| ``` | ||
| ### Connection with Kerberos Proxy | ||
| ```python | ||
| from databricks import sql | ||
| connection = sql.connect( | ||
| server_hostname="your-workspace.databricks.com", | ||
| http_path="/sql/1.0/warehouses/your-warehouse", | ||
| access_token="your-databricks-token", | ||
| # Enable Kerberos proxy authentication | ||
| _proxy_auth_type="kerberos", | ||
| # Optional Kerberos settings | ||
| _proxy_kerberos_service_name="HTTP", # Default: "HTTP" | ||
| _proxy_kerberos_principal="user@EXAMPLE.COM", # Optional: uses default if not set | ||
| _proxy_kerberos_delegate=False, # Enable credential delegation | ||
| _proxy_kerberos_mutual_auth="REQUIRED" # Options: REQUIRED, OPTIONAL, DISABLED | ||
| ) | ||
| ``` | ||
| ### Kerberos Configuration Options | ||
| | Parameter | Default | Description | | ||
| |-----------|---------|-------------| | ||
| | `_proxy_auth_type` | None | Set to `"kerberos"` to enable Kerberos proxy auth | | ||
| | `_proxy_kerberos_service_name` | `"HTTP"` | Kerberos service name for the proxy | | ||
| | `_proxy_kerberos_principal` | None | Specific principal to use (uses default if not set) | | ||
| | `_proxy_kerberos_delegate` | `False` | Whether to delegate credentials to the proxy | | ||
| | `_proxy_kerberos_mutual_auth` | `"REQUIRED"` | Mutual authentication requirement level | | ||
| ### Example: Custom Kerberos Settings | ||
| ```python | ||
| # Using a specific service principal with delegation | ||
| connection = sql.connect( | ||
| server_hostname="your-workspace.databricks.com", | ||
| http_path="/sql/1.0/warehouses/your-warehouse", | ||
| access_token="your-token", | ||
| _proxy_auth_type="kerberos", | ||
| _proxy_kerberos_service_name="HTTP", | ||
| _proxy_kerberos_principal="dbuser@CORP.EXAMPLE.COM", | ||
| _proxy_kerberos_delegate=True, # Allow credential delegation | ||
| _proxy_kerberos_mutual_auth="OPTIONAL" # Less strict verification | ||
| ) | ||
| ``` | ||
| ## Troubleshooting | ||
| ### Kerberos Authentication Issues | ||
| 1. **No Kerberos ticket**: | ||
| ```bash | ||
| # Check if you have a valid ticket | ||
| klist | ||
| # If not, obtain one | ||
| kinit user@EXAMPLE.COM | ||
| ``` | ||
| 2. **Wrong service principal**: | ||
| - Check with your IT team for the correct proxy service principal name | ||
| - It's typically `HTTP@proxy.example.com` but may vary | ||
| 3. **Import errors**: | ||
| ``` | ||
| ImportError: Kerberos proxy authentication requires 'pykerberos' | ||
| ``` | ||
| Solution: Install with `pip install databricks-sql-connector[kerberos]` | ||
| ### Proxy Connection Issues | ||
| 1. **Enable debug logging**: | ||
| ```python | ||
| import logging | ||
| logging.basicConfig(level=logging.DEBUG) | ||
| ``` | ||
| 2. **Test proxy connectivity**: | ||
| ```bash | ||
| # Test if proxy is reachable | ||
| curl -x http://proxy.example.com:8080 https://www.databricks.com | ||
| ``` | ||
| 3. **Verify environment variables**: | ||
| ```python | ||
| import os | ||
| print(f"HTTPS_PROXY: {os.environ.get('HTTPS_PROXY')}") | ||
| print(f"NO_PROXY: {os.environ.get('NO_PROXY')}") | ||
| ``` | ||
| ### Platform-Specific Notes | ||
| - **Linux/Mac**: Uses `pykerberos` library | ||
| - **Windows**: Uses `winkerberos` library (automatically selected) | ||
| - **Docker/Containers**: Ensure Kerberos configuration files are mounted | ||
| ## Security Considerations | ||
| 1. **Avoid hardcoding credentials** - Use environment variables or secure credential stores | ||
| 2. **Use HTTPS connections** - Even through proxies, maintain encrypted connections to Databricks | ||
| 3. **Credential delegation** - Only enable `_proxy_kerberos_delegate=True` if required by your proxy | ||
| 4. **Mutual authentication** - Keep `_proxy_kerberos_mutual_auth="REQUIRED"` for maximum security | ||
| ## See Also | ||
| - [Kerberos Proxy Example](../examples/kerberos_proxy_auth.py) | ||
| - [Databricks SQL Connector Documentation](https://docs.databricks.com/dev-tools/python-sql-connector.html) |
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.