I'm looking for a good package to train a linearquantile regression model, i.e. $\hat y = \sum_{i=1}^n w_i \cdot X_i$. With $x_i$ are the input features, and $w_i$ are the bounded trainable weights. This should all be trained with quantile loss.
Here's the catch: I want to optimize under the constraint that all weights $w_i$ are positive (or strictly positive, doesn't matter much). I just cannot seem to find a good package that does it.
Preferably, I'd like to fit 50'ish quantiles (0.01, 0.03, ..., 0.99) simultaneously (both computationally and memory efficient) (I have access to multiple cpu's).
- Scikit-learn's quantile regressor doesn't allow bounded parameters.
- Statsmodels doesn't allow bounded parameters either as far as I understand? Although I do see something interesting about ``Estimate a quantile regression model using iterative reweighted least squares.'' What is this about? Maybe I can use bounded least squares in combination with their iterative algorithm? I can't seem to find that approach in their sources, does anyone know about this?
- I tried my own implementation using linear programming, which used way too much memory.
def bounded_quantile_regression(X, y, quantile): n_samples, n_features = X.shape # Variables: [beta_1, ..., beta_p, u_1, ..., u_n, v_1, ..., v_n] # u_i, v_i are positive parts for quantile regression loss c = np.hstack([ np.zeros(n_features), # Coefficients (no cost in objective) quantile * np.ones(n_samples), # u_i (1-quantile) * np.ones(n_samples) # v_i ]) # For each sample: y_i - X_i beta = u_i - v_i => X_i*beta + v_i - u_i = y_i A_eq = np.hstack([X, -np.eye(n_samples), np.eye(n_samples)]) b_eq = y coef_bounds = [(0, None)] * n_features u_bounds = [(0, np.inf)] * n_samples v_bounds = [(0, np.inf)] * n_samples bounds = coef_bounds + u_bounds + v_bounds result = linprog( c, A_eq=A_eq, b_eq=b_eq, bounds=bounds, method='highs' ) if not result.success: raise ValueError("Optimization did not converge.") beta = result.x[:n_features] return beta- Scipy.optimize was really slow, even when passing a gradient function.
- In the end I settled on a torch implementation which uses softplus activated variables with a custom QR loss function (autograd wasn't fast enough). But this feels a bit like unnecessary complexity, but in implementation as in package-usage (I don't need torch anywhere else in my codebase).
Does anyone know any better alternatives? Because I've kinda ran out of ideas, but it feels like it shouldn't be such a hard problem to tackle.
1 Reply1
Since your problem boils down to solving a sequence of convex constrained linear optimization problems, the most direct (and potentially fast) approach is to use a dedicated solver that can handle large-scale sparse problems efficiently.
A. CVXPY (Python Library)
TheCVXPY library is a powerful tool for convex optimization modeling. It allows you to express your problem mathematically and then passes it to highly optimized underlying solvers (likeOSQP,ECOS, orMOSEK).
The structure is clean and directly reflects your linear programming formulation without you having to manage the large constraint matrix manually:
Python
import cvxpy as cpimport numpy as npdef cvxpy_nnlqr(X, y, quantile): n_samples, n_features = X.shape # Define variables: beta (coefficients) and residuals (r) beta = cp.Variable(n_features) r = y - X @ beta # Residual vector # 1. Define the Quantile Loss (Check Loss / Pinball Loss) # cp.pos(x) returns max(x, 0) # This is equivalent to: quantile * u + (1 - quantile) * v loss = cp.sum(cp.maximum(quantile * r, (quantile - 1) * r)) # 2. Define Constraints constraints = [ beta >= 0 # Non-negativity constraint ] # 3. Solve problem = cp.Problem(cp.Minimize(loss), constraints) problem.solve(solver=cp.MOSEK) # MOSEK is often fastest for large LPs if problem.status in ["optimal", "optimal_inaccurate"]: return beta.value else: raise ValueError("Optimization failed.")# To handle 50 quantiles efficiently, you would loop and parallelize the CVXPY calls.Why this is good: It handles the complexity of the constrained optimization, leveraging industrial-strength solvers.
B. Specialized Libraries (for the IRWLS approach)
Libraries focused on sparse data or specific types of generalized linear models (GLMs) sometimes include these solvers. Searching for "Non-Negative Quantile Regression Python" often points back to academic or specialized optimization libraries rather than mainstream ML packages.
2. 💡 Decoding Statsmodels' IRWLS
You hit on a very relevant concept with theIteratively Reweighted Least Squares (IRWLS) method used by Statsmodels for quantile regression.
What is IRWLS for Quantile Regression?
Quantile Regression (QR) minimizes the weighted $\ell_1$ loss:
$$\min_{\mathbf{w}} \sum_{i=1}^N \rho_\tau(y_i - \mathbf{x}_i^T \mathbf{w})$$
where $\rho_\tau(r) = r(\tau - \mathbb{I}_{r < 0})$ is the pinball loss.
IRWLS is a technique that approximates this $\ell_1$ problem by solving a sequence ofWeighted Least Squares (WLS) problems.
Start with an initial estimate $\mathbf{w}^{(0)}$.
Iterate: In step $k+1$, estimate the weights $\mathbf{w}^{(k+1)}$ by solving a WLS problem, where the weights for each observation $i$ are determined by the residual $\mathbf{r}_i = y_i - \mathbf{x}_i^T \mathbf{w}^{(k)}$ from the previous step:
$$\text{Weight}_i = \frac{\tau}{|r_i|} \text{ if } r_i \ge 0 \quad \text{ and } \quad \text{Weight}_i = \frac{1-\tau}{|r_i|} \text{ if } r_i < 0$$
The WLS problem is essentially solving:
$$\min_{\mathbf{w}} \sum_{i=1}^N \text{Weight}_i (y_i - \mathbf{x}_i^T \mathbf{w})^2$$
The Problem with Combining NNLS
The standard WLS problem is solved using $\ell_2$ (squared) loss, which is easy to solve, even with non-negativity constraints (usingNon-Negative Least Squares (NNLS)).
The issue is that the $\ell_2$ loss being minimized in each iterationis only an approximation of the original $\ell_1$ loss. If Statsmodels doesn't expose a hook to swap its internal WLS solver for anNNLS solver, you cannot simply add the non-negativity constraint.
Conclusion on Statsmodels: While the IRWLS technique is sound for unconstrained QR,Statsmodels does not expose a standard way to plug in NNLS for constrained QR. You would likely have to fork or deeply modify the Statsmodels source code, which is usually not practical.
3. 🎯 Your Scipy LP Implementation (Fixing the Memory/Speed)
Your customlinprog implementation is mathematicallyperfect for solving NNLQR. Quantile regressionis fundamentally a constrained linear programming problem.
The reason it's slow/memory-intensive is likely the sheer size of the constraint matrix.
$n_{features} \approx 50$
$n_{samples}$ (unknown, but likely large)
Total variables (size of $c$ and $\mathbf{w}$): $n_{features} + 2 \cdot n_{samples}$
Total equality constraints (size of $A_{eq}$): $n_{samples}$
If $n_{samples} = 100,000$, your problem has $\approx 200,000$ variables and $100,000$ constraints, leading to a large dense matrix $A_{eq}$.
Solution: Sparsity and Parallelization
Use Sparse Matrices: $A_{eq}$ is largely sparse. Only the $X$ block is dense. The $-(I, I)$ blocks are diagonal (sparse).
- Action: Try passing $A_{eq}$ as a
scipy.sparse.csr_matrix(Compressed Sparse Row). This should dramatically reduce memory use and may speed up thehighs solver.
- Action: Try passing $A_{eq}$ as a
Focus on $\mathbf{X}$ Optimization: The
highssolver is generally very fast. For high-dimensional problems, you may need a specialized commercial solver (like MOSEK or Gurobi) that CVXPY can interface with, as they are often faster than open-source alternatives.Parallelize Quantiles: Since the problems for different quantiles areindependent, you can efficiently use Python's
multiprocessingorconcurrent.futuresto solve them in parallel across your available CPUs. This is critical for 50 quantiles.
Python
from concurrent.futures import ProcessPoolExecutorfrom functools import partial# ... [your bounded_quantile_regression function definition] ...def parallel_quantile_fit(X, y, quantiles, max_workers=None): with ProcessPoolExecutor(max_workers=max_workers) as executor: # Use functools.partial to fix X and y for the map operation fit_one = partial(bounded_quantile_regression, X, y) # Map the list of quantiles to the fit function results = executor.map(fit_one, quantiles) # results will be an iterable of beta vectors for each quantile return list(results)Explore related questions
See similar questions with these tags.