YfinanceValidation
Validates ticker symbols and OHLCV data quality using yfinance test downloads and Great Expectations.
Component Separation
This class handles validation only. For ticker lists see YfinanceTickers, for data scraping see YfinancePipeline.
Validation Checks
| Category | Checks | Purpose |
|---|---|---|
| Schema | Required columns (Open, High, Low, Close, Volume) | Ensure complete data structure |
| Nulls | Zero NaN/null tolerance | Catch incomplete data |
| Price Logic | High ≥ Low, Open/Close within range, all prices > $0.01 | Detect bad data or API errors |
| Data Quality | Std dev > 0.01, min 10 rows, unique dates | Catch constant values and duplicates |
| Volume | Non-negative (0 allowed) | Validate trading activity |
Why 18 Checks?
Schema (1) + Nulls (5) + Price bounds (4) + Price logic (4) + Data quality (3) + Unique dates (1) = 18 total checks
How It Works
Runs 18 Great Expectations checks to catch API glitches, incomplete data, and bad values before they hit the database. Checks cover schema completeness, null values, price logic (High >= Low, Open/Close within bounds), data quality (stddev > 0.01, min 10 rows, unique dates), and reasonable values (prices > $0.01, volume >= 0).
Returns a dict with validation results including which checks passed/failed. If the Great Expectations framework itself fails, raises an exception for Airflow to retry.
Validation Failures
All failures raise exceptions - no silent errors. Airflow retries automatically with exponential backoff.
Constants
| Constant | Value | Purpose |
|---|---|---|
TICKER_VALIDATION_TEST_DAYS |
21 | Calendar days for ticker test |
MIN_TRADING_DAYS_FOR_VALIDATION |
10 | Minimum trading days required |
MIN_OHLCV_ROWS_FOR_VALIDATION |
10 | Minimum rows for validation |
MIN_PRICE_VALUE |
0.01 | Minimum valid price |
MIN_STDDEV_VALUE |
0.01 | Minimum standard deviation |
Ticker Validation
Each ticker is validated by downloading 21 calendar days of data (about 15 trading days). Valid tickers must return ≥10 trading days back. Catches delisted stocks, bad symbols, and API issues before bulk downloading.
Why 21 Days?
21 calendar days = ~15 trading days. Accounts for weekends, holidays, and newly listed stocks.
API Reference
Validation for ticker symbols and OHLCV data quality - Ticker validation with test downloads - OHLCV validation with Great Expectations
Source code in data_pipeline/sec_data_pipeline/yfinance/yfinance_validation.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
validate_ohlcv(df, ticker)
Runs 18 Great Expectations checks on OHLCV data to catch API glitches, incomplete data, and bad values before they hit the database.
Checks cover schema completeness, null values, price logic (High >= Low, Open/Close within bounds), data quality (stddev > 0.01, min 10 rows, unique dates), and reasonable values (prices > $0.01, volume >= 0).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with Open, High, Low, Close, Volume columns |
required |
ticker
|
str
|
Symbol for error messages |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with validation results: - valid (bool): True if all checks passed - total_checks (int): Should be 18 - passed (int): Checks that passed - failed (int): Checks that failed - failed_checks (list): Names of failed expectations |
Raises:
| Type | Description |
|---|---|
Exception
|
If Great Expectations framework fails |
Source code in data_pipeline/sec_data_pipeline/yfinance/yfinance_validation.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
validate_ticker(ticker, test_days=21)
Validates a ticker by test-downloading recent data and checking if yfinance returns enough trading days. Catches delisted stocks, bad symbols, and API issues before bulk downloading.
Downloads 21 calendar days (about 15 trading days) and checks if we got at least 10 trading days back. Threshold accounts for weekends, holidays, and newly listed stocks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ticker
|
str
|
Symbol to validate |
required |
test_days
|
int
|
Calendar days to test (default 21) |
21
|
Returns:
| Type | Description |
|---|---|
bool
|
True if yfinance returned >=10 trading days, False otherwise |
Source code in data_pipeline/sec_data_pipeline/yfinance/yfinance_validation.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |