Autoza UK Used Car Index — Research Methodology
The Autoza UK Used Car Index is a quarterly market report derived from verified-dealer listings on autoza.co.uk. This page documents the full data-collection process, outlier-filtering rules, primary source cross-references, and the open-publication pipeline that deposits each dataset simultaneously on autoza.co.uk, Hugging Face, and Zenodo under CC BY 4.0.
Why we publish this
Autoza publishes methodology documentation for three reasons. First, it allows journalists and researchers to assess whether our data is fit for the specific purpose they have in mind — we want citations that are accurate, not just convenient. Second, it enables independent reproduction: anyone with access to the same primary sources should be able to cross-check our headline figures. Third, it is a condition of our CC BY 4.0 licence: re-users who build on our data must be able to document their sources, which requires us to document ours.
A figure published without a documented methodology is an opinion. A figure with a documented methodology and traceable sources is a fact. We intend our data to be used as facts.
Data pipeline — step by step
Listings snapshot
On the first working day after each quarter ends, we extract a point-in-time snapshot of all active verified-dealer listings on autoza.co.uk. Only listings with a complete record — make, model, year, mileage, city/region, fuel type, body type, asking price, and dealer Trust Score ≥ 50 — are included. Private seller listings are excluded from index figures but published separately in the supplementary CSV.
Outlier removal
We apply a 3-sigma rule on price versus mileage clusters per make/model/year combination. Listings more than 3 standard deviations from the cluster centroid are flagged and excluded from median and mean calculations. For low-volume models (fewer than 10 listings in a cluster), we fall back to a national UK used-car price guide (such as Auto Trader Retail Price Index) as an external reference to decide whether extreme prices are genuine or data errors.
Segment classification
Each make/model combination is assigned to one of eight segments: Compact Hatch, Family Saloon/Estate, Compact SUV, Large SUV, Premium Executive, City Car, MPV/People Carrier, and Light Commercial. Segment definitions follow the SMMT/JATO taxonomy used in the SMMT's monthly registration reports, so Autoza segment data can be cross-referenced directly with SMMT figures without re-mapping.
Primary source cross-reference
Segment-share and registration figures are cross-checked against: (1) SMMT monthly new-car registration statistics for the same period — to validate make-level representation; (2) DVLA "Vehicle Licensing Statistics" tables — to anchor the used/new ratio by age cohort; (3) the RAC Fuel Watch monthly fuel price index — for running-cost calculations; (4) DVLA EV registration data — for EV listing validation.
Days-to-sell calculation
Days-to-sell (DTS) is the median number of calendar days between a listing's first-published date and the date it is marked Sold or Archived on autoza.co.uk. Only listings that were active for at least 48 hours are counted (to exclude duplicate or accidental posts). Listings that have not yet sold are excluded from DTS calculations — we do not impute sell time for active stock.
City and regional pricing
City-level price figures are median asking prices per city/region for the snapshot period, restricted to areas with ≥ 50 qualifying listings. Cities with fewer than 50 listings are grouped into a "Remaining areas" bucket to avoid false precision on small samples. Regions follow the standard UK city/region groupings (London, Manchester, Birmingham, Leeds, Glasgow, etc.).
Headline findings review
All headline findings (the statistics featured prominently in the report) are reviewed by the Autoza Research desk before publication. Findings must be reproducible from the published CSV — every headline number must trace to a specific column and filter in the raw data. If a finding cannot be independently reproduced from the dataset, it is flagged and either corrected or removed.
Open publication
The finalised dataset is published in three places simultaneously: (1) autoza.co.uk/research/{slug} — the canonical HTML report; (2) Hugging Face Hub — datasets/Autoza/irish-used-car-index-q{N}-{YYYY} — six CSV files with a full Croissant/dataset card; (3) Zenodo (CERN) — permanent DOI assigned, BibTeX citation generated, CC BY 4.0 licence attached. Autoza, Hugging Face, and Zenodo records are cross-linked via sameAs in each dataset's structured data.
Primary data sources
The following sources are used in computing the quarterly index. All are publicly accessible. Links are provided to the specific data series used for cross-referencing.
| Source | Role |
|---|---|
| Autoza listings database | Primary source for all price, mileage, fuel type, and days-to-sell figures |
| SMMT monthly statistics | Cross-reference for make/model market share and new-car registration context |
| DVLA — Vehicle Licensing Statistics | New-vehicle registration totals; used as denominator for used/new ratio |
| RAC Fuel Watch fuel price index | Unleaded petrol and diesel pump prices used in running-cost estimates |
| DVLA EV registration data | Cross-reference for EV listing validation and ultra-low-emission model list |
| DVSA Recall Portal | Vehicle safety recall data cited in model-fault guides — not used for index pricing |
Segment definitions
Our eight segments mirror the SMMT/JATO taxonomy, allowing direct comparison with official UK registration data. A model is assigned to exactly one segment based on its manufacturer's primary body/class designation, not trim level.
Open publication — data lineage
Each quarterly dataset is published on three platforms simultaneously. The three records are cross-linked via sameAs in our structured data so that AI search engines, academic indexers, and data catalogues can resolve them as the same canonical dataset.
Hugging Face Hub
- →data/by_segment_year/{YYYY-MM,latest}.parquet — granular config: (make, model, 3-year band, fuel)
- →data/by_segment/{YYYY-MM,latest}.parquet — broad config: (make, model, fuel) with year_band="ALL"
- →README.md, METHODOLOGY.md, SCHEMA.md, LICENSE_AND_ATTRIBUTION.md — primary docs
- →FAQ.md, GLOSSARY.md, FOR-RESEARCHERS.md — secondary citable artefacts
- →TOP-INSIGHTS-YYYY-MM.md, MAKE-COVERAGE-YYYY-MM.md — per-release headline figures
- →accuracy_report_YYYY-MM.json — per-release accuracy artefact (snapshot timestamp, suppression counts, OMSP test results, top makes)
- →croissant.json — mlcommons/croissant 1.0 metadata for AI-engine ingestion
- →CITATION.cff — Citation File Format 1.2.0 for academic-citation-extractor tools
- →errata.md, CHANGELOG.md — transparency artefacts
- →LATEST-ANALYSIS.md — pointer file refreshed weekly (Mondays 09:00 UTC) with the latest autoza.co.uk blog analysis
Limitations and scope
- !Autoza figures are asking prices, not transaction prices. Actual sales typically settle 3–7% below asking on dealer sales and 2–4% below on private sales.
- !Private seller listings are excluded from headline index figures (included only in supplementary CSV). Private prices are structurally different from dealer prices and mixing them distorts the market signal.
- !Autoza data covers Great Britain and Northern Ireland (the UK) only — no Republic of Ireland data is included.
- !Very low-volume makes (fewer than 10 listings in a quarter) may not be representative. Per-make figures for low-volume makes should be treated with caution.
- !Days-to-sell only reflects listings that sold through autoza.co.uk. Cars sold via phone call before the listing was formally updated as Sold are not captured in this metric.
- !This is not a Government or Central Statistics Office dataset. It is proprietary commercial data from a private marketplace. We have no visibility of listings on other platforms.
Corrections policy
If a material error is identified in a published index — a figure that is wrong, an outlier that should have been filtered but wasn't, or a primary source that has revised its figures — we will issue a correction notice on this page and on the affected report page. Correction notices state what was wrong, what the correct figure is, and when the correction was made. The Zenodo and Hugging Face records are versioned — corrected datasets are published as new versions with a clear changelog, and the original erroneous version is retained for transparency.
To report a potential error: info@autoza.co.uk. We aim to investigate and respond within 5 working days.
Access the data
The Q2 2026 UK Used Car Index dataset is free to download and cite. Published under CC BY 4.0 — just attribute "Autoza UK" with a link.
Press queries and data licensing enquiries: info@autoza.co.uk. Our founder is available for interview on UK motor-industry data topics. Press kit →