If you identify any API bugs or errors in the data please record them here.
I can find all the announced drivers for the upcoming season except for the recently announced Merhi? Also, are you planning to treat Marussia the same as last season or create a new Manor Marussia team?
Thanks for the good work – using it to run a Fantasy F1 league for some friends.
I’ve added Merhi (driverId: merhi) and Manor Marussia (constructorId: manor)
Two minor bugs in the results table:
– Manny Ayulo’s shared drive to 3rd place in the 1951 Indy 500 (resultId 20185) has NULL as position, instead of 3
– Ralf Schumacher finished 8th in the 2005 San Marino GP (resultId 1207), but has been demoted to 9th post-race – yet position value for that entry is NULL and positionText is “8”
Also, the Heidfeld grid position from Australia 2000, mentioned here: http://ergast.com/mrd/bugs/comment-page-2#comment-12125 is still incorrect in the database.
Thanks Emkael – All now corrected.
There is a mistake in this qualifying: http://ergast.com/api/f1/2015/4/qualifying
This is not Jos Verstappen in P15 but his son, Max.
Thanks Brieuc – now corrected.
in the 2013 season result table, Mark Webber has not the “permanentNumber” field.
For example, here: http://ergast.com/api/f1/2013/1/results.json
Permanent numbers were only introduced this year, and don’t apply to past drivers – only current ones.
for the 2015 bahrein gp, Romain Grosjean has no “Time” field after “Status:Finished” field in the Race results table.
Hi Fabrizio – Thanks for the heads-up. Time now added.
I’ve taken a shot at cleaning up the positionText column of the results table.
Albers’ position in China 2006 (resultId = 1087) has some garbage after “15” (probably leftovers from a footnote).
Apart from that, I’ve checked the non-numeric values for that field and assumed that their strict meaning is as follows (correct me if I’m wrong, /methods/results page does not elaborate on these symbols):
– ‘E’ denotes entries excluded before the grid is formed (in qualifying or in practice)
– ‘D’ denotes entries disqualified after the race has started
– ‘F’ denotes entries which failed to (pre)qualify
– ‘W’ denotes non-starters (entries which qualified but did not take the start – or successful restart in case of a lap 1 red flag)
– ‘R’ denotes retirements (entries which failed to run to the finish and were not classified)
– ‘N’ denotes non-classified entries (which finished the race but failed to complete required number of laps)
By these conditions, there are several adjustements to the positionText values, mostly detected when comparing positionText with status.status:
Larini was excluded from qualifying in San Marino 1988 (resultId 8551), so should be ‘E’ (instead of ‘D’).
Numerous entries are marked as retirements (‘R’) but where in fact DSQ (and have a correct ‘Disqualified’ status). These are: Bonetto in Germany 1952 (resultId 19792), Magill in 1958 Indy 500 (18559), Winkelhock in Netherlands 1983 (10982), De Cesaris in Spain 1993 (5704), Bellof in Dallas and South Africa 1984 (10463, 10267) and Brundle in South Africa 1984 (10265). The last three were retirements, later revised to disqualifications after Tyrrell were stripped of all their 1984 results.
Four retirements are marked as non-classifiers: Tuero in Luxembourg 1998 (15822), Buettler in Italy 1971 (3977), both with engine failures, and Ickx in USA 1971 (15878), Pescarolo in France 1971 (15732), alternator and gearbox failures, respectively.
Two qualifying accidents for qualified drivers (so, according to the code, should be ‘W’ – non-starters) are marked as non-classified: Donohue and Henton in Austria 1975 (14469, 14470).
Two qualifying accidents for non-qualified drivers (either due to 107% rule or due to grid too small, so these should be ‘F’ – non-qualifiers) are marked as non-starters/withdrawals: Fisichella in France 2002 (2435) and Montermini in Spain 1994 (4512).
Several non-classified finishes are marked as retirements: Kelly in GB 1951 (19955), Hawthorn in Italy 1952 (19834), Van der Lof in Netherlands 1952 (19809), Beltiose in Belgium 1973 (15059), Jarier in spain 1974 (14616), Brundle in Australia 1985 (10213), Gugelmin in France 1989 (8119), Alliot in Mexico 1989 (8008) and Dalmas in Italy 1990 (7408).
Also, the two BAR entries in the 2005 Australian GP (1152, 1149) are marked as retirements, while in fact both cars were pulled into the pits on the last lap, so both Button and Sato were classified in the race (11th for entry 1149 and 14th for 1152).
I know that’s a lot of changes, so here’s the SQL dump for these entries, so you can verify and apply them more easily: https://gist.github.com/emkael/0c56b135aeb1a86086f0
There’s probably more work to do on the status.status side of similar issues, but I don’t have any idea how to tackle these at the moment (and status.status values are less likely to be aggregated to produce some meaningful stats then positionText values).
First of all great work and thanks for making this data available to the public!
I’ve found some inconsistencies in the 1950 – 2015 Formula One Database Image. Table `results` has some typos and invalid time and milliseconds values:
wrong time data for resultId 15520 time “29:17.3” => “1:29:16.660” and milliseconds = 5356660
wrong milliseconds for resultId 20291 millisecond 11197008 => 11197800
typo for resultId 4721 time “+1:38:34.154” => “1:38:34.154”
typo for resultId 5387 time “1.48:00.185” => “1:48:00.185” and millisecond 6480185
typo for resultId 13339 time “1:42:.52.22” => “1:42:52.220” and milliseconds = 6172220
typo for resultId 20539 time “1:24.38.200” => “1:24:38.200” and milliseconds = 5078200
typo for resultId 20563 time “1:27.38.684” => “1:27:38.864” and milliseconds = 5258864
typo for resultId 20611 time “1:29.04.268” => “1:29:04.268” and milliseconds = 5344268
typo for resultId 21888 time “1:41.14.711” => “1:41:14.711” and milliseconds = 6074711
Queries to fix:
UPDATE results SET time = ‘1:29:16.660’, milliseconds = 5356660 WHERE resultId = 15520;
UPDATE results SET milliseconds = 11197800 WHERE resultId = 20291;
UPDATE results SET time = ‘1:38:34.154’ WHERE resultId = 4721;
UPDATE results SET time = ‘1:48:00.185’, milliseconds = 6480185 WHERE resultId = 5387;
UPDATE results SET time = ‘1:42:52.220’, milliseconds = 6172220 WHERE resultId = 13339;
UPDATE results SET time = ‘1:24:38.200’, milliseconds = 5078200 WHERE resultId = 20539;
UPDATE results SET time = ‘1:27:38.864’, milliseconds = 5258864 WHERE resultId = 20563;
UPDATE results SET time = ‘1:29:04.268’, milliseconds = 5344268 WHERE resultId = 20611;
UPDATE results SET time = ‘1:41:14.711’, milliseconds = 6074711 WHERE resultId = 21888;
Query to show all invalid time and/or milliseconds values (95 in dump 27/09/2015):
SELECT results.resultId, results.raceid, results.time, results.milliseconds, results.laps,
(SELECT SEC_TO_TIME(SUM(milliseconds)/1000) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid AND lap<=results.laps) AS lapTimes_time,
(SELECT SUM(milliseconds) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid AND lap<=results.laps) AS lapTimes_milliseconds,
(SELECT COUNT(*) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid) AS lapTimes_laps
WHERE milliseconds IS NOT NULL AND results.milliseconds != (SELECT SUM(milliseconds) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid);
Query to show all finished drivers with a time in milliseconds smaller than the race winner:
–This is a slow query, add an index to speedup things ALTER TABLE results ADD KEY raceId(raceId);–
SELECT resultId, time, milliseconds
FROM results r
WHERE statusId = 1 AND milliseconds IS NOT NULL AND milliseconds “1:36.827”
wrong q1 data for qualifyId 500 q1 = “1:17.806*” => “1:17.806”
wrong q1 data for qualifyId 1633 q1 = “Â” => NULL
Queries to fix:
UPDATE qualifying SET q1 = NULL WHERE q1 = ”;
UPDATE qualifying SET q2 = NULL WHERE q2 = ”;
UPDATE qualifying SET q3 = NULL WHERE q3 = ”;
UPDATE qualifying SET q1 = ‘1:36.827’ WHERE qualifyId = 409;
UPDATE qualifying SET q1 = ‘1:17.806′ WHERE qualifyId = 500;
UPDATE qualifying SET q1 = NULL WHERE qualifyId = 1633;
Query to show all typos in q1, q2 and q3:
(q1 is not null AND (ROUND(LENGTH(q1)-LENGTH(REPLACE(q1,”:”,””))/1)!=1 OR ROUND(LENGTH(q1)-LENGTH(REPLACE(q1,”.”,””))/1)!=1)) OR
(q2 is not null AND (ROUND(LENGTH(q2)-LENGTH(REPLACE(q2,”:”,””))/1)!=1 OR ROUND(LENGTH(q2)-LENGTH(REPLACE(q2,”.”,””))/1)!=1)) OR
(q3 is not null AND (ROUND(LENGTH(q3)-LENGTH(REPLACE(q3,”:”,””))/1)!=1 OR ROUND(LENGTH(q3)-LENGTH(REPLACE(q3,”.”,””))/1)!=1));
Queries to fix:
UPDATE qualifying SET q1 = CONCAT(SUBSTRING(q1,1,4),’.’,SUBSTRING(q1,6)) WHERE ROUND(LENGTH(q1) – LENGTH(REPLACE(q1,”:”,””))/1) = 2;
UPDATE qualifying SET q1 = CONCAT(SUBSTRING(q1,1,1),’:’,SUBSTRING(q1,3)) WHERE ROUND(LENGTH(q1) – LENGTH(REPLACE(q1,”.”,””))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,4),’.’,SUBSTRING(q2,6)) WHERE ROUND(LENGTH(q2) – LENGTH(REPLACE(q2,”:”,””))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,1),’:’,SUBSTRING(q2,3)) WHERE ROUND(LENGTH(q2) – LENGTH(REPLACE(q2,”.”,””))/1) = 2;
UPDATE qualifying SET q3 = CONCAT(SUBSTRING(q3,1,4),’.’,SUBSTRING(q3,6)) WHERE ROUND(LENGTH(q3) – LENGTH(REPLACE(q3,”:”,””))/1) = 2;
UPDATE qualifying SET q3 = CONCAT(SUBSTRING(q3,1,1),’:’,SUBSTRING(q3,3)) WHERE ROUND(LENGTH(q3) – LENGTH(REPLACE(q3,”.”,””))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,1),’:’,SUBSTRING(q2,3,2),’.’,SUBSTRING(q2,6)) WHERE SUBSTRING(q2,2,1) = ‘.’;
These 95 values from your check query are caused by three different reasons:
1. Most of them are caused by “incorrect” race winner overall time.
Data in results table comes from the same source as Wikipedia (or Wikipedia itself). Meanwhile, lap time data comes from the same source as FORIX (or FORIX itself).
Overall times for race winners differ between the sources – if you set the times in results table to FORIX times (and recalculate milliseconds for entries that finished on the lead lap as these values are most likely derived from offsets and milliseconds value for the winner), aggregated lap times check out.
2. There are some typos in results table for offsets (values in Ergast differ from both Wikipedia values and FORIX values). These are trivial to fix.
3. Post-race time penalties, which contribute to results time, but obviously don’t appear in lap times. These are fine.
But, most important of all, the way you propose to fix some of the times in results table (the times that were not properly formatted) does not maintain milliseconds values for races in question – as these values are derived from race winner milliseconds values (which you correct) and from time offsets for each entry.
I’m attaching queries which fix these issues in your the races you’ve spotted and fix overall times for entries mentioned above in points 1-2: https://gist.github.com/emkael/72ef27cd5729494ab3bf
These must be applied on the original data, due to me being lazy and using relative corrections for milliseconds values.
PS I wonder if we’re giving the maintainer a headache now.
Many thanks for the feedback. Struggling with work-life balance at the moment but I’ll try to make these updates when I get some peace and quiet.
First of all, thank you very much for maintaining this data. What a great resource! I have found my first discrepancy and want to check to see if it’s incorrect data or something I’m overlooking.
Jackie Stewart shows “wins:5” on the “driverStandings” endpoint, but 4 total 1st place finishes when hitting the “results” endpoint.
http://ergast.com/api/f1/1973/drivers/stewart/driverStandings //shows 5 wins
http://ergast.com/api/f1/1973/drivers/stewart/results // records a total of 4 wins
I’ve found an error in the data. This table (http://ergast.com/api/f1/2015/10/qualifying) contains incorrect information about Lewis Hamilton results at Q2. His time was 1:22.285, not 1:27.94.
Hey, thanks for your job mate, it’s awesome!
I was wondering if you could submit the current database updated. I’ve saw that you have updated races for season 2016 but in dump the 2016 season is missing.
I’ve updated the database.
Rio Haryanto and Pascal Wehrlein s are the wrong way around.
Haryanto has 94 and Wehrlien has 88, they should be the other way round. Note that the number attributes in the are however correct.
I used XML tags in my last post which deleted some wording. To clarify, Rio Haryanto and Pascal Wehrlein PermanentNumbers are the wrong way around.
Erk! Thanks for the heads-up Mike. Now fixed.
I think there is a bug for the pitstop data for the Australian Grand Prix (2016). Almost all the drivers are showing up as having had an 18 minute (!) pitstop on lap 18:
That must be the red flag, isn’t it?
From what I can tell, red flag periods are inconsistently counted as pit stops – the first red flag of the 2014 Japanese GP is present in the pit stop data, but none of the previous red flag periods is there.
Apologies, being a bit slow, that is presumably tracking the red flag period? Is that officially counted as a pitstop? Similarly, I noticed that drive through penalties are also included, is that also officially a pitstop as well? e.g.
It looks like the official records include all visits to the pits. However, I’m not keen to do any manual editing because of the potential for introducing errors.
The database includes José María López in the drivers table (driverId=809).
Isn’t that just a leftover from the failed USF1 entry?
True. He doesn’t appear in any results so I guess it makes sense to remove him (and any other phantom entries)
I didn’t run into any other “phantoms”.
However, I did a double-check on the url column for the drivers table and got some corrections for pages which are ambigous (or redirected) on Wikipedia.
I’ve dumped them to: https://gist.github.com/emkael/c03a3a5306aa1d6e8581595a3a9fb963
2003 Hungarian GP, Qualifying (qualifyId = 4019)
2003 Italian GP, Qualifying (qualifyId = 4038)
I think driverId must be 47 (Zsolt Baumgartner) instead of 42 (Antonio Pizzonia)
Thanks for the great API!
Thanks RLaci – I have a backlog of updates which I’ll run through sometime soon.
hi, i get a file corrupt message when extracting the latest (todays) ansi db image ?
Try the latest version – I’ve rebuilt it.
Hi Chris, this one is ok now . thanks !
hi, it seems that for most of the 2010 races the name is inconsistent with the rest of the database .e.g. . Japanese GP (in 2010) i.s.o. Japanese Grand Prix (in other years) .. would it be possible to update that in the source ?
Thanks for the heads-up! Now fixed.
thx chris !
There’s also a “Belgium Grand Prix” instead of a “Belgian Grand Prix” in 2005 (raceId 86).
Thanks emkael – now corrected.
when trying to unzip
f1db.sql.gz MySQL 5.1 database dump (4.5 MB)
f1db_ansi.sql.gz MySQL ANSI database dump (5.0 MB)
They worked for me using gunzip but I’ve rebuilt them just in case.
Thanks still no good for me: I get a CRC error…. downloading with chrome and using 7z… I will use wget and try with gunzip thanks
hi, a question related to the processing of “after race” standing changes such as happened in mexico yesterday.. will you process such changes later on so that they are part of the next database export ?
Hi, the ranking for the Mexican grand prix have change, the new ranking is :
1 / Hamilton | 2/ Rosberg | 3 / Ricciardo | 4 / Verstappen | 5 / Vettel | 6 / Raikkonen
Can you update it ? =) thank you for your work !
Thx – all done. Glad someone is paying attention
hi chris, great, thanks
Hi there –
The fastest lap for Mexico is currently incorrect. It’s showing as Max Verstappen but should be Daniel Ricciardo
hi, i also noticed like mike that most or all fastestlaptime data in driverresults does not match the laptime data for race 966
Thanks for the warning – now fixed.
First, congratulations for the site, the databases and the API. And thank you, thank you, thank you for the amazing job gathering all the data and making it available for us to consume.
I’ve found what I think it is a duplicate entry on the constructors standings table. There are two rows for constructor Id 7, race Id 75. I believe there should be only one entry per constructor per race on that table. The incorrect entry is constructorstandingsid = 24518.
I found the issue while working on some indices to speed up some of the data crunching queries I am running. I’ve put these indices in a GitHub Gist, in case you find they would help increasing the API performance.
Again, thanks for the great work!
Mail (will not be published) (required)
Notify me of follow-up comments by email.
Notify me of new posts by email.
and Comments (RSS)