If you identify any API bugs or errors in the data please record them here.
I can find all the announced drivers for the upcoming season except for the recently announced Merhi? Also, are you planning to treat Marussia the same as last season or create a new Manor Marussia team?
Thanks for the good work – using it to run a Fantasy F1 league for some friends.
I’ve added Merhi (driverId: merhi) and Manor Marussia (constructorId: manor)
Two minor bugs in the results table:
– Manny Ayulo’s shared drive to 3rd place in the 1951 Indy 500 (resultId 20185) has NULL as position, instead of 3
– Ralf Schumacher finished 8th in the 2005 San Marino GP (resultId 1207), but has been demoted to 9th post-race – yet position value for that entry is NULL and positionText is “8″
Also, the Heidfeld grid position from Australia 2000, mentioned here: http://ergast.com/mrd/bugs/comment-page-2#comment-12125 is still incorrect in the database.
Thanks Emkael – All now corrected.
There is a mistake in this qualifying: http://ergast.com/api/f1/2015/4/qualifying
This is not Jos Verstappen in P15 but his son, Max.
Thanks Brieuc – now corrected.
in the 2013 season result table, Mark Webber has not the “permanentNumber” field.
For example, here: http://ergast.com/api/f1/2013/1/results.json
Permanent numbers were only introduced this year, and don’t apply to past drivers – only current ones.
for the 2015 bahrein gp, Romain Grosjean has no “Time” field after “Status:Finished” field in the Race results table.
Hi Fabrizio – Thanks for the heads-up. Time now added.
I’ve taken a shot at cleaning up the positionText column of the results table.
Albers’ position in China 2006 (resultId = 1087) has some garbage after “15″ (probably leftovers from a footnote).
Apart from that, I’ve checked the non-numeric values for that field and assumed that their strict meaning is as follows (correct me if I’m wrong, /methods/results page does not elaborate on these symbols):
- ‘E’ denotes entries excluded before the grid is formed (in qualifying or in practice)
- ‘D’ denotes entries disqualified after the race has started
- ‘F’ denotes entries which failed to (pre)qualify
- ‘W’ denotes non-starters (entries which qualified but did not take the start – or successful restart in case of a lap 1 red flag)
- ‘R’ denotes retirements (entries which failed to run to the finish and were not classified)
- ‘N’ denotes non-classified entries (which finished the race but failed to complete required number of laps)
By these conditions, there are several adjustements to the positionText values, mostly detected when comparing positionText with status.status:
Larini was excluded from qualifying in San Marino 1988 (resultId 8551), so should be ‘E’ (instead of ‘D’).
Numerous entries are marked as retirements (‘R’) but where in fact DSQ (and have a correct ‘Disqualified’ status). These are: Bonetto in Germany 1952 (resultId 19792), Magill in 1958 Indy 500 (18559), Winkelhock in Netherlands 1983 (10982), De Cesaris in Spain 1993 (5704), Bellof in Dallas and South Africa 1984 (10463, 10267) and Brundle in South Africa 1984 (10265). The last three were retirements, later revised to disqualifications after Tyrrell were stripped of all their 1984 results.
Four retirements are marked as non-classifiers: Tuero in Luxembourg 1998 (15822), Buettler in Italy 1971 (3977), both with engine failures, and Ickx in USA 1971 (15878), Pescarolo in France 1971 (15732), alternator and gearbox failures, respectively.
Two qualifying accidents for qualified drivers (so, according to the code, should be ‘W’ – non-starters) are marked as non-classified: Donohue and Henton in Austria 1975 (14469, 14470).
Two qualifying accidents for non-qualified drivers (either due to 107% rule or due to grid too small, so these should be ‘F’ – non-qualifiers) are marked as non-starters/withdrawals: Fisichella in France 2002 (2435) and Montermini in Spain 1994 (4512).
Several non-classified finishes are marked as retirements: Kelly in GB 1951 (19955), Hawthorn in Italy 1952 (19834), Van der Lof in Netherlands 1952 (19809), Beltiose in Belgium 1973 (15059), Jarier in spain 1974 (14616), Brundle in Australia 1985 (10213), Gugelmin in France 1989 (8119), Alliot in Mexico 1989 (8008) and Dalmas in Italy 1990 (7408).
Also, the two BAR entries in the 2005 Australian GP (1152, 1149) are marked as retirements, while in fact both cars were pulled into the pits on the last lap, so both Button and Sato were classified in the race (11th for entry 1149 and 14th for 1152).
I know that’s a lot of changes, so here’s the SQL dump for these entries, so you can verify and apply them more easily: https://gist.github.com/emkael/0c56b135aeb1a86086f0
There’s probably more work to do on the status.status side of similar issues, but I don’t have any idea how to tackle these at the moment (and status.status values are less likely to be aggregated to produce some meaningful stats then positionText values).
First of all great work and thanks for making this data available to the public!
I’ve found some inconsistencies in the 1950 – 2015 Formula One Database Image. Table `results` has some typos and invalid time and milliseconds values:
wrong time data for resultId 15520 time “29:17.3″ => “1:29:16.660″ and milliseconds = 5356660
wrong milliseconds for resultId 20291 millisecond 11197008 => 11197800
typo for resultId 4721 time “+1:38:34.154″ => “1:38:34.154″
typo for resultId 5387 time “1.48:00.185″ => “1:48:00.185″ and millisecond 6480185
typo for resultId 13339 time “1:42:.52.22″ => “1:42:52.220″ and milliseconds = 6172220
typo for resultId 20539 time “1:24.38.200″ => “1:24:38.200″ and milliseconds = 5078200
typo for resultId 20563 time “1:27.38.684″ => “1:27:38.864″ and milliseconds = 5258864
typo for resultId 20611 time “1:29.04.268″ => “1:29:04.268″ and milliseconds = 5344268
typo for resultId 21888 time “1:41.14.711″ => “1:41:14.711″ and milliseconds = 6074711
Queries to fix:
UPDATE results SET time = ’1:29:16.660′, milliseconds = 5356660 WHERE resultId = 15520;
UPDATE results SET milliseconds = 11197800 WHERE resultId = 20291;
UPDATE results SET time = ’1:38:34.154′ WHERE resultId = 4721;
UPDATE results SET time = ’1:48:00.185′, milliseconds = 6480185 WHERE resultId = 5387;
UPDATE results SET time = ’1:42:52.220′, milliseconds = 6172220 WHERE resultId = 13339;
UPDATE results SET time = ’1:24:38.200′, milliseconds = 5078200 WHERE resultId = 20539;
UPDATE results SET time = ’1:27:38.864′, milliseconds = 5258864 WHERE resultId = 20563;
UPDATE results SET time = ’1:29:04.268′, milliseconds = 5344268 WHERE resultId = 20611;
UPDATE results SET time = ’1:41:14.711′, milliseconds = 6074711 WHERE resultId = 21888;
Query to show all invalid time and/or milliseconds values (95 in dump 27/09/2015):
SELECT results.resultId, results.raceid, results.time, results.milliseconds, results.laps,
(SELECT SEC_TO_TIME(SUM(milliseconds)/1000) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid AND lap<=results.laps) AS lapTimes_time,
(SELECT SUM(milliseconds) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid AND lap<=results.laps) AS lapTimes_milliseconds,
(SELECT COUNT(*) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid) AS lapTimes_laps
WHERE milliseconds IS NOT NULL AND results.milliseconds != (SELECT SUM(milliseconds) FROM lapTimes WHERE raceId=results.raceId AND driverId=results.driverid);
Query to show all finished drivers with a time in milliseconds smaller than the race winner:
–This is a slow query, add an index to speedup things ALTER TABLE results ADD KEY raceId(raceId);–
SELECT resultId, time, milliseconds
FROM results r
WHERE statusId = 1 AND milliseconds IS NOT NULL AND milliseconds “1:36.827″
wrong q1 data for qualifyId 500 q1 = “1:17.806*” => “1:17.806″
wrong q1 data for qualifyId 1633 q1 = “Â” => NULL
Queries to fix:
UPDATE qualifying SET q1 = NULL WHERE q1 = ”;
UPDATE qualifying SET q2 = NULL WHERE q2 = ”;
UPDATE qualifying SET q3 = NULL WHERE q3 = ”;
UPDATE qualifying SET q1 = ’1:36.827′ WHERE qualifyId = 409;
UPDATE qualifying SET q1 = ’1:17.806′ WHERE qualifyId = 500;
UPDATE qualifying SET q1 = NULL WHERE qualifyId = 1633;
Query to show all typos in q1, q2 and q3:
(q1 is not null AND (ROUND(LENGTH(q1)-LENGTH(REPLACE(q1,”:”,”"))/1)!=1 OR ROUND(LENGTH(q1)-LENGTH(REPLACE(q1,”.”,”"))/1)!=1)) OR
(q2 is not null AND (ROUND(LENGTH(q2)-LENGTH(REPLACE(q2,”:”,”"))/1)!=1 OR ROUND(LENGTH(q2)-LENGTH(REPLACE(q2,”.”,”"))/1)!=1)) OR
(q3 is not null AND (ROUND(LENGTH(q3)-LENGTH(REPLACE(q3,”:”,”"))/1)!=1 OR ROUND(LENGTH(q3)-LENGTH(REPLACE(q3,”.”,”"))/1)!=1));
Queries to fix:
UPDATE qualifying SET q1 = CONCAT(SUBSTRING(q1,1,4),’.',SUBSTRING(q1,6)) WHERE ROUND(LENGTH(q1) – LENGTH(REPLACE(q1,”:”,”"))/1) = 2;
UPDATE qualifying SET q1 = CONCAT(SUBSTRING(q1,1,1),’:',SUBSTRING(q1,3)) WHERE ROUND(LENGTH(q1) – LENGTH(REPLACE(q1,”.”,”"))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,4),’.',SUBSTRING(q2,6)) WHERE ROUND(LENGTH(q2) – LENGTH(REPLACE(q2,”:”,”"))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,1),’:',SUBSTRING(q2,3)) WHERE ROUND(LENGTH(q2) – LENGTH(REPLACE(q2,”.”,”"))/1) = 2;
UPDATE qualifying SET q3 = CONCAT(SUBSTRING(q3,1,4),’.',SUBSTRING(q3,6)) WHERE ROUND(LENGTH(q3) – LENGTH(REPLACE(q3,”:”,”"))/1) = 2;
UPDATE qualifying SET q3 = CONCAT(SUBSTRING(q3,1,1),’:',SUBSTRING(q3,3)) WHERE ROUND(LENGTH(q3) – LENGTH(REPLACE(q3,”.”,”"))/1) = 2;
UPDATE qualifying SET q2 = CONCAT(SUBSTRING(q2,1,1),’:',SUBSTRING(q2,3,2),’.',SUBSTRING(q2,6)) WHERE SUBSTRING(q2,2,1) = ‘.’;
These 95 values from your check query are caused by three different reasons:
1. Most of them are caused by “incorrect” race winner overall time.
Data in results table comes from the same source as Wikipedia (or Wikipedia itself). Meanwhile, lap time data comes from the same source as FORIX (or FORIX itself).
Overall times for race winners differ between the sources – if you set the times in results table to FORIX times (and recalculate milliseconds for entries that finished on the lead lap as these values are most likely derived from offsets and milliseconds value for the winner), aggregated lap times check out.
2. There are some typos in results table for offsets (values in Ergast differ from both Wikipedia values and FORIX values). These are trivial to fix.
3. Post-race time penalties, which contribute to results time, but obviously don’t appear in lap times. These are fine.
But, most important of all, the way you propose to fix some of the times in results table (the times that were not properly formatted) does not maintain milliseconds values for races in question – as these values are derived from race winner milliseconds values (which you correct) and from time offsets for each entry.
I’m attaching queries which fix these issues in your the races you’ve spotted and fix overall times for entries mentioned above in points 1-2: https://gist.github.com/emkael/72ef27cd5729494ab3bf
These must be applied on the original data, due to me being lazy and using relative corrections for milliseconds values.
PS I wonder if we’re giving the maintainer a headache now.
Many thanks for the feedback. Struggling with work-life balance at the moment but I’ll try to make these updates when I get some peace and quiet.
Mail (will not be published) (required)
Notify me of follow-up comments by email.
Notify me of new posts by email.
and Comments (RSS)