This page is to mirror the BNC most frequent words information to save bandwidth at its original location.
It is difficult to produce a useful report on the commonness of English words, because often there are two different words that have identical appearances (e.g. 'lead' the verb and 'lead' the noun; sometimes 'to' is a preposition and sometimes it's an infinitive verb marker). One of the more useful surveys of a large body of English material is a survey of the British National Corpus, prepared and made available by the Information Technology Research Institute at the University of Brighton. The material that was surveyed includes millions of words of transcribed conversation, printed text, and lectures and oratory.
If we look at the 1996 version of this survey and add together items that are closely related -- for example, if we consider 'this' and 'these' as a single item -- we find that the following items are the most frequent, starting with 'the' which makes up 6.18 percent of the corpus:
6.18% the 4.23% is, was, be, are, 's (= is), were, been, being, 're, 'm, am 2.94% of 2.68% and 2.46% a, an 1.80% in, inside (preposition) 1.62% to (infinitive verb marker) 1.37% have, has, have, 've, 's (= has), had, having, 'd (= had) 1.27% he, him, his 1.25% it, its 1.17% I, me, my 0.91% to (preposition) 0.86% they, them, their 0.86% not, n't, no (interjection) 0.83% for 0.83% you, your 0.70% she, her 0.65% with 0.64% on 0.62% that (conjunction) 0.58% this, these 0.57% that (demonstrative), those 0.55% do, did, does, done, doing 0.51% we, us, our 0.50% by 0.47% at 0.45% but (conjunction) 0.44% 's (possessive) 0.41% from 0.40% as (many parts of speech) 0.37% which 0.37% or 0.31% will, 'll 0.28% said, say, says, saying 0.25% would 0.25% what 0.23% there (existential, in "there is ..." phrases) 0.23% if 0.23% can 0.22% all 0.22% who, whose 0.21% so (adverb / conjunction) 0.20% go, went, gone, goes 0.20% more 0.19% other, another 0.19% one (numeral) 0.18% see, saw, seen, seeing 0.18% know, knew, known, knows, knowing |
The items listed above make up about 43% of the corpus. That's right, more than 4/10 of the words in this English corpus are pronouns, conjunctions, other function-words and a few common verbs.
Here is a portion of the survey giving the top 3000 words. Each line consists of three items: the word, its part of speech, number of times the word occurred.
The original files were created from the BNC by Adam Kilgarriff; you might want to look at their README.