Hi, I thought a lot about this question in the past, and my answer is: yes, you can apply statistical formulas. But you should know well each field of your record: what kind of information could they contain, whether you could set rules about that which you can apply for the individual records. Some factors which are important: - the "completeness" of the records: the ratio of the fields filled and unfilled - the value of an individual field matches the rules or not (say you expect a number in the range of 1 to 5, but you get 6) - the probability that a given field value could be unique - the probability that a record is not duplication of another record Some concrete example from my Europeana past: - there are mandatory fields, and if they are empty, the quality goes down - there are fields which should match a known standard, for example ISO language codes - you can apply rules to decide whether the value fits or not - the "data provider" field is a free text - no formal rule - but no individual record could contain unique value, and when you import several thousands of new record, they should not contain more than a couple new values - there are fields which should contain URLs or emails or dates, we can check whether they fit for formal rules, and their content are in a reasonable range (we should not have record created in the future for example) - you can measure whether the optional fields are fulfilled, and in which ratio At the end you will have a couple of measurements, and you can apply weighting to calculate a final classification number. You can do a lot to set up rules with faceted search, and of course you can use statistical tools, such as R, Julia which helps to get a picture of distribution of the values. Hope it helps. Regards, Péter -- Péter Király software developer Göttingen Society for Scientific Data Processing - http://gwdg.de eXtensible Catalog - http://eXtensibleCatalog.org