Reliability and Validity

If operational definitions are not always good, how does one distinguish a good one from a bad one? This brings up two basic scientific concepts: reliability and validity. A good operational definition should be reliable and valid. Here are capsule definitions:

What does it mean to say a definition is reliable? Valid?

A test is reliable if it produces the same results, again and again, when measuring the same thing.

A test is valid if it measures what you think it measures, as determined by some independent way of measuring the same thing.

One way to measure reliability is to take measurements on two different occasions, making sure you are measuring the same exact thing both times. If you get different results when measuring the same thing on two different occasions, the instrument is unreliable.

How does the "split-halves" technique work?

Another way to measure reliability, if you have a paper and pencil test with many items, is to use the split-halves method. You treat the odd-numbered items as one test and the even-numbered items as a different test. If the results from the two tests halves agree, then the test is probably reliable.

What makes the "repeated measures" technique tricky in psychology?

Not every measuring instrument can be divided up this way. Reliability is usually tested using repeated measures , which means measuring the same thing repeatedly. This allows a researcher to determine whether a test is reliable, as long as one is truly measuring the "same thing" repeatedly. With human psychological abilities, that can be tricky. Researchers cannot test the same individuals repeatedly if there are practice effects (changes in test scores due to taking the test more than once). If practice effects are likely, then the test must be used on different subjects each time, which makes verifying its reliability more difficult. Therefore, when practice effects are likely, a variation of the split-halves technique is often more convenient than a repeated measures technique. Better yet, the two can be combined, by doing a split-halves test several times with different groups of subjects.

