Basic principles of classical test theory. Basics of Test Theory

A measurement or test performed to determine the condition or ability of an athlete is called test. Not all measurements can be used as tests, but only those that meet special requirements: standardization, the presence of a rating system, reliability, information content, objectivity. Tests that meet the requirements of reliability, information content and objectivity are called solid.

The testing process is called testing, and the resulting numerical values ​​are test result.

Tests based on motor tasks are called motor or motor. Depending on the task facing the subject, three groups of motor tests are distinguished.

Types of motor tests

Test name

Task for the athlete

Test result

Control exercise

Motor achievements

1500m run time

Standard functional tests

The same for everyone, dosed: 1) according to the amount of work performed; 2) by the magnitude of physiological changes

Physiological or biochemical indicators during standard work Motor indicators during a standard amount of physiological changes

Heart rate registration during standard work 1000 kgm/min Running speed at heart rate 160 beats/min

Maximum functional tests

Show maximum result

Physiological or biochemical indicators

Determination of maximum oxygen debt or maximum oxygen consumption

Sometimes not one, but several tests are used that have a common final goal. This group of tests is called battery of tests.

It is known that even with the most stringent standardization and precise equipment, test results always vary somewhat. Therefore, one of the important conditions for selecting good tests is their reliability.

Reliability of the test is the degree of agreement between results when the same people are repeatedly tested under the same conditions. There are four main reasons causing intra-individual or intra-group variation in test results:

    change in the condition of the subjects (fatigue, change in motivation, etc.); uncontrolled changes in external conditions and equipment;

    change in the state of the person conducting or evaluating the test (well-being, change of experimenter, etc.);

    imperfection of the test (for example, obviously imperfect and unreliable tests - free throws into a basketball basket before the first miss, etc.).

The reliability criterion for the test can be reliability factor, calculated as the ratio of the true dispersion to the dispersion recorded in the experiment: r = true s 2 / recorded s 2, where the true value is understood as the dispersion obtained from an infinitely large number of observations under the same conditions; the variance recorded is derived from experimental studies. In other words, the reliability coefficient is simply the proportion of true variation in the variation that is recorded in experiment.

In addition to this coefficient, they also use reliability index, which is considered as a theoretical coefficient of correlation or relationship between the recorded and true values ​​of the same test. This method is most common as a criterion for assessing the quality (reliability) of a test.

One of the characteristics of test reliability is its equivalence, which reflects the degree of agreement between the results of testing the same quality (for example, physical) by different tests. The attitude towards test equivalence depends on the specific task. On the one hand, if two or more tests are equivalent, their combined use increases the reliability of the estimates; on the other hand, it seems possible to use only one equivalent test, which will simplify testing.

If all tests included in a battery of tests are highly equivalent, they are called homogeneous(for example, to assess the quality of jumping ability, it must be assumed that long jumps, high jumps, and triple jumps will be homogeneous). On the contrary, if there are no equivalent tests in the complex (such as for assessing general physical fitness), then all the tests included in it measure different properties, i.e. essentially the complex is heterogeneous.

The reliability of tests can be increased to a certain extent by:

    more stringent standardization of testing;

    increasing the number of attempts;

    increasing the number of evaluators and increasing the consistency of their opinions;

    increasing the number of equivalent tests;

    better motivation of subjects.

Test objectivity there is a special case of reliability, i.e. independence of test results from the person conducting the test.

Information content of the test– this is the degree of accuracy with which it measures the property (the quality of the athlete) that it is used to evaluate. In different cases, the same tests may have different information content. The question of the informativeness of the test breaks down into two specific questions:

What does this test change? How exactly does it measure?

For example, is it possible to use an indicator such as MPC to judge the preparedness of long distance runners, and if so, with what degree of accuracy? Can this test be used in the control process?

If the test is used to determine the condition of the athlete at the time of examination, then they speak of diagnostic information content of the test. If, based on the test results, they want to draw a conclusion about the athlete’s possible future performance, they talk about prognostic information content. A test can be diagnostically informative, but not prognostically, and vice versa.

The degree of information content can be characterized quantitatively - based on experimental data (the so-called empirical information content) and qualitatively - based on a meaningful analysis of the situation ( logical information content). Although in practical work, logical or meaningful analysis should always precede mathematical analysis. An indicator of the informativeness of a test is the correlation coefficient calculated for the dependence of the criterion on the result in the test, and vice versa (the criterion is taken to be an indicator that obviously reflects the property that is going to be measured using the test).

In cases where the information content of any test is insufficient, a battery of tests is used. However, the latter, even with high separate information content criteria (judging by the correlation coefficients), does not allow us to obtain a single number. Here a more complex method of mathematical statistics can come to the rescue - factor analysis. Which allows you to determine how many and which tests work together on a separate factor and what is the degree of their contribution to each factor. It is then easy to select tests (or combinations thereof) that most accurately assess individual factors.

1 What is a test called?

2 What is testing?

Quantifying a quality or condition of an athlete A measurement or test conducted to determine the condition or ability of an athlete Testing process that quantitatively evaluates a quality or condition of an athlete No definition needed

3 What is the test result called?

Quantifying a quality or condition of an athlete A measurement or test conducted to determine the condition or ability of an athlete Testing process that quantitatively evaluates a quality or condition of an athlete No definition needed

4 What type of tests is this? 100m run?

5 What type of tests is this? hand dynamometry?

Control exercise Functional testMaximum functional test

6 What type of tests does the sample belong to? IPC?

Control exercise Functional testMaximum functional test

7 What type of tests is this? three-minute run with a metronome?

Control exercise Functional testMaximum functional test

8 What type of tests is this? maximum number of pull-ups on the bar?

Control exercise Functional testMaximum functional test

9 In what cases is a test considered informative?

10 When is a test considered reliable?

The ability of the test to be reproducible when tested again The ability of the test to measure the athlete quality of interest The independence of the test results from the person administering the test

11 In what case is the test considered objective?

The ability of the test to be reproducible when tested again The ability of the test to measure the athlete quality of interest The independence of the test results from the person administering the test

12 What criterion is necessary when evaluating a test for information content?

13 What criterion is needed when evaluating a reliability test?

Student's T test Fisher's F test Correlation coefficient Coefficient of determination Dispersion

14 What criterion is needed when evaluating an objectivity test?

Student's T test Fisher's F test Correlation coefficient Coefficient of determination Dispersion

15 What is the information content of a test called if it is used to assess the degree of fitness of an athlete?

16 What information content of control exercises is the coach guided by when selecting children for his sports section?

Logical Predictive Empirical Diagnostic

17 Is correlation analysis necessary to assess the information content of tests?

18 Is factor analysis necessary to assess the information content of tests?

19 Is it possible to assess the reliability of a test using correlation analysis?

20 Is it possible to assess the objectivity of a test using correlation analysis?

21 Will tests designed to assess general physical fitness be equivalent?

22 When measuring the same quality with different tests, tests are used...

Designed to measure the same quality Having a high correlation between each other Having a low correlation between each other

FUNDAMENTALS OF VALUATION THEORY

To evaluate sports results, special points tables are often used. The purpose of such tables is to convert the shown sports result (expressed in objective measures) into conditional points. The law of converting sports results into points is called rating scale. The scale can be specified as a mathematical expression, table or graph. There are 4 main types of scales used in sports and physical education.

Proportional scales

Regressing scales

Progressive scales.

Proportional scales suggest the awarding of the same number of points for an equal increase in results (for example, for every 0.1 s of improvement in the result in the 100 m run, 20 points are awarded). Such scales are used in modern pentathlon, speed skating, ski racing, Nordic combined, biathlon and other sports.

Regressing scales suggest that for the same increase in results as sporting achievements increase, an increasingly smaller number of points are awarded (for example, for an improvement in the result in the 100 m run from 15.0 to 14.9 s, 20 points are added, and for 0.1 s in the range 10.0-9.9 s – only 15 points).

Progressive scales. Here, the higher the athletic result, the greater the increase in points for its improvement (for example, for an improvement in running time from 15.0 to 14.9 s, 10 points are added, and from 10.0 to 9.9 s - 100 points). Progressive scales are used in swimming, certain types of athletics, and weightlifting.

Sigmoid scales are rarely used in sports, but are widely used in assessing physical fitness (for example, this is what the scale of physical fitness standards for the US population looks like). In these scales, improvements in results in the zone of very low and very high achievements are sparingly rewarded; The increase in results in the middle achievement zone brings the most points.

The main objectives of assessment are:

    compare different achievements in the same task;

    compare achievements in different tasks;

    define standards.

The norm in sports metrology, the limit value of the result is called, which serves as the basis for assigning an athlete to one of the classification groups. There are three types of norms: comparative, individual, due.

Comparative standards are based on a comparison of people belonging to the same population. For example, dividing people into subgroups according to the degree of resistance (high, medium, low) or reactivity (hyperreactive, normoreactive, hyporeactive) to hypoxia.

Different gradations of assessments and norms

Percentage of subjects

Norms in scales

Verbal

in points

Percentile

Very low

Below M - 2

From M - 2 to M - 1

Below average

From M-1 to M–0.5

From M–0.5 to M+0.5

Above average

From M+0.5 to M+1

From M+1 to M+2

Very high

Above M+2

These norms characterize only the comparative successes of subjects in a given population, but do not say anything about the population as a whole (or on average). Therefore, comparative norms must be compared with data obtained from other populations and used in combination with individual and appropriate norms.

Individual norms are based on comparing the performance of the same athlete in different conditions. For example, in many sports there is no relationship between one’s own body weight and athletic performance. Each athlete has an individually optimal weight corresponding to their state of athletic fitness. This norm can be controlled at different stages of sports training.

Due standards are based on an analysis of what a person must be able to do in order to successfully cope with the tasks that life puts before him. An example of this can be the standards of individual physical training complexes, the proper values ​​of vital capacity, basal metabolic rate, body weight and height, etc.

1 Is it possible to directly measure the quality of endurance?

2 Is it possible to directly measure the quality of speed?

3 Is it possible to directly measure the quality of dexterity?

4 Is it possible to directly measure the quality of flexibility?

5 Is it possible to directly measure the strength of individual muscles?

6 Can the assessment be expressed in a qualitative characteristic (good, satisfactory, bad, pass, etc.)?

7 Is there a difference between a measurement scale and a rating scale?

8 What is a rating scale?

System for measuring sports results The law of converting sports results into points System for evaluating norms

9 The scale assumes the awarding of the same number of points for an equal increase in results. This …

10 For the same increase in results, fewer and fewer points are awarded as sporting achievements increase. This …

Progressive scale Regressive scale Proportional scale Sigmoid scale

11 The higher the sports result, the greater the increase in points, the improvement is assessed. This …

Progressive scale Regressive scale Proportional scale Sigmoid scale

12 Improvement in performance in the very low and very high achievement zones is rewarded sparingly; The increase in results in the middle achievement zone brings the most points. This …

Progressive scale Regressive scale Proportional scale Sigmoid scale

13 Norms based on the comparison of people belonging to the same population are called...

14 Norms based on comparing the performance of the same athlete in different conditions are called ...

Individual standards Due standards Comparative standards

15 Norms based on an analysis of what a person should be able to do in order to cope with the tasks assigned to him are called ...

Individual standards Due standards Comparative standards

BASIC CONCEPTS OF QUALIMETRY

Qualimetry(Latin qualitas - quality, metron - measure) studies and develops quantitative methods for assessing qualitative characteristics.

Qualimetry is based on several starting points:

Any quality can be measured;

Quality depends on a number of properties that form the “quality tree” (for example, the quality tree of exercise performance in figure skating consists of three levels - highest, middle, lowest);

Each property is determined by two numbers: relative indicator and weight; the sum of the property weights at each level is equal to one (or 100%).

Methodological techniques of qualimetry are divided into two groups:

Heuristic (intuitive), based on expert assessments and questionnaires;

Instrumental.

Expert is an assessment obtained by seeking the opinions of experts. Typical examples of expertise: judging in gymnastics and figure skating, competition for the best scientific work, etc.

Carrying out an examination includes the following main stages: forming its purpose, selecting experts, choosing a methodology, conducting a survey and processing the information received, including assessing the consistency of individual expert assessments. During the examination, the degree of consistency of expert opinions, assessed by the value, is of great importance rank correlation coefficient(in case of several experts). It should be noted that rank correlation underlies the solution of many qualimetry problems, since it allows mathematical calculations with qualitative characteristics.

In practice, an indicator of an expert's qualifications is often the deviation of his ratings from the average ratings of a group of experts.

Questionnaire is a method of collecting opinions by filling out questionnaires. Questionnaires, along with interviews and conversations, are survey methods. Unlike interviews and conversations, questioning involves written responses from the person filling out the questionnaire—the respondent—to a system of standardized questions. It allows you to study motives of behavior, intentions, opinions, etc.

Using questionnaires, you can solve many practical problems in sports: assessing the psychological status of an athlete; his attitude to the nature and direction of training sessions; interpersonal relationships in the team; own assessment of technical and tactical readiness; dietary assessment and many others.

1 What does qualimetry study?

Studying the quality of tests Studying the qualitative properties of a trait Studying and developing quantitative methods for assessing quality

2 Mathematical methods used in qualimetry?

Pair correlation Rank correlation Analysis of variance

3 What methods are used to assess the level of performance?

4 What methods are used to evaluate the diversity of technical elements?

Questionnaire method Expert assessment method Method not specified

5 What methods are used to assess the complexity of technical elements?

Questionnaire method Expert assessment method Method not specified

6 What methods are used to assess the psychological state of an athlete?

Questionnaire method Expert assessment method Method not specified

What is testing

In accordance with IEEE Std 829-1983 Testing is a process of software analysis aimed at identifying differences between its actual and required properties (defect) and at assessing the properties of the software.

According to GOST R ISO IEC 12207-99, the software life cycle defines, among others, auxiliary processes of verification, certification, joint analysis and audit. The verification process is the process of determining that software products function in full accordance with the requirements or conditions implemented in previous work. This process may include analysis, verification and testing (testing). The certification process is the process of determining the completeness of compliance of the established requirements, the created system or software product with their functional purpose. The joint review process is the process of assessing the states and, if necessary, the results of the work (products) of the project. The audit process is the process of determining compliance with requirements, plans and contract terms. Together, these processes make up what is usually called testing.

Testing is based on test procedures with specific inputs, initial conditions, and expected results, designed for a specific purpose, such as verifying a particular program or verifying conformance to a specific requirement. Test procedures can test various aspects of a program's functioning, from the correct operation of a particular function to the adequate fulfillment of business requirements.

When carrying out a project, it is necessary to consider in accordance with what standards and requirements the product will be tested. What tools (if any) will be used to find and document defects found. If you remember about testing from the very beginning of the project, testing the product under development will not bring unpleasant surprises. This means that the quality of the product will most likely be quite high.

Product life cycle and testing

Increasingly nowadays, iterative software development processes are used, in particular, technology RUP - Rational Unified Process(Fig. 1). With this approach, testing ceases to be an “off-the-cuff” process that occurs after programmers have written all the necessary code. Work on tests begins from the very initial stage of identifying requirements for a future product and is closely integrated with current tasks. And this places new demands on testers. Their role is not limited to simply identifying errors as fully and as early as possible. They must participate in the overall process of identifying and addressing the most significant project risks. To do this, for each iteration the testing goal and methods for achieving it are determined. And at the end of each iteration, it is determined to what extent this goal has been achieved, whether additional tests are needed, and whether the principles and tools for conducting tests need to be changed. In turn, each detected defect must go through its own life cycle.

Rice. 1. Product life cycle according to RUP

Testing is usually carried out in cycles, each of which has a specific list of tasks and goals. The testing cycle may coincide with an iteration or correspond to a specific part of it. Typically, a testing cycle is carried out for a specific system build.

The life cycle of a software product consists of a series of relatively short iterations (Figure 2). An iteration is a complete development cycle leading to the release of a final product or some shortened version of it, which expands from iteration to iteration to eventually become a complete system.

Each iteration usually includes tasks of work planning, analysis, design, implementation, testing and evaluation of achieved results. However, the relationship between these tasks can change significantly. In accordance with the relationship between various tasks in an iteration, they are grouped into phases. The first phase, Beginning, focuses on the analysis tasks. The second phase iterations, Development, focus on designing and testing key design solutions. In the third phase - Construction - the largest proportion of development and testing tasks. And in the last phase - Transfer - the tasks of testing and transferring the system to the Customer are solved to the greatest extent.

Rice. 2. Iterations of the software product life cycle

Each phase has its own specific goals in the product life cycle and is considered complete when those goals are achieved. All iterations, except perhaps the Beginning phase iterations, end with the creation of a functioning version of the system being developed.

Test categories

Tests vary significantly in the problems they solve and the technology they use.

Test categories Category description Types of testing
Current testing A set of tests performed to determine the functionality of new system features added.
  • Stress Testing;
  • business cycle testing;
  • stress testing.
Regression testing The purpose of regression testing is to verify that additions to the system do not reduce its capabilities, i.e. testing is carried out according to requirements that have already been met before adding new features.
  • Stress Testing;
  • business cycle testing;
  • stress testing.

Testing subcategories

Testing subcategories Description of the type of testing Subtypes of testing
Stress Testing Used to test all application functions without exception. In this case, the sequence of testing the functions does not matter.
  • functional testing;
  • interface testing;
  • database testing
Business cycle testing Used to test application functions in the sequence they are called by the user. For example, simulating all the actions of an accountant for the 1st quarter.
  • unit testing (unit testing);
  • functional testing;
  • interface testing;
  • database testing.
Stress testing

Used for testing

Application performance. The purpose of this testing is to determine the scope of stable operation of the application. During this testing, all available functions are called.

  • unit testing (unit testing);
  • functional testing;
  • interface testing;
  • database testing.

Types of testing

Unit testing (unit testing) - this type involves testing individual application modules. To obtain maximum results, testing is carried out simultaneously with the development of modules.

Functional testing - The purpose of this testing is to ensure that the test item is functioning properly. The correctness of navigation through the object is tested, as well as the input, processing and output of data.

Database testing - checking the functionality of the database during normal application operation, during overloads and in multi-user mode.

Unit testing

For OOP, the usual way to organize unit testing is to test the methods of each class, then the class of each package, and so on. We are gradually moving on to testing the entire project, and the previous tests are of the regression type.

The output documentation of these tests includes test procedures, input data, code executing the test, and output data. The following is the type of output documentation.

Functional testing

Functional testing of the test item is planned and carried out based on the testing requirements specified during the requirements definition stage. The requirements include business rules, use-case diagrams, business functions, and, if available, activity diagrams. The purpose of functional tests is to verify that the developed graphical components meet the specified requirements.

This type of testing cannot be fully automated. Therefore, it is divided into:

  • Automated testing (will be used in the case where it is possible to check the output information).

Purpose: to test data input, processing and output;

  • Manual testing (in other cases).

Purpose: Tests whether user requirements are met correctly.

It is necessary to execute (play) each of the use-cases, using both correct values ​​and obviously erroneous ones, to confirm correct functioning, according to the following criteria:

  • the product responds adequately to all input data (expected results are output in response to correctly entered data);
  • the product responds adequately to incorrectly entered data (corresponding error messages appear).

Database testing

The purpose of this testing is to ensure the reliability of database access methods, their correct execution, without violating data integrity.

It is necessary to use as many database calls as possible sequentially. An approach is used in which the test is designed in such a way as to “load” the database with a sequence of both correct values ​​and obviously erroneous ones. The reaction of the database to data input is determined, and the time intervals for their processing are estimated.

Mathematical foundations of the theory of test construction

Types of test items

There are two significantly different forms of tasks: closed (when the test taker is offered answer options to choose from) and open (the test taker must get the answer on his own). Open tasks, in turn, can be divided into two groups:

    tasks with a short, regulated answer, the formulation of which should generate only one answer, planned by the developer;

    tasks with a freely constructed answer, without any restrictions on the content and form of presentation of answers.

There are five main types of tasks. All other types are variations or combinations of these five types.

    Choice task. The text of the assignment consists of a question. There are several answer options to choose from, one or more of which are correct.

    Addition task. In the wording of the task, a certain fragment of text is missing, which is indicated by an underscore (or several underscores of the same length, if there are several missing words). The gap can be in any part of the text, but it is recommended to do it at the end. In the answer, the test taker must write the missing words.

    The task is to establish the correct sequence.

    Compliance task. The wording of the task contains two lists. On the left, as a rule, are the elements of the set containing the statement of the problem, on the right are the elements to be selected. The elements of the left set are numbered, the elements of the right set are designated by letters. It is desirable that the second set contains a larger number of elements compared to the first set. In this case, each element of the first set corresponds to one or more elements of the second set.

    A task with a detailed answer.

Stages of test development

    Formulation of the purpose and object of research.

Who, what and why should be tested

    Development of testing content.

Studying the requirements of the educational standard, the content of textbooks.

Writing a test specification:

    Selecting sections (topics) and their percentage content in the test

    Selecting Job Types

    Determination of levels of mastery of knowledge and skills:

    Level 1

    Knowledge of definitions of basic concepts of the discipline, as well as basic statements about the methods of the discipline

    Level 2

    Knowledge of basic formulas and algorithms; ability to apply them when solving standard problems

    Level 3

    Application of acquired knowledge to solve atypical problems

  1. Determining the approximate number of tasks in the test and distributing this number by type of task.

    Development of tasks.

Since the first version of the test should reveal the shortcomings of the tasks (including the proposed distractors), the largest possible number of distractors were offered in each task, so that when discarded, a sufficient number of them remained.

    Examination of raw dough.

The purpose of the examination is to identify and correct incorrect and unclear formulations. As a result, some tasks may be removed from the test (hence, tasks are recommended).

    Approbation.

    Calculation of characteristics of tasks and tests.

Based on the testing results, the following statistical characteristics of the tasks and the test are calculated.

Range of individual scores measures the distance within which all indicator values ​​in the distribution (individual scores) change.

WITH average sample(average) for the aggregate of individual scores X 1 , X 2 , …, X K groups K subjects is calculated by the formula

.

Count variances is based on calculating the deviations of each indicator value from the arithmetic mean in the distribution:

.

Low dispersion indicates low quality of the test, since weak variation in results indicates weak differentiation of test takers by level of preparation. Excessively high dispersion is typical for the case when all students are different in the number of tasks completed, which also requires reworking the test.

The calculation of test characteristics is completed by assessing the reliability of the test. To calculate the reliability coefficient, you can use the formula Kuder-Richardson coefficient(only in the case when all task weights are equal to one):

.

To give a qualitative assessment of the reliability of the test based on the coefficient value, use the following table:

Reliability coefficient value

Reliability assessment

unsatisfactory

satisfactory

excellent

Estimation of the difficulty of the jth task calculated by the formula

.

Note that the easier the task, the greater will be the proportion of correct answers to it ( p j), therefore it would be more natural to interpret this proportion as the ease of the task. A well-balanced test in terms of difficulty should have several difficult tasks, several easy ones, but the bulk of tasks should have a difficulty of 0.3 to 0.7; in this case, it is desirable that the tasks be arranged in order of increasing difficulty.

Validity of test items determined by the degree of compliance of the task with the goal of differentiation of the subjects. To do this, the correlation coefficients of the assessment for the task with the score for the entire test are determined. This is done using the correlation coefficient according to the formula

,

Where X i test score i-th subject, Y i- point i-th subject for the task. Note that in the case of dichotomous assessment of a task, the calculation of the coefficient is somewhat simplified. If r< 0, то задание следует удалить из теста, т. к. в нем побеждают слабые ученики, а сильные выбирают неверный ответ либо пропускают задание при выполнении теста. Положительные значения, но близкие к нулю (незначимые), указывают на низкую прогностическую способность задания теста; такие задания требуют доработки содержания.

The ability to differentiate subjects as the best and the worst shows coefficient of differentiation(or discriminative index) tasks. The simplest way to calculate such an index is called the contrast group method and is as follows. From the entire group of subjects, some of the best subjects based on the test results are identified (we will call them the strong subgroup) and the same number of the worst (weak subgroup). The proportion of correct answers in the subgroup is then calculated for each of these subgroups. Let us denote by p 1 j proportion of correct answers to j-th task in a strong subgroup, and after p 0 j– the proportion of correct answers in the weak subgroup. Then the discriminativity index i-task is determined by the formula:

(r dis) j = p 1 jp 0 j .

For a task that all strong subjects coped with and none of the weak ones coped with, the discriminativity index r dis will be equal to 1; in this case, the task has the maximum differentiating effect. For a task that all weak subjects coped with and none of the strong subjects coped with, the discriminativity index will be equal to –1. In other cases, the index will take values ​​between –1 and 1. Items with zero and negative discriminative index values ​​do not differentiate students well, so they should be removed from the test. If the index is positive, but less than 0.2, then such a task requires careful analysis of the content.

According to these characteristics, some tasks can be removed from the test, while others must be corrected. After this, steps 5 and 6 must be repeated.

Formulas for calculating the probability of guessing

When designing a test, you need to determine how many answers should be offered for each question so that the probability of passing the test simply by guessing the correct answers is less than 0.05 (i.e., less than 5%). Testing will be considered successfully completed if the test taker correctly answers no less than Q% of questions. If the test includes N questions, then to calculate the probability of “successful guessing” the following formula is used:

,

Where m- the number of answers offered for each question.

In the case when the number of proposed answers to questions in different tasks is different, the formula has a more complex form:

,

Where - probability of guessing the answers to j questions, which is calculated as follows. Let all the questions in the test be divided into r groups so that questions with the same probability of guessing are combined into one group. Let's denote p i , 0< p i <1 - вероятность угадывания и k i - number of questions in i- that group (
), and

.

Then for j from
to N:

,

Where t r = j  (t 1 + t 2 +…+ t r-1) , and if t r > k r, then we will assume
= 0 .

Examples.

N=10, Q=2/3: m=2, P<0,2; m=3, P<0,02; m=4, P<0,004

Literature

    Chelyshkova M. B. Theory and practice of constructing pedagogical tests: Textbook. – M.: Logos, 2002. – 432 p.

    Malygin A. A., Svetsov V. I., Shchanitsina S. V. Practical recommendations for the preparation of control and measuring materials: Method. allowance / Ivan. state chem.-technol. univ. – Ivanovo, 2005. – 30 p.

    How to write a test // Sloyer K. Mathematical fantasies. - M.: Mir, 1993. - pp. 116-118.

Description of the presentation by individual slides:

1 slide

Slide description:

2 slide

Slide description:

Physical qualities are usually called congenital (genetically inherited) morphofunctional qualities, thanks to which physical (materially expressed) human activity is possible, which receives its full manifestation in purposeful motor activity. The main physical qualities include strength, speed, endurance, flexibility, and agility.

3 slide

Slide description:

Motor abilities are individual characteristics that determine the level of a person’s motor capabilities (V.I. Lyakh, 1996). The basis of a person’s motor abilities is physical qualities, and the form of manifestation is motor abilities and skills. Motor abilities include strength, speed, speed-strength, motor-coordination abilities, general and specific endurance

4 slide

Slide description:

Scheme of systematization of physical (motor) abilities Physical (motor) abilities Conditional (energy) Strength Combinations of conditioning abilities Endurance Speed ​​Flexibility Coordination (information) CS related to individual groups of motor actions, special CS Specific CS Combinations of coordination abilities Combinations of conditioning and coordination abilities

5 slide

Slide description:

YOU CAN GET ACCURATE INFORMATION ABOUT THE LEVEL OF DEVELOPMENT OF MOTOR ABILITIES /high, medium, low/ USING TESTS /or control exercises/.

6 slide

Slide description:

With the help of control tests (tests), it is possible to identify absolute (explicit) and relative (hidden, latent) indicators of these abilities. Absolute indicators characterize the level of development of certain motor abilities without taking into account their influence on each other. Relative indicators make it possible to judge the manifestation of motor abilities taking into account this influence.

7 slide

Slide description:

The above-mentioned physical abilities can be represented as existing potentially, that is, before the start of any motor activity or activities (they can be called potential abilities) and as actually manifesting themselves at the beginning (including when performing motor tests) and in the process of performing this activities (current physical abilities).

8 slide

Slide description:

With a certain degree of convention, we can talk about ELEMENTARY and physical abilities COMPLEX physical abilities

Slide 9

Slide description:

RESEARCH RESULTS ALLOW TO DISTINCTION THE FOLLOWING PHYSICAL ABILITIES SPECIAL SPECIFIC GENERAL KS

10 slide

Slide description:

Special physical abilities refer to homogeneous groups of integral motor actions or activities: running, acrobatic and gymnastic exercises on apparatus, throwing motor actions, sports games (basketball, volleyball).

11 slide

Slide description:

We can talk about specific manifestations of physical abilities as components that make up their internal structure.

12 slide

Slide description:

Thus, the main components of a person’s coordination abilities are: the ability to navigate, balance, respond, differentiate movement parameters; ability to rhythm, rearrangement of motor actions, vestibular stability, voluntary muscle relaxation. These abilities are specific.

Slide 13

Slide description:

The main components of the structure of speed abilities are considered to be the speed of response, the speed of a single movement, the frequency of movements and the speed manifested in integral motor actions.

Slide 14

Slide description:

Manifestations of strength abilities include: static (isometric) strength, dynamic (isotonic) strength - explosive, shock-absorbing force.

15 slide

Slide description:

The structure of endurance is very complex: aerobic, requiring oxygen sources of energy breakdown for its manifestation; anaerobic (glycolytic, creatine phosphate energy sources - without the participation of oxygen); endurance of various muscle groups in static poses - static endurance; endurance in dynamic exercises performed at a speed of 20-90% of the maximum.

16 slide

Slide description:

Less complex are the manifestations (forms) of flexibility, where active and passive flexibility are distinguished.

Slide 17

Slide description:

General physical abilities should be understood as the potential and realized capabilities of a person, which determine his readiness to successfully carry out motor actions of various origins and meanings. Special physical abilities are a person’s capabilities that determine his readiness to successfully carry out motor actions of similar origin and meaning. Therefore, tests provide information primarily about the degree of formation of special and specific physical (speed, coordination, strength, endurance, flexibility) abilities.

18 slide

Slide description:

Special physical abilities are a person’s capabilities that determine his readiness to successfully carry out motor actions of similar origin and meaning. Therefore, tests provide information primarily about the degree of formation of special and specific physical (speed, coordination, strength, endurance, flexibility) abilities.

Slide 19

Slide description:

The objectives of testing are to identify the levels of development of conditioning and coordination abilities, to assess the quality of technical and tactical readiness. Based on the test results, you can: compare the preparedness of both individual students and entire groups living in different regions and countries; conduct sports selection for practicing one or another sport, for participation in competitions; exercise largely objective control over the education (training) of schoolchildren and young athletes; identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes; finally, to substantiate the norms (age-specific, individual) for the physical fitness of children and adolescents.

20 slide

Slide description:

Along with the above-mentioned tasks, in the practice of different countries, the testing tasks boil down to the following: to teach schoolchildren themselves to determine the level of their physical fitness and plan the necessary sets of physical exercises for themselves; encourage students to further improve their physical condition (shape); to know not so much the initial level of development of motor ability, but its change over a certain time; encourage students who have achieved high results, but not so much for a high level, but for a planned increase in personal results.

21 slides

Slide description:

A test is a measurement or test taken to determine a person's ability or condition.

22 slide

Slide description:

Only those tests (samples) that meet special requirements can be used as tests: the purpose of using any test (or tests) must be determined; A standardized test measurement methodology and testing procedure should be developed; it is necessary to determine the reliability and information content of the tests; test results can be presented in the appropriate evaluation system

Slide 23

Slide description:

Test. Testing. Testing result The system of using tests in accordance with the task, the organization of conditions, the execution of tests by subjects, the evaluation and analysis of results is called testing. The numerical value obtained during measurements is the result of testing (test).

24 slide

Slide description:

The tests used in physical education are based on motor actions (physical exercises, motor tasks). Such tests are called movement or motor tests.

25 slide

Slide description:

There is a known classification of tests according to their structure and, according to their primary indications, a distinction is made between single and complex tests. A single test is used to measure and evaluate one trait (coordination or conditioning ability).

26 slide

Slide description:

Slide 27

Slide description:

Using a complex test, several signs or components of different or the same ability are assessed. for example, jumping up from a place (with a wave of the arms, without a wave of the arms, to a given height).

28 slide

Slide description:

Slide 29

Slide description:

TESTS may be conditioning tests to assess strength abilities to assess endurance; to assess speed abilities; to assess flexibility, coordination tests to assess coordination abilities related to individual independent groups of motor actions that measure special coordination abilities; to assess specific coordination abilities - the ability to balance, spatial orientation, response, differentiation of movement parameters, rhythm, rearrangement of motor actions, coordination (communication), vestibular stability, voluntary muscle relaxation).

30 slide

Slide description:

Each classification is a kind of guidelines for selecting (or creating) the type of tests that are more consistent with testing tasks.

31 slides

Slide description:

QUALITY CRITERIA FOR MOTOR TESTS The concept of “motor test” meets its purpose when the test satisfies the relevant basic criteria: reliability, stability, equivalence, objectivity, informativeness (validity), as well as additional criteria: standardization, comparability and economy. Tests that meet the requirements of reliability and information content are called good or authentic (reliable).

32 slide

Slide description:

Reliability of a test refers to the degree of accuracy with which it assesses a specific motor ability, regardless of the requirements of the person assessing it. Reliability is the extent to which results are consistent when the same people are tested repeatedly under the same conditions; This is the stability or stability of an individual's test result when the control exercise is repeated. In other words, a child in a group of subjects, based on the results of repeated testing (for example, jumping performance, running time, throwing distance), consistently retains its ranking place. The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case, various methods are used to judge the reliability of the test.

Slide 33

Slide description:

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time under the same conditions by the same experimenter. The method of repeated testing to determine reliability is called retest. The stability of the test depends on the type of test, the age and gender of the subjects, and the time interval between test and retest. For example, performance on conditioning tests or morphological traits over short time intervals is more stable than performance on coordination tests; in older children the results are more stable than in younger ones. A retest is usually carried out no later than a week later. At longer intervals (for example, after a month), the stability of even such tests as the 1000 m run or standing long jump becomes noticeably lower.

Slide 34

Slide description:

Test equivalence Test equivalence is the correlation of the test result with the results of other tests of the same type. For example, when you need to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 m. The attitude towards equivalent (homogeneous) tests depends on many reasons. If it is necessary to increase the reliability of assessments or study conclusions, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, then only one of the equivalent tests should be used. Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous battery of tests is the 30 m run, pull-up, forward bend, and 1000 m run.

35 slide

Slide description:

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, the average accuracy of shots on target from 1, 3, 5, 7 and 9 attempts is compared with the average accuracy of shots from 2, 4, 6, 8 and 10 attempts. This method of assessing reliability is called the doubling method, or splitting. It is used primarily when assessing coordination abilities and in the event that the number of attempts that form the test result is at least six.

36 slide

Slide description:

Under the objectivity (consistency) of the test The objectivity (consistency) of the test is understood as the degree of consistency of the results obtained on the same subjects by different experimenters (teachers, judges, experts). To increase the objectivity of testing, it is necessary to comply with standard test conditions: testing time, location, weather conditions; unified material and hardware support; psychophysiological factors (volume and intensity of load, motivation); presentation of information (precise verbal statement of the test task, explanation and demonstration). This is the so-called objectivity of the test. They also talk about interpretive objectivity, which concerns the degree of independence in the interpretation of test results by different experimenters.

Slide 37

Slide description:

In general, as experts note, the reliability of tests can be increased in various ways: more stringent standardization of testing, an increase in the number of attempts, better motivation of subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, and an increase in the number of equivalent tests. There are no fixed values ​​for test reliability indicators. In most cases, the following recommendations are used: 0.95 - 0.99 - excellent reliability; 0.90 -- 0.94 -- good; 0.80 -- 0.89 -- acceptable; 0.70 - 0.79 - bad; 0.60 - 0.69 - doubtful for individual assessments, the test is suitable only for characterizing a group of subjects.

Slide 38

Slide description:

The validity of a test is the degree of accuracy with which it measures the motor ability or skill being assessed. In foreign (and domestic) literature, instead of the word “informativeness”, the term “validity” is used (from the English validity - validity, reality, legality). In fact, when talking about information content, the researcher answers two questions: what does this particular test (battery of tests) measure and what is the degree of measurement accuracy. There are several types of validity: logical (substantive), empirical (based on experimental data) and predictive.

Slide 39

Slide description:

Important additional test criteria, as noted, are standardization, comparability and efficiency. The essence of standardization is that, based on test results, it is possible to create standards that are of particular importance for practice. Test comparability is the ability to compare results obtained from one or more forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that, as a result of regular use of the same test, the degree of skill is assessed not only and not so much as the level of ability. At the same time, comparable test results increase the reliability of the conclusions. The essence of economy as a criterion for the quality of a test is that conducting the test does not require a long time, large material costs and the participation of many assistants.

40 slide

Slide description:

ORGANIZATION OF TESTING THE READINESS OF SCHOOL-AGE CHILDREN The second important problem of testing motor abilities (recall that the first is the selection of informative tests) is the organization of their use. The physical education teacher must determine: in what timeframe it is better to organize testing, how to carry it out in the lesson and how often it should be carried out testing The timing of testing is consistent with the school program, which provides for mandatory testing of students’ physical fitness twice a day.

41 slides

Slide description:

Knowledge of annual changes in the development of children’s motor abilities allows the teacher to make appropriate adjustments to the process of physical education for the next school year. However, the teacher must and can conduct more frequent testing and conduct so-called operational control. It is advisable to do this in order to determine, for example, changes in the level of speed, strength abilities and endurance under the influence of athletics lessons during the first quarter. For this purpose, the teacher can use tests to assess children’s coordination abilities at the beginning and at the end of mastering the program material, for example, in sports games, to identify changes in the development indicators of these abilities.

42 slide

Slide description:

It should be taken into account that the variety of pedagogical problems being solved does not allow the teacher to be provided with a unified testing methodology, the same rules for conducting tests and evaluating test results. This requires experimenters (teachers) to demonstrate independence in solving theoretical, methodological and organizational testing issues. Testing in a lesson must be linked to its content. In other words, the test or tests used, subject to the appropriate requirements (as a research method), should be organically included in the planned physical exercises. If, for example, children need to determine the level of development of speed abilities or endurance, then the necessary tests should be planned in that part of the lesson in which the tasks of developing the corresponding physical abilities will be solved.

43 slide

Slide description:

The frequency of testing is largely determined by the rate of development of specific physical abilities, age, gender and individual characteristics of their development. For example, to achieve a significant increase in speed, endurance or strength, several months of regular exercise (training) are required. At the same time, in order to obtain a significant increase in flexibility or individual coordination abilities, only 4-12 workouts are required. If you start from scratch, you can achieve improvement in physical quality in a shorter period of time. And in order to improve the same quality when a child has a high level, it takes more time. In this regard, the teacher must study more deeply the features of the development and improvement of various motor abilities in children at different age and gender periods.

44 slide

Slide description:

When assessing the general physical fitness of children, you can use a wide variety of test batteries, the choice of which depends on the specific testing objectives and the availability of necessary conditions. However, due to the fact that the test results obtained can only be assessed by comparison, it is advisable to choose tests that are widely represented in the theory and practice of physical education of children. For example, rely on those recommended in the FC program. To compare the general level of physical fitness of a student or group of students using a set of tests, they resort to converting test results into points or points. The change in the amount of points during repeated testing makes it possible to judge the progress of both an individual child and a group of children.

Slide 49

Slide description:

An important aspect of testing is the problem of choosing a test to assess a specific physical ability and general physical fitness.

50 slide

Slide description:

Practical recommendations and advice. IMPORTANT: Determine (select) the battery (or set) of necessary tests with a detailed description of all the details of their implementation; Set testing dates (better - 2-3 weeks of September - 1st testing, 2-3 weeks of May - 2nd testing); In accordance with the recommendation, accurately determine the age of children on the day of testing and their gender; Develop unified data recording protocols (possibly based on the use of ICT); Determine the circle of assistants and carry out the testing procedure itself; Immediately carry out mathematical processing of testing data - calculation of basic statistical parameters (arithmetic mean, error of arithmetic mean, standard deviation, coefficient of variation and assessment of the reliability of differences between arithmetic means, for example, parallel classes of the same and different schools of children of a certain age and gender ); One of the significant stages of the work may be the translation of test results into points or points. With regular testing (2 times a year, for several years), this will allow the teacher to have an idea of ​​the progress of the results.

51 slides

Slide description:

Moscow “Enlightenment” 2007 The book contains the most common motor tests to assess the conditioning and coordination abilities of students. The manual provides for an individual approach of the physical education teacher to each specific student, taking into account his age and physique.

Basic concepts of test theory.

A measurement or test taken to determine an athlete's condition or ability is called a test. Any test involves measurement. But not every change serves as a test. The measurement or test procedure is called testing.

A test based on motor tasks is called motor. There are three groups of motor tests:

  • 1. Control exercises, in which the athlete is tasked to show maximum results.
  • 2. Standard functional tests, during which the task, the same for everyone, is dosed either according to the amount of work performed, or according to the magnitude of physiological changes.
  • 3. Maximum functional tests, during which the athlete must show maximum results.

High quality testing requires knowledge of measurement theory.

Basic concepts of measurement theory.

Measurement is the identification of correspondence between the phenomenon being studied, on the one hand, and numbers, on the other.

The fundamentals of measurement theory are three concepts: measurement scales, units of measurement and measurement accuracy.

Measurement scales.

A measurement scale is a law by which a numerical value is assigned to a measured result as it increases or decreases. Let's look at some of the scales used in sports.

Name scale (nominal scale).

This is the simplest of all scales. In it, numbers act as labels and serve to detect and distinguish objects under study (for example, the numbering of players on a football team). The numbers that make up the naming scale are allowed to be changed by metas. There are no more-less relationships in this scale, so some believe that the use of a naming scale should not be considered a measurement. When using a scale, names, only some mathematical operations can be carried out. For example, its numbers cannot be added or subtracted, but you can count how many times (how often) a particular number occurs.

Order scale.

There are sports where the athlete’s result is determined only by the place taken in the competition (for example, martial arts). After such competitions, it is clear which of the athletes is stronger and which is weaker. But how much stronger or weaker it is impossible to say. If three athletes took first, second and third places, respectively, then what the difference in their sportsmanship is remains unclear: the second athlete may be almost equal to the first, or may be weaker than him and be almost identical to the third. The places occupied in the order scale are called ranks, and the scale itself is called rank or non-metric. In such a scale, its constituent numbers are ordered by rank (i.e., occupied places), but the intervals between them cannot be accurately measured. Unlike the naming scale, the order scale allows not only to establish the fact of equality or inequality of measured objects, but also to determine the nature of inequality in the form of judgments: “more is less,” “better is worse,” etc.

Using order scales, you can measure qualitative indicators that do not have a strict quantitative measure. These scales are used especially widely in the humanities: pedagogy, psychology, sociology.

A greater number of mathematical operations can be applied to the ranks of the order scale than to the numbers of the name scale.

Interval scale.

This is a scale in which numbers are not only ordered by rank, but also separated by certain intervals. The feature that distinguishes it from the ratio scale described below is that the zero point is chosen arbitrarily. Examples include calendar time (the beginning of chronology in different calendars was set for random reasons), joint angle (the angle at the elbow joint with full extension of the forearm can be taken equal to either zero or 180°), temperature, potential energy of a lifted load, electric field potential and etc.

The results of measurements on an interval scale can be processed by all mathematical methods, except for calculating ratios. These interval scales provide an answer to the question: “how much more,” but do not allow us to state that one value of a measured quantity is so many times greater or less than another. For example, if the temperature increased from 10 to 20 C, then it cannot be said that it has become twice as warm.

Relationship scale.

This scale differs from the interval scale only in that it strictly defines the position of the zero point. Thanks to this, the ratio scale does not impose any restrictions on the mathematical apparatus used to process observational results.

In sports, ratio scales measure distance, strength, speed, and dozens of other variables. The ratio scale also measures those quantities that are formed as differences between numbers measured on the interval scale. Thus, calendar time is counted on a scale of intervals, and time intervals - on a scale of ratios. When using a ratio scale (and only in this case!), the measurement of any quantity is reduced to the experimental determination of the ratio of this quantity to another similar one, taken as a unit. By measuring the length of the jump, we find out how many times this length is greater than the length of another body taken as a unit of length (a meter ruler in a particular case); By weighing a barbell, we determine the ratio of its mass to the mass of another body - a single “kilogram” weight, etc. If we limit ourselves only to the use of ratio scales, then we can give another (narrower, more specific) definition of measurement: to measure a quantity means to find experimentally its relation to the corresponding unit of measurement.

Units of measurement.

In order for the results of different measurements to be compared with each other, they must be expressed in the same units. In 1960, at the International General Conference on Weights and Measures, the International System of Units was adopted, abbreviated as SI (from the initial letters of the words System International). Currently, the preferred application of this system has been established in all areas of science and technology, in the national economy, as well as in teaching.

The SI currently includes seven basic units independent from each other (see table 2.1.)

Table 1.1.

From the indicated basic units, the units of other physical quantities are derived as derivatives. Derived units are determined on the basis of formulas that relate physical quantities to each other. For example, the unit of length (meter) and the unit of time (second) are basic units, and the unit of speed (meter per second) is a derivative.

In addition to the basic ones, the SI distinguishes two additional units: the radian, a unit of plane angle, and the steradian, a unit of solid angle (angle in space).

Accuracy of measurements.

No measurement can be made absolutely accurately. The measurement result inevitably contains an error, the magnitude of which is smaller, the more accurate the measurement method and measuring device. For example, using a regular ruler with millimeter divisions, it is impossible to measure length with an accuracy of 0.01 mm.

Basic and additional error.

Basic error is the error of a measurement method or measuring instrument that occurs under normal conditions of use.

Additional error is the error of a measuring device caused by deviation of its operating conditions from normal ones. It is clear that instruments designed to operate at room temperature will not give accurate readings if used in the summer at a stadium under the scorching sun or in the winter in the cold. Measurement errors can occur when the voltage of the electrical network or battery power supply is lower than normal or is not constant in value.

Absolute and relative errors.

The value E = A--Ao, equal to the difference between the reading of the measuring device (A) and the true value of the measured quantity (Ao), is called the absolute measurement error. It is measured in the same units as the measured quantity itself.

In practice, it is often convenient to use not the absolute, but the relative error. The relative measurement error is of two types - real and reduced. The actual relative error is the ratio of the absolute error to the true value of the measured quantity:

A D =---------* 100%

The given relative error is the ratio of the absolute error to the maximum possible value of the measured quantity:

Up =----------* 100%

Systematic and random errors.

Systematic is an error whose value does not change from measurement to measurement. Due to this feature, systematic error can often be predicted in advance or, in extreme cases, detected and eliminated at the end of the measurement process.

The method for eliminating systematic error depends primarily on its nature. Systematic measurement errors can be divided into three groups:

errors of known origin and known magnitude;

errors of known origin but unknown magnitude;

errors of unknown origin and unknown magnitude. The most harmless are the errors of the first group. They are easily removed

by introducing appropriate corrections to the measurement result.

The second group includes, first of all, errors associated with the imperfection of the measurement method and measuring equipment. For example, the error in measuring physical performance using a mask for collecting exhaled air: the mask makes breathing difficult, and the athlete naturally demonstrates physical performance that is underestimated compared to the true one measured without a mask. The magnitude of this error cannot be predicted in advance: it depends on the individual abilities of the athlete and his state of health at the time of the study.

Another example of a systematic error in this group is an error associated with imperfect equipment, when a measuring device knowingly overestimates or underestimates the true value of the measured value, but the magnitude of the error is unknown.

Errors of the third group are the most dangerous; their occurrence is associated both with the imperfection of the measurement method and with the characteristics of the object of measurement - the athlete.

Random errors arise under the influence of various factors that cannot be predicted in advance or accurately taken into account. Random errors cannot be eliminated in principle. However, using the methods of mathematical statistics, it is possible to estimate the magnitude of the random error and take it into account when interpreting the measurement results. Without statistical processing, measurement results cannot be considered reliable.



Random articles

Up