Psy: Cognition & Instruction Dissertation Development

The dissertation problem statement emerges from the topic to address what needs the research will fulfill. The problem statement must be supported by literature that upholds the need to explore the problem. In this assignment, you will construct an initial supported problem statement for your intended research topic.

My program is Doctor of Philosophy in General Psychology with an emphasis in Cognition & Instruction.

Dissertation Topic Chosen: Individual Focused Learning for Better Memory Retention Through Experience

General Requirements: 
Use the following information to ensure the successful completion of the assignment:

· Locate the “Dissertation Development Template” in the Topic Resources. 

· This assignment uses a rubric. Please review the rubric prior to beginning the assignment to become familiar with the expectations for successful completion. 

· Doctoral learners are required to use APA 7th style for their writing assignments. The APA Style Guide is located in the Student Success Center. 

· Refer to the Publication Manual of the American Psychological Association for specific guidelines related to doctoral level writing. The Manual contains essential information on manuscript structure and content, clear and concise writing, and academic grammar and usage. 

· This assignment requires that at least two additional scholarly research sources related to this topic, and at least one in-text citation from each source be included.

Directions:

Use the Dissertation Development Template to provide an annotated bibliography of ten empirical research articles on your proposed dissertation topic. For each article include a reference and an annotation (150-250 words each) of the key points of the article.

In the second portion of the assignment, you will answer questions regarding the alignment of your topic to your program and defend your selection of articles. Use the template to write a supported problem statement for your proposed dissertation topic. The statement must include a defense of the need to explore the problem and a discussion of the feasibility of the topic (250-500 words).

Running head: DISSERTATION DEVELOPMENT 1

DISSERTATION DEVELOPMENT 3

Individual Focused Learning for Better Memory Retention Through Experience

Grand Canyon University

Date

Individual Focused Learning for Better Memory Retention Through Experience

Baker, V. L., & Pifer, M. J. (2011). The role of relationships in the transition from doctor to independent scholar. Studies in Continuing Education, 33(1), 5-17. http://doi.org/10.1080/0158037X. 2010.515569

For this assignment select ten empirical, peer-reviewed research articles that are relevant to your dissertation topic and program of study. Provide a reference and an annotation (150-250 words) that includes important details about the article for each of the sources.

Annotations are descriptive and critical assessments of literature that help researchers evaluate texts and determine relevancy in relation to a research project. Ultimately, it is a note-taking tool that fosters critical thinking and helps you evaluate the source material for possible later use. Instead of reading articles and forgetting what you have read, you have a convenient document full of helpful information. An annotated bibliography can help you see the bigger picture of the literature you are reading. It can help you visualize the overall status of the topic, as well as where your unique question might fit into the field of literature. 

Your sources should not be class resources, dissertations, books, reports, conference papers, legal documents, book reviews, editorials, newspaper articles, magazine articles, and other sources that are not peer-reviewed. 


Degree

Doctor of Philosophy in General Psychology with an emphasis in Cognition & Instruction

Research Focus

How does your topic area align with your degree and emphasis area?

Feasibility of Research Problem

It is important for you to select a topic that is viable and a problem that is researchable. Justify the feasibility of your proposed research project. For instance, is this a researchable problem? Is there a need for it? What is your proximity to a data source?

Problem Statement

A problem statement emerges from reading and studying the literature around a topic. Consider these dimensions of the problem you are investigating: what is known, what needs to be known, and what is the significance. To provide context, begin with how the problem has already been investigated. Next, address what has not been examined about the specific issue you want to address. Finally, discuss why this is an important problem that needs to be investigated, that is what is the benefit of studying this specific problem. (100-150 words)

Defense of Article Selection

The selection of the articles you choose to include is essential in developing an understanding of the literature on your topic. For this assignment you should select five empirical articles that are important to your understanding of the problem. Your searches may turn up many articles on the topic and it is imperative for you to discern which articles to include and which to exclude. In this section, defend your selection of your five articles for the annotated bibliography. Why were these five articles important enough to annotate? What do these add to your understanding of the problem? How do these align with your research focus? Are they current and relevant? (250-500 words)

Paper

  

Research Paper: Securing Data

Overview

IT infrastructure of an enterprise has: 

a) Database servers, storage systems (SAN & NAS) for data storage

b) Computers, applications to process the data

c) Network components to enable connectivity among computers and servers

In this research, your goal is to secure the data processed by these IT components. 

Prepare

· Review the APA Citation Style guide if necessary. You can also consult the Learning Commons for writing support and library resources.

· Carefully review the Chapter 8, Chapter 11, and Chapter 12 of the textbook. The textbook is your primary source of information. 

Research

· Research and analyze potential vulnerabilities and security challenges that are associated with these IT components and that eventually threatens confidentiality, integrity, and availability of data. Consider both the technical and non-technical vulnerabilities.

· Research and analyze various security countermeasures and solutions to secure data. Think about countermeasures in policy, human, database, application, computer, storage, server, network, data, object, and application dimensions.

Write

Prepare a two to three-page APA formatted research paper. Your paper should be well-organized in terms of matching each stated vulnerability with at least one countermeasure. Real-world examples are a plus.

Cite

APA citation and referencing is required for this assignment. You must use in-text citations and a reference list to accurately cite the sources of information that you use to write your assignment. 

Failure to use proper citations and references will be considered plagiarism. If you need further information about APA style and format, see the APA Citation Style guide. The Purdue OWL is another useful source of APA help.

Submit

1. Turnitin will automatically and transparently check your submission after you submit it. Turnitin is integrated into Canvas; therefore, you don’t need to sign up for Turnitin. After Turnitin completed processing your submission, you will be able to see your similarity result on Canvas.

2. Review your Submission Details and access your Turnitin report. Revise your work as needed based on the feedback. 

3. By the due date indicated, re-submit the final version of your work.

BEFORE submitting your assignment for final grading, ensure that you have completed ALL of the steps above.

Week 10 soap note

This week, you will be learning about different gynecological cancers .  We will begin by reviewing the pathophysiology and progression of the disease. As a clinician, you will be responsible for screening patients and helping co-manage their diseases. You will identify  risk factors and prevention measures to help improve health. To complete this soap note you will have to follow the template attached to this post. The soap note has to be done as if a pt came in with symptoms of possibLe endometrial cancer and then lead up to a diagnosis of endometrial cance. please use references in app style and no Later then five years ago. 

SOAP NOTE

Name: Date: Time:
Age: Sex:
SUBJECTIVE
CC:

Reason given by the patient for seeking medical care “in quotes”

HPI:

Describe the course of the patient’s illness, including when it began, character of symptoms, location

where the symptoms began, aggravating or alleviating factors; pertinent positives and negatives, other

related diseases, past illnesses, surgeries or past diagnostic testing related to present illness.

Medications: (list with reason for med )

PMH

Allergies:

Medication Intolerances:

Chronic Illnesses/Major traumas

Hospitalizations/Surgeries

“Have you every been told that you have: Diabetes, HTN, peptic ulcer disease, asthma, lung disease, heart

disease, cancer, TB, thyroid problems or kidney disease or psychiatric diagnosis.”

Family History

Does your mother, father or siblings have any medical or psychiatric illnesses? Anyone diagnosed with:

lung disease, heart disease, htn, cancer, TB, DM, or kidney disease.

Social History

Education level, occupational history, current living situation/partner/marital status, substance use/abuse,

ETOH, tobacco, marijuana. Safety status

ROS
General

Weight change, fatigue, fever, chills, night sweats,

energy level

Cardiovascular

Chest pain, palpitations, PND, orthopnea, edema

Skin

Delayed healing, rashes, bruising, bleeding or skin

discolorations, any changes in lesions or moles

Respiratory

Cough, wheezing, hemoptysis, dyspnea, pneumonia

hx, TB

Eyes

Corrective lenses, blurring, visual changes of any

kind

Gastrointestinal

Abdominal pain, N/V/D, constipation, hepatitis,

hemorrhoids, eating disorders, ulcers, black tarry

stools

Ears

Ear pain, hearing loss, ringing in ears, discharge

Genitourinary/Gynecological

Urgency, frequency burning, change in color of

urine.

Contraception, sexual activity, STDS

Fe: last pap, breast, mammo, menstrual

complaints, vaginal discharge, pregnancy hx

Male: prostate, PSA, urinary complaints

Nose/Mouth/Throat

Sinus problems, dysphagia, nose bleeds or

discharge, dental disease, hoarseness, throat pain

Musculoskeletal

Back pain, joint swelling, stiffness or pain, fracture

hx, osteoporosis

Breast

SBE, lumps, bumps or changes

Neurological

Syncope, seizures, transient paralysis, weakness,

paresthesias, black out spells
Heme/Lymph/Endo

HIV status, bruising, blood transfusion hx, night

sweats, swollen glands, increase thirst, increase

hunger, cold or heat intolerance

Psychiatric

Depression, anxiety, sleeping difficulties, suicidal

ideation/attempts, previous dx

OBJECTIVE

Weight BMI Temp BP
Height Pulse Resp
General Appearance

Healthy appearing adult female in no acute distress. Alert and oriented; answers questions appropriately.

Slightly somber affect at first, then brighter later.
Skin

Skin is brown, warm, dry, clean and intact. No rashes or lesions noted.
HEENT

Head is normocephalic, atraumatic and without lesions; hair evenly distributed. Eyes: PERRLA. EOMs

intact. No conjunctival or scleral injection. Ears: Canals patent. Bilateral TMs pearly grey with positive

light reflex; landmarks easily visualized. Nose: Nasal mucosa pink; normal turbinates. No septal deviation.

Neck: Supple. Full ROM; no cervical lymphadenopathy; no occipital nodes. No thyromegaly or nodules.

Oral mucosa pink and moist. Pharynx is nonerythematous and without exudate. Teeth are in good repair.
Cardiovascular

S1, S2 with regular rate and rhythm. No extra sounds, clicks, rubs or murmurs. Capillary refill 2 seconds.

Pulses 3+ throughout. No edema.
Respiratory

Symmetric chest wall. Respirations regular and easy; lungs clear to auscultation bilaterally.
Gastrointestinal

Abdomen obese; BS active in all 4 quadrants. Abdomen soft, non-tender. No hepatosplenomegaly.
Breast

Breast is free from masses or tenderness, no discharge, no dimpling, wrinkling or discoloration of the skin.
Genitourinary

Bladder is non-distended; no CVA tenderness. External genitalia reveals coarse pubic hair in normal

distribution; skin color is consistent with general pigmentation. No vulvar lesions noted. Well estrogenized.

A small speculum was inserted; vaginal walls are pink and well rugated; no lesions noted. Cervix is pink

and nulliparous. Scant clear to cloudy drainage present. On bimanual exam, cervix is firm. No CMT.

Uterus is antevert and positioned behind a slightly distended bladder; no fullness, masses, or tenderness.

No adnexal masses or tenderness. Ovaries are non-palpable.

(Male: both testes palpable, no masses or lesions, no hernia, no uretheral discharge. )

(Rectal as appropriate: no evidence of hemorrhoids, fissures, bleeding or masses—Males: prostrate is

smooth, non-tender and free from nodules, is of normal size, sphincter tone is firm).
Musculoskeletal

Full ROM seen in all 4 extremities as patient moved about the exam room.
Neurological

Speech clear. Good tone. Posture erect. Balance stable; gait normal.
Psychiatric

Alert and oriented. Dressed in clean slacks, shirt and coat. Maintains eye contact. Speech is soft, though

clear and of normal rate and cadence; answers questions appropriately.

Lab Tests

Urinalysis – pending

Urine culture – pending

Wet prep – pending

Special Tests

Diagnosis

Differential Diagnoses

o 1-
o 2-
o 3-

Diagnosis

o

Plan/Therapeutics

o Plan:
▪ Further testing
▪ Medication
▪ Education
▪ Non-medication treatments

Evaluation of patient encounter

Humanities

I need help with the attachment

Due June 8

What I want you to know: I want you to understand the basic political system of Russia.

I. Basic System: Semi-Presidential Federation

President

The Head of State represents Russia abroad and at home.

Elected every 6 years

Can serve 2 consecutive terms. June 2020 Constitutional referendum erased Putin’s previous terms and resets his term limits to potentially 2036.

Appoints and nominates the Prime Minister (Head of Government).

Can issue decrees that have the power of law unless they contradict federal laws.

Can disband Duma

Commander-and-Chief of the armed forces

2. Executive Branch

Chief of State: Vladimir Putin

Head of Government: Mikhail Mishustin

The Cabinet: The “Government” is the premier, his deputies, and ministers, all appointed by the president. The Duma also confirms the nomination of the premier.

3. The Federal Assembly-Parliament or Legislative Branch

Bicameral=Federation Council & State Duma

The Federal Assembly forms committees and commissions to resolve issues and is empowered to pass laws.

4. The Judicial Branch

Three types of courts:

The courts of general jurisdiction (including military courts) are subordinated to the Supreme Court The municipal court is the lowest adjudicating body in the general court system. It serves each city or rural district and hears more than 90 percent of all civil and criminal cases. The next level of courts of general jurisdiction is the regional courts. At the highest level is the Supreme Court. Decisions of the lower trial courts can generally be appealed only to the immediately superior court.

The arbitration court system under the High Court of Arbitration courts is in practice specialized courts that resolve property and commercial disputes between economic agents. The highest level of the court resolving economic disputes is the High Court of Arbitration.

The Constitutional Court (as well as constitutional courts in several federal entities) The Constitutional Court is empowered to rule on whether or not laws or presidential decrees are constitutional. If it finds that a law is unconstitutional, the law becomes unenforceable and governmental agencies are barred from implementing it. The judges of the Constitutional Court, the Supreme Court, and the Higher Arbitration Court are appointed by the parliament’s upper house, the Federation Council

5. Federal Administration

89 administrative units that are divided into republics, territories, regions, cities of Moscow and St. Petersburg, Jewish Autonomous, and other autonomous regions.46 provinces (oblasti, singular – oblast), 21 republics (respubliki, singular – respublika), 4 autonomous okrugs (avtonomnyye okrugi, singular – avtonomnyy okrug), 9 krays (kraya, singular – kray), 2 federal cities (goroda, singular – gorod), and 1 autonomous oblast (avtonomnaya oblast’)

Russian republics were created on broadly ethnic lines

Republics have constitutions, parliaments, and governments that can pass their own laws so far as they don’t contradict Russian law.

Territories may pass their own charters and other legislative acts. Set local taxes and maintain public order and legal affairs

The trend recently is to grant greater powers of government to these administrative units and to reduce central government interference in local government.

Putin wanted a mechanism of central control of the provinces=Institution of Presidential Representatives 7 Federal administrative provinces. President appoints each personally. Each has a staff of 100 Each unit assumes and coordinates authority over officials from many other federal agencies that set up branches in the federal capitals. Examples: Ministry of Justice, Tax Police, FSB

* Purpose: To have agents of the central state who would remain unswervingly loyal to directives from Moscow. They could be removed if they failed to bring provincial legislation in line with federal law.

* In the past, Governors ran their regions like personal kingdoms and flouted federal law. Now there is more control over them.

II. The Party System

Has beginnings in Perestroika era. Many political organizations and movements began during this era and created the foundation for a multi-party system.

After the fall of communism, political parties were allowed again.

1994-60 political parties

1999-over 100

2003-23

2018-64 registered parties, only 4 hold representation in Russia’s national legislature.

The main parties are: A Just Russia [Sergey MIRONOV, Civic Platform or CP [Rifat SHAYKHUTDINOV], Communist Party of the Russian Federation or CPRF [Gennadiy ZYUGANOV], Liberal Democratic Party of Russia or LDPR [VladimirZHIRINOVSKIY], Rodina [Aleksei ZHURAVLYOV], United Russia [Dmitriy MEDVEDEV]

3. Four types of Political Parties

Communists: Want to see Russia become a Superpower with a state-run planned economy, following Marxist principles. Some desire restoration of the Soviet Union. Some invoke “socialism with a human face”.

Nationalists: Russia as a Superpower. Russians are the dominant ethnic group. Various views on the market economy and political pluralism. Often backed by paramilitary structures led by former “black berets” and Afghan war vets.

*In 1993, Vladimir Zhirinovsky”s party received 23% of the votes for parliament. He openly proclaims that Russia should become a colonial power and regain its former territories of the tsarist regime including Poland, Finland, and Alaska. In the 2000 elections, he received 2.7% of the vote.

c. Democrats: They are pro-Western and in favor of a free-market economy. These groups represent young, progressively minded professionals, small business owners, and entrepreneurs.

d. Pro-Government Party (President’s Party) United Russia: Putin began a tradition of gathering his power base and giving it party status. This party is not formed for ideological reasons, but to create a pro-government, pro-presidential bloc in the Duma. Putin is not a member of this party officially.

III. Who is in power?

President-Vladimir Putin

Prime Minister-Mikhail Mishustin

Study Guide Questions

1. How many years does the President serve? How many terms?

2. What is the Prime Minister’s position?

3. Name the two parts of the Russian Legislature.

4. What is the Institution of Presidential Representatives? Why was it created?

5. Who hand-picked Dmitri Medvedev as the new Russian President?

6. What are two political parties that have influence in Russian politics besides United Russia?

7. What is the party “United Russia”?

8. When are the next parliamentary elections? (go on the Internet to find this)

9. When are the next presidential elections? (go on the Internet to find this)

10. Who are the President and Prime Minister of Russia?

Due June 8

Fixing the Economy

What I want you to know: The Russian economy has been through some tough times, but it is on the upswing and it will not be long before we see Russian products competing in our own marketplace.

 

I. Background

Definitions:

Command Economy = Managed economy (Planners).  Prices are set artificially.  Often, they are not close to the price for that product in a market economy.

 

Market Economy=Prices are determined by the natural interaction of supply and demand in the market.  Prices are equal to real value. The Soviet economy was a command economy.  The concentration of labor and resources was in heavy industry (especially in the military-industrial sector). Sacrifices were in consumer goods.  Soviet consumers had to deal with shoddy products and shortages.  The sectors of agriculture, transport, and service were ignored.

 

II.  Gorbachev’s Perestroika.

There was a gradual but controlled move towards a market economy.

A. June 1987: Law on State Enterprises

    -Was designed to move enterprises to full-cost accounting over a two-year period.  Began the process of rolling back the frontiers of the State by starting to dismantle the huge planning Bureaucracy. 

B. May 1988:  Law on Cooperatives

    -Removed restrictions on cooperative economic activity (stopped in 1929) and opened the door to privatization.

*Gorbachev’s reforms were efficient at demolishing the command economy but failed to put a viable alternative in its place.

III. Yeltsin Era: “Let the good times roll!”

At the end of the Gorbachev era, many elements of the command economy were still evident, including state subsidy of many enterprises, controlled prices, a centralized supply system, etc. With the break-up of the Soviet Union, Yeltsin felt free to pursue a policy of radical economic reform.

A. “Shock Therapy” 

    This was a sudden and rapid move towards a market economy.  It was supposed that although it would be painful, it was better to get it over with quickly than do it slowly.

January 1992:

1. Prices freed.

2. Government subsidies to industry were cut back.

3. Full autonomy and financial accountability to state-owned enterprises.

4. To soften the blow to the people, the Russian government maintained a commitment to a program of social welfare to help the needy.

5. To force monetary stability, Yeltsin’s government pursued a policy of trying to balance the budget.

6. Opening the economy to the world market would promote competition, inward investment, and efficiency.

B. Privatization continued during this period.  There was a move to privatize services, the retail sector, and the establishment of joint-ventures with foreign ownership.  State-owned housing, farming, and eventually the industrial sector was privatized.

    1. In October of 1992, the government approved two plans for privatizing state-owned industry:

        a. Managers, directors, and employees got shares of their firms.

        b. Vouchers were distributed to the population which represented their nominal share of the state-owned industry.  Vouchers could be used to buy shares or simply sold for cash.

        c. Results

        1.  By the end of 1993, 2/3 of the 14,500 firms picked for privatization had been transformed into joint-stock companies owned mainly by their workers.

        2. It is estimated that by the end of 1995, 73% of the industrial assets had been transferred to private hands.

        3. The Government continued to control key sectors.

*By purchasing and accumulating vouchers is how many people got fantastically rich in Russia.  Most of the average people had no idea what the worth of their vouchers really had if accumulated and cashed in or purchased ownership of industry. 

 

**Former officials, directors of factories, and former communist officials used their positions to acquire shares in their factories dishonestly.

   C. Costs

       1. 1989-1996 GDP fell by 60% with a greater decline in industrial output.

       2. There was widespread tax evasion, corruption, and tax fraud, and that severely hampered the government’s indebtedness (recall, IMF loans in the early ’90s). 

       3. Because of the failing economy, politically people chose Communists and Nationalists in the Duma elections of ’93 & ‘95 over democratic, reform-minded candidates.

       4. By September 1997, unemployment had risen to 9% of the workforce (65 million Russians). This resulted in delayed payment, payments-in-kind, multiple jobs, unofficial or illegal commercial deals, and growing one’s own food at the dacha. 

 

*It is estimated that only about 10% of the population experienced a real increase in income between1991-96.  Recall Oligarchs and New Russians.  The rest lost out.

    D. Financial Crisis of August 1998

        On August 17th, 1998, the Russian Government announced the devaluation of the Ruble and a 90-day moratorium on the payment of external debts by commercial banks.  The Government was bankrupt.  

        Results:

         1. Ruble fell to half its value.

         2. Prices of food changed hourly.

         3. Inflation soared.

         4. People and Enterprises scrambled to get their money out of the banks.  Most failed and their savings and accounts were wiped out in the suspension of the banks.

         5. Belief in the future of the Government and economic reforms disappeared.  The Government was dissolved within the week.

         6. The emerging middle-class was destroyed.

         7. Many said that the crisis was worse than our Great Depression.

 

*Silver Lining:  As a result of this crisis, many say that it was the death of the speculative economy based on quick mega-profits and financial manipulations.  It was replaced by a real producing economy with professional managers quickly replacing the poorly educated newly rich Russians with bad manners.  It was the beginning of a new era in the Russian Economy.

THE RUSSIAN ECONOMY REVISITED IN 2003

What I want you to know:  The Russian economy is in a growth mode.  The Government has set high goals and is determined to reach them.  Essential laws protecting property rights and worries about the aftermath of the wealth redistribution of the early 90s have overshadowed Putin’s Governments efforts. 

I. After the August Crisis

Professional political managers came to power.  The Left and radicals became less popular.  

Results:

=This means that society no longer entertains the illusion that somebody else, and not the people themselves, can quickly change their lives for the better. 

=People learned to count on themselves to improve their lives. 

=Irreversible market reforms are the result

1. Statistics:

A. The middle-class has grown to up to 25% in cities.

B. People are trusting the banks again.  Over twice as much has been deposited in banks over pre-crisis amounts.

    1998=$2.9 Billion

    2003=$7 Billion

C. Consumer spending is up in travel, auto purchases, home improvement, electronics, and furniture.

D. By June 2002, real cash incomes had exceeded August 1998 pre-crisis levels by 5.4%.  Wages in June 2002 were 18.7% higher than wages in 1998.

Managing Russia After the Crisis

09 October 2009

By Odd Per Brekk

 

The international crisis dealt a severe blow to the Russian economy. The lower oil prices and reversal of international capital flow to emerging markets hit the country hard because the shocks struck just as the economy was on a steep upturn and Russia’s dependence on oil made it particularly vulnerable.

As a result, economic activity fell precipitously. Faced with this challenging turn of events, the government mounted an economic policy response that was swift and unprecedented in its scale and contents. The banking pressures were addressed through large-scale liquidity injections and rescue of problem banks, while fiscal policy became expansionary. At first, the Central Bank allowed gradual exchange-rate depreciation into early 2009, drawing on its foreign reserves to moderate the pace. This allowed banks and corporations to bolster their foreign exchange positions and brought the ruble in line with the new fundamentals implied by lower oil prices.

Looking forward, the global economy suggests a slow recovery as it will be facing deleveraging, corporate restructuring, and slow job growth. Similarly, Russia cannot expect a rapid return of high oil prices or large capital inflows. We should therefore foresee a fairly modest recovery in Russia combined with a weaker balance of payments than in recent years.

 

This sobering outlook has important implications for the country’s economic strategy. Clearly, the government’s response over the last year has helped preserve stability, which is a prerequisite for the resumption of growth. In fact, since mid-2009, there have been signs of economic stabilization. But large challenges remain. The central goal will be to turn the tentative signs of a rebound into lasting economic growth while preserving the stabilization gains. In this regard, Russia faces delicate trade-offs, as well as room for improving the boost to economic growth, both in the short and longer-term.

Consider first the short-term policy priorities. Ensuring a healthy banking system will be critical for the resumption of credit supply. This underscores the need for a proactive and comprehensive strategy so that banks have the capacity to lend once the economy recovers. Key elements of this strategy should include mandatory stress tests of major banks to obtain better assessments of their viability. These tests should, above all, reveal whether banks have adequate capital or have the ability to raise more capital if needed, either from private sources or from the envisaged bank recapitalization by public funds.

Turning to budget policy, the cautious fiscal policy of the past has left Russia with a low public debt level and sizable buffers, creating “fiscal space” for relaxation. But the size of the relaxation should not be so large as to undermine the quality of public spending. Moreover, the use of the Reserve Fund for budgetary financing is effectively the same as printing money for this purpose, and this could easily threaten the stability of the ruble. The good news is that with a better composition of the fiscal stimulus, Russia could achieve the same boost to domestic demand with lower fiscal deficits. To this end, the fiscal stimulus should enhance social safety nets and infrastructure projects. Also, the government should keep in mind longer-term fiscal policy objectives. Emphasizing self-reversing spending categories now would allow more flexibility in budget policies later. The more convincing the medium-term fiscal plans are, the stronger the fiscal boost will be today.

On the monetary policy side, the Central Bank is facing a balancing act. Inflation is coming down and may undershoot the target this year. But at the same time, the ruble remains vulnerable to swings in oil prices, banks are still liquid, and the fiscal expansion may renew pressures. On balance, however, the gradual relaxation of monetary policy envisaged by the Central Bank would seem appropriate. But there is clearly a need for careful implementation to avoid instability while keeping an eye on capital flows, the exchange rate, and depositor confidence.

Looking beyond the crisis, there is broad consensus on the need for Russia to achieve economic diversification. This would help Russia realize its economic potential and also make the country less vulnerable to the vagaries of financial and commodity markets. Diversifying would not necessarily mean an increase in hi-tech industries but could equally well involve such sectors as light industry and tourism. To achieve real diversification, however, Russia will need significant investment.

The reform agenda is well-known. The most important priorities are a rollback of state control, easing of entry for new firms, reforms of the public sector, strengthening anti-corruption efforts, and gaining accession to the World Trade Organization. While the commentary on Russia’s medium-term policies tends to focus on these structural reforms, we should not lose sight of the macroeconomic foundations for balanced economic growth.

Both medium-term government budget policy and monetary policy will play critical roles in how Russia recovers. As for medium-term budget policies, the central issue is how the country over time would best benefit from its natural resource wealth. One option would be to conservatively aim for a public spending level consistent with the income that the government will derive from petroleum over the long haul. Taking the 2009 budget as the starting point, this would require considerable restraint in government spending once the economy recovers, while at the same time underlining moving forward with deep and comprehensive public sector reforms. Other options toward fiscal viability entail large fiscal adjustments. Whichever option is pursued, conservative fiscal policies will preserve Russia’s competitiveness and limit “Dutch disease” by avoiding excessive reliance on natural resources.

The second important condition for achieving sustained growth is to anchor inflation at a low and stable level. This can be achieved through higher domestic savings and investment. To this end, formal inflation targeting must become a goal of the government. The Central Bank has been making progress on the technical preparations for formal inflation targeting. Encouraging recent examples include increased exchange-rate flexibility and more public statements explaining interest rate decisions.

Russia must now concentrate its efforts on how to foster sustained growth. For the near term, the government’s strategy on the banking and budget sides should aim to facilitate early recovery and protect stability. Russia has vast economic potential, and unleashing it will require a deliberate and broad economic strategy that encompasses sound macroeconomic policies and structural reforms.

Odd Per Brekk is senior resident representative at the International Monetary Fund in Moscow.

STUDY GUIDE QUESTIONS

Fixing the Russian Economy

1. What is the Command Economy?

2. What is the Market Economy?

3. Describe the Soviet Economy (3).

4. Describe two major reforms from the Perestroika era.

5. What did Perestroika do to the economy?

6. What was Shock Therapy? (3)

7. What is a voucher?

8. Name the two ways that State industry was privatized in October 1992.

9. What were the results of this privatization?

10. What did most people do with their vouchers?

11. How did the former bosses get a big piece of Russia?

12. Name two costs of the efforts at fixing the Russian economy?

13. How did the Russians survive this period?

14. What happened on August 17th, 1998?

15. What were the results of this crisis?

16. Why do the Russians say that the August Crisis has a silver lining?

Economy Revisited

1. Why is it important that after the crisis the government began to be run by professional political managers?

2. Name 3 positive things that happened as a result of the August 1998 Crisis.

From the CIA World Factbook: Russia:

1. What is the estimated GDP per capita (per person) last year?

2. What is the Russian currency?

3. What are Russia’s primary export commodities?

4. What is the number one import commodity?

From “Managing Russia After the Crisis”

1. Why does Odd Per Brekk suggest that the Russian economy will recover slowly after the 2009 world economic crisis?

From the 2021 OECD Economic Forecast for Russia

1. According to the OECD, is Russia’s GDP expected to grow (+) in 2021?

The Prompt

1. The Women: Read this excerpt below of an economic case study of three women and their lives right at the time of economic transition from planned to free-market.

State your thoughts on the following:

· What would you have done to survive during this time?

· How do you think this could have been prevented? Do you think it could have been prevented?

· How can Russia become a “Rule of Law” nation?

In the discussion please post your thoughts on how these women lived and the economic choices that they had to make. One student discovered this economic tool to help her understand the value of the ruble and purchasing power from 1997 to the present. If you like numbers, please visit: The inflation Tool

WOMEN & SURVIVAL

Excerpts from a study done by Michael Buraway, Pavel Krotov, and Tatyana Lytkina; “Domestic Involution: How Women Organize Survival in a North Russian City”. Found in: Victoria E. Bonnell and George W. Breslauer, eds., Russia in the New Century: Stability or Disorder (Boulder: Westview Press, 2001).

Background Notes: This study was done in a small arctic town called Syktyvkar in the Komi Republic. There are two factories focused on-the city’s garment factory (Red October) and its furniture factory (Polar). The interviews were conducted in 1994-95 and then the subjects were re-interviewed in 1998-1999. The portions taken are verbatim from the text.

Marina: For a Roof over One’s Head

The stereotypic Soviet citizen often has been described as a dependent, bereft of initiative, passive in the face of adversity, helpless without state handouts, and jealous of those who enrich themselves. At first sight, Marina looked as though she fit the stereotype. When we interviewed her in 1995, she was still working at Polar, hanging on in the hope of early retirement (at the age of 45), to which she would be entitled based on her hazardous work. But she was denied this because her registered job classification was not designated as hazardous. Still, she didn’t leave even though by 1995, wages had been irregular and falling for two years and most workers had already left. She complained a lot about all the stealing that was taking place at the enterprise, both by workers and by managers. She recently had turned down a job in retail since such work-so she said-was immoral.

At the age of 47, in February 1998, Marina was laid off. She received 1,500 rubles in kind (a divan), half of the six months’ liquidation wages owed her by the law. At the time of our second interview (April 1999), she was still waiting for the remaining 1,500 rubles. When the six months were up, she registered at the Employment Agency in search of work but so far had found none. “Who wants to employ a pensioner,” she says, “when there are so many young people looking for work?” So she depends on monthly unemployment compensation of 375 rubles (75 percent of her regular wages-the amount provided for by law, for the first three months of unemployment) in food, and another 310 rubles in medical assistance for her son, who has chronic asthma and gastritis.

Marina lives with her second husband, who also worked at Polar until wages became irregular. He quit in 1993 for a construction company job, which also failed to meet his expectations, after which he took a job caring for the Municipal Parks. Again, he didn’t last six months before turning to unemployment. That was in 1994. Now he is working for the Ministry of Internal Affairs as a joiner. He receives 300 rubles a month, more or less regularly, but again only in kind–a bus pass, food. The latest insult was 100 rubles worth of so-called Humanitarian Aid, which was, as Marina described it, only fit for their dogs. He used to do odd jobs on dachas, building stairs or bathhouses, but as Marina asked rhetorically, “Who has the money to pay for such work nowadays?”

Marina and her husband have two children, a daughter of 16 and a son of 15. Marina frequently mentions her son’s disability, which often keeps him out of school. She is proud of her daughter’s outstanding academic accomplishments and is hoping that through connections she may somehow go on to the university. These accomplishments are even more amazing, given their deplorable housing conditions. The four of them live in one room of a ramshackle, frame cottage: Marina’s sister, who receives the minimum unemployment benefit of 130 rubles, lives with her young daughter in a smaller, adjoining room. It is difficult to comprehend how the six of them can exist together in this tiny, dark, dank space. They heat their room with a small store, carry water from an outside wall, and use an outhouse.

Marina and her first husband inherited this cottage—originally, a duplex—from its owner. When they divorced, they split it equally. Her ex-husband sold his half, which lies abandoned and demolished; but Marina and her family refuse to evacuate the other half. The land has been granted to a developer who is eager to erect a new apartment building on this prime real estate near the center of town. But Marina won’t budge. By law, her cottage cannot be demolished until all registered there have been re-housed. At the time of the first interview, she had already turned down a modern two-room apartment, holding out for the three rooms to which she was entitled. Since then, she has been offered a two-room apartment in a frame building, and most recently, space in a hostel. As the offerings of the city council have become less attractive, she has become even more determined to hold out for her three-room fantasy, knowing that until she gets her way, she is denying some private developer sumptuous profits.

Their only other source of sustenance is their dacha, bought some 15 years ago, soon after they married. Until 1992, they used to raise chickens and pigs there but stopped because they didn’t have the money for feed. At the first interview, they were still growing vegetables at the dacha; but by the second interview, Marina was complaining that almost everything they grew was stolen. In the realm of dacha production, as in their income and their housing conditions, their life has progressively deteriorated.

Marina considers herself a troublemaker. At Polar, she protested the ubiquitous stealing as well as her job classification. She has waged a protracted war against the municipality for many years, in the vain hope of improving her deplorable housing circumstance. Bereft of material and skill assets inherited from the past, cut off from redistributive networks, she is reliant on the state for the little income she receives. But she is hardly passive.

Tanya: Working the Kinship Network

While Marina plays her citizenship assets—unemployment benefits, sick benefits, and housing rights—for all they are worth, Tanya works on her social assets, and her diverse kin networks, to keep herself afloat.

Tanya is effectively a single mother. At the age of 44, she shares a one-and-a-half-room apartment in a frame building with her daughter (20) and son (23). At the time of the first interview (1995), she still worked at Red October, but only intermittently because of her asthma and weak heart. Her pay, 200 rubles a month was about half that of her coworkers; and during the previous year, she had seen only 70 rubles a month in cash, having received the rest in kind, at the factory shop. She finally left her job in 1997 because of poor health. She now lives on her disability pension of 400 rubles. She used to do some sewing on the side, but stopped, fearing that the tax inspectors would discover this activity and take away her pension.

Tanya’s first husband died by drowning. She shed no tears over it since he was an inveterate drinker and used to beat her. Her second husband was Bulgarian, a member of Komi’s Bulgarian colony. When communism ended, he returned home to Bulgaria and soon began to send Tanya money. She had even spent six months with him there. At the time of the first interview, she wanted to join him permanently with her daughter; later, she wanted to divorce him. Her life was in Komi, with her two children.

Tanya’s son was wounded while serving in Chechnya. At the time of the first interview, he had recently returned, a changed person from the gentle boy she knew. When his drinking sprees made him abusive and violent, his sister and mother had to leave the apartment. He had been irregularly employed as an electrician, but he rarely saw any wages. Three years later, with tears welling in her eyes, she told us that a year earlier he’d been imprisoned for petty crimes. Tanya’s daughter in contrast—even though she too had found no permanent work-brought smiles on her face. The daughter was about to deliver a baby. Its father was a policeman with no desire to marry her. They hoped he would at least pay child support.

So how does Tanya get by? Her parents in the village nearby help with food (vegetables and sometimes meat). Her mother can sometimes offer her money since she runs a successful practice in homeopathic medicine. Tanya’s eldest sister also helps her with clothes, and in an emergency, with money. As a social worker she doesn’t earn much, but her husband had a lucrative job as a plumber in a meat processing plant until he had a heart attack and died the previous year, at 48. Tanya’s other sister, also older than her, used to work at Red October but is now employed at a kindergarten. She can’t help Tanya materially, but they have always shared their sorrows and delights.

Since the first interview, Tanya’s relationship with her mother-in-law during her first marriage had taken a new turn. As a grandmother to Tanya’s children, she had always helped in small ways. She was of German descent, and like so many of Komi’s ethnic Germans, she had reconnected with her kin. She was now living with her brother in Berlin but continued to visit Komi, as she was employed in German automobile export. She proposed that Tanya marry her other son, the younger brother of Tanya’s first husband and that together with her daughter they move to Berlin. Tanya smiles whimsically at the thought, concluding once more that her future is here in Syktyvkar, close to her own family.

Tanya is not working. Having inherited little from the past other than her sickness, she gets by on minimal support from the state and assistance from her close-knit family (parents, sisters, and mother-in-law). She is the center and beneficiary of a redistributive kinship network. Resignation mixes with the fantasy of escape, as she contemplates her future; but the security of family ties wins the day.

Natasha: The Two-Earner Household

When we first interviewed Natasha in 1995, both she and her husband were receiving unemployment compensation, at 75 percent of their wages. Today, unemployment compensation is set at the so-called minimum wage of 87 rubles a month, except for those who lose their jobs through liquidation or restructuring. Any job would pay better than that, so we were not surprised to learn at our second interview that Natasha had found herself new employment.

Natasha began her work career in 1970, at the age of 16, in what was then a small furniture shop and later became a modern factory of Polar. She worked there for 24 years. When wages became irregular and work stoppages more frequent in 1994, she quit her job. As a worker in the hazardous lacquer shop, she might have retired if she had stayed another four years; but instead, she opted for unemployment compensation for two months, and then found a temporary job as a painter, though her husband’s sister. When this job ended, she again was left unemployed. Her husband, 43 years old, had worked as a carpenter in a local construction company until pay became irregular, whereupon he too left his job for one in the municipality—thanks, again to his sister. Like his wife, he only lasted a few months before returning to the construction industry. Again, pay became so irregular that he left for unemployment, which together with his disability pension came to 500 rubles. At the time of the first interview, they were both on unemployment, bringing in less than 1,00 rubles for a family of four—themselves and their 11-year-old twin daughters. Their income, therefore, was on a par with the poorest of our respondents; but their living conditions, as we shall see, were much better.

Their elder son, age 23, was living in a room in a hostel with his wife and child. He worked as a chauffeur or an enterprise director, which meant that he could use the car for private purposes. Natasha’s daughter, age 21 used to work at Red October and was living with her family in a two-room apartment (inherited from her husband’s parents). Natasha would like to help her daughter, but she can’t even afford to feed, clothe, and buy school supplies for her two younger girls. The only plus in her circumstances is the modern, three-room apartment she received through the municipal queue for large families. They have a plot of land where they grow potatoes, but they have no dacha. They sometimes take the children to Natasha’s parents’ village, where Natasha grows some food, and where her 74-year-old mother helps by knitting clothes for them.

When we returned in July 1997, both husband and wife were employed: she, as a cook in a canteen, and he, with the Municipal Parks. She received a low wage of 350 rubles, with an occasional bonus of 100 or 150 rubles. His wage was much higher, at 800 rubles, but he rarely saw more than 200 rubles, with some of the difference being made up in food. Natasha said they were much better off on unemployment, but when that ran out, they had to find jobs. They were desperately short of cash to pay for their children’s needs.

We interviewed Natasha again in May 1999 and discovered that they were still in the same jobs. She was earning wages and bonuses of between 600 and 800 rubles a month as well as subsidized meals. He was still receiving between 800 and 1,000 rubles, on paper. Wages were usually paid in kind (food and housing maintenance). But in summer there was work on the side, which could bring in 50 rubles a day, plus a meal. On top of this, her husband was receiving a disability pension of 300 rubles a month. They were still having difficulty making ends meet, and Natasha was making plans for her teenage daughters to go to a technical college, where they would learn catering.

In comparison with the three interviewees, Natasha had inherited more from the old regime. She had an extensive network of kin in town and country as well as a modern, three-room apartment. At the time of the second interview, Natasha’s son was trying to exchange the three-room apartment for a two-room apartment for his parents and a separate, single-room apartment for his family. He hoped to then combine this with his hostel room in order to obtain a two-room apartment. But the plan came to naught. Even a seemingly nonfungible asset such as an apartment can be traded in, and the proceeds distributed among family members. Although she appears to be better off than Marina, Tanya, and Sveta, Natasha and her husband struggle daily to meet their family’s basic needs.

help needed

Everything is in the attachment.

Task: Writing a research-based argument paper

Topic: Social media addiction requires/does not require medical treatment.
Write an argumentative essay in which you agree or disagree with this claim.

Instructions:
1. OVERVIEW

This assignment asks you to deliver the outcome of your research in the form of a research- based argument paper. After having engaged in the various steps of the research process, now it is time to present the final product. The goal of this research-based argument paper is to forward the thesis and defend it by developing an articulate and persuasive argument. You will need to apply research and critical thinking skills in order to support your claims, and persuade your reader that you know your subject and can explain and defend your position. The paper should be 750-1000 words long and backed by evidence collected from 2 external sources.

2. REQUIREMENTS

Word count: 750-1000words (abstract, cover page and references are not included in the word count)

– Sources: use 2 reliable sources (you can use the sources you used earlier)

– Sources should be relevant and used appropriately (i.e. they should be used for specific ideas that need support and not for general knowledge/facts that can be found in many sources).
– Use different methods for integrating sources (quotation, paraphrase, summary).

– Use APA Style for formatting, citations and references.

3. COMPONENTS OF YOUR RESEARCH PAPER Introduction (1 paragraph)

Your introduction should grab the reader’s attention and explain what the rest of the paper will be about. To get your audience’s attention, use a method of introduction. You should also provide background information on the topic. Background information serves to establish the context of your discussion, explain the nature of the problem at hand, and possibly briefly summarize how the problem has been addressed in previous studies.

To explain what your paper will be about, you must include a thesis statement. The thesis is an essential part of your paper as it will guide you during the write-up – all the information you provide in the following paragraphs will support this thesis statement. A thesis should be a clear and strong formulation of the point you are making. In other words, it should be clear from the start what position you will be arguing for. To be effective, a thesis should not be too vague or too specific, and should also provide an idea of the scope of your paper.

Supporting Paragraphs (4 paragraphs)

Each paragraph in the body of your essay should be devoted to one idea. Some paragraphs will support the point made in your thesis; one will present opposing viewpoint and refutation.

Throughout the development of your argument, you need to offer evidence to support your claims. This evidence should be incorporated using quotations, paraphrases and summaries. Quotations should be used sparingly. In any case, the evidence provided should be properly integrated into your writing. In other words, the significance of the evidence to your writing should be made clear; your audience should understand how it supports your argument.

Counter-argument and refutation paragraph

To make a strong and credible argument, it is essential to consider differing viewpoints. Being able to identify weaknesses about these viewpoints shows how your side of the argument is the strongest one. Therefore, when developing your argument, you should include counterarguments (i.e. a claim that disagrees with the claim presented in your thesis) and refutation (i.e. evidence that disagrees with the counterargument).

Make sure you use transitional words/phrases and sentences, and other devices to add coherence to your writing.

Conclusion (1 paragraph)

The goal of your conclusion is to wrap up your paper and provide a sense of closure to the reader. It should include two elements. On the one hand, conclusions are linked to introductions. You should restate your thesis and synthesize your argument. On the other hand, you should conclude with another method or combination of methods (e.g. final thought, prediction, recommendation, etc.). Conclusions do not include new ideas, but you can discuss the broader implications of your argument.

NOTES:

·  Write your essay on a NEW WORD DOCUMENT

·  Write in full APA style (Cover page, Abstract, Main Body and References)

Follow APA 7 Guidelines.

·  This is an individual assignment. Each student is going to submit the assignment himself/herself.

·  If the essay is written with another essay pattern 20% deduction from the total grade. Cause / effect, compare / contrast, process or some other pattern of development is used instead of argumentation.

·  If the essay is off topic 20 % deduction from the total grade. Student has to write a new version about the prompt given by the instructor after wrong submission e-mail

Introduction to SPSS for Quantitative Analysis

Please see attachment(s) 

Understanding the data sampling procedure and the description of the data is critical to accurately interpreting the results of a study. In this assignment, you will practice describing data from an SPSS data set.

General Requirements:

Use the following information to ensure successful completion of the assignment:

· Refer to the Topic 1 assignment, “SPSS: Download and Install.” As a result of completing that assignment, you should have downloaded and installed SPSS on your computer. You will use the SPSS software in this assignment.

· Refer to the SPSS introductory video found at 
https://youtu.be/_zFBUfZEBWQ
.

· Access the document, “Introduction to Statistical Analysis Using IBM SPSS Statistics, Student Guide” to complete the assignment. It is attached to this assignment.

· Download the file “Census.sav” and open it with SPSS. Use the data to complete the assignment.

· Refer to the document, “Example: Introduction to SPSS for Quantitative Analysis” attached to this assignment.

· Doctoral learners are required to use APA style for their writing assignments. The APA Style Guide is located in the Student Success Center.

· You are not required to submit this assignment to LopesWrite.

Directions:

Open the attached document, “Introduction to Statistical Analysis Using IBM SPSS Statistics, Student Guide.”

Carefully read the following lessons in the Guide:

· Lesson 0: Course Introduction

· Lesson 1: Introduction to Statistical Analysis

· Lesson 2: Understanding Data Distributions

· Lesson 3: Data Distributions for Categorical Variables

As you read, work through all the tasks in the sections labeled, “Procedure: …,” and complete the “Apply Your Knowledge” activities within each lesson. These activities are designed to help you learn to navigate the SPSS program and build your understanding of statistical analysis. You will not submit your work on these activities; they are for your personal practice in learning SPSS. Note that not every lesson has these components.

Complete the tasks in section 3.9 Learning Activity in Lesson 3: Data Distributions for Categorical Variables by doing the following:

· Locate the data set “Census.sav” and open it with SPSS.

· Run the Frequencies procedure as directed in question 1.

· Answer questions 1-3 in the activity based on your observations of the SPSS output.

· Type your answers into a Word document.

· Copy and paste the full SPSS output including any supporting graphs and tables directly from SPSS into the Word document with your answers to the questions for submission to the instructor. 
The appropriate tables and charts must be copied from the SPSS output into the Word document under each related question to support the written answer. 
The SPSS output must be included in the submission with the problem set answers in order to receive full credit for the assignment. Note: If your output is large, you may need to reduce the size by compressing it. For instructions on how to compress a file, please 
click here
.

Following the pasted image of the SPSS output in the same Word document, write a reflection (250-500 words) describing the following:

1. The challenges you faced in downloading and using SPSS to complete the 3.9 Learning Activity.

2. How you overcame those challenges to complete the assignment.

3. Your overall perceptions of using SPSS for doing quantitative analysis.

4. How you build on your introductory knowledge of SPSS as you begin to define your dissertation topic and methodological approach.

Submit to the instructor the single Word document with the following items:

1. Your answers for 3.9 Learning Activity, questions 1-3.

2. The SPSS Output for 3.9 Learning Activity that correspond to the questions.

3. Your reflection on the use of SPSS.

Download link:

https://halo.gcu.edu/resource/6f25e4a0-7e18-4ed6-bc52-367bd8f5e9bb

College of Doctoral Studies

RES-831 Example: Introduction to SPSS for Quantitative Analysis

This is for example purposes only. The questions below have been modified so they do not exactly match what is in the actual assignment. The number of variables and the data are not identical to your data set for Census.sav; this example is for illustrative purposes only. Note that appropriate tables and charts are copied from the SPSS output into the Word document under each question to support the written answer. This is how statistical results are reported and then supported with the data in the results sections of dissertations and journal articles.

Learning Activity 3.9 – Example Solution
1. Run the Frequencies procedure on the following variable: race. What is the scale of measurement for this variable? Request appropriate summary statistics and charts.

Answer: Variable “race” is categorical” and the summary table below provides the frequency for each race and the associated visual pie chart for categories.

RACE OF RESPONDENT

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

WHITE

1559

77.1

77.1

77.1

BLACK

281

13.9

13.9

91.0

OTHER

183

9.0

9.0

100.0

Total

2023

100.0

100.0

2. For the variable “race” is it appropriate to use the median? What conclusions can you draw about the distributions of this variable?

Answer: This variable is categorical, so median is not a good summary statistic to use. This variable is not normally distributed

3. What percent of respondents were living with their both mother and father with they were 16 years old?

Answer: 68.0% of respondents were living with both their mother and father when they were 16 years old, as highlighted in the table and bar chart below.

LIVING WITH PARENTS WHEN 16 YRS OLD

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

OTHER

67

3.3

3.3

3.3

MOTHER & FATHER

1375

68.0

68.0

71.3

FATHER & STPMOTHER

25

1.2

1.2

72.5

MOTHER & STPFATHER

119

5.9

5.9

78.4

FATHER

50

2.5

2.5

80.9

MOTHER

300

14.8

14.8

95.7

MALE RELATIVE

12

.6

.6

96.3

FEMALE RELATIVE

36

1.8

1.8

98.1

M AND F RELATIVES

39

1.9

1.9

100.0

Total

2023

100.0

100.0

Copy/Paste and Insert the Full SPSS Output for this assignment here. See Example Below:

This is what the output in SPSS looks like:

Click on the Edit tab, and then “Select All”. After you select all, then click on the “Copy” button. This will allow you to select your entire output and then copy it to your computer clipboard. Then open a new word document and “Paste” the output into your word document. You may have to “Paste Special” depending on your version of Word. Scroll down to see what the output will look like in word.

Output in Word using the following sequence

1. In SPSS: After you run your analysis and the output comes up then click Edit>Select All>Copy

2. Then In Word, in a new document, click past and your output will show up below.

GET

FILE=’C:UsersCynthiaDownloadsDrinks (2).sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

FREQUENCIES VARIABLES=price cost calories sodium alcohol

/STATISTICS=STDDEV VARIANCE MINIMUM MAXIMUM MEAN MEDIAN MODE

/HISTOGRAM NORMAL

/ORDER=ANALYSIS.

Frequencies

Notes

Output Created

19-JAN-2015 19:07:34

Comments

Input

Data

C:UsersCynthiaDownloadsDrinks (2).sav

Active Dataset

DataSet1

File Label

SPSS/PC+

Filter

<none>

Weight

<none>

Split File

<none>

N of Rows in Working Data File

35

Missing Value Handling

Definition of Missing

User-defined missing values are treated as missing.

Cases Used

Statistics are based on all cases with valid data.

Syntax

FREQUENCIES VARIABLES=price cost calories sodium alcohol

/STATISTICS=STDDEV VARIANCE MINIMUM MAXIMUM MEAN MEDIAN MODE

/HISTOGRAM NORMAL

/ORDER=ANALYSIS.

Resources

Processor Time

00:00:02.12

Elapsed Time

00:00:01.92

[DataSet1] C:UsersCynthiaDownloadsDrinks (2).sav

Statistics

Price per 6-pack

Cost per 12 Fluid Ounces

Calories per 12 Fluid Ounces

Sodium per 12 Fluid Ounces in mg

Alcohol by Volume (in %)

N

Valid

35

35

35

35

35

Missing

0

0

0

0

0

Mean

3.0274

.5057

139.77

14.66

4.5771

Median

2.6500

.4400

147.00

15.00

4.7000

Mode

2.59

.43

144

7a

4.70

Std. Deviation

1.12343

.18732

24.447

6.145

.60298

Variance

1.262

.035

597.652

37.761

.364

Minimum

1.59

.27

68

6

2.30

Maximum

7.19

1.20

175

27

5.50

a. Multiple modes exist. The smallest value is shown


Frequency Table

Price per 6-pack

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

1.59

1

2.9

2.9

2.9

1.69

1

2.9

2.9

5.7

1.79

2

5.7

5.7

11.4

2.15

1

2.9

2.9

14.3

2.29

2

5.7

5.7

20.0

2.39

1

2.9

2.9

22.9

2.49

2

5.7

5.7

28.6

2.55

1

2.9

2.9

31.4

2.59

5

14.3

14.3

45.7

2.63

1

2.9

2.9

48.6

2.65

2

5.7

5.7

54.3

2.73

1

2.9

2.9

57.1

2.75

1

2.9

2.9

60.0

2.79

1

2.9

2.9

62.9

2.89

1

2.9

2.9

65.7

2.99

2

5.7

5.7

71.4

3.15

1

2.9

2.9

74.3

3.35

1

2.9

2.9

77.1

3.65

1

2.9

2.9

80.0

4.22

1

2.9

2.9

82.9

4.39

1

2.9

2.9

85.7

4.55

1

2.9

2.9

88.6

4.59

2

5.7

5.7

94.3

4.75

1

2.9

2.9

97.1

7.19

1

2.9

2.9

100.0

Total

35

100.0

100.0

Cost per 12 Fluid Ounces

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

.27

1

2.9

2.9

2.9

.28

1

2.9

2.9

5.7

.30

2

5.7

5.7

11.4

.36

1

2.9

2.9

14.3

.38

2

5.7

5.7

20.0

.40

1

2.9

2.9

22.9

.42

2

5.7

5.7

28.6

.43

6

17.1

17.1

45.7

.44

3

8.6

8.6

54.3

.46

2

5.7

5.7

60.0

.47

1

2.9

2.9

62.9

.48

1

2.9

2.9

65.7

.50

2

5.7

5.7

71.4

.53

1

2.9

2.9

74.3

.56

1

2.9

2.9

77.1

.61

1

2.9

2.9

80.0

.70

1

2.9

2.9

82.9

.73

1

2.9

2.9

85.7

.76

1

2.9

2.9

88.6

.77

2

5.7

5.7

94.3

.79

1

2.9

2.9

97.1

1.20

1

2.9

2.9

100.0

Total

35

100.0

100.0

Calories per 12 Fluid Ounces

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

68

1

2.9

2.9

2.9

72

1

2.9

2.9

5.7

97

1

2.9

2.9

8.6

99

1

2.9

2.9

11.4

102

1

2.9

2.9

14.3

113

1

2.9

2.9

17.1

135

1

2.9

2.9

20.0

136

1

2.9

2.9

22.9

140

1

2.9

2.9

25.7

144

5

14.3

14.3

40.0

145

3

8.6

8.6

48.6

147

2

5.7

5.7

54.3

149

4

11.4

11.4

65.7

150

1

2.9

2.9

68.6

151

1

2.9

2.9

71.4

152

2

5.7

5.7

77.1

153

1

2.9

2.9

80.0

154

2

5.7

5.7

85.7

155

1

2.9

2.9

88.6

157

1

2.9

2.9

91.4

162

1

2.9

2.9

94.3

170

1

2.9

2.9

97.1

175

1

2.9

2.9

100.0

Total

35

100.0

100.0

Sodium per 12 Fluid Ounces in mg

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

6

2

5.7

5.7

5.7

7

4

11.4

11.4

17.1

8

3

8.6

8.6

25.7

10

2

5.7

5.7

31.4

11

2

5.7

5.7

37.1

13

2

5.7

5.7

42.9

14

1

2.9

2.9

45.7

15

4

11.4

11.4

57.1

17

4

11.4

11.4

68.6

18

2

5.7

5.7

74.3

19

3

8.6

8.6

82.9

21

1

2.9

2.9

85.7

23

1

2.9

2.9

88.6

24

2

5.7

5.7

94.3

27

2

5.7

5.7

100.0

Total

35

100.0

100.0


Alcohol by Volume (in %)


Frequency

Percent

Valid Percent

Cumulative Percent

Valid

2.30

1

2.9

2.9

2.9

2.90

1

2.9

2.9

5.7

3.70

1

2.9

2.9

8.6

4.10

1

2.9

2.9

11.4

4.20

2

5.7

5.7

17.1

4.30

1

2.9

2.9

20.0

4.40

1

2.9

2.9

22.9

4.50

2

5.7

5.7

28.6

4.60

4

11.4

11.4

40.0

4.70

9

25.7

25.7

65.7

4.90

4

11.4

11.4

77.1

5.00

5

14.3

14.3

91.4

5.10

1

2.9

2.9

94.3

5.20

1

2.9

2.9

97.1

5.50

1

2.9

2.9

100.0

Total

35

100.0

100.0


Histogram

© 2020. Grand Canyon University. All Rights Reserved.

Introduction to Statistical
Analysis Using IBM SPSS
Statistics

Student Guide

Course Code: 0G517

ERC 1.0

Introduction to Statistical Analysis Using IBM
SPSS Statistics

Licensed Materials – Property of IBM

© Copyright IBM Corp. 2010

0G517

Published October 2010 US Government Users Restricted Rights – Use,
duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.

IBM, the IBM logo and ibm.com are trademarks
of International Business Machines Corp.,
registered in many jurisdictions worldwide.

SPSS, SamplePower, and PASW are trademarks
of SPSS Inc., an IBM Company, registered in
many jurisdictions worldwide.

Other product and service names might be

trademarks of IBM or other companies.

This guide contains proprietary information which
is protected by copyright. No part of this
document may be photocopied, reproduced, or
translated into another language without a legal
license agreement from IBM Corporation.

Any references in this information to non-IBM
Web sites are provided for convenience only and
do not in any manner serve as an endorsement
of those Web sites. The materials at those Web
sites are not part of the materials for this IBM
product and use of those Web sites is at your
own risk.

TABLE OF CONTENTS

i

Table of Contents

LESSON 0: COURSE INTRODUCTION ………………………………………….. 0-1

LESSON 1: INTRODUCTION TO STATISTICAL ANALYSIS …………. 1-1

LESSON 2: UNDERSTANDING DATA DISTRIBUTIONS – THEORY 2-1

LESSON 3: DATA DISTRIBUTIONS FOR CATEGORICAL
VARIABLES ……………………………………………………………………………….. 3-1

0.1 INTRODUCTION ……………………………………………………………………………………………………. 0-1
0.2 COURSE OBJECTIVES ……………………………………………………………………………………………. 0-1
0.3 ABOUT SPSS ………………………………………………………………………………………………………. 0-1
0.4 SUPPORTING MATERIALS ……………………………………………………………………………………… 0-2
0.5 COURSE ASSUMPTIONS ………………………………………………………………………………………… 0-2

1.1 OBJECTIVES ………………………………………………………………………………………………………… 1-1
1.2 INTRODUCTION ……………………………………………………………………………………………………. 1-1
1.3 BASIC STEPS OF THE RESEARCH PROCESS ………………………………………………………………. 1-1
1.4 POPULATIONS AND SAMPLES ………………………………………………………………………………… 1-3
1.5 RESEARCH DESIGN ………………………………………………………………………………………………. 1-3
1.6 INDEPENDENT AND DEPENDENT VARIABLES …………………………………………………………… 1-4
1.7 NOTE ABOUT DEFAULT STARTUP FOLDER AND VARIABLE DISPLAY IN DIALOG BOXES .. 1-4
1.8 LESSON SUMMARY ………………………………………………………………………………………………. 1-5
1.9 LEARNING ACTIVITY ……………………………………………………………………………………………. 1-6

2.1 OBJECTIVES ………………………………………………………………………………………………………… 2-1
INTRODUCTION …………………………………………………………………………………………………………. 2-1
2.2 LEVELS OF MEASUREMENT AND STATISTICAL METHODS …………………………………………. 2-1
2.3 MEASURES OF CENTRAL TENDENCY AND DISPERSION …………………………………………….. 2-5
2.4 NORMAL DISTRIBUTIONS ……………………………………………………………………………………… 2-7
2.5 STANDARDIZED (Z-) SCORES ………………………………………………………………………………… 2-8
2.6 REQUESTING STANDARDIZED (Z-) SCORES……………………………………………………………. 2-10
2.7 STANDARDIZED (Z-) SCORES OUTPUT ………………………………………………………………….. 2-10
2.8 PROCEDURE: DESCRIPTIVES FOR STANDARDIZED (Z-) SCORES ……………………………….. 2-10
2.9 DEMONSTRATION: DESCRIPTIVES FOR Z-SCORES…………………………………………………… 2-11
2.10 LESSON SUMMARY …………………………………………………………………………………………… 2-12
2.11 LEARNING ACTIVITY ………………………………………………………………………………………… 2-13

3.1 OBJECTIVES ………………………………………………………………………………………………………… 3-1
3.2 INTRODUCTION ……………………………………………………………………………………………………. 3-1
3.3 USING FREQUENCIES TO SUMMARIZE NOMINAL AND ORDINAL VARIABLES ……………….. 3-2
3.4 REQUESTING FREQUENCIES ………………………………………………………………………………….. 3-3
3.5 FREQUENCIES OUTPUT …………………………………………………………………………………………. 3-3
3.6 PROCEDURE: FREQUENCIES ………………………………………………………………………………….. 3-4
3.7 DEMONSTRATION: FREQUENCIES …………………………………………………………………………… 3-6
3.8 LESSON SUMMARY …………………………………………………………………………………………….. 3-10
3.9 LEARNING ACTIVITY ………………………………………………………………………………………….. 3-10

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

ii

LESSON 4: DATA DISTRIBUTIONS FOR SCALE VARIABLES ……… 4-1

LESSON 5: MAKING INFERENCES ABOUT POPULATIONS FROM
SAMPLES ……………………………………………………………………………….. 5-1

LESSON 6: RELATIONSHIPS BETWEEN CATEGORICAL
VARIABLES ………………………………………………………………………………. 6-1

4.1 OBJECTIVES ………………………………………………………………………………………………………… 4-1
4.2 INTRODUCTION ……………………………………………………………………………………………………. 4-1
4.3 SUMMARIZING SCALE VARIABLES USING FREQUENCIES …………………………………………… 4-1
4.4 REQUESTING FREQUENCIES …………………………………………………………………………………… 4-2
4.5 FREQUENCIES OUTPUT …………………………………………………………………………………………. 4-2
4.6 PROCEDURE: FREQUENCIES …………………………………………………………………………………… 4-4
4.7 DEMONSTRATION: FREQUENCIES …………………………………………………………………………… 4-6
4.8 SUMMARIZING SCALE VARIABLES USING DESCRIPTIVES…………………………………………. 4-11
4.9 REQUESTING DESCRIPTIVES ………………………………………………………………………………… 4-11
4.10 DESCRIPTIVES OUTPUT ……………………………………………………………………………………… 4-11
4.11 PROCEDURE: DESCRIPTIVES ………………………………………………………………………………. 4-11
4.12 DEMONSTRATION: DESCRIPTIVES……………………………………………………………………….. 4-12
4.13 SUMMARIZING SCALE VARIABLES USING THE EXPLORE PROCEDURE ……………………… 4-13
4.14 REQUESTING EXPLORE ……………………………………………………………………………………… 4-13
4.15 PROCEDURE: EXPLORE ……………………………………………………………………………………… 4-16
4.16 DEMONSTRATION: EXPLORE ………………………………………………………………………………. 4-19
4.17 LESSON SUMMARY …………………………………………………………………………………………… 4-24
4.18 LEARNING ACTIVITY ………………………………………………………………………………………… 4-25

5.1 OBJECTIVES ………………………………………………………………………………………………………… 5-1
5.2 INTRODUCTION ……………………………………………………………………………………………………. 5-1
5.3 BASICS OF MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES …………………….. 5-1
5.4 INFLUENCE OF SAMPLE SIZE ………………………………………………………………………………….. 5-2
5.5 HYPOTHESIS TESTING …………………………………………………………………………………………. 5-10
5.6 THE NATURE OF PROBABILITY …………………………………………………………………………….. 5-11
5.7 TYPES OF STATISTICAL ERRORS …………………………………………………………………………… 5-11
5.8 STATISTICAL SIGNIFICANCE AND PRACTICAL IMPORTANCE …………………………………….. 5-12
5.9 LESSON SUMMARY …………………………………………………………………………………………….. 5-13
5.10 LEARNING ACTIVITY ………………………………………………………………………………………… 5-13

TABLE OF CONTENTS

iii

LESSON 7: THE INDEPENDENT- SAMPLES T TEST …………………….. 7-1

LESSON 8: THE PAIRED-SAMPLES T TEST …………………………………. 8-1

6.1 OBJECTIVES ………………………………………………………………………………………………………… 6-1
6.2 INTRODUCTION ……………………………………………………………………………………………………. 6-1
6.3 CROSSTABS…………………………………………………………………………………………………………. 6-2
6.4 CROSSTABS ASSUMPTIONS……………………………………………………………………………………. 6-3
6.5 REQUESTING CROSSTABS ……………………………………………………………………………………… 6-3
6.6 CROSSTABS OUTPUT ……………………………………………………………………………………………. 6-3
6.7 PROCEDURE: CROSSTABS ……………………………………………………………………………………… 6-4
6.8 EXAMPLE: CROSSTABS …………………………………………………………………………………………. 6-5
6.9 CHI-SQUARE TEST ……………………………………………………………………………………………….. 6-7
6.10 REQUESTING THE CHI-SQUARE TEST ……………………………………………………………………. 6-8
6.11 CHI-SQUARE OUTPUT …………………………………………………………………………………………. 6-8
6.12 PROCEDURE: CHI-SQUARE TEST ………………………………………………………………………….. 6-9
6.13 EXAMPLE: CHI-SQUARE TEST ……………………………………………………………………………. 6-10
6.14 CLUSTERED BAR CHART …………………………………………………………………………………… 6-11
6.15 REQUESTING A CLUSTERED BAR CHART WITH CHART BUILDER ……………………………. 6-12
6.16 CLUSTERED BAR CHART FROM CHART BUILDER OUTPUT …………………………………….. 6-12
6.17 PROCEDURE: CLUSTERED BAR CHART WITH CHART BUILDER ………………………………. 6-13
6.18 EXAMPLE: CLUSTERED BAR CHART WITH CHART BUILDER ………………………………….. 6-15
6.19 ADDING A CONTROL VARIABLE …………………………………………………………………………. 6-16
6.20 REQUESTING A CONTROL VARIABLE ………………………………………………………………….. 6-17
6.21 CONTROL VARIABLE OUTPUT ……………………………………………………………………………. 6-17
6.22 PROCEDURE: ADDING A CONTROL VARIABLE ……………………………………………………… 6-18
6.23 EXAMPLE: ADDING A CONTROL VARIABLE …………………………………………………………. 6-19
6.24 EXTENSIONS: BEYOND CROSSTABS ……………………………………………………………………. 6-22
6.25 ASSOCIATION MEASURES ………………………………………………………………………………….. 6-23
6.26 LESSON SUMMARY …………………………………………………………………………………………… 6-23
6.27 LEARNING ACTIVITY ………………………………………………………………………………………… 6-24

7.1 OBJECTIVES ………………………………………………………………………………………………………… 7-1
7.2 INTRODUCTION ……………………………………………………………………………………………………. 7-1
7.3 THE INDEPENDENT-SAMPLES T TEST …………………………………………………………………….. 7-1
7.4 INDEPENDENT-SAMPLES T TEST ASSUMPTIONS ………………………………………………………. 7-2
7.5 REQUESTING THE INDEPENDENT-SAMPLES T TEST ………………………………………………….. 7-2
7.6 INDEPENDENT-SAMPLES T TEST OUTPUT ……………………………………………………………….. 7-3
7.7 PROCEDURE: INDEPENDENT-SAMPLES T TEST ………………………………………………………… 7-5
7.8 DEMONSTRATION: INDEPENDENT-SAMPLES T TEST …………………………………………………. 7-6
7.9 ERROR BAR CHART ……………………………………………………………………………………………. 7-10
7.10 REQUESTING AN ERROR BAR CHART WITH CHART BUILDER …………………………………. 7-11
7.11 ERROR BAR CHART OUTPUT ……………………………………………………………………………… 7-11
7.12 DEMONSTRATION: ERROR BAR CHART WITH CHART BUILDER ……………………………… 7-12
7.13 LESSON SUMMARY …………………………………………………………………………………………… 7-14
7.14 LEARNING ACTIVITY ………………………………………………………………………………………… 7-14

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

iv

LESSON 9: ONE-WAY ANOVA ……………………………………………………… 9-1

LESSON 10: BIVARIATE PLOTS AND CORRELATIONS FOR SCALE
VARIABLES ……………………………………………………………………………… 10-1

8.1 OBJECTIVES ………………………………………………………………………………………………………… 8-1
8.2 INTRODUCTION ……………………………………………………………………………………………………. 8-1
8.3 THE PAIRED-SAMPLES T TEST ……………………………………………………………………………….. 8-1
8.4 ASSUMPTIONS FOR THE PAIRED-SAMPLES T TEST ……………………………………………………. 8-2
8.5 REQUESTING A PAIRED-SAMPLES T TEST ……………………………………………………………….. 8-3
8.6 PAIRED-SAMPLES T TEST OUTPUT …………………………………………………………………………. 8-3
8.7 PROCEDURE: PAIRED-SAMPLES T TEST …………………………………………………………………… 8-4
8.8 DEMONSTRATION: PAIRED-SAMPLES T TEST …………………………………………………………… 8-4
8.9 LESSON SUMMARY ………………………………………………………………………………………………. 8-6
8.10 LEARNING ACTIVITY ………………………………………………………………………………………….. 8-6

9.1 OBJECTIVES ………………………………………………………………………………………………………… 9-1
9.2 INTRODUCTION ……………………………………………………………………………………………………. 9-1
9.3 ONE-WAY ANOVA ……………………………………………………………………………………………….. 9-1
9.4 ASSUMPTIONS OF ONE-WAY ANOVA ……………………………………………………………………. 9-2
9.5 REQUESTING ONE-WAY ANOVA ………………………………………………………………………….. 9-2
9.6 ONE-WAY ANOVA OUTPUT …………………………………………………………………………………. 9-3
9.7 PROCEDURE: ONE-WAY ANOVA ………………………………………………………………………….. 9-4
9.8 DEMONSTRATION: ONE-WAY ANOVA ………………………………………………………………….. 9-6
9.9 POST HOC TESTS WITH A ONE-WAY ANOVA …………………………………………………………. 9-8
9.10 REQUESTING POST HOC TESTS WITH A ONE-WAY ANOVA ……………………………………. 9-9
9.11 POST HOC TESTS OUTPUT……………………………………………………………………………………. 9-9
9.12 PROCEDURE: POST HOC TESTS WITH A ONE-WAY ANOVA…………………………………… 9-10
9.13 DEMONSTRATION: POST HOC TESTS WITH A ONE-WAY ANOVA …………………………… 9-12
9.14 ERROR BAR CHART WITH CHART BUILDER …………………………………………………………. 9-14
9.15 REQUESTING AN ERROR BAR CHART WITH CHART BUILDER …………………………………. 9-14
9.16 ERROR BAR CHART OUTPUT ……………………………………………………………………………… 9-14
9.17 PROCEDURE: ERROR BAR CHART WITH CHART BUILDER ………………………………………. 9-15
9.18 DEMONSTRATION: ERROR BAR CHART WITH CHART BUILDER ………………………………. 9-16
9.19 LESSON SUMMARY …………………………………………………………………………………………… 9-18
9.20 LEARNING ACTIVITY ………………………………………………………………………………………… 9-18

10.1 OBJECTIVES …………………………………………………………………………………………………….. 10-1
10.2 INTRODUCTION ………………………………………………………………………………………………… 10-1
10.3 SCATTERPLOTS ………………………………………………………………………………………………… 10-1
10.4 REQUESTING A SCATTERPLOT ……………………………………………………………………………. 10-2
10.5 SCATTERPLOT OUTPUT ……………………………………………………………………………………… 10-3
10.6 PROCEDURE: SCATTERPLOT ………………………………………………………………………………. 10-3
10.7 DEMONSTRATION: SCATTERPLOT ……………………………………………………………………….. 10-4
10.8 ADDING A BEST FIT STRAIGHT LINE TO THE SCATTERPLOT …………………………………… 10-5
10.9 PEARSON CORRELATION COEFFICIENT………………………………………………………………… 10-7
10.10 REQUESTING A PEARSON CORRELATION COEFFICIENT ………………………………………… 10-8
10.11 BIVARIATE CORRELATION OUTPUT …………………………………………………………………… 10-8
10.12 PROCEDURE: PEARSON CORRELATION WITH BIVARIATE CORRELATIONS ………………. 10-9
10.13 DEMONSTRATION: PEARSON CORRELATION WITH BIVARIATE CORRELATIONS …….. 10-10
10.14 LESSON SUMMARY ……………………………………………………………………………………….. 10-11
10.15 LEARNING ACTIVITY …………………………………………………………………………………….. 10-12

TABLE OF CONTENTS

v

LESSON 11: REGRESSION ANALYSIS ………………………………………… 11-1

LESSON 12: NONPARAMETRIC TESTS ………………………………………. 12-1

LESSON 13: COURSE SUMMARY ……………………………………………….. 13-1

APPENDIX A: INTRODUCTION TO STATISTICAL ANALYSIS
REFERENCES 1

11.1 OBJECTIVES …………………………………………………………………………………………………….. 11-1
11.2 INTRODUCTION ………………………………………………………………………………………………… 11-1
11.3 SIMPLE LINEAR REGRESSION …………………………………………………………………………….. 11-1
11.4 SIMPLE LINEAR REGRESSION ASSUMPTIONS ……………………………………………………….. 11-3
11.5 REQUESTING SIMPLE LINEAR REGRESSION …………………………………………………………. 11-4
11.6 SIMPLE LINEAR REGRESSION OUTPUT ………………………………………………………………… 11-4
11.7 PROCEDURE: SIMPLE LINEAR REGRESSION …………………………………………………………. 11-5
11.8 DEMONSTRATION: SIMPLE LINEAR REGRESSION ………………………………………………….. 11-7
11.9 MULTIPLE REGRESSION…………………………………………………………………………………… 11-11
11.10 MULTIPLE LINEAR REGRESSION ASSUMPTIONS ……………………………………………….. 11-11
11.11 REQUESTING MULTIPLE LINEAR REGRESSION………………………………………………….. 11-11
11.12 MULTIPLE LINEAR REGRESSION OUTPUT ………………………………………………………… 11-11
11.13 PROCEDURE: MULTIPLE LINEAR REGRESSION ………………………………………………….. 11-14
11.14 DEMONSTRATION: MULTIPLE LINEAR REGRESSION ………………………………………….. 11-16
11.15 LESSON SUMMARY ……………………………………………………………………………………….. 11-22
11.16 LEARNING ACTIVITY …………………………………………………………………………………….. 11-22

12.1 OBJECTIVES …………………………………………………………………………………………………….. 12-1
12.2 INTRODUCTION ………………………………………………………………………………………………… 12-1
12.3 NONPARAMETRIC ANALYSES …………………………………………………………………………….. 12-2
12.4 THE INDEPENDENT SAMPLES NONPARAMETRIC ANALYSIS …………………………………… 12-2
12.5 REQUESTING AN INDEPENDENT SAMPLES NONPARAMETRIC ANALYSIS ………………….. 12-3
12.6 INDEPENDENT SAMPLES NONPARAMETRIC TESTS OUTPUT …………………………………… 12-3
12.7 PROCEDURE: INDEPENDENT SAMPLES NONPARAMETRIC TESTS …………………………….. 12-5
12.8 DEMONSTRATION: INDEPENDENT SAMPLES NONPARAMETRIC TESTS …………………….. 12-8
12.9 THE RELATED SAMPLES NONPARAMETRIC ANALYSIS ………………………………………… 12-11
12.10 REQUESTING A RELATED SAMPLES NONPARAMETRIC ANALYSIS ……………………….. 12-12
12.11 RELATED SAMPLES NONPARAMETRIC TESTS OUTPUT ………………………………………. 12-12
12.12 PROCEDURE: RELATED SAMPLES NONPARAMETRIC TESTS ……………………………….. 12-13
12.13 DEMONSTRATION: RELATED SAMPLES NONPARAMETRIC TESTS ………………………… 12-16
12.14 LESSON SUMMARY ……………………………………………………………………………………….. 12-19
12.15 LEARNING ACTIVITY …………………………………………………………………………………….. 12-20

13.1 COURSE OBJECTIVES REVIEW ……………………………………………………………………………. 13-1
13.2 COURSE REVIEW: DISCUSSION QUESTIONS …………………………………………………………. 13-1
13.3 NEXT STEPS …………………………………………………………………………………………………….. 13-2

1.1 INTRODUCTION …………………………………………………………………………………………………… A-1
1.2 REFERENCES ………………………………………………………………………………………………………. A-1

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

vi

COURSE INTRODUCTION

0-1

Lesson 0: Course Introduction
0.1 Introduction
The focus of this two-day course is an introduction to the statistical component of IBM®
SPSS® Statistics. This is an application-oriented course and the approach is practical. You’ll take a
look at several statistical techniques and discuss situations in which you would use each technique,
the assumptions made by each method, how to set up the analysis using PASW®

0.2 Course Objectives

Statistics, as well
as how to interpret the results. This includes a broad range of techniques for exploring and
summarizing data, as well as investigating and testing underlying relationships. You will gain an
understanding of when and why to use these various techniques as well as how to apply them with
confidence, and interpret their output, and graphically display the results.

After completing this course students will be able to:

• Perform basic statistical analysis using selected statistical techniques with PASW Statistics

To support the achievement of this primary objective, students will also be able to:

• Explain the basic elements of quantitative research and issues that should be considered in
data analysis

• Determine the level of measurement of variables and obtain appropriate summary statistics
based on the level of measurement

• Run the Frequencies procedure to obtain appropriate summary statistics for categorical
variables

• Request and interpret appropriate summary statistics for scale variables
• Explain how to make inferences about populations from samples
• Perform crosstab analysis on categorical variables
• Perform a statistical test to determine whether there is a statistically significant relationship

between categorical variables
• Perform a statistical test to determine whether there is a statistically significant difference

between two groups on a scale variable
• Perform a statistical test to determine whether there is a statistically significant difference

between the means of two scale variables
• Perform a statistical test to determine whether there is a statistically significant difference

among three or more groups on a scale dependent variable
• Perform a statistical test to determine whether two scale variables are correlated (related)
• Perform linear regression to determine whether one or more variables can significantly

predict or explain a dependent variable
• Perform non-parametric tests on data that don’t meet the assumptions for standard statistical

tests

0.3 About SPSS
SPSS® Inc., an IBM® Company is a leading global provider of predictive analytics software and
solutions. The Company’s complete portfolio of products – data collection, statistics, modeling and
deployment – captures people’s attitudes and opinions, predicts outcomes of future customer
interactions, and then acts on these insights by embedding analytics into business processes. SPSS
solutions address interconnected business objectives across an entire organization by focusing on
the convergence of analytics, IT architecture and business process. Commercial, government and
academic customers worldwide rely on SPSS technology as a competitive advantage in attracting,

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

0-2

retaining and growing customers, while reducing fraud and mitigating risk. SPSS was acquired by
IBM®

0.4 Supporting Materials

in October 2009. For more information, visit http://www.spss.com.

We use several datasets in the course because no one data file contains all the types of variables
and relationships between them that are ideal for every technique we discuss. As much as possible,
we try to minimize the need within one lesson to switch between datasets, but the first priority is to
use appropriate data for each method.

The following data files are used in this course:

• Bank.sav
• Drinks.sav
• Census.sav
• Employee data.sav
• SPSS_CUST.sav

0.5 Course Assumptions
General computer literacy. Completion of the “Introduction to PASW Statistics” and/or “Data
Management and Manipulation with PASW Statistics” courses or experience with PASW Statistics
including familiarity with, opening, defining, and saving data files and manipulating and saving output.
Basic statistical knowledge or at least one introductory level course in statistics is recommended.

Note about Default Startup Folder and Variable Display in Dialog Boxes
In this course, all of the files used for the demonstrations and exercises are located in the folder
c:TrainStatistics_IntroAnalysis.

Note: If the course files are stored in a different location, your instructor will give you instructions
specific to that location.

Either variable names or longer variable labels will appear in list boxes in dialog boxes. Additionally,
variables in list boxes can be ordered alphabetically or by their position in the file. In this course, we
will display variable names in alphabetical order within list boxes.

1) Select Edit…Options
2) Select the General tab (if necessary)
3) Select Display names in the Variable Lists group on the General tab
4) Select Alphabetical
5) Select OK and OK in the information box to confirm the change

INTRODUCTION TO STATISTICAL ANALYSIS

1-1

Lesson 1: Introduction to Statistical
Analysis
1.1 Objectives
After completing this lesson students will be able to:

• Explain the basic elements of quantitative research and issues that should be considered in
data analysis

To support the achievement of the primary objective, students will also be able to:

• Explain the basic steps of the research process
• Explain differences between populations and samples
• Explain differences between experimental and non-experimental research designs
• Explain differences between independent and dependent variables

1.2 Introduction
The goal of this course is to enable you to perform useful analyses on your data using PASW
Statistics. Keeping this in mind, these lessons demonstrate how to perform descriptive and inferential
statistical analyses and create charts to support these analyses. This course guide will focus on the
elements necessary for you to answer questions from your data.

In this chapter, we begin by briefly reviewing the basic elements of quantitative research and issues
that should be considered in data analysis. We will then discuss a number of statistical procedures
that PASW Statistics performs. This is an application-oriented course and the approach will be
practical. We will discuss:

1) The situations in which you would use each technique.
2) The assumptions made by the method.
3) How to set up the analysis using PASW Statistics.
4) Interpretation of the results.

We will not derive proofs, but rather focus on the practical matters of data analysis in support of
answering research questions. For example, we will discuss what correlation coefficients are, when to
use them, and how to produce and interpret them, but will not formally derive their properties. This
course is not a substitute for a course in statistics. You will benefit if you have had such a course in
the past, but even if not, you will understand the basics of each technique after completion of this
course.

We will cover descriptive statistics and exploratory data analysis, and then examine relationships
between categorical variables using crosstabulation tables and chi-square tests. Testing for mean
differences between groups using T Tests and analysis of variance (ANOVA) will be considered.
Correlation and regression will be used to investigate the relationships between interval/scale
variables and we will also discuss some non-parametric techniques. Graphs comprise an integral part
of the analyses and we will demonstrate how to create and interpret these as well.

1.3 Basic Steps of the Research Process
All research projects, whether analyzing a survey, doing program evaluations, assessing marketing
campaigns, doing pharmaceutical research, etc., can be broken down into a number of discrete

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

1-2

components. These components can be categorized in a variety of ways. We might summarize the
main steps as:

1) Specify exactly the aims and objectives of the research along with the main hypotheses.

2) Define the population and sample design.

3) Choose a method of data collection, design the research and decide upon an appropriate
sampling strategy.

4) Collect the data.

5) Prepare the data for analysis.

6) Analyse the data.

7) Report the findings.

Some of these points may seem obvious, but it is surprising how often some of the most basic
principles are overlooked, potentially resulting in data that is impossible to analyze with any
confidence. Each step is crucial for a successful research project and it is never too early in the
process to consider the methods that you intend to use for your data analysis.

In order to place the statistical techniques that we will discuss in this course in the broader framework
of research design, we will briefly review some of the considerations of the first steps. Statistics and
research design are highly interconnected disciplines and you should have a thorough grasp of both
before embarking on a research project. This introductory chapter merely skims the surface of the
issues involved in research design. If you are unfamiliar with these principles, we recommend that
you refer to the research methodology literature for more thorough coverage of the issues.

Research Objectives
It is important that a research project begin with a set of well-defined objectives. Yet, this step is often
overlooked or not well defined. The specific aims and objectives may not be addressed because
those commissioning the research do not know exactly which questions they would like answered.
This rather vague approach can be a recipe for disaster and may result in a completely wasted
opportunity as the most interesting aspects of the subject matter under investigation could well be
missed. If you do not identify the specific objectives, you will fail to collect the necessary information
or ask the necessary question in the correct form. You can end up with a data file that does not
contain the information that you need for your data analysis step.

For example, you may be asked to conduct a survey “to find out about alcohol consumption and
driving”. This general objective could lead to a number of possible survey questions. Rather than
proceeding with this general objective, you need to uncover more specific hypotheses that are of
interest to your organization. This example could lead to a number of very specific research
questions, such as:

“What proportion of people admits to driving while above the legal alcohol limit?”

“What demographic factors (e.g., age/sex/social class) are linked with a propensity to drunk-driving?”

“Does having a conviction for drunk-driving affect attitudes towards driving while over the legal limit?”

These specific research questions would then define the questionnaire items. Additionally, the
research questions will affect the definition of the population and the sampling strategy. For example,
the third question above requires that the responder have a drunk-driving conviction. Given that a

INTRODUCTION TO STATISTICAL ANALYSIS

1-3

relatively small proportion of the general population has such a conviction, you would need to take
that into consideration when defining the population and sampling design.

Therefore, it is essential to state formally the main aims and objectives at the outset of the research
so the subsequent stages can be done with these specific questions in mind.

1.4 Populations and Samples
In studies involving statistical analysis it is important to be able to characterize accurately the
population under investigation. The population is the group to which you wish to generalize your
conclusions, while the sample is the group you directly study. In some instances the sample and
population are identical or nearly identical; consider the Census of any country. In the majority of
studies, the sample represents a small proportion of the population.

In the example above, the population might be defined as those people with registered drivers’
licenses. We could select a sample from the drivers’ license registration list for our survey. Other
common examples are: membership surveys in which a small percentage of members are sent
questionnaires, medical experiments in which samples of patients with a disease are given different
treatments, marketing studies in which users and non users of a product are compared, and political
polling.

The problem is to draw valid inferences from data summaries in the sample so that they apply to the
larger population. In some sense you have complete information about the sample, but you want
conclusions that are valid for the population. An important component of statistics and a large part of
what we cover in the course involves statistical tests used in making such inferences. Because the
findings can only be generalized to the population under investigation, you should give careful
thought to defining the population of interest to you and making certain that the sample reflects this
population. To state it in a simple way, statistical inference provides a method of drawing conclusions
about a population of interest based on sample results.

1.5 Research Design
With specific research goals and a target population in mind, it is then possible to begin the design
stage of the research. There are many things to consider at the design stage. We will consider a few
issues that relate specifically to data analysis and statistical techniques. This is not meant as a
complete list of issues to consider. For example, for survey projects, the mode of data collection,
question selection and wording, and questionnaire design are all important considerations. Refer to
the survey research literature as well as general research methodology literature for discussion of
these and other research design issues.

First, you must consider the type of research that will be most appropriate to the research aims and
objectives. Two main alternatives are experimental and non-experimental research. The data may be
recorded using either objective or subjective techniques. The former includes items measured by an
instrument and by computer such as physiological measures (e.g. heart-rate) while the latter includes
observational techniques such as recordings of a specific behavior and responses to questionnaire
surveys.

Most research goals lend themselves to one particular form of research, although there are cases
where more than one technique may be used. For example, a questionnaire survey would be
inappropriate if the aim of the research was to test the effectiveness of different levels of a new drug
to relieve high blood pressure. This type of work would be more suited to a tightly controlled
experimental study in which the levels of the drug administered could be carefully controlled and
objective measures of blood pressure could be accurately recorded. On the other hand, this type of
laboratory-based work would not be a suitable means of uncovering people’s voting intentions.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

1-4

The classic experimental design consists of two groups: the experimental group and the control
group. They should be equivalent in all respects other than that those in the former group are
subjected to an effect or treatment and the latter is not. Therefore, any differences between the two
groups can be directly attributed to the effect of this treatment. The treatment variables are usually
referred to as independent variables, and the quantity being measured as the effect is the
dependent variable. There are many other research designs, but most are more elaborate variations
on this basic theme.

In non-experimental research, you rarely have the opportunity to implement such a rigorously
controlled design. For example, we cannot randomly assign students to schools, however, the same
general principles apply to many of the analyses you perform.

1.6 Independent and Dependent Variables
In general, the dependent (sometimes referred to as the outcome) variable is the one we wish to
study as a function of other variables. Within an experiment, the dependent variable is the measure
expected to change as a result of the experimental manipulation. For example, a drug experiment
designed to test the effectiveness of different sleeping pills might employ the number of hours of
sleep as the dependent variable. In surveys and other non-experimental studies, the dependent
variable is also studied as a function of other variables. However, no direct experimental manipulation
is performed; rather the dependent variable is hypothesized to vary as a result of changes in the other
(independent) variables.
Correspondingly, independent (sometimes referred to as predictor) variables are those used to
measure features manipulated by the experimenter in an experiment. In a non-experimental study,
they represent variables believed to influence or predict a dependent measure.

Thus terms (dependent, independent) reasonably applied to experiments have taken on more general
meanings within statistics. Whether such relations are viewed causally, or as merely predictive, is a
matter of belief and reasoning. As such, it is not something that statistical analysis alone can resolve.
To illustrate, we might investigate the relationship between starting salary (dependent) and years of
education, based on survey data, and then develop an equation predicting starting salary from years
of education. Here starting salary would be considered the dependent variable although no
experimental manipulation of education has been performed. One way to think of the distinction is to
ask yourself which variable is likely to influence the other? In summary, the dependent variable is
believed to be influenced by, or be predicted by, the independent variable(s).

Finally, in some studies, or parts of studies, the emphasis is on exploring and characterizing
relationships among variables with no causal view or focus on prediction. In such situations there is
no designation of dependent and independent variables. For example, in crosstabulation tables and
correlation matrices the distinction between dependent and independent variables is not necessary. It
rather resides in the eye of the beholder (researcher).

1.7 Note about Default Startup Folder and Variable Display
in Dialog Boxes

In this course, all of the files used for the demonstrations and exercises are located in the folder
c:TrainStatistics_IntroAnalysis.You can set the startup folder that will appear in all Open and Save
dialog boxes. We will use this option to set the startup folder.

Select Edit…Options, and then select the File Locations tab
Select the Browse button to the right of the Data Files text box
Select Train from the Look In: drop down list, then select Statistics_IntroAnalysis from the

list of folders and select Set button
Click the Browse button to the right of the Other Files text box and repeat the process to set

this folder to TrainStatistics_IntroAnalysis

INTRODUCTION TO STATISTICAL ANALYSIS

1-5

Figure 1.1 Set Default File Location in the Edit Options Dialog Box

Note: If the course files are stored in a different location, your instructor will give you instructions
specific to that location.

Either variable names or longer variable labels will appear in list boxes in dialog boxes. Additionally,
variables in list boxes can be ordered alphabetically or by their position in the file. In this course, we
will display variable names in alphabetical order within list boxes.

Select General tab
Select Display names in the Variable Lists group on the General tab
Select Alphabetical (Not shown)
Select OK and then OK in the information box to confirm the change

1.8 Lesson Summary
In this lesson, we reviewed the basic elements of quantitative research and issues that should be
considered in data analysis.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Explain the basic elements of quantitative research and issues that should be considered in
data analysis

To support the achievement of the primary objective, students should now also be able to:

• Explain the basic steps of research process
• Explain differences between populations and samples
• Explain differences between experimental and non-experimental research designs
• Explain differences between independent and dependent variables

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

1-6

1.9 Learning Activity
In this set of learning activities you won’t need any supporting material.

1. In each of the following scenarios, state the possible goals of the research, the type of design
you can use, and the independent and dependent variables:

a. The relationship between gender and whether a product was purchased.
b. The difference between income categories (e.g., low, medium, and high) and number

of years of education.
c. The effect of two different marketing campaigns on number of items purchased.

2. In your own organization/field, are experimental studies ever done? If not, can you imagine

how an experiment might be done to study a topic of interest to you or your organization?
Describe that and the challenges such an experimental design would encounter.

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-1

Lesson 2: Understanding Data
Distributions – Theory
2.1 Objectives
After completing this lesson students will be able to:

• Determine the level of measurement of variables and obtain appropriate summary statistics
based on the level of measurement

To support the achievement of this primary objective, students will also be able to:

• Describe the levels of measurement used in PASW Statistics
• Use measures of central tendency and dispersion
• Use normal distributions and z-scores

Introduction
Ideally, we would like to obtain as much information as possible from our data. In practice however,
given the measurement level of our variables, only some information is meaningful. In this lesson we
will discuss level of measurement and see how this determines the summary statistics we can
request.

Business Context
Understanding how level of measurement impacts the kind of information we can obtain is an
important step before we collect our data. In addition, level of measurement also determines the kind
of research questions we can answer, and so this is a critical step in the research process.

2.2 Levels of Measurement and Statistical Methods
The term levels of measurement refers to the properties and meaning of numbers assigned to
observations for each item. Many statistical techniques are only appropriate for data measured at
particular levels or combinations of levels. Therefore, when possible, you should determine the
analyses you will be using before deciding upon the level of measurement to use for each of your
variables. For example, if you want to report and test the mean age of your customers, you will need
to ask their age in years (or year of birth) rather than asking them to choose an age group into which
their age falls.

Because measurement type is important when choosing test statistics, we briefly review the common
taxonomy of level of measurement.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-2

The four major classifications that follow are found in many introductory statistics texts. They are
presented beginning with the weakest and ending with those having the strongest measurement
properties. Each successive level can be said to contain the properties of the preceding types and to
record information at a higher level.

• Nominal — In nominal measurement each numeric value represents a category or group
identifier, only. The categories cannot be ranked and have no underlying numeric value. An
example would be marital status, coded 1 (Married), 2 (Widowed), 3 (Divorced), 4
(Separated) and 5 (Never Married); each number represents a category and the matching of
specific numbers to categories is arbitrary. Counts and percentages of observations falling
into each category are appropriate summary statistics. Such statistics as the mean (the
average marital status?) would not be appropriate, but the mode would be appropriate (the
most frequent category).

• Ordinal — For ordinal measures the data values represent ranking or ordering information.

However, the difference between the data values along the scale is not equal. An example
would be specifying how happy you are with your life, coded 1 (Very Happy), 2 (Happy), and
3 (Not Happy). There are specific statistics associated with ranks; PASW Statistics provides a
number of them mostly within the Crosstabs, Nonparametric and Ordinal Regression
procedures. The mode and median can be used as summary statistics.

• Interval — In interval measurement, a unit increase in numeric value represents the same

change in quantity regardless of where it occurs on the scale. For interval scale variables
such summaries as means and standard deviations are appropriate. Statistical techniques
such as regression and analysis of variance assume that the dependent (or outcome)
variable is measured on an interval scale. Examples might be temperature in degrees
Fahrenheit or SAT score.

• Ratio — Ratio measures have interval scale properties with the addition of a meaningful zero
point; that is, zero indicates complete absence of the characteristic measured. For statistics
such as ANOVA and regression only interval scale properties are assumed, so ratio scales
have stronger properties than necessary for most statistical analyses. Health care
researchers often use ratio scale variables (number of deaths, admissions, discharges) to
calculate rates. The ratio of two variables with ratio scale properties can thus be directly
interpreted. Money is an example of a ratio scale, so someone with $10,000 has ten times
the amount as someone with $1,000.

The distinction between the four types is summarized below.

Table 2.1 Level of Measurement Properties

Level of
Measurement

Property

Categories Ranks Equal Intervals True Zero Point
Nominal 
Ordinal  
Interval   
Ratio    

These four levels of measurement are often combined into two main types, categorical consisting of
nominal and ordinal measurement levels and scale (or continuous) consisting of interval and ratio
measurement levels.

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-3

The measurement level variable attribute in PASW Statistics recognizes three measurement levels:
Nominal, Ordinal and Scale. The icon indicating the measurement level is displayed preceding the
variable name or label in the variable lists of all dialog boxes. The following table shows the most
common icons used for the measurement levels. Special data types, such as Date and Time
variables have distinct icons not shown in this table.

Table 2.2 Variable List Icons

Measurement Level Data Type Numeric String

Nominal

Ordinal

Not
Applicable

Scale

Not
Applicable

Rating Scales and Dichotomous Variables
A common scale used in surveys and market research is an ordered rating scale usually consisting of
five- or seven-point scales. Such ordered scales are also called Likert scales and might be coded 1
(Strongly Agree, or Very Satisfied), 2 (Agree, or Satisfied), 3 (Neither agree nor disagree, or Neutral),
4 (Disagree, or Dissatisfied), and 5 (Strongly Disagree, or Very Dissatisfied). There is an ongoing
debate among researchers as to whether such scales should be considered ordinal or interval. PASW
Statistics contains procedures capable of handling such variables under either assumption. When in
doubt about the measurement scale, some researchers run their analyses using two separate
methods, since each make different assumptions about the nature of the measurement. If the results
agree, the researcher has greater confidence in the conclusion.

Dichotomous (binary) variables containing two possible responses (often coded 0 and 1) are often
considered to fall into all of the measurement levels except ratio (at least as independent variables).
As we will see, this flexibility allows them to be used in a wide range of statistical procedures

Implications of Measurement Level
As we have discussed, the level of measurement of a variable is important because it determines the
appropriate summary statistics, tables, and graphs to describe the data. The following table
summarizes the most common summary measures and graphs for each of the measurement levels
and PASW Statistics procedures that can produce them.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-4

Table 2.3 Summary of Descriptive Statistics and Graphs

NOMINAL ORDINAL SCALE

Definition Unordered Categories Ordered Categories
Metric/Numeric

Values

Examples
Labor force status,

gender, marital
status

Satisfaction ratings,
degree of education

Income, height,
weight

Measures
of Central
Tendency

Mode Mode
Median

Mode
Median
Mean

Measures
of

Dispersion
N/A

Min/Max/Range,
InterQuartile Range

(IQR)

Min/Max/Range,
IQR,

Standard
Deviation/Variance

Graph Pie or Bar Pie or Bar
Histogram,

Box & Whisker,
Stem & Leaf

Procedures Frequencies Frequencies
Frequencies,
Descriptives,

Explore

Measurement Level and Statistical Methods
Statistics are available for variables at all levels of measurement for more advanced analysis. In
practice, your choice of method depends on the questions you are interested in asking of the data
and the nature of the measurements you make. The table below suggests which statistical techniques
are most appropriate, based on the measurement level of the dependent and independent variable.
Much more extensive diagrams and discussion are found in Andrews et al. (1981), or other standard
statistical texts.

Table 2.4 Level of Measurement and Appropriate Statistical Methods

Dependent
Variable

Independent Variables
Nominal Ordinal Interval/Ratio

Nominal Crosstabs Crosstabs
Discriminant,
Logistic
Regression

Ordinal
Nonparametric
tests,
Ordinal
Regression

Nonparametric
correlation,
Optimal Scaling
Regression

Ordinal
Regression

Interval/
Ratio T Test, ANOVA

Nonparametric
Correlation

Correlation,
Linear Regression

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-5

Apply Your Knowledge
1. PASW Statistics distinguishes three levels of measurement. Which of these is not one of

those levels?
a. Categorical
b. Scale
c. Nominal
d. Ordinal

2. True or false? An ordinal variable has all properties of a nominal variable?

3. Consider the dataset depicted below. Which statements are correct?

a. The variable region is an ordinal variable
b. The variable age is a scale variable
c. The variable agecategory is an ordinal variable
d. The variable salarycategory is a scale variable

2.3 Measures of Central Tendency and Dispersion
Measures of central tendency and dispersion are the most common measures used to summarize the
distribution of variables. We give a brief description of each of these measures below.

Measures of Central Tendency
Statistical measures of central tendency give that one number that is often used to summarize the
distribution of a variable. They may be referred to generically as the “average.” There are three main

If in doubt about the measurement properties of your
variables, you can apply a statistical technique that
assumes weaker measurement properties and compare
the results to methods making stronger assumptions. A
consistent answer provides greater confidence in the
conclusions.

Best
Practice

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-6

central tendency measures: mode, median, and mean. In addition, Tukey devised the 5% trimmed
mean.

• Mode: The mode for any variable is merely the group or class that contains the most cases.
If two or more groups contain the same highest number of cases, the distribution is said to be
“multimodal.” This measure is more typically used on nominal or ordinal data and can easily
be determined by examining a frequency table.

• Median – If all the cases for a variable are arranged in order according to their value, the
median is that value that splits the cases into two equally sized groups. The median is the
same as the 50th percentile. Medians are resistant to extreme scores, and so are considered
robust measures of central tendency.

• Mean: – The mean is the simple arithmetic average of all the values in the distribution (i.e.,
the sum of the values of all cases divided by the total number of cases). It is the most
commonly reported measure of central tendency. The mean along with the associated
measures of dispersion are the basis for many statistical techniques.

• 5% trimmed mean – The 5% trimmed mean is the mean calculated after the extreme upper
5% and the extreme lower 5% of the data values are dropped. Such a measure is resistant to
extreme values.

The specific measure that you choose will depend on a number of factors, most importantly the level
of measurement of the variable. The mean is considered the most “powerful” measure of the three
classic measures of central tendency. However, it is good practice to compare the median, mean,
and 5% trimmed mean to get a more complete understanding of a distribution.

Measures of Dispersion
Measures of dispersion or variability describe the degree of spread, dispersion, or variability around
the central tendency measure. You might think of this as a measure of the extent to which
observations cluster within the distribution. There are a number of measures of dispersion, including
simple measures such as maximum, minimum, and range, common statistical measures, such as
standard deviation and variance, as well as the interquartile range (IQR).

• Maximum: Simply the highest value observed for a particular variable. By itself, it can tell us
nothing about the shape of the distribution, merely how high the top value is.

• Minimum: The lowest value in the distribution and, like the maximum, is only useful when
reported in conjunction with other statistics.

• Range: The difference between the maximum and minimum values gives a general
impression of how broad the distribution is. It says nothing about the shape of a distribution
and can give a distorted impression of the data if just one case has an extreme value.

• Variance: Both the variance and standard deviation provide information about the amount of
spread around the mean value. They are overall measures of how clustered around the mean
the data values are. The variance is calculated by summing the square of the difference
between the value and the mean for each case and dividing this quantity by the number of
cases minus 1. If all cases had the same value, the variance (and standard deviation) would
be zero. The variance measure is expressed in the units of the variable squared. This can
cause difficulty in interpretation, so more often the standard deviation is used. In general
terms, the larger the variance, the more spread there is in the data, the smaller the variance,
the more the data values are clustered around the mean.

• Standard Deviation: The standard deviation is the square root of the variance which
restores the value of variability to the units of measurement of the original variable. It is
therefore easier to interpret. Either the variance or standard deviation is often used in
conjunction with the mean as a basis for a wide variety of statistical techniques.

• Interquartile Range (IQR) – This measure of variation is the range of values between the
25th and 75th percentile values. Thus, the IQR represents the range of the middle 50 percent
of the sample and is more resistant to extreme values than the standard deviation.

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-7

Like the measures of central tendency, these measures differ in their usefulness with variables of
different measurement levels. The variability measures, variance and standard deviation, are used in
conjunction with the mean for statistical evaluation of the distribution of a scale variable. The other
measures of dispersion, although less useful statistically, can provide useful descriptive information
about a variable.

Apply Your Knowledge
1. True or false? The mode is that value that splits the cases into two equally sized groups.

2. True or false? Consider the table depicted below. The salaries of men are clustered tighter

around their mean than the salaries of women around their mean?

2.4 Normal Distributions
An important statistical concept is that of the normal distribution. This is a frequency (or probability)
distribution which is symmetrical and is often referred to as the normal bell-shaped curve. The
histogram below illustrates a normal distribution. The mean, median and mode exactly coincide in a
perfectly normal distribution. And the proportion of cases contained within any portion of the normal
curve can be exactly calculated mathematically.

Its symmetry means that 50% of cases lie to either side of the central point as defined by the mean.
Two of the other most frequently-used representations are the portions lying between plus and minus
one standard deviation of the mean (containing approximately 68% of cases) and that between plus
and minus 1.96 standard deviations (containing approximately 95% of cases, sometimes rounded up
to 2.00 for convenience). Thus, if a variable is normally distributed, we expect 95% of the cases to be
within roughly 2 standard deviations from the mean.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-8

Figure 2.1 The Normal Distribution

Many naturally occurring phenomena, such as height, weight and blood pressure, are distributed
normally. Random errors also tend to conform to this type of distribution. It is important to understand
the properties of normal distributions and how to assess the normality of particular distributions
because of their theoretical importance in many inferential statistical procedures.

2.5 Standardized (Z-) Scores
The properties of the normal distribution allow us to calculate a standardized score, often referred to
as a z-score, which indicates the number of standard deviations above or below the sample mean for
each value. Standardized scores can be used to calculate the relative position of each value in the
distribution. Z-scores are most often used in statistics to standardize variables of unequal scale units
for statistical comparisons or for use in multivariate procedures.

For example, if you obtain a score of 68 out of 100 on a test of verbal ability, this information alone is
not enough to tell how well you did in relation to others taking the test. However, if you know the
mean score is 52.32, the standard deviation 8.00 and the scores are normally distributed, you can
calculate the proportion of people who achieved a score at least as high as your own.

The standardized score is calculated by subtracting the mean from the value of the observation in
question (68-52.32 = 15.68) and dividing by the standard deviation for the sample (15.68/8 = 1.96).

Standardized Score =
Standard Deviation

Case Score – Sample Mean

Therefore, the mean of a standardized distribution is 0 and the standard deviation is 1.

In this case, your score of 68 is 1.96 standard deviations above the mean.

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-9

The histogram of the normal distribution above displays the distribution as a Z-score so the values on
the x-axis are standard deviation units. From this figure, we can see only 2.5% of the cases are likely
to have a score above 68 on the verbal ability test (1.96 standardized score). The normal distribution
table (see below), found in an appendix of most statistics books, shows proportions for z-score
values.

Table 2.5 Normal Distribution Table

A score of 1.96, for example, corresponds to a value of .025 in the “one-tailed” column and .050 in the
“two-tailed” column. The former means that the probability of obtaining a z-score at least as large as
+1.96 is .025 (or 2.5%), the latter that the probability of obtaining a z-score of more than +1.96 or less
than -1.96 is .05 (or 5%) or 2.5% at each end of the distribution. You can see these cutoffs in the
histogram above.

As we mentioned, another advantage of standardized scores is that they allow for comparisons on
variables measured in different units. For example, in addition to the verbal test score, you might
have a mathematics test score of 150 out of 200 (or 75%). Although it appears that you did better on
the mathematics test from the percentages alone, you would need to calculate the z-score for the
mathematics test and compare the z-scores in order to answer the question.
You might want to compute z-scores for a series of variables and determine whether certain
subgroups of your sample are, on average, above or below the mean on these variables by
requesting descriptive statistics or using the Case Summaries procedure. For example, you might
want to compare a customer’s yearly revenue using z-scores.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-10

2.6 Requesting Standardized (Z-) Scores
The Descriptives procedure has an option to calculate standardized score variables. A new variable
containing the standardized values is calculated for the specified variables. Creating standardized
scores is accomplished by following these steps:

1) Choose variables to transform into standardized-scores.
2) Review the new variables that were created.

2.7 Standardized (Z-) Scores Output
The Descriptives procedure provides descriptive statistics of the original variables. The
standardized variables will appear in the Data Editor.

Figure 2.2 Example of Descriptives Output

2.8 Procedure: Descriptives for Standardized (Z-) Scores
The Descriptives procedure is accessed from the Analyze…Descriptive Statistics…Descriptives
menu choice. With the Descriptives dialog box open:

1) Place one or more scale variables in the Variable(s) box.
2) Select the Save standardized values as variables box.

Figure 2.3 Descriptives Dialog Box to Create Z-Scores

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-11

2.9 Demonstration: Descriptives for Z-Scores
We will work with the Census.sav data file in this example. We create standardized scores for number
of years of education (educ) and age of respondent (age). We would like to determine where
respondents fall on the distribution of these variables.

Detailed Steps for Z-Scores
1) Place the variable educ and age in the Variable(s) box.
2) Select the Save standardized values as variables box.

Results from Z-Scores
By default, the new variable name is the old variable name prefixed with the letter “Z”. Two new
variables, zeduc and zage, containing the z-scores of the two variables, are created at the end of the
data file. These variables can be saved in your file and used in any statistical procedure.

We observe that:

• The first person (row) in the data file is below the average on education but above the
average on age.

Figure 2.4 Two Z-score Variables in the Data Editor

Apply Your Knowledge
1. True or false? Only for variables of measurement level scale in PASW Statistics is it

meaningful to calculate standardized scores?

2. Consider the data below, where we computed standardized values for the variables educ
(highest year of education) and salary (salary in dollars). Which of the following statements
are correct?

a. The observation with employee_id=49 has a salary very close to the mean salary.
b. The observation with employee_id=50 has a salary that is more than one standard

deviation above the mean.
c. The observation with employee_id=46 is more extreme in her education than in

salary.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-12

Additional Resources

2.10 Lesson Summary
We explored the concept of the level of measurement and the appropriate summary statistics given
level of measurement. We also discussed the normal distribution and z-scores.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Determine the level of measurement of variables and obtain appropriate summary statistics
based on the level of measurement

To support the achievement of the primary objective, students should now also be able to:

• Describe the levels of measurement used in PASW Statistics
• Use measures of central tendency and dispersion
• Use normal distributions and z-scores

For additional information on Level of
Measurement and Statistical Tests, see:

Andrews, Frank M, Klem, L., Davidson, T.N., O’Malley,
P.M. and Rodgers, W.L. 1981. A Guide for Selecting
Statistical Techniques for Analyzing Social Science
Data. Ann Arbor, MI: Institute for Social Research,
University of Michigan.

Velleman, Paul F. and Wilkinson, L. 1993. “Nominal,
Ordinal and Ratio Typologies are Misleading for
Classifying Statistical Methodology,” The American
Statistician, vol. 47, pp. 65-72.

Further Info

UNDERSTANDING DATA DISTRIBUTIONS – THEORY

2-13

2.11 Learning Activity
The overall goal of this learning activity is to create standardized (Z-) scores for several variables. In
this set of learning activities you will use the Drinks.sav data file.

1. Create standardized scores for all scale variables (price through alcohol). Which beverages
have positive standardized scores on every variable? What does this mean?

2. What is the most extreme z-score on each variable? What is the most extreme z-score

across all variables?

3. What beverage is most typical of all beverages, that is, has z-score values closest to 0 for
these variables?

4. If the variable is normally distributed, what percentage of cases should be above 1 standard

deviation from the mean or below 1 standard deviation from the mean? Calculate this
percentage for a couple of the variables. Is the percentage of beverages with an absolute z-
score above 1 close to the theoretical value?

The file Drinks.sav, a PASW Statistics data file that
contains hypothetical data on 35 beverages. Included is
information on their characteristics (e.g., % alcohol), price,
origin, and a rating of quality.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

2-14

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-1

Lesson 3: Data Distributions for
Categorical Variables
3.1 Objectives
After completing this lesson students will be able to:

• Run the Frequencies procedure to obtain appropriate summary statistics for categorical
variables

To support the achievement of this primary objective, students will also be able to:

• Use the options in the Frequencies procedure
• Interpret the results of the Frequencies procedure

3.2 Introduction
As a first step in analyzing data, one must gain knowledge of the overall distribution of the individual
variables and check for any unusual or unexpected values. You often want to examine the values that
occur in a variable and the number of cases in each. For some variables, you want to summarize the
distribution of the variable by examining simple summary measures including the mode, median, and
minimum and maximum values. In this chapter, we will review tables and graphs appropriate for
describing categorical (nominal and ordinal) variables.

Business Context
Summaries of individual variables provide the basis for more complex analyses. There are a number
of reasons for performing single variable analyses. One would be to establish base rates for the
population sampled. These rates may be of immediate interest: What percentage of our customers is
satisfied with services this year? In addition, studying a frequency table containing many categories
might suggest ways of collapsing groups for a more succinct and statistically appropriate table. When
studying relationships between variables, the base rates of the separate variables indicate whether
there is a sufficient sample size in each group to proceed with the analysis. A second use of such
summaries would be as a data-checking device—unusual values would be apparent in a frequency
table.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-2

3.3 Using Frequencies to Summarize Nominal and Ordinal
Variables

The most common technique for describing categorical data is a frequency analysis which provides a
summary table indicating the number and percentage of cases falling into each category of a variable,
as well as the number of valid and missing cases. We can also use the mode, which indicates the
category with the highest frequency, and, if there is a large number of categories, the median (for
ordinal variables), which is the value above and below which half the cases fall.

Figure 3.1 Typical Frequencies Table

To represent the frequencies graphically we use bar or pie charts.

• A pie chart displays the contribution of parts to a whole. Each slice of a pie chart corresponds
to a group that is defined by a single grouping variable.

• A bar chart displays the count for each distinct value or category as a separate bar, allowing
you to compare categories vertically.

Figure 3.2 Pie Chart illustrated

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-3

Figure 3.3 Bar Chart Illustrated

3.4 Requesting Frequencies
Requesting Frequencies is accomplished by following these steps:

1) Choose variables for the Frequencies procedure.
2) Request additional summary statistics and graphs.
3) Review the procedure output to investigate the distribution of the variables including:

a. Frequency Tables
b. Graphs

3.5 Frequencies Output
The information in the frequency table is comprised of counts and percentages:

• The Frequency column contains counts, i.e., the number of occurrences of each data value.
• The Percent column shows the percentage of cases in each category relative to the number

of cases in the entire data set, including those with missing values.
• The Valid Percent column contains the percentage of cases in each category relative to the

number of valid (non-missing) cases.
• The Cumulative percentage column contains the percentage of cases whose values are less

than or equal to the indicated value. Cumulative percent is only useful for variables that are
ordinal.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-4

Figure 3.4 Example of Frequency Output

3.6 Procedure: Frequencies
The Frequencies procedure is accessed from the Analyze…Descriptive Statistics…Frequencies
menu choice. With the Frequencies dialog box open:

1) Place one or more variables in the Variable(s) box.
2) Open the Statistics dialog to request summary statistics.
3) Open the Charts dialog to request graphs.

Figure 3.5 Frequencies Dialog Box

In the Statistics dialog box:

1) Ask for the appropriate measures of central tendency and dispersion.

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-5

Figure 3.6 Frequencies: Statistics Dialog Box

In the Charts dialog:

1) Ask for the appropriate chart based on the scale of measurement of the variable.
Figure 3.7 Frequencies: Charts Dialog Box

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-6

3.7 Demonstration: Frequencies
We will work with the Census.sav data file in this lesson. In this example we examine the distribution
of the variables marital and happy. These variables are either nominal or ordinal in scale of
measurement.

Detailed Steps for Frequencies
1) Place the variables marital and happy in the Variable(s) box
2) In the Statistics dialog, select Mode and Median in the Statistics dialog
3) In the Charts dialog, select Bar Chart in the Chart Types area and Percentages in the Chart

Value area

Results from Frequencies
The first table produced is the table labeled Statistics.
Figure 3.8 Statistics for Marital Status and General Happiness

This table shows the number of cases having a valid value on Marital Status (2018) and General
Happiness (2015), the number of cases having a (user- or system-) missing value (5 and 8,
respectively) and the Mode and Median. The mode, the category that has the highest frequency, is a
value of 1 and 2 respectively, and represents the category of “Married” for marital and the “Pretty
Happy” group for happy. The median, the middle point of the distribution (50th percentile), is a value
of 2 for both variables.

The second table shows the frequencies and percentages for each variable. This table confirms that
almost half of the respondents are married. Since there is almost no missing data for marital status,
the percentages in the Percent column and in the Valid Percent column are almost identical.

Figure 3.9 Frequency Table of Marital Status

Examine the table. Note the disparate category sizes. About half of the sample is married, and there
is one category that has less than 5% of the cases. Before using this variable in a crosstabulation

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-7

analysis, should you consider combining some of the categories with fewer cases? Decisions about
collapsing categories usually have to do with which groups need to be kept distinct in order to answer
the research question asked, and the sample sizes for the groups. For example, could we create a
“was previously married” group?

The bar chart summarizes the distribution that we observed in the frequency table and allows us to
“see” the distribution.

Figure 3.10 Bar Chart of Marital Status

For the variable happy, over half of the people fall into one category, pretty happy. Might it be
interesting to look at the relationship between this variable and marital status: to what extent is
general happiness related to marital status?

For a nominal variable (where the order of the categories is arbitrary) sorting the table and graph descending on
counts gives better insight in what the main categories are
(use the Format subdialog box to sort descending on
counts).

Tip

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-8

Figure 3.11 Frequency Table of General Happiness

Next we view a bar chart based on the general happiness variable. Does the picture make it easier to
understand the distribution?

Figure 3.12 Bar Chart of General Happiness

For an ordinal variable, sorting the categories on
descending/ascending counts (which was useful for
nominal variables) will disturb the natural order of
categories and so is not as useful for an ordinal variable.

Note

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-9

Apply Your Knowledge
1. See the output below. Which statements are correct?

a. The median is an appropriate statistic to report for the variable region.
b. The region that has the highest frequency is the North.
c. The cumulative percent is meaningful for region.
d. The columns Percent and Valid Percent are identical because there are no missing

values on region

2. See the output below. Which bar chart is best to present the distribution of the ordinal
variable age of employees, in categories? Bars sorted descending on count (A) or sorted on
ascending value (B)?

3. See the table below (a frequency table of HAPPINESS OF MARRIAGE, with those not

married defined as missing). Which statements are correct?
a. 29.5% of those married are very happy in their marriage.
b. The mode is the category pretty happy.
c. 96.9% of those married are pretty happy or very happy

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-10

Additional Resources

3.8 Lesson Summary
In this lesson we used the Frequencies procedure to explore the distribution of categorical variables,
via both tables and graphs.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Run the Frequencies procedure to obtain appropriate summary statistics for categorical
variables

To support the achievement of the primary objective, students should now also be able to:

• Use the options in the Frequencies procedure
• Interpret the results of the Frequencies procedure

3.9 Learning Activity
The overall goal of this learning activity is to run Frequencies to explore the distributions of several
variables. In the exercises you will use the data file Census.sav.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

For additional information on how to present data
in tables and graphs, see:

Few, Stephen. 2004. Show Me the Numbers:
Designing Tables and Graphs to Enlighten. Analytics
Press Further Info

DATA DISTRIBUTIONS FOR CATEGORICAL VARIABLES

3-11

1. Run the Frequencies procedure on the following variables: sex, wrkstat (Labor Force
Status), paeduc (Father’s highest degree), and satjob (Job or Housework). What is the scale
of measurement for each? Request appropriate summary statistics and charts.

2. For which of these variables is it appropriate to use the median? What conclusions can you

draw about the distributions of these variables?

3. What percent of respondents have a bachelor’s degree, or higher? What percent of
respondents are working?

4. How might you combine some of the categories of wrkstat to insure that there are a sufficient
number of respondents in each category?

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

3-12

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-1

Lesson 4: Data Distributions for Scale
Variables
4.1 Objectives
After completing this lesson students will be able to:

• Request and interpret appropriate summary statistics for scale variables

To support the achievement of this primary objective, students will also be able to:

• Use the options in the Frequencies, Descriptives, and Explore procedures
• Interpret the results of the Frequencies, Descriptives, and Explore procedures

4.2 Introduction
As a first step in analyzing your data, you must first gain knowledge of the overall distribution of the
individual variables and check for any unusual or unexpected values. You often want to examine the
values that occur in a variable and the number of cases in each. For some variables, you want to
summarize the distribution of the variable by examining simple summary measures including
minimum and maximum values for the range. Frequently used summary measures describe the
central tendency of the distribution, such as the arithmetic mean, and dispersion, the spread around
the central point. In this lesson, we will review tables and graphs appropriate for describing scale
(interval and ratio) variables.

Business Context
Summaries of individual variables provide the basis for more complex analyses. There are a number
of reasons for performing single variable analyses. One would be to establish base rates for the
population sampled. These rates may be of immediate interest: What is the average customer
satisfaction? In addition, studying distributions might suggest ways of collapsing information for a
more succinct and statistically appropriate table. When studying relationships between variables, the
base rates of the separate variables indicate whether there is a sufficient sample size in each group
to proceed with the analysis. A second use of such summaries would be as a data-checking device,
as unusual values would be apparent in tables.

4.3 Summarizing Scale Variables Using Frequencies
When working with categorical variables, frequency tables containing counts and percentages are
appropriate summaries. For a scale variable, counts and percentages may still be of interest,
especially when the variables can take only a limited number of distinct values. For example, when
working with a one to five point rating scale we might be very interested in knowing the percentage of
respondents who reply “Strongly Agree.” However, as the number of possible response values

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-2

increases, frequency tables based on interval scale variables become less useful. Suppose we asked
respondents for their family income to the nearest dollar? It is likely that each response would have a
different value and so a frequency table would be quite lengthy and not particularly helpful as a
summary of the variable. In data cleaning, you might find a frequency table useful for examining
possible clustering of cases on specific values or looking at cumulative percentages. But, beware of
using frequency tables for scale variables with many values as they can be very long.

If the variables of interest are scale we can expand the summaries to include means, standard
deviations and other statistical measures. You will want to spend some time looking over the
summary statistics you requested. Do they make sense, or is something unusual?

For a categorical variable, we request a pie chart or a bar chart to graphically display the distribution
of the variable. For a scale variable, a histogram is used to display the distribution.

4.4 Requesting Frequencies
Requesting statistics and a graphical display is accomplished by following these steps:

1) Select variables in the Frequencies procedure.
2) Request additional summary statistics and graphs.
3) Review the procedure output to investigate the distribution of the variables including:

a. Frequency tables (if requested)
b. Statistics tables
c. Graphs

4.5 Frequencies Output
Statistics for the variable are presented in a separate table.

A normal curve can be superimposed on the histogram
and helps you to judge whether the variable is normally
distributed. Tip

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-3

Figure 4.1 Example of Summary Statistics for Frequencies Output

A histogram shows the distribution graphically. A histogram has bars, but, unlike the bar chart, they
are plotted along an equal interval scale. The height of each bar is the count of values of a
quantitative variable falling within the interval. A histogram shows the shape, center, and spread of
the distribution.

Figure 4.2 Example of Histogram for Frequencies Output

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-4

4.6 Procedure: Frequencies
The Frequencies procedure is accessed from the Analyze…Descriptive Statistics…Frequencies
menu choice. With the Frequencies dialog box open:

1) Place one or more variables in the Variable(s) box.
2) Deselect the Display frequency tables check box for variables with many values.
3) Open the Statistics dialog to request summary statistics.
4) Open the Charts dialog to request graphs.

Figure 4.3 Frequencies Dialog Box

In the Statistics dialog:

1) Select appropriate measures of central tendency
2) Select appropriate measures of dispersion

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-5

Figure 4.4 Frequencies: Statistics Dialog Box

In the Charts dialog:

1) Ask for a histogram for scale variables.
2) Optionally, superimpose a normal curve on the histogram.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-6

Figure 4.5 Frequencies: Charts Dialog Box

4.7 Demonstration: Frequencies
We will work with the Census.sav data file in this lesson.

In this demonstration we examine the distribution of number of brothers and sisters (sibs) and
respondent’s age. We would like to see the distribution of these variables.

Detailed Steps for Frequencies
1) Place the variables sibs and age in the Variable(s) box
2) Deselect the Display frequency tables check box for variables with many values
3) Select Mode, Median, Mean, Minimum, Maximum and Standard Deviation in the Statistics

dialog
4) Select Histograms and Show normal curve on histogram in the Charts dialog

If you request histograms and summary statistics for scale
variables with many categories, you might want to
uncheck (turn off) Display frequency tables in the
Frequencies dialog box, as there may be almost as many
distinct values as there are cases in the data file.

Note

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-7

Results from Frequencies
The table labeled Statistics shows the requested statistics.

Figure 4.6 Summary Statistics for Number of Brothers and Sisters and Age of Respondent

This table shows the number of cases having a valid value on sibs (2021) and age (2013), the
number of cases having a (user- or system-) missing value (2 and 10, respectively) and measures of
central tendency and dispersion. The minimum value is 0 and the maximum value is 55 (seems
unusual) for number of siblings. For age, the minimum value is 18 and the maximum value 89. Note,
that the means and medians within each variable are similar, indicating that the variables are roughly
normally distributed within the defined range. We can visually check the distribution of these variables
with a histogram.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-8

Figure 4.7 Histogram of Number of Brothers and Sisters

We can see that the lower range of values is truncated at 0 and the number of people is greatest
between 0 to 6 siblings, although we do have some extreme values. The distribution is not normal.

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-9

Figure 4.8 Histogram of Age of Respondent

We can see that the lower range of values is truncated at 18 and the number of people is highest in
the middle age values (the “baby boomers”) with the number of cases tapering off at the higher ages
as we would expect. Thus, the age variable for respondents of this sample of adults is roughly
normally distributed.

Apply Your Knowledge
1. Suppose we have a variable region (with the categories north/east/south/west). Which of

these statements is true?
a. The mean is a meaningful statistic for region
b. The standard deviation is a meaningful statistic for region
c. The median is a meaningful statistic for region

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-10

2. See output below, with statistics for two variables: Current Salary and Beginning Salary (data
collected on employees). Which statements are correct?

a. There are 474 cases in the dataset
b. Both variables are skewed to the right (meaning: there are employees with some

large salaries compared to the average
c. Half of the employees have a current salary below 30,750.
d. The highest current salary is 135,000.

3. See the histogram below for Current Salary (data collected on employees). Which of these
statements is correct?

a. The variable seems normally distributed
b. The variable is skewed to the right
c. The standard deviation would be smaller, if the case with salary of 135,000 would be

removed from the histogram.

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-11

4.8 Summarizing Scale Variables using Descriptives
The Descriptive procedure is a good alternative to Frequencies when the objective is to summarize
scale variables. Descriptives is usually used to provide a table of statistical summaries (means,
standard deviations, variance, minimum, maximum, etc.) for several scale variables. The
Descriptives procedure also provides a succinct summary of the number of cases with valid values
for each variable included in the table as well as the number of cases with valid values for all
variables included in the table. These summaries are quite useful in evaluating the extent of missing
values in your data and in identifying variables with missing values for a large proportion of the data.

4.9 Requesting Descriptives
Running Descriptives is accomplished by following these steps:

1) Select variables for the Descriptives procedure.
2) Review the procedure output to investigate the distribution of the variables.

4.10 Descriptives Output
The figure below shows the Descriptives output table for a few variables.

Figure 4.9 Example Descriptives Output

The minimum and maximum provide an efficient way to check for values outside the expected range.
In general, this is a useful check for categorical variables as well. Thus, although mean and standard
deviation are not relevant for respondent’s sex, minimum and maximum for this variable show that
there are no values outside the expected range.

The last row in the table labeled Valid N (listwise) gives the number of cases that have a valid value
on all of variables appearing in the table. In this example, 1333 cases have valid values for all three
variables listed. Although this number is not particularly useful for this set of variables, it would be
useful for a set of variables that you intended to use for a specific multivariate analysis. As you
proceed with your analysis plans, it is helpful to know how many cases have complete information
and which variables are likely to be the main sources of potential problems.

4.11 Procedure: Descriptives
The Descriptives procedure is accessed from the Analyze…Descriptive Statistics…Descriptives
menu choice. With the Descriptives dialog box open:

1) Place one or more variables in the Variable(s) box.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-12

Figure 4.10 Descriptives Dialog Box

Only numeric variables appear in the Descriptives dialog box. The Save standardized values as
variables feature creates new variables that are standardized forms of the original variables. These
new variables, referred to as z-scores, have values standardized to a mean of 0 and standard
deviation of 1.

The Options button allows you to select additional summary statistics to display. You can also select
the display order of the variables in the table (for example, by ascending or descending mean value).

4.12 Demonstration: Descriptives
We will work with the Census.sav data file in this lesson.

In this example we examine the summary statistics of number of siblings, respondent’s age,
education and respondent´s gender. We would like to see the summary statistics for these variables,
as well as how much missing data there is, and if there are unusual cases.

Detailed Steps for Descriptives
1) Place the variables sex, sibs, educ, and age in the Variable(s) box.

Results from Descriptives
The table labeled Descriptive Statistics contains the statistics.

Figure 4.11 Descriptives Output

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-13

The column labeled N shows the number of valid observations for each variable in the table. We see
there is little variation in the number of valid observations.

The number of valid cases can be a useful check on the data and help us determine which variables
might be appropriate for specific analyses. Here 2006 cases have valid values for the entire set of
questions.

The minimum and maximum provide an efficient way to check for values outside the expected range.
Here the maximum for the variable sibs seems high and deserves further investigation.

4.13 Summarizing Scale Variables using the Explore
Procedure

Exploratory data analysis (EDA) was primarily developed by John Tukey. He devised several
statistical measures and plots designed to reveal data features that might not be readily apparent
from standard statistical summaries. Exploratory data analysis can be viewed either as an analysis in
its own right, or as a set of data checks that investigators perform before applying inferential testing
procedures.

These methods are best applied to variables with at least ordinal (more commonly interval) or scale
properties and which can take many different values. The plots and summaries would be less helpful
for a variable that takes on only a few values (for example, a five-point scale).

4.14 Requesting Explore
Running Explore is accomplished with these steps:

1) Select variables on which to report statistics in the Dependent List box
2) Select grouping variables in the Factor box
3) Request additional summary statistics and graphs.
4) Review the procedure output to investigate the summary statistics and distribution of the

variables including tables and graphs

Explore Output
The Descriptives table displays a series of descriptive statistics for age. From the previous table (not
shown), we know that these statistics are based on 1763 respondents.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-14

Figure 4.12 Summaries for Age of Respondent

First, several measures of central tendency appear: the Mean, 5% Trimmed Mean, and Median.
These statistics attempt to describe with a single number where data values are typically found, or the
center of the distribution. Useful information about the distribution can be gained by comparing these
values to each other. If the mean were considerably above or below the median and trimmed mean, it
would suggest a skewed or asymmetric distribution.

The measures of central tendency are followed in the table by several measures of dispersion or
variability. These indicate to what degree observations tend to cluster or be widely separated. Both
the standard deviation (Std.Deviation) and Variance (standard deviation squared) appear. The
standard error (Std.Error) is an estimate of the standard deviation of the mean if repeated samples of
the same size (here 1763) were taken. It is used in calculating the 95% confidence interval for the
sample mean. Technically speaking, if we would draw 100 samples of this size (1763) and construct
a 95% confidence for the mean for each of the 100 samples, then the expectation is that 95 out of
these 100 intervals will contain the (unknown) population mean. Also appearing is the Interquartile
Range (often abbreviated to IQR) which is essentially the range between the 25th and the 75th
percentile values. It is a variability measure more resistant to extreme scores than the standard
deviation. We also see the Minimum, Maximum and Range.

The final two statistical measures, Skewness and Kurtosis, provide numeric summaries about the
shape of the distribution of the data.

Skewness is a measure of the symmetry of a distribution. It measures the degree to which cases are
clustered towards one end of the distribution. It is normed so that a symmetric distribution has zero
skewness. A positive skewness value indicates bunching on the left and a longer tail on the right (for
example, income distribution); negative skewness follows the reverse pattern. The standard error of
skewness also appears in the Descriptives table and we can use it to determine if the data are
significantly skewed. One method is to use the standard errors to calculate the 95% confidence
interval around the skewness. If zero is not in this range, we could conclude that the distribution was
skewed. A second method is to compare the skewness value to 1.96*(Standard error of skewness)
from zero.

Kurtosis also has to do with the shape of a distribution and is a measure of how much of the data is
concentrated near the center, as opposed to the tails, of the distribution. It is normed to the normal
curve (for which kurtosis is zero). As an example, a distribution with longer tails and more peaked in
the middle than a normal is referred to as a leptokurtic distribution and would have a positive kurtosis

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-15

measure. On the other hand, a platykurtic distribution is a flattened distribution and has negative
kurtosis values. A standard error for kurtosis also appears. The same methods used for evaluating
skewness can be used to evaluate the kurtosis values.

Since most analysts are content to view histograms in order to make judgments regarding the
distribution of a variable, skewness and kurtosis are infrequently used.
Figure 4.13 Histogram of Age of Respondent

The histogram shows the shape of the distribution. For this sample of adults, age is roughly normally
distributed, except that the distribution is truncated below 18 years.

Boxplots, also referred to as box & whisker plots, are a more easily interpreted plot to convey the
same information about the distribution of a variable. In addition, the boxplot graphically identifies
outliers. Below we see the boxplot for hours worked.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-16

Figure 4.14 Boxplot of Hours Worked

The vertical axis represents the scale for the number of hours worked. The solid line inside the box
represents the median or 50th percentile. The top and bottom borders (referred to as “hinges”) of the
box correspond to the 75th and 25th percentile values of hours worked and thus define the
interquartile range (IQR). In other words, the middle 50% of data values fall within the box. The
“whiskers” (vertical lines extending from the top and bottom of the box) are the last data values that
lie within 1.5 box lengths (or IQRs) of the respective hinges (borders of box). Tukey considers data
points more than 1.5 box lengths from a hinge to be “outliers.” These points are marked with a circle.
Points more than 3 box lengths (IQR) from a hinge are considered by Tukey to be “far out” points and
are marked with an asterisk symbol (there are none here). This plot has many outliers. If a single
outlier exists at a data value, the case sequence number appears beside it (an ID variable can be
substituted), which aids data checking.

If the distribution were symmetric, the median would be centered within the box. In the plot above, the
median is toward the bottom of the box, indicating a positively skewed distribution.

4.15 Procedure: Explore
The Explore procedure is accessed from the Analyze…Descriptive Statistics…Explore menu
choice. With the Explore dialog box open:

1) Place one or more scale variables to be summarized in the Dependent list box.
2) The Factor list box can contain one or more categorical variables, and if used would cause

the procedure to present summaries for each category of the factor variable(s).
3) We can request specific statistical summaries and plots using the Statistics and Plots

buttons.
4) The Options button specifies how missing data will be handled.

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-17

Figure 4.15 Explore Dialog Box

In the Plots dialog:

1) Request a histogram rather than a stem and leaf plot.

The stem & leaf plot (devised by Tukey) is modeled after the histogram, but contains more
information. For most purposes, the histogram is easier to interpret and more useful. By default, a
boxplot will be displayed for each scale variable.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-18

Figure 4.16 Explore: Plots Dialog Box

In the Options dialog the user specifies how to deal with missing values:

1) Request pairwise rather than listwise deletion.

Figure 4.17 Explore: Options Dialog Box

When several variables are used you have a choice as to whether the analysis should be based on
only those observations with valid values for all variables in the analysis (called listwise deletion), or
whether missing values should be excluded separately for each variable (called pairwise deletion).
When only a single variable is considered both methods yield the same result, but they will not give
identical answers when multiple variables are analyzed in the presence of missing values.

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-19

Rarely used, the Report values option includes cases with user-defined missing values in frequency
analyses, but excludes them from summary statistics and charts.

4.16 Demonstration: Explore
We will work with the Census.sav data file in this lesson.

In this example we examine the summary statistics of age, and educ. We would like to see summary
statistics for the mentioned variables, as well as how much missing data we have, and if there are
unusual cases.

Detailed Steps for Explore
1) Place the variables age, and educ in the Dependent list box
2) Deselect the Stem and leaf and select Histogram in the Plots dialog
3) Request Exclude cases pairwise method in the Options dialog

Results from Explore
The Explore procedure produces two tables followed by the requested charts for each variable. The
first table, Case Processing Summary, displays the number of valid and missing cases for each
variable. Each variable has little missing data. For example, 2013 cases (respondents) had valid
values for age, while 0.5% were missing.

Figure 4.18 Explore Case Processing Table

The Descriptives table displays a series of descriptive statistics for age and educ. From the previous
table, we know that these statistics are based on 2013 and 2018 respondents, respectively.

Comparing measures of central tendency can provide useful information about the distribution. Here
the mean, median and 5% trimmed mean are very close within each variable and this suggests either
that there are not many extreme scores, or that the number of high and low scores is balanced. If the
mean were considerably above or below the median and trimmed mean, it would suggest a skewed
or asymmetric distribution. A perfectly symmetric distribution, the normal distribution, would produce
identical means, medians and trimmed means.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-20

Figure 4.19 Summaries for Age of Respondent and Highest Year of School Completed

The standard deviation of age, 17.351, indicates a variation around the mean of plus or minus 17
years. The standard error is used in calculating the 95% confidence interval for the sample mean.
The interquartile range of 26 indicates that the middle 50% of the sample lies within a range of 26
years.

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-21

The shape of the distribution can be of interest in its own right. Also, assumptions are made about the
shape of the data distribution within each group when performing significance tests on mean
differences between groups. The shape of the distributions of both these variables does not appear
problematic.

Figure 4.20 Histogram of Age of Respondent

Figure 4.21 Histogram of Highest Year of School Completed

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-22

Below we see the boxplots for age and educ. Boxplots are particularly useful for obtaining an overall
“feel” for a distribution. The median tells us the location or central tendency of the data. The length of
the box indicates the amount of spread within the data, and the position of the median in relation to
the box tells us something of the nature of the distribution. Boxplots are also useful when comparing
several groups.

Figure 4.22 Boxplot of Age of Respondent

The plot for age has no outliers while the plot for educ has a few outliers on the lower end of the
distribution.

Figure 4.23 Boxplot of Highest Year of School Completed

If suspicious outliers appear in your data you should check whether they are data errors. If not, you
need to consider whether you wish them included in your analysis. This is especially problematic
when dealing with a small sample (not the case here), since an outlier can substantially influence the
analysis.

We would not argue that something of interest always appears through use of the methods of
exploratory data analysis. However, you can quickly glance over these results, and if anything strikes

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-23

your attention, pursue it in more detail. The possibility of detecting something unusual encourages the
use of these techniques.

Apply Your Knowledge
1. See the output below, with statistics for Current Salary (data collected on employees). Which

statements are correct?
a. If repeated samples of the same sample size (here 467) were taken, and we record

the sample mean for each sample, then we expect a standard deviation of 795.399
for these sample means.

b. The median is lower than the mean, indicating that the distribution is skewed to the
left.

c. 50% of the salaries are in a range of salary of 13,500.

2. True or false? See the boxplot for variable X. The boxplot of this variable indicates a normal
distribution?

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-24

Additional Resources

4.17 Lesson Summary
In this lesson we explored how to obtain summary statistics for scale variables. We reviewed three
procedures—Frequencies, Descriptives and Explore—that can help us obtain that information.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Request and interpret appropriate summary statistics for scale variables

To support the achievement of the primary objective, students should now also be able to:

• Use the options in the Frequencies, Descriptives, and Explore procedures
• Interpret the results of the Frequencies, Descriptives, and Explore procedures

For additional information on Exploratory data
analysis, see:

Tukey, John W. 1977. Exploratory Data Analysis.
Reading, MA: Addison-Wesley.
Further Info

DATA DISTRIBUTIONS FOR SCALE VARIABLES

4-25

4.18 Learning Activity
In this set of learning activities you will use the Drinks.sav data file. The overall goal of this learning
activity is to obtain summary statistics for scale variables.

1. Run Frequencies on the variable alcohol, requesting the summary statistics median and
mean, plus a histogram with a superimposed normal curve. Suppress the display of the
frequency table.

2. What is the value of value of alcohol that splits the distribution in half? Is the median the

same as the mean? Which value is lower? What does that tell you about the shape of the
distribution of alcohol?

3. Does the histogram verify your description of the distribution of alcohol? How does it differ

from a normal distribution?

4. Run Descriptives to obtain default statistics for price and calories. On which variable is there
more dispersion? Is it even realistic to compare these two variables since they are on
different scales?

5. Continuing your analysis of price and calories, run the Explore procedure for these two

variables. Request a histogram in addition to the defaults.

6. Does the standard error of each variable help you better determine which variable has more
dispersion?

7. Review the boxplots and histogram for each variable. Which one has more outliers? What are

the outliers on each? Which variable now appears to have more dispersion, based on these
graphs? Does that match what you expected based on the statistics?

The file Drinks.sav, a PASW Statistics data file that
contains data on 35 beverages. Included is information on
their characteristics (e.g., % alcohol), price, origin, and a
rating of quality.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

4-26

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-1

Lesson 5: Making Inferences about
Populations from Samples
5.1 Objectives

After completing this lesson students will be able to:

• Explain how to make inferences about populations from samples

To support the achievement of the primary objective, students will also be able to:

• Explain the influence of sample size
• Explain the nature of probability
• Explain hypothesis testing
• Explain different types of statistical errors and power
• Explain differences between statistical and practical importance

5.2 Introduction
Ideally, for any analysis we would have data about everyone we wished to study (i.e., the whole
population). In practice, we rarely have information about all members of our population and instead
collect information from a representative sample of the population. However, our goal is to make
generalizations about various characteristics of the population based on the known facts about the
sample. In this lesson we discuss the requirements of making inferences to populations from
samples.

Business Context
Understanding how to make inferences from a sample to a population is the basis of inferential
statistics. This allows us to reach conclusions about the population without the need to study every
single individual. Hypothesis testing allows researchers to develop hypotheses which are then
assessed to determine the probability or likelihood of the findings.

Supporting Materials
None

5.3 Basics of Making Inferences about Populations from
Samples

We choose a sample with the intention of using the data from that sample to make inferences about
the “true” values in the population. These population measures are referred to as parameters while
the equivalent measures from samples are known as statistics. It is unlikely that we will know the
population parameters; therefore we use the sample statistics to infer what these population values
will be.

An important distinction between parameters and statistics is that parameters are fixed (although
often not known) while statistics vary from one sample to another. Due to the effects of random
variability, it is unlikely that any two samples drawn from the same population will produce the same
statistics. By plotting the values of a particular statistic (e.g., the mean) from a large number of
samples, it is possible to obtain a sampling distribution of the statistic. For small numbers of
samples, the mean of the sampling distribution may not closely resemble that of the population.
However, as the number of samples taken increases, the closer the mean of the sampling distribution

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-2

(the mean of all the means, if you like) gets to the population mean. For an infinitely large number of
samples, the mean will be exactly the same as the population mean. Additionally, as sample size
increases, the amount of variability in the distribution of sample means decreases. If you think of
variability in terms of the error made in estimating the mean, it should be clear that the more evidence
you have (i.e., the more cases in your sample), the smaller will be the error in estimating the mean.

If repeated random samples of size N are drawn from any population, then as N becomes large, the
sampling distribution of sample means approaches normality—a phenomenon known as the Central
Limit Theorem. This is an extremely useful statistical concept as it does not require that the original
population distribution is normal. In the next section, we’ll take a closer look at the influence of sample
size on the precision of the statistics.

5.4 Influence of Sample Size
In statistical analysis sample size plays an important role, but one that can easily be overlooked since
a minimum sample size is not required for the most commonly used statistical tests. Here we will
demonstrate the effect of sample size in two common data analysis situations: crosstabulation tables
and mean summaries.

Precision of Percentages
Precision is strongly influenced by the sample size. In the figures below we present a series of
crosstabulation tables containing identical percentages, but with varying sample sizes. We will
observe how the test statistics change with sample size and relate this result to the precision of the
measurement.

The Chi-square test of independence will be presented for each table as part of the presentation of
the effect of changing sample size. (See the lesson “Combining Categorical Variables,
Crosstabulations and the Chi-Square Statistic for a detailed discussion of the chi-square test.)

Sample Size of 100
The table below displays responses of men and women to a question asking for which candidate they
would vote.

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-3

Figure 5.1 Crosstab Table with Sample of 100

• 46 % of the men and 54 % of the women choose candidate A—an 8% difference
• The Chi-square test assesses whether men differ from women in the population
• Significance value of .424 indicates probability that men and women share the same view (do

not differ significantly) concerning candidate choice
• Conclude there is no gender difference in a sample of 100 people

Sample Size of 400
Now we view a table with percentages identical to the previous one, but based on a sample of 400
people, four times as large as before.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-4

Figure 5.2 Crosstabulation Table with Sample of 400

• 46 % of the men and 54 % of the women choose candidate A—an 8% difference
• Significance value of .110 for Chi-Square test indicates probability that men and women

share the same view (do not differ significantly) concerning candidate choice
• Conclude there is no gender difference in a sample of 400 people

Sample Size of 1,600
Finally we present the same table of percentages, but increase the sample size to 1,600; the increase
is once again by a factor of four.

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-5

Figure 5.3 Crosstabulation Table with Sample of 1,600

• 46 % of the men and 54 % of the women choose candidate A– an 8% difference
• Significance value of .001 for Chi-Square test indicates probability that men and women

share the same view (do differ significantly) concerning candidate choice
• Conclude there is a gender difference in a sample of 1600 people
• We have more precise estimates as our sample size increases and the 8% sample difference

can more accurately be estimated

Sample Size and Precision
In the series of crosstabulation tables above we saw that as the sample size increased we were more
likely to conclude there was a statistically significant difference between two groups when the
magnitude of the sample difference was constant (8%). This is because the precision with which we
estimate the population percentage increases with increasing sample size. This relation can be
approximated (see note for the exact relationship) by a simple equation: the precision of a sample
proportion is approximately equal to one divided by the square root of the sample size. The table
below displays the precision for the sample sizes used in our examples.

Table 5.1 Sample Size and Precision for Different Sample Sizes

Sample Size Precision
100 1/sqrt(100) = 1/10 .10 or 10%
400 1/sqrt (400) = 1/20 .05 or 5%
1600 1/sqrt(1600) = 1/40 .025 or 2.5%

And to obtain a precision of 1%, we would need a sample of 10,000 (1/sqrt(10,000) = 1/100).

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-6

Since precision increases as the square root of the sample size, in order to double the precision we
must increase the sample size by a factor of four. This is an unfortunate and expensive fact of
research. In practice, samples between 500 and 1,500 are often selected for national studies.

Precision of Means
The same basic relation—that precision increases with the square root of the sample size—applies to
sample means as well. To illustrate this we display histograms based on different samples from a
normally distributed population with mean 70 and standard deviation 10.

Formally, for a binomial or multinomial distribution (a
variable measured on a nominal or ordinal scale), the
standard error of the sample proportion (P) is equal to

NPPPStdErr )1(*)( −=

Thus the standard error is a maximum when P = .5 and
reaches a minimum of 0 when P = 0 or 1. A 95%
confidence band is usually determined by taking the
sample estimate plus or minus twice the standard error.
Precision (pre) here is simply two times the standard
error. Thus precision (pre) is

NPPPpre )1(**2)( −= .

If we substitute for P the value .5 which maximizes the
expression (and is therefore conservative) we have

( ) ( )
( )
N

N

N

Npre

1

5.0*2

5.0*5.0*2

)5.01(*5.0*2)5.0(

=

=

=

−=

This validates the rule of thumb used in the chapter. Since
the rule of thumb employs the value of P=.5, which
maximizes the standard deviation and thus the standard
error, in practice, greater precision would be obtained
when P departs from .5.

Further Info

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-7

A Large Sample of Individuals
Below is a histogram of 10,000 observations drawn from a normal distribution of mean 70 and
standard deviation 10. This is for one sample.

Figure 5.4 Histogram of 10,000 Observations

• Sample of this size closely matches its population
• Sample mean is very close to 70
• Sample standard deviation is near 10
• Shape of the distribution is normal

Means Based on Samples of 10
The second histogram displays 1,000 sample means drawn from the same population (mean 70,
standard deviation 10). Here each observation is a mean based on only 10 data points. In other
words we select 1,000 samples of ten observations each and plot their means in the histogram.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-8

Figure 5.5 Histogram of Means Based on Samples of 10

• Sample mean is very close to 70
• Sample standard deviation is reduced to 3.11
• Less variation among means based on groups of observations then among the observations

themselves
• Shape of the distribution is normal

Means Based on Samples of 100
The next histogram is based on a sample of 100 means where each mean represents 100
observations. Here each observation is a mean based on 100 data points. In other words we select
100 samples of 100 observations each and plot their means in the histogram.

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-9

Figure 5-6 Histogram of Means Based on Samples of 100

• Sample mean is very close to 70
• Sample standard deviation is reduced to 1.00
• Less variation among means based on groups of observations then among the observations

themselves
• Shape of the distribution is normal

Thus with means as well as percents, precision increases with the square root of the sample size.

Apply Your Knowledge
1. Consider the population distribution of the variable income, depicted below. Note, this

distribution is highly skewed. If we are sampling repeatedly from this distribution, each time
with sample size = 1000, then what is the distribution of the sample means?

a. The sample means show the same distribution as income
b. The sample means show a normal distribution
c. The distribution of the sample means is unknown

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-10

2. Which of these statements is correct?
a. The bigger the sample, the higher

b. The

the precision with which we estimate a population
percentage

bigger the sample, the lower

c. A sample size of 2000 will give a precision for a population percentage that is twice
as precise as the precision for this percentage with sample size 1000

the precision with which we estimate a population
mean

d. If a population percentage is 0 or if it is 100, the standard error for the sample
percentage will be 0

3. True or false? A population parameter varies from sample to sample?

5.5 Hypothesis Testing
Whenever we wish to make an inference about a population from our sample, we must specify a
hypothesis to test. It is common practice to state two hypotheses: the null hypothesis (also known
as H0) and the alternative hypothesis (H1). The null hypothesis being tested is conventionally the
one in which no effect is present. For example, we might be looking for differences in mean income
between males and females, but the (null) hypothesis we are testing is that there is no difference
between the groups. If the evidence is such that this null hypothesis is unlikely to be true, the
alternative hypothesis should be accepted. Another way of thinking about the problem is to make a
comparison with the criminal justice system. Here, a defendant is treated as innocent (i.e.. the null
hypothesis is accepted) until there is enough evidence to suggest that they perpetrated the crime
beyond any reasonable doubt (i.e., the null hypothesis is rejected).

The alternative hypothesis is generally (although not exclusively) the one we are really interested in
and can take any form. In the above example, we might hypothesis that males will have a higher
mean income than females. When the alternative hypothesis has a “direction” (we expect a specific
result), the test is referred to as one-tailed. Often, you do not know in which direction to expect a
difference and may simply wish to leave the alternative hypothesis open-ended. This is a two-tailed
test and the alternative hypothesis would simply be that the mean incomes of males and females are
different.

Whichever option you choose will have implications when interpreting the probability levels. In
general, the probability of the occurrence of a particular statistic for a one-tailed test will be half that of
a two-tailed test as only one extreme of the distribution is being considered in the former type of test.

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-11

5.6 The Nature of Probability
Descriptive statistics describe the data in our sample through the use of a number of summary
procedures and statistics. Inferential statistics allow us to infer the results from the sample on
which we have data to the population which the sample represents. To do this, we use
procedures that involve the calculation of probabilities. The fundamental issue with inferential
statistical tests concerns whether any “effects” (relationships or differences between groups) we have
found are genuine or are a result of sampling variability (in other words, mere chance). So we have
two hypotheses and we want to know which hypothesis is true. The way hypotheses are assessed is
by calculating the probability or the likelihood of finding our result. A probability value, which can
range from 0 to 1 (corresponding to 0% to 100% in terms of percentages), can be defined as “the
mathematical likelihood of a given event occurring,” and as such we can use such values to
assess whether the likelihood that any differences we have found are the result of random chance.

Now in statistics, we want to be sure of our conclusions, so having formally stated your hypotheses,
you must then select a criterion for acceptance or rejection of the null hypothesis. With probability
tests such as the chi-square test or the t-test, you are testing the likelihood that a statistic of the
magnitude obtained (or greater) would have occurred by chance assuming that the null hypothesis is
true. You always assess the null hypothesis, which is the hypothesis that states there is no difference
or relationship. In other words, we only wish to reject the null hypothesis when we can say that the
result would have been extremely unlikely under the conditions set by the null hypothesis. In this
case, the alternative hypothesis should be accepted. It is worth noting that this does not “prove” the
alternative hypothesis beyond doubt, it merely tells us that the null hypothesis is unlikely to be true.

But what criterion (or alpha level, as it is often known) should we use? Unfortunately, there is no
easy answer! Traditionally, a 5% level is chosen, indicating that a statistic of the size obtained would
only be likely to occur on 5% of occasions (or once-in-twenty) should the null hypothesis be true. This
also means that, by choosing a 5% criterion, you are accepting that you will make a mistake in
rejecting the null hypothesis 5% of the time.

5.7 Types of Statistical Errors
Recall that when performing statistical tests we are generally attempting to draw conclusions about
the larger population based on information collected in the sample. There are two major types of
errors in this process. False positives, or Type I errors, occur when no difference (or relation) exists in
the population, but the sample tests indicate there are significant differences (or relations). Thus the
researcher falsely concludes a positive result. This type of error is explicitly taken into account when
performing statistical tests. When testing for statistical significance using a .05 criterion (alpha level),
we acknowledge that if there is no effect in the population then the sample statistic will exceed the
criterion on average 5 times in 100 (.05).

Type II errors, or false negatives, are mistakes in which there is a true effect in the population
(difference or relation) but the sample test statistic is not significant, leading to a false conclusion of
no effect. To put it briefly, a true effect remains undiscovered. The probability of making this type of
error is often referred to as the beta level. Whereas you can select your own alpha levels, beta levels
are dependent upon things such as the alpha level and the size of the sample. It is helpful to note that
statistical power, the probability of detecting a true effect, equals 1 minus the Type II error and the
higher the power the better.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-12

Table 5.2 Types of Statistical Errors in Hypothesis Testing
Statistical Test Outcome

Not Significant Significant

Population

No Difference

(Ho is True)

Correct

Type I error (α)
False positive

True Difference

(H1 is True)

Type II error (β)
False negative

Correct
(Power)

Statistical Power Analysis
With increasing precision we are better able to detect small differences that exist between groups and
small relationships between variables. Power analysis was developed to aid researchers in
determining the minimum sample size required in order to have a specified chance of detecting a true
difference or relationship of a given size. To put it more simply, power is used to quantify your ability
to reject the null hypothesis when it is false. For example, suppose a researcher hopes to find a mean
difference of .8 standard deviation units between two populations. A power calculation can determine
the sample size necessary to have a 90% chance that a significant difference will be found between
the sample means when performing a statistical test at a specified significance level. Thus a
researcher can evaluate whether the sample is large enough for the purpose of the study. Books by
Cohen (1988) and Kraemer & Thiemann (1987) discuss power analysis and present tables used to
perform the calculation for common statistical tests. In addition specialty software is available for such
analyses, such as SamplePower®

5.8 Statistical Significance and Practical Importance

. Power analysis can be very useful when planning a study, but
does require such information as the magnitude of the hypothesized effect and an estimate of the
variance.

A related issue involves drawing a distinction between statistical significance and practical
importance. When an effect is found to be statistically significant we conclude that the population
effect (difference or relation) is not zero. However, this allows for a statistically significant effect that is
not quite zero, yet so small as to be insignificant from a practical or policy perspective. This notion of
practical or real world importance is also called ecological significance. Recalling our discussion of
precision and sample size, very large samples yield increased precision, and in such samples very
small effects may be found to be statistically significant. In such situations, the question arises as to
whether the effects make any practical difference. For example, suppose a company is interested in
customer ratings of one of its products and obtains rating scores from several thousand customers.
Furthermore, suppose mean ratings on a 1 to 5 satisfaction scale are 3.25 for male and 3.15 for
female customers, and this difference is found to be significant. Would such a small difference be of
any practical interest or use?

When sample sizes are small (say under 30), precision tends to be poor and so only relatively large
(and ecologically significant) effects are found to be statistically significant. With moderate samples
(say 50 to one or two hundred), small effects tend to show modest significance while large effects are
highly significant. For very large samples, several hundreds or thousands, small effects can be highly
significant; thus an important aspect of the analysis is to examine the effect size and determine if it is
important from a practical, policy or ecological perspective.

Apply Your Knowledge
1. Which of these statements is correct?

MAKING INFERENCES ABOUT POPULATIONS FROM SAMPLES

5-13

a. The nature of the null hypothesis is: there is no effect/difference
b. The nature of the null hypothesis is: there is an effect/difference
c. The nature of the alternative hypothesis is: there is no effect/difference

2. Which of these statements is correct?

a. If we reject the null hypothesis while the null hypothesis is true in reality, then we make a
Type II error.

b. If we reject the null hypothesis while the null hypothesis is true in reality, then we make a
Type I error.

c. If we reject the null hypothesis, while the alternative hypothesis is true in reality, then we
make a Type I error.

d. If we reject the null hypothesis, while the alternative hypothesis is true in reality, then we
make a Type II error.

3. Which of these statements is false?

a. The probability of detecting a true effect is known as “power.”
b. Probability of making a Type II error + the probability of detecting a true effect=1
c. With an alpha level of .01, the probability of a Type II error will always be lower than

that.
d. With Alpha=.000001 the probability to commit a Type I error is lower than with

Alpha=0.1.

5.9 Lesson Summary
In this lesson, we explored how to make inferences from a sample to a population. This allows us to
reach conclusions about the population without the need to study every single individual. Hypothesis
testing allows researchers to develop hypotheses which are then assessed to determine the
probability or likelihood of the findings.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Explain how to make inferences about populations from samples

To support the achievement of the primary objective, students should now also be able to:

• Explain the influence of sample size
• Explain the nature of probability
• Explain hypothesis testing
• Explain different types of statistical errors and power
• Explain differences between statistical and practical importance

5.10 Learning Activity
In this set of learning activities you won’t need any supporting material.

State the null and alternative hypotheses when assessing each of the following scenarios. Would you
do a one- or two-tailed test for each?

1. The relationship between gender and belief in the afterlife.
2. The relationship between number of years of education and income.
3. The difference in mean income between men and women.
4. The difference between happiness in marriage between married persons in America, Europe,

Asia, Australia and Africa.

INTRODUCTION TO STATISTICAL ANALYSIS WITH IBM SPSS STATISTICS

5-14

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-1

Lesson 6: Relationships Between
Categorical Variables
6.1 Objectives
After completing this lesson students will be able to:

• Perform crosstab analysis on categorical variables
• Perform a statistical test to determine whether there is a statistically significant relationship

between categorical variables

To support the achievement of this primary objective, students will also be able to:

• Use the options in the Crosstabs procedure
• Request appropriate statistics for a crosstabulation
• Interpret cell counts and percents in a crosstabulation
• Use the Chi-Square test, interpret its results, and check its assumptions
• Use the Chart Builder to visualize a crosstabulation

6.2 Introduction
Many data analysts consider crosstabulations the core of data analysis. Crosstabulations display the
joint distribution of two or more categorical variables. In this lesson we will provide examples and
advice on how best to construct and interpret crosstabulations. With PASW Statistics, statistical tests
are used to determine whether a relationship between two, or more, variables is statistically
significant in a crosstabulation. Another way to state this is that a statistical test is done to determine
whether a relationship observed in the sample is caused by chance sampling variation or instead is
likely to exist in the population of interest. To support the analysis, we also show examples of
graphical displays of crosstabulations.

Business Context
When analyzing data, the focus is on both descriptive and causal relationships. When we examine a
table with categorical variables, we would like to know whether a relationship we observe is likely to
exist in our target population or instead is caused by random sampling variation. We might want to
know whether:

• Satisfaction with the instructor in a training workshop was related to satisfaction with the
course material.

• Eating more often at fast-food restaurants was related to more frequent shopping at
convenience stores.

• Certain types of people are more likely to buy laptop versus desktop computers.

Statistical testing tells us whether two categorical variables are related. Without that, we might make
decisions based on observed category percentage differences that are not likely to exist in a
population of customers.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-2

6.3 Crosstabs
Crosstabulations are commonly used to explore how demographic characteristics are related to
attitudes and behaviors, but they are also used to see how one attitude is related to another. The key
point is that crosstabulations are used to study the relationships between two, or more, categorical
variables.

Crosstabs Illustrated
To provide context, consider the table depicted below. This table shows counts and percentages in
the cells. For example, there are 206 female clerical employees. The percentages are calculated
within Gender (the column variable). Thus, 60.9% (157/258) of the men are clerical.

To assess whether the two variables are related, there is a standard procedure to follow, depending
on which percentages we are using.

1) If using percentages based on the row variable, compare percentages within each column.
Look to see if the percentages are the same or different across rows within each column. If
percentages are the same, there is no relationship between the variables.

2) If using percentages based on the column variable, compare percentages within each row.
Look to see if the percentages are the same or different across columns within each row.
Again, identical percentages indicate no relationship between the variables.

Here, percentages are based on Gender, so to assess whether Employment Category and Gender
are related, we should compare the percentages 95.4% with 60.9% (95.4% of women are clerical
versus 60.9% of men), 0% with 10.5% and 4.6% with 28.7%. Here, percentages differ substantially.

Figure 6.1 Crosstabs Illustrated

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-3

6.4 Crosstabs Assumptions
To use Crosstabs, one condition has to be met:

1) Variables used in Crosstabs must be categorical (nominal, ordinal).

6.5 Requesting Crosstabs
Requesting Crosstabs is accomplished by following these steps:

1) Select variables for the Crosstabs procedure, at least one for the row and one for the column
dimension; more than one variable can be used in a dimension of the table

2) Select percentage options.
3) Review the procedure output to investigate the relationship between the variables including:

a. Cell counts
b. Cell percentages.

6.6 Crosstabs Output
The Crosstabs table has at least two rows and two columns. The information contained in the cells of
this table will depend on what you requested:

• By default, the (cell) Count is the only statistic displayed
• Normally, you will ask for either a row or column percent, or both
• These percentages are based on the variable categories in the row or column, respectively.
• The last row and column of the table display marginal totals for each row and column
• The lower right hand cell contains table total statistics, including the number of valid cases for

the table.
• The percentages are used to determine whether one variable is statistically associated with

another

Figure 6.2 Example of Crosstabs Output

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-4

6.7 Procedure: Crosstabs
The Crosstabs procedure is accessed from the Analyze…Descriptives…Crosstabs menu choice.
With the Crosstabs dialog box open:

1) Place one or more variables in the Row(s) box.
2) Place one or more variables in the Column(s) box.
3) Open the Cells dialog to specify percents and other cell statistics .
4) The order of categories can be changed in the Format dialog.

A separate table will be created for all combinations of variables in the Rows and Columns boxes.

Figure 6.3 Crosstabs Dialog

In the Cells Display dialog:

1) By default the cell count is displayed (the Observed check box in the Counts area).
2) Typically one or both of row and column percents are selected with the appropriate check

boxes in the Percentages area.
3) Other more non-standard statistics available include residuals which help to understand

where there are deviations from the expected counts if there is no relationship.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-5

Figure 6.4 Crosstabs Cell Display Dialog

6.8 Example: Crosstabs
We will work with the Census.sav data file in this lesson.

In this example we want to study how overall happiness (happy) is related to marital status (marital).
We want to know what the percentage of overall happiness for each of the marital status groups, so
we will percentage based on marital. The variable marital is nominal while happy is ordinal, so both
variables are categorical and appropriate for Crosstabs.

Detailed Steps for Crosstabs
1) Place happy in the Row(s) box.
2) Place the variable marital in the Column(s) box.
3) Select Column in the Cells Display dialog.

The percentages are based on the categories of marital. This indicates that we expect marital status
to affect general happiness.

Results from Crosstabs
As with all analyses, you should first look to see how many cases are missing from an analysis. The
first table in the Viewer window provides this information (not shown). There is actually very little
missing data here.

To examine a crosstabulation table:

• First focus on the marginal totals, which are equivalent to frequency distributions for each
variable. We see that most people are either very happy (597) or pretty happy (1099) with
their life overall, and most respondents are either married (969) or never married (528).

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-6

• Next turn to the counts in the cells of the table. Focus on the upper left-hand cell where 398
respondents indicated they were very happy with their life and married. Conversely, from the
bottom right-hand cell, we see that 123 people are not too happy with their life and single.

• In the upper left-hand cell the first percentage is 41.1% (calculated within marital status). This
tells us that, of the people who are married (398), 41.1% (398/969) are very happy in their
life. The other percentages are calculated in a similar manner.

We observe that:

• 41.1% of those who are married say they are very happy. This is much larger than any other
category. The next largest is divorced, at 20.0%.

• Married people also have the lowest percentage of people who say they are not too happy at
8.4%. Those who are separated have the highest percentage (25.7%).

These differences would certainly lead us to the conclusion that marital status is related to general
happiness, and that married people are happier than others.

Figure 6.5 Crosstabulation of marital and happy

Apply Your Knowledge
1. What level of measurement should variables have to be used in Crosstabs?

a. Any level of measurement
b. Ordinal and Scale
c. Nominal and Ordinal

2. Consider a Crosstabs table with the variable sex on the rows and variable happiness on the

columns. If we wanted to see how sex might affect happiness, how should the table be
percentaged?

a. By happiness
b. By sex
c. By both

3. Consider the output depicted below. Which statements are correct?

a. 11.9% of all women are managers
b. 4.6% of all women are managers
c. 60.9% of all men are clericals
d. 2.1% of all cases are female managers

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-7

6.9 Chi-Square Test
Comparing percentages is only part one of the story. What we don’t know is whether differences in
percentages are due to sampling variation, or instead are likely to be real and exist in the population.
For this we turn to the Chi-Square test.

Every statistical test has a null hypothesis. In most cases, the null hypothesis is that there is no
relationship between two variables. This is also true for the null hypothesis for a crosstabulation, so
we use the Chi-Square test to determine whether there is a relationship between marital status and
general happiness. If the significance is small enough, the null hypothesis is rejected. In the context
of a crosstabulation, the null hypothesis would be that percentages across categories of an
independent variable are statistically equivalent.

As with other statistical tests, sample size has an effect on the calculated significance. Larger sample
sizes, everything else being equal, will reduce the significance value of a test statistic and thus make
it easier to reject the null hypothesis. Chi-Square is particularly sensitive to the effect of increased
sample size. In large samples, say 1000 to 1500 cases and above, it is fairly easy to show that two
variables are related using the .05 significance level. So in large samples it is probably best to require
a lower significance level before rejecting the null hypothesis (this is appropriate for the data file
Census.sav, which contains over 2000 cases).

Substantive Versus Statistical Significance
A critical distinction to make is whether a relationship is
statistically significant versus being substantively, or
practically, significant. In very large samples, small
differences will be statistically significant. You should not
let statistical significance be the overriding determinant in
deciding whether a relationship or pattern you have
discovered is interesting or important.

Note

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-8

Chi-Square Test Assumptions
To correctly use a Chi-Square statistical test in Crosstabs, one condition has to be met:

1) At most, 20% of the expected values in the cells of the table should be below 5.

Chi-Square is calculated based on the expected values in each cell. If too many expected values are
below 5, the reported significance can be incorrect. Because of this, PASW Statistics adds a footnote
to the Chi-Square table noting the number of cells with expected counts less than 5. If more than 20
percent of the cells are in this condition, you should consider grouping categories of one or both
variables.

6.10 Requesting the Chi-Square Test
The procedure to request a Chi-Square test is to:

1) Select one or more row variables.
2) Select one or more column variables.
3) Request appropriate percentages.
4) Request the Chi-Square statistic.
5) Inspect the Chi-Square test output, if the test results in a significant result, conclude the null

hypothesis of independence is rejected and report the differences in percentages. If the Chi-
Square is not significant, then conclude there are sample differences, but equality of
percentages in the population cannot be rejected.

6.11 Chi-Square Output
The table titled Chi-Square Tests shows the Chi-Square test.

There are three Chi-Square values listed, the first two of which are used to test for a relationship. We
concentrate on the Pearson Chi-Square statistic, which is adequate for almost all purposes. The
actual value (here 153.16), is not important, nor is the number of degrees of freedom (df), which is
related to the number of cells in the table. These values are used to calculate the significance for the
Chi-Square statistic, labeled “Asymp. Sig. (2-sided).” We interpret the significance value as the
chance, if the null hypothesis is true, of finding a Chi-Square value at least this large. It is this value
that we use to test the null hypothesis. If the significance value is smaller than the preset alpha, then
we reject the null hypothesis of independence.

Another option to deal with sparse cells is to use exact
statistical tests (available in the PASW Statistics Exact
Tests option). Note

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-9

Figure 6.6. Chi-Square Tests Output

6.12 Procedure: Chi-Square Test
To include the Chi-Square test in a Crosstabs, with the Crosstabs dialog open:

1) Place one or more variables in the Row(s) box.
2) Place one or more variables in the Column(s) box.
3) Optionally, include appropriate percentages in the Cells dialog.
4) Select Chi-square in the Statistics dialog.

Figure 6.7 Statistics Dialog

In terms of formal testing, we often use significance values
of .05 or .01 to determine whether we should reject the
null hypothesis. Best

Practice

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-10

6.13 Example: Chi-Square Test
We will work with the Census.sav data file in this lesson.

In this example we continue to examine the relationship between marital status (marital) and
happiness with one’s life overall (happy). We would like to determine whether the percentage
differences we observed above are likely to be observed in the total population.

Detailed Steps for Crosstabs with Chi-Square Test
1) Place the variable happy in the Row(s) box.
2) Place the variable marital in the Column(s) box.
3) Select Column in the Cells Display dialog.
4) Select Chi-square in the Statistics dialog.

Results from Crosstabs with Chi-Square Test
The Crosstabs table percentages are based on the columns, or on marital. We observe that the
marital groups differ substantially in their happiness (e.g. the percentage not too happy is 8.4% for
those married versus 25.7% for those married).

Figure 6.8 Crosstabulation of Marital Status and General Happiness

The table titled Chi-Square Tests shows that the significance value is quite small and is rounded off to
.000, which means that the actual value is less than .0005. With the significance being so low, it is
rather unlikely that the null hypothesis is true. Therefore, it is reasonable to conclude that there is a
relationship between marital status and general happiness. The exact form of the relationship is as
described above, i.e., as depicted in the table.

Figure 6.9 Chi-Square Tests

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-11

Apply Your Knowledge
1. A chi-square value from a test of two variables is 53.4. What should we conclude about the

relationship of these two variables?
a. There is no relationship
b. There is a relationship
c. We need to know the significance of this value to reach a conclusion
d. Both A and C

2. True or false? A Chi-Square test is done differently for nominal compared to ordinal

variables?

3. See the output below. True or false? There is a statistically significant relationship between
Minority Classification and Employment Category (alpha=0.05).

4. See the table below. True or false? It is incorrect to do the Chi-Square test, because more
than 3 out of the 6 cells have an observed

count less than 5.

6.14 Clustered Bar Chart
For presentations it is often useful to show a graph of the relationship between two categorical
variables. Clustered bar charts are the most effective method of doing this for Crosstabs.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-12

Clustered Bar Chart Illustrated
As an illustration of a clustered bar chart, see the graph below. The bars show the percentage in each
of the job categories by gender. For example, 95% of the women are clerical versus 61% of the men.

Figure 6.10 Clustered Bar Chart Illustrated

6.15 Requesting a Clustered Bar Chart with Chart Builder
The Chart Builder procedure allows for the creation of a variety of charts, including clustered bar
charts. It provides for great flexibility in creating charts, including formatting options.

To create a clustered bar chart:

1) In the Chart Builder, select the clustered bar chart type of graph.
2) Specify the appropriate variables.
3) Specify the appropriate percentage statistic.
4) Inspect the graph in the output and compare the percentages.

6.16 Clustered Bar Chart from Chart Builder Output
The clustered bar chart from Chart Builder gives the user control of:

• Which variable is used for clustering, and which variable defines the X-axis
• What variable is used for percentages
• The chart also correctly lists that the percentage is displayed, and it has labels and a legend

that is placed outside the chart area.

In the graph depicted below, RS HIGHEST DEGREE [degree] is used for clustering (in the legend),
GENERAL HAPPINESS [happy] defines the X-axis. Percentages are based on degree, so that
percentages for each degree sum to 100%.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-13

Figure 6.11 Example of a Clustered Bar Chart from Chart Builder

6.17 Procedure: Clustered Bar Chart with Chart Builder
The Chart Builder procedure is accessed from the Graphs…Chart Builder menu. With the Chart
Builder dialog open:

1) Select a clustered bar chart in the Chart Builder and drag it into the Chart preview area
2) Select a variable for the X-axis.
3) Select a variable for the Cluster on X: set color box.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-14

Figure 6.12 Chart Builder Dialog to Create Clustered Bar Chart

To specify the statistic:

1) In the Element Properties dialog, specify the Percentage() statistic.
2) Select the appropriate base for percentages in the Set Parameters dialog.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-15

Figure 6.13 Setting Percentage in Element Properties Dialog

6.18 Example: Clustered Bar Chart with Chart Builder
We will create a clustered bar chart of the crosstab table of marital status and general happiness. We
want to see how general happiness varies across categories of marital status, so we use marital
status as the clustering variable. This is equivalent to how we percentage a crosstab by marital status
to study this relationship.

Detailed Steps for Clustered Bar Chart
1) Select the clustered bar chart icon and put it in the Chart Preview pane.
2) Place happy in the X-axis box.
3) Place marital in the Set Color box.
4) Select the Percentage() statistic in the Element Properties Statistics drop down
5) Select Total for Each Legend Variable Category in the Set Parameters dialog.

Results from the Clustered Bar Chart Created with Chart Builder
A crosstab table with percentages based from marital status lets us compare percentages of that
variable within categories of general happiness, and this is mirrored in the clustered bar chart.

• There are 3 values on the X-axis for each separate value of general happiness
• At each value, the five categories of marital status are displayed with the bar representing

percentages
• We can readily compare the categories of marital status in this arrangement, equivalent to

comparing across rows in the crosstabulation

Married respondents are by far more likely to say they are very happy, and they are less likely to say
they are not too happy. There isn’t much difference between the other categories.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-16

Figure 6.14 Clustered Bar Chart of Marital Status and General Happiness

6.19 Adding a Control Variable
Tables can be made more complex by adding variables to the layer dimension.

• Adding one or more layer variables to a Crosstabs table is known as adding control
variables

• Using a control variable means that a subtable will be formed for every value of the control
variable; thus, the relationship between the original variables will be controlled by examining
it within values of the third variable

• You can add more than one control variable to a table by nesting variables in the layer (use
the Next button), but unless you have a large sample size, or the variables have only a few
categories, you may quickly create tables with only a few cases, or even no cases, in several
cells

Control Variable Crosstabs Illustrated
The table depicted below includes sex as the control variable. Percentages are calculated per
subtable.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-17

Figure 6.15 A Control Variable Table

6.20 Requesting a Control Variable
Including a control variable in Crosstabs is accomplished with these steps:

1) Select row and column variables.
2) Add a control variable.
3) Request appropriate percentages.
4) Optionally, request the Chi-Square statistic.
5) In the output, compare the percentages within each category of the control variable to see if

there is a relationship between the row variable and column variable, for that particular
category.

6) In the output, inspect the Chi-Square test output for each of the subtables and compare these
results.

6.21 Control Variable Output
The layer variable will be included as outer-left variable in the crosstabulation. Here the control
variable is gender.

Choosing the proper control variables is somewhat of an
art, since there is no reason (nor is there time) to try every
possible other variable as a control. Select control
variables based on the goals of the study, any available
theory, and also based on which variables are possibly
related to one or both variables in the table. Demographic
variables are often used as controls.

Best
Practice

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-18

Figure 6.16 Crosstabs with a Control variable

If a Chi-Square test is requested, a Chi-Square test will be performed for each of the values of the
control variable. The interpretation of the test is the same as in any table, but it applies only to each
subtable (here, for the male and female subtables separately).

Figure 6.17 Chi-Square Test with a Control Variable

6.22 Procedure: Adding a Control Variable
With the Crosstabs dialog open, the procedure to include a control variable is:

1) Place one or more variables in the Rows box.
2) Place one or more variables in the Columns box.
3) Place the control variable in the Layer(s) box.
4) In the Cells dialog, select appropriate percentages.
5) Optionally, select Chi-Square in the Statistics dialog.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-19

Figure 6.18 Crosstabs Dialog with Control Variable

6.23 Example: Adding a Control Variable
We’ll create an entirely new table with three new variables to investigate the relationship between
sex, race, and do you own a business (ownbiz). We’ll use sex as the control variable to further
illuminate the relationship between race and ownbiz.

To illustrate the benefit of adding a control variable, first we create the table of race and ownbiz.

Detailed Steps for the Two-Way Crosstabs
1) Select Reset button
2) Place [race in the Column(s) box
3) Place ownbiz in the Rows box
4) In the Cells Display dialog, select Column percents
5) Select Chi-Square in the Statistics dialog

Results for the Two-Way Crosstabs
We see that the percentages in the table suggests that whites (13.6%) are most likely to own a
business, followed by those of other races (10.9%) and then blacks (6.8%).
Figure 6.19 Crosstab of Race and Owning a Business

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-20

Figure 6.20 Chi-Square Tests

Detailed Steps for Control Variable Crosstabs
Now we’ll see what happens when we add the variable sex to the layer.

1) Place sex in the Layer(s) box.

Results from Adding a Control Variable
The resulting table is much larger. Actually, there are two subtables, one for males and one for
females.

Does the relationship vary by gender? To a certain extent it does.

• White males are much more likely than black males to own a business (almost a 10
percentage point difference)

• Black females are closer to white females in the percentage owning a business (a 4
percentage point difference)

Figure 6.21 Crosstab of Race, Sex, and Owning a Business

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-21

Chi-Square Tests for Control Variable
Are these differences we observe statistically significant? We can answer this, as before, with the chi-
square test.

Males. The chi-square test is significant (.018). This means that, for males, race is related to owning
a business.

Females. The chi-square test is not significant (.263). This means that, for females, there is no
relationship between race and owning a business.
Figure 6.22 Chi-Square Tests with a Control Variable

We added a control variable and discovered that the initial relationship depends somewhat on the
respondent’s gender. What does this mean about our analysis of the bivariate table? Should the
results in this table be entirely discarded in favor of the three-way table?

The first point to make in answering these questions is that this example confirms the importance of
multivariate analysis. That said, we can add these comments:

1) The bivariate table is still “correct,” although that word is better replaced by “valuable.” The

bivariate table is our estimate of the relationship between owing a business and race.
2) However, if we generalize this relationship and believe it is the same for males and females, we

would be incorrect.
3) Adding the control variable has given us additional insight into this relationship. It may help us

understand more about the two-way relationship.

Apply Your Knowledge
1. See the output below. Which statements are correct?

a. The table shows that there is a statistically significant relationship between gender
and minority classification (alpha=0.05)

b. The relationship between current salary (binned) and minority classification is not
significant for women (alpha=0.05)

c. The relationship between current salary (binned) and minority classification is not
significant for men (alpha=0.05)

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-22

d. The relationship between current salary (binned) and minority classification is not the
same for men and women

Brainstorming Exercise
In regard to your own data, what are some potentially important control variables for your analyses?

6.24 Extensions: Beyond Crosstabs
Decision Tree analysis is often used by data analysts who need to predict to which group an
individual can be classified, based on potentially many nominal or ordinal background variables. For
example, an insurance company is interested in the combination of demographics that best predict
whether a client is likely to make a claim. Or a direct mail analyst is interested in the combinations of
background characteristics that yield the highest return rates. Here the emphasis is less on testing a
hypothesis and more on a heuristic method of finding the optimal set of characteristics for prediction
purposes. CHAID (Chi-Square automatic interaction detection), a commonly used type of decision-
tree technique, along with other decision-tree methods are available in the PASW Decision Trees
add-on module.

A technique called loglinear modeling can also be used to analyze multi-way tables. This method
requires statistical sophistication and is well beyond the domain of this course. PASW Statistics has
several procedures (Genlog, Loglinear and Hiloglinear) to perform such analyses. They provide a way
of determining which variables relate to which others in the context of a multi-way crosstab (also
called contingency) table. These procedures could be used to explicitly test for the three-way
interaction suggested above. For an introduction to this methodology see Fienberg (1977). Academic
researchers often use such models to test hypotheses based on many types of data.

RELATIONSHIPS BETWEEN CATEGORICAL VARIABLES

6-23

6.25 Association Measures
Measures of association for categorical variables have been developed to summarize the strength of
a relationship in a single statistic. This allows you to compare different tables (groups) concisely.

• They are typically normed to range between 0 and 1 for variables on a nominal scale, or –1
and 1 for variables on an ordinal scale.

• The specific measure used thus depends on the level of measurement of the variables,
among other things

• In general, measures of association are bivariate rather than designed for multi-way tables

The Crosstabs procedure provides many measures of association. Several are available because
different aspects of the association are emphasized by particular measures.

There are four tests each in the Nominal and Ordinal areas. The variable at the lowest level of
measurement determines which test can be used, e.g., in a table with a nominal and ordinal variable,
you must use one of the nominal tests.
For a discussion of these tests, see Gibbons (2005).

Figure 6.23 Crosstabs Statistics Dialog with Measures of Association

6.26 Lesson Summary
We explored the use of the Crosstabs procedure to analyze relationships between categorical
variables.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

6-24

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Perform crosstab analysis on categorical variables

To support the achievement of the primary objective, students should now also be able to:

• Use the options in the Crosstabs procedure
• Request appropriate statistics for a crosstabulation
• Interpret cell counts and percents in a crosstabulation
• Use the Chi-Square test, interpret its results, and check its assumptions
• Use the Chart Builder to visualize a crosstabulation

6.27 Learning Activity
The overall goal of this learning activity is to create two and three-way crosstabulations to explore the
relationship between several variables and to use the Chart Builder to visualize the relationship.
You’ll use the file Census.sav.

Supporting Materials

1. Investigate the relationship between the variables race and self-rated health (health), can
people be trusted (cantrust), and support for spending on scientific research (natsci). Request
appropriate percentages and a Chi-Square test.

2. What is the significance of the Chi-Square test? Is there a statistically significant relationship

between race and these variables? Describe the relationships that you observe in the tables.

3. Create a clustered bar chart to display the relationship of one of the tables with a significant
relationship.

4. Now add sex as a control variable. Are there differences in the relationship by gender? Are
the relationships in the subtables significant or not? Are they different or not?

5. Now remove the control variable sex and substitute the variable born (was the respondent
born in the U.S. or not). Does the relationship between race and self-rated health vary
depending on place of birth?

6. For those with extra time: Select some variables that you find interesting and explore their

relationship with Crosstabs. Begin with two-way tables before investigating more complex
relationships.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics. Supporting

Materials

THE INDEPENDENT- SAMPLES T TEST

7-1

Lesson 7: The Independent- Samples T
Test
7.1 Objectives
After completing this lesson students will be able to:

• Perform a statistical test to determine whether there is a statistically significant difference
between two groups on a scale variable

To support the achievement of this primary objective, students will also be able to:

• Check the assumptions of the Independent-Samples T Test
• Use the Independent-Samples T Test to test the difference in means
• Know how to interpret the results of a Independent-Samples T Test
• Use the Chart Builder to create an error bar graph to display mean differences

7.2 Introduction
When our purpose is to examine group differences on scale variables, we turn to the mean as the
summary statistic since it provides a single measure of central tendency. In this lesson we outline the
logic involved when testing for mean differences between groups, state the assumptions, and then
perform an analysis comparing two groups.

Business Context
When analyzing data, we are concerned with whether groups differ from each other. Without
statistical testing, we might make decisions based on perceptions that are not likely to exist in a
population of customers. An Independent-Samples T Test allows us to determine if two groups differ
significantly on a scale variable. For example, we might want to know whether:

• One customer group purchases more items, on average, than a second
• Drug A reduces depression levels better than drug B
• Student test scores in one class are higher than in a second class

7.3 The Independent-Samples T Test
For historical reasons related to the development of statistics and lack of computing power, separate
methods were developed to test for mean differences when the predictor or grouping variable had two
categories versus three or more. For dichotomous predictors we use the Independent-Samples T
Test; for predictors with more categories we use analysis of variance (ANOVA). So, the Independent-

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-2

Samples T Test applies when there are two

As with any statistical test, the goal is to draw conclusions about the population, based on sample
data. For the Independent-Samples T Test the null hypothesis states that population means are the
same, and we use sample data to evaluate this hypothesis.

separate populations to compare on a scale dependent
variable (for example, males and females on salary).

So, Ho (the null hypothesis) assumes that the population means are identical. We then determine if
the differences in sample means are consistent with this assumption. If the probability of obtaining
sample means as far (or further) apart as we find in our sample is very small (less than 5 chances in
100 or .05), assuming no population differences, we reject our null hypothesis and conclude the
populations are different.

7.4 Independent-Samples T Test Assumptions
To correctly use the Independent-Samples T Test requires a number of assumptions to be made.

1) Only two population subgroups are compared.
2) The dependent variable has a scale measurement level.
3) The distribution of the dependent variable within each population subgroup follows the normal

distribution (normality).

4) The variation is the same within each population subgroup. This is the so-called
“homogeneity of variance” assumption. Violation of this assumption is more critical than
violation of the normality assumption. When this assumption is violated, the significance or
probability value reported by PASW Statistics is incorrect and the test statistics must be
adjusted.

7.5 Requesting the Independent-Samples T Test
Requesting an Independent-Samples T Test is accomplished with these steps:

1) Select one or more variables to be tested that are scale in measurement level.
2) Select a grouping or test variable and define the two groups.
3) Optionally, set the confidence interval if you prefer something other than 95%.
4) Review the procedure output to investigate the relationship between the variables including:

a. Group Statistics Table

The Independent-Samples T Test is robust to moderate
violations of the homogeneity of variance assumption. If
the ratio (greatest variance / smallest variance) is less
than 2, with similar sample sizes of the groups, then
violation is not serious.
The Independent-Samples T Test provides a statistical
test for the assumption of homogeneity of variance and an
alternative test for testing equality of means, taking into
account unequal variances.

Note

The Independent-Samples T Test is robust to moderate
violations of the normality assumption when sample sizes
are moderate to large (over 50 cases per group) and the
dependent measure has the same distribution (for
example, skewed to the right) within each group Note

THE INDEPENDENT- SAMPLES T TEST

7-3

b. Check the assumptions of the Independent-Samples T Test.
5) Examine the t test statistics to determine whether there is a significant difference in the

means.

7.6 Independent-Samples T Test Output
The Group Statistics table provides sample sizes, means, standard deviations, and standard errors
for the two groups. In the table below, there is a 1.4 year difference in group means on highest year
of school completed. The sample standard deviations (and so the variances) are quite different,
indicating potentially unequal population variances.

Figure 7.1 Example of Group Statistics

To understand how to work with the second table, the Independent Samples Test, we need to review
the assumptions for conducting an Independent-Samples T Test. The most serious violation is that
of the assumption of equal variances, so this assumption must be tested. Levene´s Test for Equality
of Variances does exactly this. Levene´s homogeneity of variances test evaluates the null hypothesis
that the dependent variable’s variance is the same in the two populations

. Since homogeneity of
variance is assumed when performing the Independent-Samples T Test, the analyst hopes to find this
test to be nonsignificant.

In the first, left half, section of the Independent Samples Test table, Levene´s test for equality of
variances is displayed.

• The null hypothesis of Levene´s test is that the variances are equal
• The F statistic is a technical detail to calculate the significance (Sig.)
• The significance is the likelihood that the variances have the observed difference, or a

greater difference, in the target population
• We use a standard criterion value of .05 or .01 to reject the null hypothesis

Figure 7.2. Example of Levene´s Test for Equality of Variances Output

As sample size increases, it is better to use the .01
level for the Levene’s test.

Best
Practice

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-4

Here, in the example output, we observe that the hypothesis of equal variances must be rejected
because the significance value is low.

The second, right half, section of the Independent Samples Test table provides the test statistics for
the null hypothesis of equal group means. The row labeled “Equal variances assumed” contains
results of the standard t test. The second row labeled “Equal variances not assumed” contains an
adjusted t test that corrects for lack of homogeneity of variances in the data. You would choose one
or the other based on your evaluation of the homogeneity of variance question. Summarizing: the
result of the Levene test tells us which of the two rows of t test statistics to use:

• If the null hypothesis of equal variances is not rejected

• If the null hypothesis of equal variances

, then use the row labeled “Equal
variances assumed”

is rejected

, then use the row “Equal Variances not
assumed”

To test equality of means:

• The null hypothesis of the test is that the means are equal
• The t and df statistics are technical details to calculate the significance (Sig. 2-tailed)
• The significance is the likelihood that the means have the observed difference, or a greater

difference, in the target population
• We use a standard criterion value of .05 or .01 to reject the null hypothesis

Figure 7.3 Example of T Test for Equality of Means Output

Here, we observe that the hypothesis of equal population means must be rejected (reading the row
Equal variances not assumed, since the null hypothesis of equal variances was rejected).

The Independent Samples Test table provides an additional bit of useful information: the 95%
confidence band for the population mean difference. The 95% confidence band for the difference
provides a measure of the precision with which we have estimated the true population difference. In
the output shown below, the 95% confidence band for the mean difference between groups is .816
years to 1.914 years (again using the Equal variances not assumed row). Note that the difference
values does not include zero, because there is a difference between groups. So, the 95% confidence
band indicates the likely range within which we expect the population mean difference to fall.
Speaking in a technically correct fashion, if we were to continually repeat this study, we would expect
the true population difference to fall within the confidence bands 95% of the time. While the technical
definition is not illuminating, the 95% confidence band provides a useful precision indicator of our
estimate of the group difference.

THE INDEPENDENT- SAMPLES T TEST

7-5

7.7 Procedure: Independent-Samples T Test
The Independent Samples T Test procedure is accessed from the Analyze…Compare
Means…Independent-Samples T Test menu choice. With the Independent-Samples T Test dialog
box open:

1) Place one or more scale dependent variables in the Test Variable box.
2) Place one categorical independent variable in the Grouping Variable box.
3) Click the Define Groups button to specify which two groups are being compared.

Figure 7.4 Independent-Samples T Test Dialog

In the Define Groups dialog:

4) Specify the two group values

Figure 7.5 Independent-Samples T-Test — Define Groups Dialog

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-6

The Options dialog can be used to change the confidence interval percentage and change the
handling of missing values to listwise.

Figure 7.6 T Test Options Dialog

7.8 Demonstration: Independent-Samples T Test
We will work with the Census.sav data file in this lesson.

In this example we examine the relationship between the respondent’s gender (sex) and number of
children (childs). You would expect that the means would be equal—it normally takes two to
procreate—but let’s investigate the question.

Before doing the actual test, we should explore the data to compare the distributions of number of
children by gender. We’ll use the Explore procedure to do so.

Detailed Steps for Explore Procedure for Number of Children by Gender
The Explore dialog box is accessed from the Analyze…Descriptive Statistics…Explore menu.
With the dialog box open:

We will not repeat details here on the Explore dialog box
or the options available with the procedure. If you need a
review, see the Lesson on Understanding Data
Distributions for Scale Variables. Note

If the independent variable is a numeric variable, specify a
single cut point value to define the two groups. Those
cases less than or equal to the cut point go into the first
group, and those greater than the cut point fall into the
second group.
If the independent variable is categorical but has more
than two categories, it can be still used by using only two
categories in an analysis.

Tip

THE INDEPENDENT- SAMPLES T TEST

7-7

1) Place childs in the Dependent List: box
2) Place sex in the Factor List: box
3) In the Plots dialog, select Histogram check box

We’ll concentrate first on the boxplot. The highest value of childs is 8 (values any higher are coded to
8). Still, there is enough variation in the data to see that the male and female distributions are not
identical. The inter-quartile range for males is wider than for females, and it extends all the way to 0.
The median for both genders is the same (2), and there are fewer outliers for males as a
consequence of the wider IQR.

Figure 7.7 Boxplot of Number of Children by Gender

Looking next at the histograms, we observe that the distributions are not normal. However, given the
sample size, this shouldn’t be a deterrent to doing a t test. We might be more concerned as to
whether the mean is an suitable measure of central tendency of these distributions, especially for
males. But given the narrow range of childs, from 0 to 8, it seems reasonable to proceed.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-8

Figure 7.8 Histogram of Number of Children for Males

Figure 7.9 Histogram of Number of Children for Females

Detailed Steps for Independent Samples T-Test
1) Place the variable childs in the Test Variable(s) box.
2) Place the variable sex in the Grouping Variable box.
3) Select the Define Groups button.

Notice the question marks following sex. The Independent-Samples T Test dialog requires that you
indicate which groups are to be compared, which is usually done by providing the data values for the
two groups.

4) Specify that groups 1 and 2 are being compared

THE INDEPENDENT- SAMPLES T TEST

7-9

Results from Independent Samples T-Test
As with all analyses, you should first look to see how many cases are in each group, along with the
means and standard deviations. The Group Statistics table provides this information. We have fairly
large samples for each group. Intriguingly, the means are not very close for males and females. The
mean number of children for females is about .30 higher than for males (The actual sample mean
difference is displayed in the Independent Samples Test table). The standard deviations are similar
for each group, which is a bit unexpected since the boxplots looked somewhat different.
Figure 7.10 Group Statistics Table

Reading an Independent Samples Test Table
Looking at Levene´s test, the null hypothesis assuming homogeneity is not rejected at the .01 level.
Given the large sample size, it makes more sense to use a more stringent alpha level. So, we may
assume homogeneity of variances, and we take the result of the t test from the Equal variances
assumed row (actually, the two rows gives very similar results in this example).

Figure 7.11 Independent Samples Levene’s Test

To test equality of means, move to the column labeled “Sig. (2-tailed).” This is the probability of our
obtaining sample means as far or further apart, by chance alone, if the two populations (males and
females) actually have the same number of children. Thus the probability of obtaining such a large
difference by chance alone is quite small (.000), so we conclude there is a significant difference in
mean number of children between men and women. Can you suggest how that could be true?

The 95% confidence band for the mean difference between groups is from –.438 to –.142 years.

Figure 7.12 Independent Samples T Test Results

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-10

Apply Your Knowledge
1. In which of the following situations can an Independent-Samples T Test be applied?

a. Difference between men and women with respect to political preferences (liberal/
conservative/independent)

b. Difference in mean age between liberals and conservatives
c. Difference in mean income between three groups of political affiliation (liberals,

conservatives and independents)
d. Difference in mean expenditure between those shopping at Harrods and those not

shopping at Harrods

2. True or False? Suppose we want to test whether boys and girls differ in mean hours (a week)
doing sports. Below we have the distribution of this variable, for both genders. Now we draw
a random sample of 50 boys and 50 girls. The Independent-Samples T Test canno

t be done
because the distribution of hours sport is not normal within each of the groups?

3. See the output below. Which statements are correct?

a. To test equality of means, we use the “Equal variances assumed” row and disregard
the row “Equal variances not assumed.”

b. The null hypothesis of equal group means is rejected (alpha=0.05)
c. The 95% confidence interval for the difference contains the value 0, which indicates

that the null hypothesis of equal group means cannot be rejected (alpha=0.05).

7.9 Error Bar Chart
Although the Independent-Samples T Test procedure displays the appropriate statistical test
information, a summary chart is often preferred as a way to present significant results. Bar charts
displaying the group sample means can be produced using the Chart Builder procedure. However,
many people prefer an error bar chart instead. It is a chart that focuses more on the precision of the
estimated mean for each group than the mean itself.

THE INDEPENDENT- SAMPLES T TEST

7-11

Error Bar Chart Illustrated
The figure below provides an example of an the error bar chart. The graph shows mean salary and its
associated 95% confidence interval, for each of two groups. Mean salary for men is higher than for
women and the intervals do not overlap, indicating differences between men and women in the
population in mean salary.

Figure 7.13 Error Bar Chart Illustrated

7.10 Requesting an Error Bar Chart with Chart Builder
Requesting an Error bar chart is accomplished with these steps, using the Chart Builder procedure.

1) Place an Error Bar chart icon in the Chart Preview area
2) Select the scale variable for which you want a mean and confidence intervals.
3) Select the categorical variable defining the groups.
4) In the resulting graph, check if confidence intervals overlap.

7.11 Error Bar Chart Output
The error bar chart will generate a graph depicting the relationship between a scale and categorical
variable. It provides a visual sense of how far the groups are separated.

• Note the means of each group
• Note if the 95% confidence intervals of the groups overlap

This method of comparing confidence intervals is not as
precise as statistical testing. For example, an error bar
chart does not take into account whether the homogeneity
of variance assumption is met. Still, used carefully, they
can be very useful

Note

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-12

• If the error bar and statistical test lead to the same conclusion, support the statistical test with
the error bar chart

Procedure: Error Bar Chart with Chart Builder
The Chart Builder procedure is accessed from the Graphs…Chart Builder menu. With the Chart
Builder dialog open:

1) Select the simple error bar chart icon in the Bar Choose from: group and place it in
the Chart preview area

2) Specify a variable for the X-axis.
3) Specify a variable for the Y-axis. This variable should be scale in measurement.

Figure 7.14 Chart Builder Dialog to Create Error Bar Chart

7.12 Demonstration: Error Bar Chart with Chart Builder
We will create an error bar chart corresponding to the Independent-Samples T-Test of gender and
number of children. We want to see how number of children varies across categories of gender, so
we use childs as the Y-axis variable. This is equivalent to how we used the Independent-Samples T
Test procedure to study this relationship.

THE INDEPENDENT- SAMPLES T TEST

7-13

Detailed Steps for Error Bar Chart
Before beginning this example, to insure that childs is displayed properly in the error bar chart, do the
following:

1) In the Data Editor, change the Measure level for childs to Scale
2) Change the value of Width to 3; change the number of Decimals to 1

Then, with the Chart Builder dialog open:

1) Select a simple error bar chart icon and put it in the Chart Preview pane.
2) Place childs in the Y-axis box.
3) Place the variable sex in the X-axis box.

Results from the Error Bar Chart
We can observe the following details from the graph.

• The mean number of children for each gender along with 95% confidence intervals is
represented in this chart

• The confidence intervals for the two genders don’t quite overlap, which is consistent with the
result from the T Test

• The error bars have a small range compared to the range of childs, which indicates we are
fairly precisely measuring number of children (because of large sample sizes)

Figure 7.15 Error Bar Chart of Education and Gender

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

7-14

7.13 Lesson Summary
We explored the use of the Independent-Samples T-Test procedure and error bar charts to test
whether there are mean differences in a scale variable between two groups.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Perform a statistical test to determine whether there is a statistically significant difference
between two groups on a scale dependent variable

To support the achievement of the primary objective, students should now also be able to:
• Check the assumptions of the Independent-Samples T Test
• Use the Independent-Samples T Test to test the difference in means
• Know how to interpret the results of a Independent-Samples T Test
• Use the Chart Builder to create an error bar graph to display mean differences

7.14 Learning Activity
In these activities you will use the file Census.sav. The overall goal is to run the Independent-
Samples T Test, to interpret the output and visualize the results with an error bar chart.

1. We want to see whether men and women differ in their mean socioeconomic index (sei) and
their age when their first child was born (agekdbrn). First, use the Explore procedure to view
the distributions of these two variables by gender. Are they similar or different? Do you see
any problems with doing a t test?

2. Now do a t test for each variable, by gender. Is the homogeneity of variance assumption met,

or not? What do you conclude about mean differences by gender?

3. Create an error bar chart for each variable by gender. Is the graph consistent with the result
from the t test?

4. Now do the same analysis with the variable race, testing whether there are differences in sei

and agekdbrn comparing whites to blacks. Although race has three categories, you can use
only two categories in the t test. As before, first use the Explore procedure to view the
distributions of these two variables by race? Are they similar or different? Do you see any
problems with doing a t test?

5. Now do a t test for each variable, by white versus black. Is the homogeneity of variance
assumption met, or not? What do you conclude about mean differences between whites and
blacks?

6. Create an error bar chart for each variable by race. Is the graph consistent with the result

from the t test?

7. For those with more timed: How could you display an error bar chart with only the categories
of white and black, not other? There are at least two methods.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

THE PAIRED- SAMPLES T TEST

8-1

Lesson 8: The Paired-Samples T Test
8.1 Objectives
After completing this lesson students will be able to:

• Perform a statistical test to determine whether there is a statistically significant difference

between the means of two scale variables

To support the achievement of this primary objective, students will also be able to:

• Use the Paired-Samples T Test procedure
• Interpret the results of a Paired-Samples T Test

8.2 Introduction
In this lesson we outline the logic involved when testing for a difference between two scale variables,
and then perform an analysis comparing two variables. As with the Independent Samples T Test, we
use the mean to compare the two variables, as the mean is an excellent measure of central tendency.
As an example, we might want to compare a student´s test score before and after participating in a
particular program and see if the program had an effect by calculating the mean difference between
the two test scores.

Business Context
When analyzing data, we are concerned with whether there is a difference between two scale
variables. Without statistical testing, we might make decisions based on perceptions that are not likely
to exist in a population of customers. The Paired-Samples T Test allows us to determine if the
difference between two variables in the sample reflects a difference in the population. For example:

• In medical research a Paired-Samples T Test would be used to compare means on a
measure administered both before and after some type of treatment.

• In market research, if a subject were to rate the product they usually purchase and a
competing product on some attribute, a Paired-Samples T Test would be needed to
compare the mean ratings.

• In customer satisfaction studies, if a special customer care program is implemented, we can
test whether satisfaction beforehand is lower, or higher, than satisfaction after the program is
in place.

8.3 The Paired-Samples T Test
The Paired-Samples T Test is used to test for statistical significance between two population means
when each observation (respondent) contributes to both means. With a Paired-Samples T Test each
person serves as his own control. To the extent that an individual’s outcomes across the two

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

8-2

conditions are related, the Paired-Samples T Test provides a more powerful statistical analysis
(greater probability of finding true effects) than the Independent-Samples T Test. Moreover, due to
the assumption of independent for the independent-samples test, a paired-samples test must be used
when the same subject/respondent provides both scores.

To clearly state the difference between the Independent-Samples T Test and the Paired-Samples
T Test, it is instructive to compare the data structure needed for each of the two tests.

Figure 8.1 Data structure for Independent-Samples T Test (left) and Paired-Samples T Test
(right)

8.4 Assumptions for the Paired-Samples T Test
There are three assumptions to apply the Paired-Samples T Test:

1) Variables have a scale measurement level.
2) Variables should be in the same unit of measurement.
3) The difference

between the two variables is normally distributed (because the mean
difference is what is being tested)

The Paired-Samples T Test is robust to violations of the
normality assumption when sample sizes are moderate to
large (over 50 cases per group).

Note

The homogeneity of variance assumption that holds for
the Independent-Samples T Test does not apply to the
Paired-Samples T Test since we are dealing with only
one group. Note

THE PAIRED- SAMPLES T TEST

8-3

8.5 Requesting a Paired-Samples T Test
Requesting a Paired-Samples T Test is accomplished by following these steps:

1) Choose pairs of variables for the Paired-Samples T Test.
2) Review the procedure output to investigate the relationship between the variables including:

a. Paired Samples Statistics Table
b. Paired Samples Test Table.

8.6 Paired-Samples T Test Output
The Paired Samples Statistics table provides summary information including sample sizes, means,
standard deviations, and standard errors for the pair of variables. The Paired Samples Correlations
table displays the correlation between the pair of variables. The higher the correlations, the more
statistical power there is to detect a mean difference.

Figure 8.2 Example of Paired Samples Statistics Output

Figure 8.3 Example of Paired Samples Correlations Output

The null hypothesis is that the two means are equal. The mean difference in socioeconomic index is
2.42, reported along with the sample standard deviation and standard error in the Paired Samples
Test table. Although this is not a large difference, the significance value (.000) indicates that the two
means are significantly different (at the .01 level). A 95% confidence interval for the mean difference
is also reported, which can be quite useful.

Figure 8-4 Paired Samples Test Table

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

8-4

8.7 Procedure: Paired-Samples T Test
The Paired-Samples T Test procedure is accessed from the Analyze…Compare Means…Paired-
Samples T Test menu choice. With the Paired-Samples T Test dialog box open:

1) Select the pair of variables to compare and place them in the Paired Variables box (more
than one pair of variables can be in the Paired Variable(s) box). Hint: Use Ctrl+Click to select
the pair.

Figure 8.5 Paired-Samples T Test Dialog

8.8 Demonstration: Paired-Samples T Test
We will work with the Census.sav data file in this lesson.

To demonstrate a Paired-Samples T Test, we will compare mean education levels of the respondent
(educ) and his or her spouse (speduc). The Paired-Samples T Test is appropriate because we will
obtain data from a single respondent regarding his/her education and that of the spouse’s education.
We are interested in testing whether there is a significant difference in education between spouses in
the population. Although two people are involved—husband and wife—information is being obtained
from only one person, so the paired-sample test is appropriate.

Detailed Steps for Paired-Samples T Test
1) Select the variables educ and speduc and place them in the Paired Variables box

Results from Paired-Samples T Test
The first table displays the mean, standard deviation and standard error for each of the variables. We
see that the means for respondent and spouse’s education are close, but the education for the
spouse is a bit lower. This might indicate very close educational matching of people who marry.

THE PAIRED- SAMPLES T TEST

8-5

Figure 8.6 Paired Samples Statistics Table

In the next table, the sample size (number of pairs) appears along with the correlation between the
two variables. The correlation (.594) is positive, high, and statistically significant (differs from zero in
the population). This suggests that the power to detect a difference between the two means is
substantial.

Figure 8-7 Paired Samples Correlations Table

The mean education difference, about .20 years, is reported along with the sample standard deviation
of the difference. The significance of the difference is .024, so if we are using a criterion of .05, we
would reject the null hypothesis and conclude the means are different (but we would reach a different
conclusion if using an alpha value of .01).

Whether or not the means are substantively different is a separate question.

Figure 8.8 Paired Samples Test Table

Apply Your Knowledge
1. True or false? Before doing the Paired-Samples T Test, a test of equality of variances should

be done?

2. See the output below. Which statements are correct?

a. The sample mean difference is .6.
b. The null hypothesis that the mean difference is 0 must be rejected (alpha=0.05)
c. Normality of the distribution of mean differences is not a concern here.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

8-6

8.9 Lesson Summary
We explored the use of the Paired-Samples T Test to test the mean difference between two
variables.

Lesson Objectives Review
After completing this lesson students will be able to:

• Perform a statistical test to determine whether there is a statistically significant difference
between the means of two scale variables

To support the achievement of this primary objective, students will also be able to:

• Use the Paired-Samples T Test procedure
• Interpret the results of a Paired-Samples T Test

8.10 Learning Activity
The overall goal of this learning activity is to use the Paired-Samples T Test.

1. One variable in the customer survey asked about agreement that SPSS products are a good
value (gdvalue). A second question asked about agreement that SPSS offers high quality
products (hiqualty). Use a paired-samples t test to see whether the means of these two
questions differ (they are measured on a five-point scale). What do you conclude?

2. Then test whether there is a mean difference between agreement that SPSS products are

easy to learn (easylrn) and SPSS products are easy to use (easyuse). What do you
conclude?

3. Could we use a paired-sample t test to compare how long a customer has used SPSS

products (usespss) and how frequently they use SPSS (freqspss)? Why or why not?

The SPSS customer satisfaction data file
SPSS_CUST.SAV. This data file was collected from a
random sample of SPSS customers asking about their
satisfaction with the software, service, and other
features, and some background information on the
customer and their company.

Supporting
Materials

ONE-WAY ANOVA

9-1

Lesson 9: One-Way ANOVA
9.1 Objectives
After completing this lesson students will be able to:

• Perform a statistical test to determine whether there is a statistically significant difference
among three or more groups on a scale dependent variable

To support the achievement of this primary objective, students will also be able to:

• Use the options in the One-Way ANOVA procedure
• Check the assumptions for One-Way ANOVA
• Interpret the results of a One-Way ANOVA analysis
• Use the Chart Builder to create an error bar to graph mean differences

9.2 Introduction
Analysis of variance (ANOVA) is a general method of drawing conclusions regarding differences in
population means when three or more comparison groups are involved. The Independent-Samples
T Test applies only to the simplest instance (two groups), while the One-Way ANOVA procedure can
accommodate more complex situations (three or more groups). In this lesson we will provide
information on the assumptions of using the One-Way ANOVA procedure and then provide examples
of its use.

Business Context
When analyzing data, we are often concerned with whether groups differ from each other. Without
statistical testing, we might make decisions based on perceptions that are not likely to exist in a
population of customers. One-Way ANOVA allows us to determine if three or more groups
significantly differ on scale variables; thus we can determine which groups score higher or lower than
the others. For example, we might want to know whether:

• Customer groups differ on attitude toward a product or service.
• Different drugs better reduce depression levels.

9.3 One-Way Anova
The basic logic of significance testing for comparing group means on more than two groups is the
same as that for comparing two group means (i.e., the Independent-Samples T Test). To
summarize:

• The null hypothesis is that the population groups have the same means.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-2

• We determine the probability of obtaining a sample with group mean differences as large (or
larger) as what we find in our data. To make this assessment the amount of variation among
group means (between-group variation) is compared to the amount of variation among
observations within each group (within-group variation). Assuming in the population that the
group means are identical (null hypothesis), the only source of variation among sample
means would be the fact that the groups are composed of different individual observations.

Thus a ratio of the two sources of variation (between group/within group) should be about 1 when
there are no population differences. When the distribution of individual observations within each
group follows the normal curve, the statistical distribution of this ratio is known (F distribution) and we
can make a probability statement about the consistency of our data with the null hypothesis. The final
result is the probability of obtaining sample differences as large (or larger) as what we found if there
were no population differences. If this probability is sufficiently small (usually less than 5 chances in
100, or .05) we conclude the population groups differ.

Once we find a difference, we have to determine which groups differ from each other. (When the null
hypothesis is rejected, it does not follow that all

group means differ significantly; the only thing that
can be said is that not all group means are the same.)

9.4 Assumptions of One-Way ANOVA
To correctly use the One-Way ANOVA procedure requires an understanding of additional issues.

1) The dependent variable must have a scale measurement level.
2) The independent variable (named a “factor” in Anova analyses) must have a categorical

measurement level.
3) The distribution of the dependent variable within each population subgroup follows the normal

distribution (normality). One-Way ANOVA is robust to moderate violations when sample
sizes are moderate to large (over 25 cases) and the dependent measure has the same
distribution (for example, skewed to the right) within each comparison group.

4) The variation is the same within each population subgroup (homogeneity of variance). One-
Way ANOVA is robust to moderate violations when sample sizes of the groups are similar.

Similar to the Independent-Samples T Test, violation of the assumption of homogeneity of variances
is more serious than violation of the assumption of normality. And like the Independent-Samples T
Test, One-Way ANOVA applies a two-step strategy for testing:

1) Test the homogeneity of variance assumption.
2) If the assumption holds, proceed with the standard test (the ANOVA F Test) to test equality of

means; if the null hypothesis of equal variances is rejected, use an adjusted F test to test
equality of means.

9.5 Requesting One-Way ANOVA
Running a One-Way ANOVA is accomplished by following these steps:

1) Select the scale variable on which to test equality of group means
2) Select a factor variable.
3) Request the Levene homogeneity of variance test
4) Review the procedure output to:

a. Check on the test for homogeneity of variances
b. Review the test on equality of means, either using the standard ANOVA F test or an

adjusted F test taking into account unequal variances
5) If the null hypothesis of equal population means is rejected, extend the analysis by adding a

post hoc test within the One-Way ANOVA procedure.

ONE-WAY ANOVA

9-3

9.6 One-Way ANOVA Output
The first table of output is the test for homogeneity of variance. The null hypothesis is that the
variances are equal, so if the significance level is low enough (as it is in the table below), we reject
the null hypothesis and conclude the variances are not equal.

As with the independent-samples t test, this isn’t a problem, but it does mean that we should use the
tests that adjust for unequal variance.

Figure 9.1 Levene Test of Homgeneity of Variances

Most of the information in the ANOVA table is technical in nature and is not directly interpreted.
Rather the summaries are used to obtain the F statistic and, more importantly, the probability value
we use in evaluating the population differences. The standard ANOVA table will provide the following
information:

• The first column has a row for the between-groups and a row for within-groups variation.
• Sums of squares are intermediate summary numbers used in calculating the between-

(deviations of individual group means around the total sample mean) and within- (deviations
of individual observations around their respective sample group mean) group variances.

• The “df” column contains information about degrees of freedom, related to the number of
groups and the number of individual observations within each group.

• Mean Squares are measures of the between-group and within-group variation (Sum of
Squares divided by their respective degrees of freedom).

• The F statistic is the ratio of between to within group variation and will be about 1 if the null
hypothesis is true.

• The column labeled “Sig.” provides the probability of obtaining the sample F ratio (taking into
account the number of groups and sample size), if the null hypothesis is true.

In practice, most researchers move directly to the significance value since the columns containing the
sums of squares, degrees of freedom, mean squares and F statistic are all necessary for the
probability calculation but are rarely interpreted in their own right. In the table below, the low
significance value leads us to reject the null hypothesis of equal means.

You should also examine the actual standard deviations or
variances for each group. In large samples, it is relatively
easy to reject the null hypothesis of equal variances even
when the variances are within a factor of 2 of each other. Tip

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-4

Figure 9.2 ANOVA Table Output

When the condition of equal variances is not met, an adjusted F test has to be used. PASW Statistics
provides two such tests, Welch and Brown-Forsythe. The table Robust Tests of Equality of Means
provides the details. Again, the columns containing test statistic and degrees of freedom are technical
details to compute the significance.

Figure 9.3 Robust Tests of Equality of Means Output

9.7 Procedure: One-Way ANOVA
The One-Way ANOVA procedure is accessed from the Analyze…Compare Means…One-Way
ANOVA menu choice. With the One-Way ANOVA dialog box open:

1) Place one or more scale variables in the Dependent List box.
2) Place one categorical variable in the Factor box.
3) Open the Options dialog to request descriptive information and the test for homogeneity of

variance.

ONE-WAY ANOVA

9-5

Figure 9.4 One-Way ANOVA Dialog

In the Options dialog:

1) Ask for Descriptive statistics so that group means and standard deviations are displayed.
2) The homogeneity of variance test allows one to assess the assumption of homogeneity of

variance.
3) Brown-Forsythe and Welch are robust tests that do not assume homogeneity of variance and

thus can be used when this assumption is not met.

Figure 9.5 One-Way ANOVA Options Dialog

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-6

9.8 Demonstration: One-Way ANOVA
We will work with the Census.sav data file in this lesson.

In this example we investigate the relationship between marital status (marital) and education in years
(educ). We would like to determine whether there are educational differences among marital status
groups.

Detailed Steps for One-Way ANOVA
1) Place the variable educ in the Dependent List box.
2) Place the variable marital in the Factor box.
3) In the Options dialog, select Descriptive, Homogeneity of variance test, Brown-Forsythe,

and Welch check boxes.

Results from One-Way ANOVA
As with all analyses, you should first look to see how many cases are in each group, along with the
means and standard deviations. The first table in the Viewer window provides this information. The
size of the groups ranges from 70 to 971 people. The means vary from 11.76 to 13.73 years (the
One-Way ANOVA procedure will assess if these means differ), while the standard deviations vary
from 2.89 to 3.45 (the test of homogeneity of variance will assess if these standard deviations differ).
We observe that:

• The married group has the most education (13.73) while those separated have the least
education (11.76).

Figure 9.6 Table of Descriptive Statistics

What we don’t know is whether these differences are due to sampling variation, or instead are likely
to be real and exist in the population. For this we turn to the ANOVA.

Levene Test of Homogeneity of Variance
First, we must review Levene’s test of homogeneity of variance. The null hypothesis of homogeneity
of within-group variance is not rejected (significance .694). This means we can use the standard
ANOVA table.

Figure 9.7 Test of Homogeneity of Variances Table

ONE-WAY ANOVA

9-7

ANOVA Table
Every statistical test has a null hypothesis. In most cases, the null hypothesis is that there is no
difference between groups. This is also true for the null hypothesis for the One-Way ANOVA
procedure, so we test with One-Way ANOVA whether there is no

difference in mean education
among marital status groups. If the significance is small enough, we reject the null hypothesis and
conclude that there are differences.

Figure 9.8 ANOVA Table for Education by Marital Status

We see that the probability of the null hypothesis being correct is extremely small, less than .05,
therefore we reject the null hypothesis and conclude that there are differences in education among
these groups.

If we had not met the homogeneity of variances assumption, and given our disparate sample sizes,
we would have to turn to the Brown-Forsythe and Welch tests, which test for equality of group means
without assuming homogeneity of variance. These tests are shown below, although we would not
report them in this situation (where equal variances may be assumed).

Figure 9.9 Robust Tests of Mean Differences

Both of these measures mathematically attempt to adjust for the lack of homogeneity of variance.

• When calculating the between-group to within-group variance ratio, the Brown-Forsythe test
explicitly adjusts for heterogeneity of variance by adjusting each group’s contribution to the
between-group variation by a weight related to its within-group variation.

• The Welch test adjusts the denominator of the F ratio so it has the same expectation as the
numerator, when the null hypothesis is true, despite the heterogeneity of within-group
variance.

Both tests indicate there are highly significant differences in average highest year of school
completed between the marital status groups, which are consistent with the conclusions we drew
from the standard ANOVA.

Having concluded that there are differences in amount of education among different marital status
groups, we probe to find specifically which groups differ from which others.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-8

Apply Your Knowledge
1. True or false? Suppose we have collected data for four groups of respondents on region

(north/east/south/west) and their income categories (very low/low/moderate/high/very high).
Is One-Way ANOVA the correct procedure to test whether there are differences between the
regions with respect to income categories?

2. True or false? When the F-test is not significant, then we will not

follow up this analysis with
an analysis on which group means differ?

3. In a dataset about employees and their salaries we tested whether there are differences in
mean salary according to the job category of the employee. The output is depicted below.
Which statements are correct?

a. Although the sample standard deviations are different, the null hypothesis of equal
population variances cannot be rejected (alpha=0.05)

b. The table titled ANOVA must be discarded, as the null hypothesis of equal population
variances is rejected (alpha=0.05)

c. The null hypothesis of equal population group means is rejected by both Welch and
Brown-Forsythe tests (alpha=0.05)

9.9 Post Hoc Tests with a One-Way ANOVA
Post hoc tests are typically performed only after the overall F test indicates that population differences
exist, although for a broader view see Milliken and Johnson (2004). At this point there is usually
interest in discovering just which group means differ from which others. In one aspect, the procedure
is quite straightforward: every possible pair of group means is tested for population differences and a
summary table produced. However, a problem exists in that as more tests are performed, the
probability of obtaining at least one false-positive result increases. As an extreme example, if there
are ten groups, then 45 pairwise group comparisons (n*(n-1)/2) can be made. If we are testing at the
.05 level, we would expect to obtain on average about 2 (.05 * 45) false-positive tests. In an attempt
to reduce the false-positive rate when multiple tests of this type are done, statisticians have
developed a number of methods.

ONE-WAY ANOVA

9-9

Often, more than one post hoc test is used and the results are compared to provide for more
evidence about potential mean differences.

9.10 Requesting Post Hoc Tests with a One-Way ANOVA
If the null hypothesis of equal population group means is rejected, a post hoc analysis is required,
following these steps:

1) Ask for one or more appropriate post hoc tests in the Post Hoc dialog.
2) Inspect the output and report which groups differ significantly in their population mean.

9.11 Post Hoc Tests Output
The table labeled Multiple Comparisons provides all pairwise comparisons.

The rows are formed by every possible combination of groups. The column labeled “Mean Difference
(I-J)” contains the sample mean difference between each pairing of groups. If this difference is
statistically significant at the specified level after applying the post hoc adjustments, then an asterisk
(*) appears beside the mean difference. Notice the actual significance value for the test appears in
the column labeled “Sig.”. In addition, the standard errors and 95% confidence intervals for each
mean difference appear. These provide information on the precision with which we have estimated
the mean differences. Note that, as you would expect, if a mean difference is not significant, the
confidence interval includes 0.

Also notice that each pairwise comparison appears twice. For each such duplicate pair the
significance value is the same, but the signs are reversed for the mean difference and confidence
interval values.

Figure 9.10 Multiple Comparisons Output

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-10

9.12 Procedure: Post Hoc Tests with a One-Way ANOVA
Post hoc analyses are accessed from the One-Way ANOVA dialog box. With the One-Way ANOVA
dialog box open:

1) Open the Post Hoc Multiple Comparisons dialog
2) Select the appropriate method of multiple comparisons, which will depend on whether the

assumption of homogeneity of variance has been met.
Figure 9.11 Post Hoc Testing Dialog

Why So Many Tests?
The Post Hoc dialog lists over a dozen tests just in the Equal Variances Assumed area. We need to
review why so many tests are available.

The ideal post hoc test would demonstrate tight control of Type I (false-positive) error, have good
statistical power (probability of detecting true population differences), and be robust over assumption
violations (failure of homogeneity of variance, non-normal error distributions). Unfortunately, there are
implicit tradeoffs involving some of these desired features (Type I error and power) and no current
post hoc procedure is best in all these areas. Add to this the fact that pairwise tests can be based on
different statistical distributions (t, F, studentized range, and others) and that Type I error can be
controlled at different levels (per individual test, per family of tests, variations in between), and you
have a large collection of post hoc tests.

We will briefly compare post hoc tests from the perspective of being liberal or conservative regarding
control of the false-positive rate (Type 1 error). The existence of numerous post hoc tests suggests
that there is no single approach that statisticians agree will be optimal in all situations.

LSD
The LSD or least significant difference method simply applies standard t tests to all possible pairs of
group means. No adjustment is made based on the number of tests performed. The argument is that
since an overall difference in group means has already been established at the selected criterion
level (say .05), no additional control is necessary. This is the most liberal of the post hoc tests.

ONE-WAY ANOVA

9-11

SNK, REGWF, REGWQ & Duncan
The SNK (Student-Newman-Keuls), REGWF (Ryan-Einot-Gabriel-Welsh F), REGWQ (Ryan-Einot-
Gabriel-Welsh Q, based on the studentized range statistic) and Duncan methods involve sequential
testing. After ordering the group means from lowest to highest, the two most extreme means are
tested for a significant difference using a critical value adjusted for the fact that these are the
extremes from a larger set of means. If these means are found not to be significantly different, the
testing stops; if they are different then the testing continues with the next most extreme set, and so
on. All are more conservative than the LSD. REGWF and REGWQ improve on the traditionally used
SNK in that they adjust for the slightly elevated false-positive rate (Type I error) that SNK has when
the set of means tested is much smaller than the full set.

Bonferroni & Sidak
The Bonferroni (also called the Dunn procedure) and Sidak (also called Dunn-Sidak) perform each
test at a stringent significance level to insure that the family-wise (applying to the set of tests) false-
positive rate does not exceed the specified value. They are based on inequalities relating the
probability of a false-positive result on each individual test to the probability of one or more false
positives for a set of independent tests. For example, the Bonferroni is based on an additive
inequality, so the criterion level for each pairwise test is obtained by dividing the original criterion level
(say .05) by the number of pairwise comparisons made. Thus with five means, and therefore ten
pairwise comparisons, each Bonferroni test will be performed at the .05/10 or .005 level.

Tukey (b)
The Tukey (b) test is a compromise test, combining the Tukey (see next test) and the SNK criterion
producing a test result that falls between the two.

Tukey
Tukey’s HSD (Honestly Significant Difference; also called Tukey HSD, WSD, or Tukey(a) test)
controls the false-positive rate family-wise. This means if you are testing at the .05 level, that when
performing all pairwise comparisons, the probability of obtaining one or more false positives is .05. It
is more conservative than the Duncan and SNK. If all pairwise comparisons are of interest, which is
usually the case, Tukey’s test is more powerful than the Bonferroni and Sidak.

Scheffe
Scheffe’s method also controls the family-wise error rate. It adjusts not only for the pairwise
comparisons, but also for any possible comparison the researcher might ask. As such it is the most
conservative of the available methods (false-positive rate is least), but has less statistical power.

Specialized Post Hoc Tests

Hochberg’s GT2 & Gabriel: Unequal Ns
Most post hoc procedures mentioned above (excepting LSD, Bonferroni & Sidak) were derived
assuming equal group sample sizes in addition to homogeneity of variance and normality of error.
When the subgroup sizes are unequal, PASW Statistics substitutes a single value (the harmonic
mean) for the sample size. Hochberg’s GT2 and Gabriel’s post hoc test explicitly allow for unequal
sample sizes.

Waller-Duncan
The Waller-Duncan takes an approach (Bayesian) that adjusts the criterion value based on the size of
the overall F statistic in order to be sensitive to the types of group differences associated with the F
(for example, large or small). Also, you can specify the ratio of Type I (false positive) to Type II (false
negative) error in the test. This feature allows for adjustments if there are differential costs to the two
types of errors.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-12

Unequal Variances and Unequal Ns

Tamhane T2, Dunnett’s T3, Games-Howell, Dunnett’s C
Each of these post hoc tests adjusts for unequal variances and sample sizes in the groups.
Simulation studies (summarized in Toothaker, 1991) suggest that although Games-Howell can be too
liberal when the group variances are equal and sample sizes are unequal, it is more powerful than the
others.
The bottom line is that your choice in post hoc tests should reflect your preference for the
power/false-positive tradeoff and your evaluation of how well the data meet the assumptions of the
analysis, and you live with the results of that choice.

9.13 Demonstration: Post Hoc Tests with a One-Way
ANOVA

We will work with the Census.sav data file in this example. Previous analysis showed that the null
hypothesis of equal mean education for the marital status groups was rejected. Post hoc analysis will
reveal which groups differ.

Detailed Steps for a Post Hoc Test
1) Place the variable educ in the Dependent List box.
2) Place the variable marital in the Factor box.
3) In the Post hoc dialog, select Bonferroni.

Results for the Post Hoc Tests
We will move directly to the post hoc test results.

ONE-WAY ANOVA

9-13

Figure 9.12 Bonferroni Post Hoc Results

We see the Married and Widowed groups have a mean difference of 1.26 years of education. This
difference is statistically significant at the specified level after applying the post hoc adjustments. The
interval [.54, 1.99] contains the difference between these two population means with 95% confidence.

Summarizing the entire table, we would say that the Married, Divorced, and Never Married groups
differs in amount of education from the Separated and Widowed groups. We could not be sure of this
just from examining the means.

Apply Your Knowledge
1. In a dataset about employees and their salaries we tested whether there are differences in

mean salary according to the political preference of the employee. The output is depicted
below. Which statements are correct?

1) The null hypothesis of equal variances is not rejected (alpha=0.05)
2) The null hypothesis of equal group means is rejected (alpha=0.05)
3) Post-hoc tests show significant differences between all three groups A, B, and C

(alpha=.05).

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-14

9.14 Error Bar Chart with Chart Builder
For presentations it is often useful to show a graph of the relationship between a scale and
categorical variable. Error bar charts are the most effective method of doing this for ANOVAs.

The Chart Builder allows for the creation of a variety of charts, including error bar charts. It provides
for great flexibility in creating charts, including formatting options.

9.15 Requesting an Error Bar Chart with Chart Builder
Follow these steps to produce an error bar using the Chart Builder:

1) Select an error bar chart type of graph.
2) Specify the scale variable for which means and confidence intervals are requested.
3) Specify the categorical variable that defines the groups.
4) Inspect the error bar in the output to see which groups do (not) overlap in their confidence

intervals.

9.16 Error Bar Chart Output
The standard error bar chart will generate a graph depicting the relationship between a scale and
categorical variable. It provides a visual sense of how far the groups are separated.

• Note the means of each group (the small circle)
• Note if the 95% confidence intervals of the groups overlap

ONE-WAY ANOVA

9-15

Figure 9.13 Error Bar Chart of TV Hours by Highest Degree

9.17 Procedure: Error Bar Chart with Chart Builder
The Chart Builder procedure is accessed from the Graphs…Chart Builder menu. With the Chart
Builder dialog open:

1) Select an error bar chart icon and place it in the Chart preview area.
2) Specify a categorical variable for the X-axis.
3) Specify a scale variable for the Y-axis. The mean is calculated by default.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-16

Figure 9.14 Chart Builder Dialog to Create Error Bar Chart

9.18 Demonstration: Error Bar Chart with Chart Builder
We will create an error bar chart for the ANOVA of marital status and education. We want to see
visually how education varies across categories of marital status, so we use education as the Y-axis
variable. This is equivalent to how we used ANOVA to study this relationship.

Detailed Steps for Error Bar Chart
1) Select an error bar chart icon and put it in the Chart Preview pane.
2) Place the variable educ in the Y-axis box.
3) Place the variable marital in the X-axis box.

Results from the Error Bar Chart Created with Chart Builder
An ANOVA table with means based from education within categories of marital status lets us
compare groups, and this is mirrored in the error bar chart.

• The mean education of each marital status group along with 95% confidence intervals is
represented in this chart

• Confidence intervals that do not overlap indicate that those groups differ from each other
• Confidence intervals that do overlap indicate that those groups do not differ from each other
• We can readily compare the categories of marital status in this arrangement

ONE-WAY ANOVA

9-17

Married, divorced and never married respondents have more education than the separated and
widowed categories.

Figure 9.15 Error Bar Chart of Education and Marital Status

Additional Resources

For additional information on multiple comparison
tests with ANOVA, see:

Klockars, Alan J. and Sax, G. 1986. Multiple
Comparisons. Newbury Park, CA: Sage.

Further Info

The confidence intervals on the error bar chart are
determined for each group separately and no adjustment
is made based on the number of groups that are being
compared, or for unequal variance. So an error bar chart
should never be used without doing statistical tests. Further Info

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

9-18

9.19 Lesson Summary
We explored the use of the One-Way ANOVA procedure to analyze relationships between a
categorical and a scale variable.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Perform a statistical test to determine whether there is a statistically significant difference
among three or more groups on a scale dependent variable

Lesson Objectives Review
To support the achievement of the primary objective, students should now also be able to:

• Use the options in the One-Way ANOVA procedure
• Check the assumptions for One-Way ANOVA
• Interpret the results of a One-Way ANOVA analysis
• Use the Chart Builder to create an error bar to graph mean differences

9.20 Learning Activity
The overall goal of this learning activity is to use One-Way ANOVA with post hoc tests to explore the
relationship between several variables. You will use the PASW Statistics data file Census.sav.

1. Investigate how the number of siblings (sibs) varies by highest degree (degree). Ask for
appropriate statistics.

2. Is the assumption of homogeneity of variance met? Is the ANOVA test significant at the .01

level?

3. Do a post hoc analysis, if justified. Ask for both the Bonferroni and Scheffe tests? What do
you conclude from these tests? Which education groups have different mean numbers of
children? Are the Bonferroni and Scheffe tests consistent?

4. Create an error bar chart to display the mean differences for sibs by degree. Is the error bar

chart a correct representation of which means are different?

5. Now do another analysis of political position (polviews) by degree. Repeat the same steps
from the analysis above. Which education groups differ in their political position?

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-1

Lesson 10: Bivariate Plots and
Correlations for Scale Variables
10.1 Objectives
After completing this lesson students will be able to:

• Perform a statistical test to determine whether two scale variables are correlated (related)

To support the achievement of the primary objective, students will also be able to:

• Visually assess the relationship between two scale variables with scatterplots, using the
Chart Builder procedure

• Explain the options of the Bivariate Correlations procedure
• Explain the Pearson correlation coefficient and its assumptions
• Interpret a Pearson correlation coefficient

10.2 Introduction
In this lesson we examine and quantify the relationship between two scale variables. A scatterplot
visually presents the relationship between two scale variables, while the Pearson correlation
coefficient is used to quantify the strength and direction of the relationship between scale variables.
The Pearson correlation coefficient (formally named the Pearson product-moment correlation
coefficient) is a measure of the extent to which there is a linear (or straight line) relationship between
two variables.

Business Context
When we examine the distributions of two scale variables, we would like to know whether a
relationship we observe is likely to exist in our target population or instead is caused by random
sampling variation. Statistical testing tells us whether two scale variables are related. Assessing
correlation coefficients allows us to determine the direction and strength of this relationship. For
example, we might want to know whether:

• Higher SAT scores are associated with higher first year college GPAs
• Eating more often at fast-food restaurants was related to more frequent shopping at

convenience stores
• Lower levels of depression are associated with higher self-esteem scores

10.3 Scatterplots
The scatterplot visually presents the relationship between two scale variables. A scatterplot displays
individual observations in an area determined by a vertical and a horizontal axis, each of which

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-2

represent the variables of interest (note that the variables must be scale). In a scatterplot, look for a
relationship between the two variables and note any patterns or extreme points.

Scatterplot illustrated
For an illustration, see the scatterplot below, showing the relationship between variables age and
salary. The relationship is linear (a straight line describes the relationship between age and salary),
and positive (if age increases, then so does salary); there is one person having a salary which is “out
of line.”

Figure 10.1 Scatterplot of Age and Salary

10.4 Requesting a Scatterplot
The scatterplot is available in the Chart Builder procedure. The steps to create a scatterplot are:

1) Select a chart (simple scatter) from the Gallery.
2) Place the dependent variable on the vertical (y) axis and the independent variable on the

horizontal (x).
3) Review the procedure output to investigate the relationship between the variables including:

a. Linearity
b. Directionality
c. Outliers.

Place the independent variable on the x-axis and the
dependent variable on the y-axis. Here, salary depends
on age and not vice versa, so salary is the dependent
variable on the y-axis, age the independent variable on
the x-axis.
If the situation is such that no dependent or
independent variable can be identified (say number of
hours watching tv and number of hours of internet use),
then the choice of which variable goes where is
arbitrary.

Best
Practice

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-3

10.5 Scatterplot Output
The standard scatterplot will generate a graph depicting the relationship between the two variables.

• Note if there is a linear relationship
• Note the direction of the relationship
• Note any outliers

Figure 10.2 Scatterplot Showing the Relationship between Beginning Salary and Education
Level

10.6 Procedure: Scatterplot
The scatterplot is created from the Graphs…Chart Builder menu choice. With the Chart Builder
dialog box open:

1) Select a Simple Scatter graph on the canvas and drag and drop it in the Chart Preview area.
2) Place the dependent variable in the vertical axis.
3) Place the independent variable in the horizontal axis.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-4

Figure 10.3 Chart Builder Dialog Box to Create a Scatterplot

10.7 Demonstration: Scatterplot
We will work with the Census.sav data file in this example.
In this demonstration we examine the relationship between mother’s education (maeduc) and father’s
education (paeduc). This will allow us to investigate whether people marry those with a similar
amount of education (we could also investigate this question by examining the scatterplot of the
respondent’ education and his/her spouse).

Detailed Steps for a Scatterplot
1) Place the Simple Scatter icon in the Chart Preview area
2) Place the variable maeduc in the Y-axis box.
3) Place the variable paeduc in the X-axis box.

Results from the Scatterplot
The scatterplot visually presents the relationship between two variables by displaying individual
observations. Observe that for the most part couples with low education tend to marry each. Also,
couples with high education tend to marry each other, thus there is a positive, linear relationship.

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-5

Figure 10.4 Scatterplot of Mother’s and Father’s Highest Year of School Completed

In preparation for our discussion of the correlation coefficient, we will edit the scatterplot to
superimpose a best fit straight line to the data.

10.8 Adding a Best Fit Straight Line to the Scatterplot
All charts in PASW Statistics can be edited. The type of editing that can be done depends on the type
of chart. For a scatterplot, one option is to add a best fit line, which can be done in one step.

Detailed Steps to Edit Scatterplot
We need to open the scatterplot in the Chart Editor.

1) Double-click on the chart to open it in the Chart Editor
2) Select Elements…Fit Line at Total
3) Close the Chart Editor

The straight line tracks the positive relationship between father and mother’s educational attainment.
How well do you think it describes or models the relationship? Does it match what you would have
drawn by hand?

We use scatterplots to get a sense of whether or not it is appropriate to use a correlation coefficient to
describe this relationship with one number. As we will learn, the correlation coefficient assumes a
linear relationship.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-6

Figure 10.5 Scatterplot with Fit Line Added

Apply Your Knowledge
1. Consider the scatterplot between X and Y depicted below. Which statements are correct?

a. There is a linear relationship between X and Y
b. There is a positive relationship between X and Y
c. The point labeled A is far from the straight line describing the relationship between X

and Y
d. The point labeled B is far from the straight line describing the relationship between X

and Y

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-7

10.9 Pearson Correlation Coefficient
The Pearson Correlation Coefficient is a measure of the extent to which there is a linear (or straight
line) relationship between two scale variables. It is normed so that a correlation of +1 indicates that
the data fall on a perfect straight line sloping upwards (positive relationship), while a correlation of –1
would represent data forming a straight line sloping downwards (negative relationship). A correlation
of 0 indicates there is no straight-line relationship at all.

Below are four scatterplots with their accompanying correlations, all based on simulated data
following normal distributions. Four different correlations appear (here, and in general, the letter “r” is
used to denote the Pearson Correlation Coefficient). All are positive, but represent the full range in
strength of linear association (from 0 to 1). As an aid in interpretation, a best-fitting straight line is
superimposed on each chart.

Figure 10.6 Scatterplots Based on Various Correlations

For the perfect correlation of 1.0, all points fall on the straight line trending upwards. In the scatterplot
with a correlation of .8 in the upper right, the strong positive relation is apparent, but there is some
variation around the line. Looking at the plot of data with correlation of .4 in the lower left, the
association is clearly less pronounced than with the data correlating .8 (note greater scatter of points
around the line). The final chart displays a correlation of 0: there is no linear association present and
the best-fitting straight line is a horizontal line.

We use statistical tests to determine whether a relationship between two or more variables is
statistically significant. That is, we want to test whether the correlation differs from zero (zero
indicates no linear association) in the population, based on the sample results. In other words, the
null hypothesis is the correlation coefficient is 0 in the population, and we use a statistical test to
assess this hypothesis.

Pearson Correlation Coefficient Assumptions
To correctly use the Pearson correlation coefficient and apply statistical tests, three conditions must
be met:

1) Variables must have a scale measurement level.
2) Variables must be linearly related.
3) Variables must be normally distributed.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-8

10.10 Requesting a Pearson Correlation Coefficient
The Pearson Correlation Coefficient is available in the Bivariate Correlations procedure. Requesting
a Pearson Correlation Coefficient is accomplished with these steps:

1) Choose variables for the correlation (We simply list the variables to be analyzed; there is no
designation of dependent and independent variables. Correlations will be calculated on all
pairs of variables listed.).

2) Optionally select the correlation statistic to calculate. Pearson is used for scale variables,
while Spearman and Kendall’s tau-b (less common) are used for non-normal data or ordinal
data, as relationships are evaluated after the original data have been transformed into ranks.

3) Optionally, select a one- or two-tailed significance test to perform.
4) The Flag significant correlations check box is checked by default. When checked, asterisks

appearing beside the correlations will identify significant correlations.
5) Examine the output to see which correlations are significant.

10.11 Bivariate Correlation Output
The standard correlation table will provide the following information:

• The Pearson Correlation, which will range from +1 to -1, the further away from 0, the stronger
the relationship

• The 2-tailed significance level, the test of the null hypothesis that the correlation is 0 in the
population; all correlations with a significance level less than .05 will be considered
statistically significant and will have an asterisk next to the coefficient

• N, which is the sample size
• The correlations in the major diagonal will always be 1, because these are the correlations of

each variable with itself
• The correlation matrix is symmetric, so that the same information is represented above and

below the major diagonal

Figure 10.7 Example of Bivariate Correlations Output

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-9

10.12 Procedure: Pearson Correlation with Bivariate
Correlations

The Bivariate Correlations procedure is accessed from the Analyze…Correlate…Bivariate menu
choice. With the Bivariate Correlations dialog box open:

1) Place two or more variables in the Variables box.
2) Optionally, select the correlation coefficient to calculate.
3) Optionally, select the type of significance test to perform.
4) Open the Options dialog box to display descriptive information and determine how to handle

missing values.

Figure 10.8 Bivariate Correlations Dialog Box

Dichotomous variables with only two categories can be
used to calculate correlation coefficients. A scatterplot
won’t be useful to study the relationship between a
dichotomous variable and a scale variable, but the
correlation coefficient still provides information on the
strength and direction of the relationship.

Further Info

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-10

10.13 Demonstration: Pearson Correlation with Bivariate
Correlations

We will work with the Census.sav data file in this lesson.

In this demonstration we examine the relationship between mother’s education (maeduc), father’s
education (paeduc), and the education of the respondent (educ). We would like to determine whether,
for example, people marry people with a similar amount of education and if the children also have a
similar amount of education as their parents.

Detailed Steps for Bivariate Correlations
1) Place maeduc, paeduc and educ in the Variables box.

Results from Bivariate Correlations
In the Correlations table we see that the three correlations are moderate to high, and positive.
Couples with high education tended to marry each other, thus we have a positive, linear relationship
(r = .68). There is a similar level of association between the respondent’s education and that of his
father (r=.48) and that of his mother (r=.44).

Figure 10.9 Correlations Table of Education Variables

What we don’t know is whether these relationships are due to sampling variation, or instead are likely
to be real and exist in the population of adults. For this we turn to the significance level of the Pearson
Correlation Coefficient.

We see that the probability of the null hypothesis being true for all of these relationships is extremely
small, less than .01; therefore we reject the null hypothesis and conclude that there is a positive,
linear relationship between these variables.

Apply Your Knowledge
1. True or false? If we remove point A from the data, the correlation between X and Y will be

lower?

BIVARIATE PLOTS AND CORRELATIONS FOR SCALE VARIABLES

10-11

2. True or false? In the two scatterplots below, the correlation between age and income is
higher in A than in B?

3. What is the range of a Pearson correlation coefficient?
a. From 0 to 1
b. Can take on any positive or negative value
c. From –1 to 1
d. Depends on the standard deviation of the variables

10.14 Lesson Summary
We explored the use of scatterplots and correlations to examine relationships between scale
variables.

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Perform a statistical test to determine whether two scale variables are correlated (related)

To support the achievement of the primary objective, students should now also be able to:

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

10-12

• Visually assess the relationship between two scale variables with scatterplots, using the
Chart Builder procedure

• Explain the options of the Bivariate Correlations procedure
• Explain the Pearson correlation coefficient and its assumptions
• Interpret a Pearson correlation coefficient

10.15 Learning Activity
The overall goal of this learning activity is to visualize the relationship between two scale variables
creating scatterplots and to quantify this relationship with the correlation coefficient. In this set of
learning activities you will use the data file Bank.sav.

1. Suppose you are interested in understanding how an employees demographic
characteristics, beginning salary, and time at the bank and in the work force are related to
current salary. Start by producing scatterplots of salbeg, sex, time, age, edlevel, and work
with salnow. Add a fit line to each plot. Check on the variable labels for time and work so you
understand what these variables are measuring.

2. Describe the relationships based on the scatterplots. Do they all appear to be linear? Are any
relationships negative? What is the strongest relationship?

3. Now produce correlations with all these variables. Which correlations with salnow are

significant? What is the largest correlation in absolute value with salnow? Did this match what
you thought based on the scatterplots?

4. Examine the correlations between the other variables? Which variables are most strongly

related? Create scatterplots for these as well to check for linearity.

5. For those with more time: Go back and review the scatterplots with salnow. Are there any
employees who are outliers—far from the fit line—in any of the scatterplots? How might they
be affecting the relationship?

The file Bank.sav, a PASW Statistics data file that
contains information on employees of a major bank.
Included is data on beginning and current salary position,
time working, and demographic information.

Supporting
Materials

REGRESSION ANALYSIS

11-1

Lesson 11: Regression Analysis
11.1 Objectives
After completing this lesson students will be able to:

• Perform linear regression to determine whether one or more variables can significantly
predict or explain a dependent variable

To support the achievement of the primary objective, students will also be able to:

• Explain linear regression and its assumptions
• Explain the options of the Linear Regression procedure
• Interpret the results of the Linear Regression procedure

11.2 Introduction
Correlations allow one to determine if two scale variables are linearly related to each other. So for
example, beginning salary and education might be positively related for employees. Regression
allows one to further quantify this relation by developing an equation predicting starting salary based
on education. Linear regression is a statistical method used to predict a variable (a scale dependent
measure) from one or more independent (scale) variables. Commonly, straight lines are used,
although other forms of regression allow nonlinear functions. In this lesson we will focus on linear
regression.

Business Context
When we examine the relationships between scale variables, we would like to know whether a
relationship we observe is likely to exist in our target population or instead is caused by random
sampling variation. Statistical testing tells us whether scale variables are related. The results of linear
regression allow us to determine if one or more scale variables predict an outcome variable and the
impact each independent variable has on this variable. For example, we might want to know whether:

• Higher SAT scores explain higher first year college GPAs
• Eating more often at fast-food restaurants predicts more frequent shopping at convenience

stores
• Increasing income leads to more customer purchases

11.3 Simple Linear Regression
Linear regression involving a single independent (scale) variable is the simplest case and is called
simple linear regression. In other words, one scale variable, say X, is used to predict another scale
variable, say Y.

The file Bank.sav, a PASW Statistics data file that
contains information on employees of a major bank.
Included is data on beginning and current salary position,
time working, and demographic information.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-2

Simple Regression Illustrated
When there is a single independent variable, the relationship between the independent variable and
dependent variable can be visualized in a scatterplot, and the concept of linear regression can be
explained using the scatterplot.

Figure 11.1 Scatterplot of Height and Weight

The line superimposed on the scatterplot is the best straight line that describes the relationship. The
line is represented in general form by the equation,

Y = a + b*X

where, b is the slope (the change in Y per unit change in X) and a is the intercept (the value of Y
when X is zero). (Here, Y is weight in pounds and X is height in inches).

The value of the equation is linked to how well it actually describes or fits the data, and so part of the
regression output includes fit measures. To quantify the extent to which the straight line fits the data,
the fit measure, R Square, was developed. R2

has the dual advantages of falling on a standardized
scale and having a practical interpretation. The R Square measure (which is simply the correlation
squared, or r2, when there is a single predictor variable, and thus its name) is on a scale from 0 (no
linear association) to 1 (perfect linear prediction). Also, the R Square value can be interpreted as the
proportion of variation in one variable that can be predicted from the other. Thus an R Square of .50
indicates that we can account for 50% of the variance in one variable if we know values of the other.
You can think of this value as a measure of the improvement in your ability to predict one variable
from the other (or others if there are multiple independent variables).

We use statistical tests to determine whether a relationship between the independent variable and
dependent variable is statistically significant. That is, we want to test whether the predictor can
significantly explain the dependent variable.

REGRESSION ANALYSIS

11-3

• Linear Regression is used to determine whether the independent variable can significantly
explain the dependent variable, in other words, test the null hypothesis that R Square is zero

• Once we find a relationship, we have to assess the effect of the independent variable on the

dependent variable

Finally, referring to the scatterplot, we see that many points fall near the line, but some are quite a
distance from it. For each point, the difference between the value of the dependent variable and the
value predicted by the equation (value on the line) is called the residual

(also known as the error).
Points above the line have positive residuals (they were under-predicted), those below the line have
negative residuals (they were over-predicted), and a point falling on the line has a residual of zero
(perfect prediction). Points having relatively large residuals are of interest because they represent
instances where the prediction line did poorly. Outliers, or points far from the mass of the others, are
of interest in regression because they can exert considerable influence on the equation (especially if
the sample size is small).

11.4 Simple Linear Regression Assumptions
To correctly use simple linear regression and apply statistical tests, four conditions must be met:

1) Variables must have a scale measurement level.
2) Variables must be linearly related.
3) Residuals must be normally distributed.
4) Residuals are assumed to be independent of the predicted values, implying that the variation

of the residuals around the line is homogeneous.

A variable coded as a dichotomy (say 0 and 1) can
technically be considered as a scale variable. A scale
variable assumes that a one-unit change has the same
meaning throughout the range of the scale. If a variable’s
only possible codes are 0 and 1 (or 1 and 2, etc.), then a
one-unit change does mean the same change throughout
the scale. Thus dichotomous variables, for example sex,
can be used as predictor variables in regression. It also
permits the use of nominal predictor variables if they are
converted into a series of dichotomous variables; this
technique is called dummy coding and is considered in
most regression texts (Draper and Smith, 1998; Cohen
and Cohen, 2002).

Note

While not covered in this course, PASW Statistics can
provide influence statistics to aid in judging whether the
equation was strongly affected by a particular observation. Note

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-4

11.5 Requesting Simple Linear Regression
Requesting a linear regression involves these steps:

1) Select a dependent variable and an independent variable
2) Review the procedure output to investigate the relationship between the variables including:

a. R
b. Adjusted R

2
2

3) Examine the regression test statistics to determine whether the observed relationship is
statistically significant.

.

4) Determine the impact of the independent variable on the dependent variable.

11.6 Simple Linear Regression Output
The standard linear regression will generate three tables depicting the relationship between the two
variables.

• The Model Summary table provides several measures of how well the model fits the data
• R, the multiple correlation coefficient, which can range from 0 to 1, is a generalization of the

correlation coefficient. It is the correlation between the dependent measure and the
combination of the independent variable(s), thus the closer the multiple R is to 1, the better
the fit. If there is only one predictor, the multiple correlation coefficient is equivalent to the
Pearson correlation coefficient.

• R Square, which can range from 0 to 1, is the correlation coefficient squared. It can be
interpreted as the proportion of variance of the dependent measure that can be predicted
from the independent variable(s).

• Adjusted R Square represents a technical improvement over R Square in that it explicitly
adjusts for the number of predictor variables relative to the sample size. If Adjusted R Square
and R Square differ dramatically, it is a sign that you have used too many predictor variables
for your sample size.

• Standard Error of the Estimate is a standard deviation type summary of the dependent
variable that measures the deviation of observations around the best fitting straight line. As
such it provides, in the scale of the dependent variable, an estimate of how much variation
remains to be accounted for after the line is fit.

Figure 11.2 Model Summary and ANOVA Tables

REGRESSION ANALYSIS

11-5

While the fit measures indicate how well we can expect to predict the dependent variable, they do not
tell whether there is a statistically significant relationship between the dependent and independent
variable(s). The analysis of variance table (ANOVA in the Output Viewer) presents technical
summaries (sums of squares and mean square statistics) of the variation accounted for by the
prediction equation. Our main interest is in determining whether there is a statistically significant (non-
zero) linear relation between the dependent variable and the independent variable(s) in the
population.

• The Sig. column provides the probability that the null hypothesis is true (i.e., no relationship
between the independent and dependent variable).

We use the significance, as in any other hypothesis test, to determine whether or not to reject the null
hypothesis. If significant results are found, we turn to the next table, Coefficients, to view the
regression coefficients.

Figure 11.3 Regression Coefficients

• The first column contains a list of the independent variables plus the intercept (Constant).
The intercept is the value of the dependent variable when the independent variable is 0.

• The column labeled B contains the estimated regression coefficients we would use in a
prediction equation. The coefficient for height indicates that on average, each additional inch
in height was associated with an increase in weight of 5.33 pounds.

• The Standard Error (of B) column contains standard errors of the regression coefficients. The
standard errors can be used to create a 95% confidence interval around the B coefficients.

• Betas are standardized regression coefficients and are used to judge the relative importance
of each of several independent variables.

• The t statistics provide a significance test for each B coefficient, testing whether it differs from
zero in the population.

11.7 Procedure: Simple Linear Regression
Linear Regression is available in the Regression…Linear menu choice. With the Linear
Regression dialog box open:

1) Place the dependent variable in the Dependent: box.
2) Place the independent variable in the Independent(s): box.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-6

Figure 11.4 Linear Regression Dialog Box

In the Linear Regression dialog:

1) The Independent(s) list box will permit more than one independent variable, and so this
dialog box can be used for both simple and multiple regression.

2) The block controls permit an analyst to build a series of regression models with the variables
entered at each stage (block), as specified by the user.

3) By default, the regression Method is Enter, which means that all independent variables in the
block will be entered into the regression equation simultaneously. This method is selected to
run one regression based on all variables you specify. If you wish the program to select, from
a larger set of independent variables, those that in some statistical sense are the best
predictors, you can request the Stepwise method.

4) The Selection Variable option permits cross-validation of regression results. Only cases
whose values meet the rule specified for a selection variable will be used in the regression
analysis; then the resulting prediction equation will be applied to the other cases. Thus you
can evaluate the regression on cases not used in the analysis, or apply the equation derived
from one subgroup of your data to other groups.

5) The Statistics dialog box presents many additional (and some of them quite technical)
statistics.

6) The Plots dialog box is used to generate various diagnostic plots used in regression,
including residual plots.

7) The Save dialog box permits you to add new variables to the data file. These variables
contain such statistics as the predicted values from the regression equation, various residuals
and influence measures.

8) The Options dialog box controls the criteria when running stepwise regression and choices in
handling missing data. By default, PASW Statistics excludes a case from regression if it has
one or more values missing for the variables used in the analysis.

REGRESSION ANALYSIS

11-7

11.8 Demonstration: Simple Linear Regression
We will work with the Bank.sav data file in this example.

In this example we examine the relationship between beginning salary at the bank (salbeg) and
highest year of school completed (edlevel). We would like to determine whether, for example, people
with more education receive a higher initial salary, and then determine the impact of each additional
year of education on salary.

Before we begin, we should view a scatterplot of these two variables.

Detailed Steps for Scatterplot
In the Chart Builder dialog, accessed from Graphs…Chart Builder:

1) Place the Simple Scatter icon in the Chart Preview area
2) Place the variable salbeg in the Y-axis box.
3) Place the variable edlevel in the X-axis box.

We can observe that there is a positive relationship between education and beginning salary. This
relationship, though, is not strong, as there is a fair amount of scatter in the data. The relationship
does seem reasonably linear.

The PASW Statistics Missing Values add-on module
provides more sophisticated methods for handling missing
values. This module includes procedures for displaying
patterns of missing data and imputing (estimating) missing
values using multiple variable imputation methods.

Note

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-8

Figure 11.5 Scatterplot of Beginning Salary and Education Level

Once we have viewed the scatterplot, we are read for the simple linear regression.

Detailed Steps for Simple Linear Regression
1) Place salbeg in the Dependent box.
2) Place edlevel in the Independent(s) box.

Results from Simple Linear Regression
Here we can observe that the multiple R or correlation coefficient between highest year of school
completed and beginning salary is .633—this is the correlation between these two variables. If we
square the multiple R, we get R Square, which tells us that about 40% of the variance (.401) in the
dependent variable can be predicted from the independent variable. Adjusted R Square is basically
the same, as there is only one independent variable in the model.

If you wish, you can edit the chart and request a
regression fit line to determine whether linear regression
seems appropriate for these data. Note

REGRESSION ANALYSIS

11-9

Figure 11.6 Model Summary Table for Regression

So now we know the correlation between the dependent and independent variables and the percent
of variance accounted for by the model. However, we don’t know whether this relationship is due to
sampling variation, or instead is likely to be real and exist in the population. For this we turn to the
ANOVA table.

Every statistical test has a null hypothesis. In most cases, the null hypothesis is that there is no
relationship between two variables. This is also true for the null hypothesis for Linear Regression (the
null hypothesis is that we have no relationship between the dependent and the combination of
independent variable(s)). If the significance is small enough, the null hypothesis has to be rejected.

The probability of the null hypothesis being correct for this relationship is extremely small, less than
.01, therefore the null hypothesis has to be rejected and the conclusion is that there is a linear
relationship between these variables.

Figure 11.7 ANOVA Table

Because a significant relationship has been found between our dependent and independent variable,
next step is to determine what the impact on the dependent variable is.

Figure 11.8 Coefficients Table

Looking at the B column, we see that for each additional year of education completed, the expected
increase in beginning salary is $691.01. In fact, if we wanted to predict beginning salary based on
education, we would use the following equation: salbeg = –2,516.387 + 691.011*(educ).

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-10

The column t is a technical detail and tells that the observed sample coefficient (691.011) is 17.773
times the standard error (38.879) away from the value that we expect if the null hypothesis is true,
i.e., zero. Such a t value leads to a significance value of .000 (column Sig.) for the B coefficient.

Apply Your Knowledge
1. True or false? In the figure depicted below the point labeled A has a positive residual?

2. True or false. Suppose we predict Salary with Age and find an R Square of .25. Then 75% of
the variation in Salary cannot

be accounted for by Age?

3. Which coefficient is used in the regression equation to make predictions?
a. Beta
b. B

The column labeled Standardized Coefficients will be
discussed in the Multiple Regression section.

Note

In case of simple regression the test on the null
hypothesis R Square = 0 tests the same as the test on B =
0. In both cases the test is whether X and Y are linearly
related. This equivalence can be seen in the test statistics,
as the F value in the ANOVA table is the square of the t
value in the Coefficients table.

Further Info

REGRESSION ANALYSIS

11-11

11.9 Multiple Regression
A regression involving more than one independent variable is called multiple regression and is a
direct extension of simple linear regression. When we run multiple regression we will again be
concerned with how well the equation fits the data, whether a linear model is the best fit to the data,
whether any of the variables are significant predictors, and estimating the coefficients for the best-
fitting prediction equation. In addition, we are interested in the relative importance of the independent
variables in predicting the dependent measure.

11.10 Multiple Linear Regression Assumptions
To correctly use multiple regression and apply statistical tests, one extra condition has to be met, in
addition to the assumptions stated for simple linear regression. We restate the four assumptions for
simple linear regression and add the extra assumption:

1) Variables must have a scale measurement level.
2) Variables must be linearly related.
3) Residuals must be normally distributed.
4) Residuals are assumed to be independent of the predicted values, implying that the variation

of the residuals around the line is homogeneous.

5) Absence of multi-collinearity—no exact or nearly exact linear relation between the
independent variables.

11.11 Requesting Multiple Linear Regression
Requesting multiple linear regression is accomplished with these steps:

1) Select a dependent variable and two or more independent variables
2) Request a histogram of the residuals; this allows for a check of the normality of errors
3) Request a scatterplot of the standardized residuals and standardized predicted value; this

allows for a check of the homogeneity of errors
4) Review the procedure output to investigate the relationship between the variables including:

a. R
b. Adjusted R

2
2

5) Examine the regression test statistics to determine whether the observed relationship is
statistically significant.

.

6) Determine which independent variables are significantly related to the dependent variable.
7) Determine the impact of each independent variable on the dependent variable.

11.12 Multiple Linear Regression Output
The standard linear regression will generate the same tables depicting the relationship between the
variables that were discussed earlier.

The R square and Adjusted R square are interpreted the same, except now the amount of explained
variance is from a group of predictors, not just one.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-12

Figure 11.9 Variables in the Model and Model Summary

The ANOVA table has a similar interpretation, except that now it tests whether any variable has a
significant effect on the dependent variable.

Figure 11.10 ANOVA Table for Multiple Regression

In the Coefficients table, there is an additional twist to the interpretation of the B coefficient. For
example, the effect here of height on weight can be stated as every additional inch of height predicts
an additional 4.261 pounds of weight. However, that estimate controls for the other variables in the
model.

Here, we must also examine the significance values for each coefficient, because a regression that is
overall significant does not imply that each coefficient is statistically significant.

REGRESSION ANALYSIS

11-13

Figure 11.11 Multiple Regression Coefficients Table

To test the assumptions of regression, we turn to the histogram of residuals. The residuals should be
approximately normally distributed, which is basically true for the histogram below.

Figure 11.12 Histogram of Residuals

Additionally, the scatterplot of the standardized error (residual) and standardized predicted value
(here of weight) should show no pattern if homogeneity of variance holds.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-14

Figure 11.13 Scatterplot of Residuals and Predicted Value

11.13 Procedure: Multiple Linear Regression
Multiple linear regression is available in the Regression…Linear menu choice. With the Linear
Regression dialog box open:

1) Place the dependent variable in the Dependent: box.
2) Place the independent variables in the Independent(s): box.
3) Select the Plots button to open that dialog

REGRESSION ANALYSIS

11-15

Figure 11.14 Linear Regression Dialog for Multiple Regression

In the Plots dialog:

4) Select Histogram
5) Move *ZRESID into the Y: box in the Plots dialog and*ZPRED into the X: box in the Plots

dialog

Figure 11.15 Linear Regression Plots Dialog

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-16

11.14 Demonstration: Multiple Linear Regression
We will work with the Bank.sav data file in this lesson.

In this example we examine the relationship between beginning salary at the bank and several
predictors, including education, years of previous work experience (work), age, and gender (sex).
Note that gender is a dichotomous variable coded 0 for males and 1 for females, but it can be
included as an independent variable (see Message Box above).

Detailed Steps for Multiple Linear Regression
1) Place salbeg in the Dependent: box.
2) Place edlevel, sex, age, and work in the Independent(s): box.

While we can run multiple regression at this point, we will request some diagnostic plots involving
residuals and information about outliers. By default no residual plots will appear.

3) Select the Plots button
4) Select Histogram.
5) Move *ZRESID into the Y: box in the Plots dialog.
6) Move *ZPRED into the X: box in the Plots dialog.

We requested a histogram of the standardized residuals because regression assumes that the
residuals follow a normal distribution.

Regression can produce summaries concerning various types of residuals. We request a scatterplot
of the standardized residuals (*ZRESID) versus the standardized predicted values (*ZPRED)
because regression assumes that residuals are independent of predicted values, thus if we see any
patterns (as opposed to a random blob) in this plot, it might suggest a way of adjusting and improving
the analysis.

Next we request casewise diagnostics in the Statistics dialog. The Casewise Diagnostics check box
requests information about all cases whose standardized residuals are more than 3 standard
deviations from the fit line.

7) Select the Statistics button
8) Select Casewise Diagnostics

Results from Multiple Linear Regression
Recall that Linear Regression uses listwise deletion of missing data so that if a case is missing data
on any of the five variables used in the regression it will be dropped from the analysis. If this results in
heavy data loss, other choices for handling missing values are available in the Regression Options
dialog box.

Here we can observe that the multiple R or correlation coefficient between our combination of
predictors and the dependent variable is .699. If we square the multiple R, we get R square, which is
.489. The Adjusted R square is .485, which is about the same.

When reporting on explained variance—R square—
always report the adjusted R square.

Further Info

REGRESSION ANALYSIS

11-17

Therefore, about 48.5% of the variance in beginning salary can be predicted from the four
independent variables.

Figure 11.16 Model Summary Table for Multiple Regression

We see that the probability of the null hypothesis being correct for this relationship is extremely small,
less than .01, therefore we reject the null hypothesis and conclude that there is a linear relationship
between these variables and beginning salary.

Figure 11.17 ANOVA Table for Regression of Beginning Salary

Figure 11.18 Regression Coefficients Table to Predict Beginning Salary

In the Coefficients table, the independent variables appear in the order they were listed in the Linear
Regression dialog box, not in order of importance. Although the B coefficients are important for
prediction and interpretive purposes, analysts usually look first to the t test at the end of each line to
determine which independent variables are significantly related to the outcome measure. Since four
variables are in the equation, we are testing if there is a linear relationship between each independent

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-18

variable and the dependent variable after adjusting for the effects of the three other independent
variables. Looking at the significance values we see that edlevel, sex, and age significant at the .05
significance level, while the work experience is not (sig = .352 after controlling for the other
predictors).

The estimated regression (B) coefficient for edlevel is about $651, similar but not identical to the
coefficient (691) found in the simple regression using edlevel alone. In the simple regression we
estimated the B coefficient for edlevel ignoring any other effects, since none were included in the
model. Here we evaluate the effect of edlevel after controlling (statistically adjusting) for age, sex, and
work. If the independent variables are correlated, the change in B coefficient from simple to multiple
regression can be substantial. So, after controlling (holding constant) age, sex, and work, one year of
formal education, on average, was worth another $651 in beginning salary.

Continuing on:

• The variable sex has a B coefficient of about –$1,526. This means that a one-unit change in
gender (which means moving from male status to female, or comparing females to males),
controlling for the other variables, is associated with a drop in beginning salary of –$1,526.

• The variable age has a B coefficient of $33, so each additional year increases beginning
salary by $33.

• Since we found work experience’s coefficient to be not significantly different from zero, we
treat it as 0.

If we simply look at the estimated B coefficients we might think that sex is the most important
variable. However, the magnitude of the B coefficient is influenced by the unit of measurement (or
standard deviation if you like) of the independent variable. The Beta coefficients explicitly adjust for
such standard deviation differences in the independent variables.

• They indicate what the regression coefficients would be if all variables were standardized to
have means of 0 and standard deviations of 1.

• A Beta coefficient thus indicates the expected change (in standard deviation units) of the
dependent variable per one standard deviation unit increase in the independent variable
(after adjusting for other predictors). This provides a means of assessing relative importance
of the different predictor variables in multiple regression.

• The Betas are normed so that the maximum should be less than or equal to one in absolute
value (if any Betas are above 1 in absolute value, it suggests a problem with the data: multi-
collinearity).

Examining the Betas, we see that edlevel is the most important predictor, followed by sex, and then
age. The Beta for work is near zero, as we would expect.

If we needed to predict salbeg from these background variables (dropping work) we would use the B
coefficients. Rounding to whole numbers, we would say:

salbeg = –2,667 + 651*edlevel – 1526*sex + 33*age.

Diagnostic Statistics
The request for casewise diagnostics produces two tables, the most important of which is shown
below. The Casewise Diagnostics table lists those observations more than three standard deviations
(in error) from the regression fit line. Assuming a normal distribution, this would happen less than 1%
of the time by chance alone. In this data file that would be about 5 outliers (.01*474), so the six cases
does not seem excessive. Residuals should normally be balanced between positive and negative
values; when they are not, you should investigate the data further. In these data, all six residuals are
positive, so this does indicate that some additional investigation is required. We could, for example
see if these observations have anything in common (very high education which may be out of line

REGRESSION ANALYSIS

11-19

with others). Since we know their case numbers (an ID variable can be substituted), we could find
them easily at them more closely.

We also don’t want to discover very large prediction errors, but here residuals are very, very high,
over 6 standard deviations above the fit line.

Figure 11.19 Casewise Listing of Outliers

In the diagnostic plots involving residuals we see the distribution of the residuals with a normal bell-
shaped curve superimposed, depicted in the figure below. The residuals are fairly normal, although
they are a bit too concentrated in the center. They are also somewhat positive skewed. Given this
pattern, a data analyst might try a data transformation on the dependent measure, which might
improve the properties of the residual distribution, e.g., the log. However, just as with ANOVA, larger
sample sizes protect against moderate departures from normality.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-20

Figure 11.20 Histogram of the Residuals

In the scatterplot of residuals, we hope to see a horizontally oriented blob of points with the residuals
showing the same spread across different predicted values. Unfortunately, we see a hint of a curving
pattern: the residuals seem to slowly decrease, then swing up at higher salaries. This type of pattern
can mean the relationship is curvilinear.

Also, the spread of the residuals is much more pronounced at higher predicted salaries. This
suggests lack of homogeneity of variance.

REGRESSION ANALYSIS

11-21

Figure 11.21 Scatterplot of Residuals and Predicted Value for Beginning Salary

Apply Your Knowledge
1. Consider the output below, for the regression where Miles per gallon for a vehicle is predicted

from Engine Size, Horsepower, Weight and American car or not (coded as 1 for American
cars and 0 for cars from other countries). Which statements are correct?

a. We predict lower mpg for an American car than for a non-American car
a. All predictors have an effect significantly different from 0
b. The most important predictor for Miles per Gallon is Vehicle Weight
c. The fact that there is an Unstandardized

coefficient greater than 1 (in absolute value)
indicates a problem due to multi-collinearity.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-22

Additional Resources

11.15 Lesson Summary
We explored the use of Linear Regression to test relationships between scale variables and develop
prediction equations to prediction the dependent variable and determine the impact of each
independent variable on the dependent variable.

Lesson Objectives Review
Students who have completed this lesson should be able to:

• Perform linear regression to determine whether one or more variables can significantly
predict or explain a dependent variable

To support the achievement of the primary objective above, students should also be able to:

• Explain linear regression and its assumptions
• Explain the options of the Linear Regression procedure
• Interpret the results of the Linear Regression procedure

11.16 Learning Activity
The overall goal of this learning activity is to run linear regressions and to interpret the output. You will
use the PASW Statistics data file Census.sav.

1. Run a linear regression to predict total family income (income06) with highest year of
education (educ). First, do a scatterplot of these two variables and superimpose a fit line.
Does the relationship seem linear? How would you characterize the relationship?

2. Now run the linear regression. What is the Adjusted R square value? Is the regression

significant? What is the B coefficient for educ? Interpret it.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

For additional information on linear regression
analysis, see:

Allison, Paul D. 1998. Multiple Regression: A Primer.
Thousand Oaks, CA: Pine Forge.

Draper, Norman and Smith, Harry. 1998. Applied
Regression Analysis. 3rd ed. New York: Wiley.

Further Info

REGRESSION ANALYSIS

11-23

3. Next add the variables born (born in the U.S. or overseas), age, sex, and number of brothers
and sisters (sibs). Check the coding on born so you can interpret its coefficient. First, do a
scatterplot of age and sibs with income06. Superimpose a fit line. Does the relationship seem
linear? How would you characterize the relationship? Why not do scatterplots of income06
with sex and born?

4. Use all these variables to predict income06. Request residual statistics including the

histogram of errors and the scatterplot of standardized values. Also request casewise
diagnostics. What is the Adjusted R square? How much has it increased from above?

5. Which variables are significant predictors? What is the effect of each on income06? Which

variable is the strongest predictor? The weakest?
6. Examine the casewise diagnostics. Do you see any pattern? Are there more cases with large

errors than we would expect?

7. Examine the histogram and scatterplot. Are the errors normally distributed? Do you see any
pattern in the scatterplot? What might that mean?

8. What is the prediction equation for income06?

9. For those with more time: Add additional variables to the regression equation for income06.

Examples are father and mother’s education, or number of children. Be careful to add
variables that are at least on an interval scale of measurement. Repeat the exercise above.
Are the new variables significant predictors? Does adding variables change the effects of the
variables already in the model from above?

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

11-24

NONPARAMETRIC TESTS

12-1

Lesson 12: Nonparametric Tests
12.1 Objectives
After completing this lesson students will be able to:

• Perform non-parametric tests on data that don’t meet the assumptions for standard statistical
tests

To support the achievement of this primary objective, students will also be able to:

• Describe when non-parametric tests should and can be used
• Describe the options in the Nonparametric procedure dialog box and tabs
• Interpret the results of several types of nonparametric tests

12.2 Introduction
Parametric tests, such as the t test, ANOVA, or the Pearson correlation coefficient, make several
assumptions about the data. They typically assume normality of the variable(s) distribution for scale
variables, and they often assume that the variance is equal within categories of a grouping or factor
variable. If these assumptions are clearly violated, the results of these tests are in doubt. Fortunately
there are alternatives.

There are a whole family of various tests and methods that make fewer assumptions about the data.
These tests fall under the class of nonparametric statistics.

1) These methods generally don’t assume normality or variance equality.
2) They are generally less powerful than parametric tests, which means they have a lower

chance of finding true differences.
3) These tests are most useful with questions using a short response scale with 3, 4, or 5

points. These scales are not truly interval in measurement.
4) They are also useful when variables have very skewed distributions and so the normality

assumption is violated.
5) These tests are also commonly used when sample size is small.

Business Context
Nonparametric tests allow us to determine if we have relationships in our data when we do not meet
important distributional assumptions. They permit us to do standard analysis—looking at group
differences, or assessing associations between variables—with all types of data, thus extending data
analysis capabilities.

The file Census.sav, a PASW Statistics data file from a
survey done on the general adult population. Questions
were included about various attitudes and demographic
characteristics.

Supporting
Materials

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-2

12.3 Nonparametric Analyses
PASW Statistics includes a wizard to guide you through selecting the appropriate nonparametric test
for a particular set of variables. You need to know whether the general situation is:

• One Sample: A dependent variable with no grouping variable
• Independent Samples: A dependent variable with a grouping (factor) variable
• Related Samples: Two (or more) dependent variables whose association you wish to test

(such as experiments with pre- and post-test measurements)

Figure 12.1 Nonparametric Menu Choices

12.4 The Independent Samples Nonparametric Analysis
When the dependent variable is scale and we want to test equality of population means for two, or
three or more groups, the procedures that we have used were the Independent-Samples T Test and

By default, the Nonparametric Tests procedures will use
the declared scale of measurement of variables to
determine how they are used. In particular, recall that
nonparametric tests are an alternative to parametric
methods such as One-Way ANOVA, and these
parametric procedures assume that a dependent
variable is scale. Therefore, to conduct the equivalent
nonparametric test, the dependent variable must have
scale level of measurement in PASW Statistics..
If you use the default settings in the Nonparametric
Tests Wizard, it is critical that the level of measurement
be set correctly for each variable in the analysis.

Important

NONPARAMETRIC TESTS

12-3

One-Way ANOVA, respectively. Certain conditions (normality, homogeneity of variances) had to be
satisfied to use these procedures.

If the assumptions are violated, or if the variable is ordinal in nature, these tests cannot be used and
an alternative is needed. Nonparametric Independent Samples tests provide this alternative. The
wizard will select the appropriate test, depending on whether there are two groups or three or more
groups. If there are more than two groups, a post hoc analysis can be run to determine which groups
differ significantly, analogous to the post hoc pairwise comparisons in the One-Way ANOVA
procedure.

Independent Samples Nonparametric Assumptions
The nonparametric tests for two or more independent samples only assume:

1) There is a categorical independent variable defining two or more groups
2) There is an ordinal or scale dependent variable on which group differences are tested.

12.5 Requesting an Independent Samples Nonparametric
Analysis

Requesting a nonparametric test for two or more independent samples is accomplished with these
steps:

1) Select Independent Samples from the Nonparametric Tests menu entry.
2) Select whether you want to compare the shape of the distributions, the medians, or

customize the analysis.
3) Specify the Test field and Group variables.
4) To see the equivalent of post hoc pairwise comparisons, you can request a specific test and

the appropriate comparison.
5) Review the significance test and other output in the Model Viewer.

12.6 Independent Samples Nonparametric Tests Output
The output from the Independent Samples tests, and for all procedures in the Nonparametric Tests
menu entry, is produced in the Model Viewer. Initially, all that is displayed is the actual test result.

• The null hypothesis being tested is described in plain language
• The specific test used is noted, here the Kruskal-Wallis test
• The significance level is listed
• The decision about the null hypothesis is listed, using the .05 level of significance

Figure 12.2 Nonparametric Independent Samples Hypothesis Test Summary

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-4

By double-clicking on the Model output, the Model Viewer window is opened, with additional output.
The Model Viewer contains two panes:

1) The Main View appears on the left side. It displays general information about the tests, and
there is a dropdown list at the bottom of the pane that allows you to switch between views
about the test/model.

2) The Auxiliary view appears on the right side. This view displays a more detailed visualization
(including tables and graphs) of the model compared to the general visualization in the main
view. Like the main view, the auxiliary view may have more than one element that can be
displayed. Initially, in the main view, the overall test is listed, identical to what was displayed
in the Output Viewer. In the Auxiliary view, the distribution of the dependent variable is
displayed via a boxplot, and details about the test are listed.

Figure 12.3 Model Viewer Panes for Nonparametric Independent Samples Test

In the Auxiliary view, we can open the Pairwise Comparisons view, as shown in the figure below,
where all the possible pairwise comparisons are listed (with no redundancy). The tests are adjusted
for the fact that multiple comparisons are being done (6 tests in this example). Here only one
comparison is significant at the .05 level, that between the middle and working class groups. They
differ in their attitude toward spending on supporting scientific research.

The distance network chart—the graph above the table—lists the average rank for each category of
the grouping variable. The rank is used because this is a nonparametric test, and data can be ranked
without making many assumptions about the dependent variable. It is a graphical representation of
the comparisons table in which the distances between nodes in the network correspond to differences
between samples. Yellow lines correspond to statistically significant differences; black lines
correspond to non-significant differences. Hovering over a line in the network displays a tooltip with
the adjusted significance of the difference between the nodes connected by the line.

NONPARAMETRIC TESTS

12-5

Figure 12.4 Pairwise Comparison Tests

12.7 Procedure: Independent Samples Nonparametric
Tests

The nonparametric Independent Samples tests are accessed from the Analyze…Nonparametric
Tests…Independent Samples menu choice. With the dialog box open:

In the Objectives tab:

1) Select Customize analysis

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-6

Figure 12.5 Nonparametric Tests Two or More Independent Samples Dialog

In the Fields tab:

1) Specify the test variable(s)
2) Specify the Groups variable

NONPARAMETRIC TESTS

12-7

Figure 12.6 Two or More Independent Samples Fields Tab Dialog

In the Settings tab:

1) Select Customize tests and specify the desired test and, if applicable, any multiple
comparisons option.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-8

Figure 12.7 Two or More Independent Samples Settings Tab Dialog

12.8 Demonstration: Independent Samples Nonparametric
Tests

In this example we will use the file Census.sav. Our objective is to see how one’s political position
(from liberal to conservative; the variable polviews) is related to marital status. For example, are
married people more conservative than others?

The variable polviews is measured on a seven-point scale, and it is truly ordinal, not interval, in
measurement scale. Therefore, testing for differences by marital status is best done with a
nonparametric method.

Detailed Steps for Independent Samples Nonparametric Test
1) In the Objectives tab, select Customize analysis.
2) In the Fields tab, specify polviews as the Test Fields variable.
3) Specify marital as the Groups variable.
4) In the Settings tab, select Customize tests

The variable polviews may not be Scale in measurement
level in the data. If not, change its measurement level
before beginning this example.

Note

NONPARAMETRIC TESTS

12-9

5) Select the Kruskal-Wallis 1-way ANOVA test, and also select the All pairwise option on the
Multiple comparisons: dropdown

Results from Independent Samples Nonparametric Test
The model view output, initially condensed in the Viewer window, shows that we reject the null
hypothesis, as the significance of the Kruskal-Wallis test is .000. This test uses the ranks of cases on
the dependent variable to determine whether there are differences between categories, and we
conclude that there are.

Figure 12.8 Nonparametric Test of Political Position by Marital Status

To decide which categories differ from others, we need to open the Model viewer and look in the
Auxiliary pane at the Pairwise Comparisons view. There are many possible comparisons to make,
and those significant at the .05 level are highlighted. We find that:

• Those who are never married are significantly different than those who are divorced,
widowed, or married

• There are no other pairs that are significantly different (although the separated-married pair
has an adjusted significance of .063).

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-10

Figure 12.9 Pairwise Comparison Tests for Marital Status

To see the average rank on polviews for the pairs that differ, we can look at the distance network
chart. The average rank for the never married category is 800.51, lower than any other category. And
lower values on polviews indicate more liberal attitudes, so this implies that those who have never
been married are more liberal than married, divorced, and widowed respondents.

Figure 12.10 Distance Network Chart for Marital Status

NONPARAMETRIC TESTS

12-11

Apply Your Knowledge
1. True or false? In order to assess if there is a relationship between region and gender (two

nominal variables), a nonparametric Independent Samples test can be run?

2. See the dataset shown below, with data collected on students, with grade for mathematics at
two different points in time. What is the appropriate test to see whether grade differs at time 2
from time 1?

a. Nonparametric tests: One Sample
b. Nonparametric tests: Independent Samples
c. Nonparametric tests: Related Samples
d. Parametric test: Independent Samples T-Test

12.9 The Related Samples Nonparametric Analysis
As the parametric Independent-Samples T Test and One-Way ANOVA have their analog in the
nonparametric Independent Samples tests, so has the parametric Paired-Samples T Test its
equivalent in the nonparametric Related Samples test. (Actually the parametric Paired-Samples T
Test is done with two paired variables, while the nonparametric Related Samples procedure allows
for two or more paired variables.)

The nonparametric tests for two or more paired samples only assume that:

A One-Way ANOVA would find similar results as the
Kruskal-Wallis test, but the nonparametric test is more
appropriate for the data. However, sometimes an analyst
will perform both tests, and if the nonparametric test is
consistent with the parametric test results, report only on
the latter.

Further Info

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-12

1) The measurement level of the variables is ordinal or scale (depending on the specific test
chosen).

12.10 Requesting a Related Samples Nonparametric
Analysis

Requesting a nonparametric analysis is accomplished with these steps:
1) Select the Related Samples menu selection.
2) Select whether you want to compare observed data to hypothesized or customize the

analysis.
3) Specify the Test fields.
4) Review the significance test and other output in the Model Viewer.

12.11 Related Samples Nonparametric Tests Output
The output from the nonparametric Related Samples procedure is produced in the Model Viewer.
Initially, all that is displayed is the actual test result.

• The null hypothesis being tested is described in plain language
• The specific test used is noted, here the Wilcoxon Signed Ranks test
• The significance level is listed
• The decision about the null hypothesis is listed, using the .05 level of significance

Figure 12.11 Nonparametric Related Samples Test Summary

By double-clicking on the Model output, the Model Viewer window is opened, with additional output.
The Model Viewer contains two panes:

1) The Main View appears on the left side. It displays general information about the tests, and
there is a dropdown list at the bottom of the pane that allows you to switch between views
about the test/model.

2) The Auxiliary view appears on the right side. This view displays a more detailed visualization
(including tables and graphs) of the model compared to the general visualization in the main
view. Like the main view, the auxiliary view may have more than one element that can be
displayed. Initially, in the main view, the overall test is listed, identical to what was displayed
in the Output Viewer. In the auxiliary view, the distribution of the variables is displayed via a
histogram, and details about the Wilcoxon Signed Ranks test are listed.

NONPARAMETRIC TESTS

12-13

Figure 12.12 Model Viewer Panes Related Samples Test

12.12 Procedure: Related Samples Nonparametric Tests
The nonparametric Related Samples procedure is accessed from the Analyze…Nonparametric
Tests…Related Samples menu choice.

In the Objectives tab:

1) Select Customize analysis.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-14

Figure 12.13 Nonparametric Tests Two or More Related Samples Dialog

In the Fields tab:

1) Specify the Test Fields variables

NONPARAMETRIC TESTS

12-15

Figure 12.14 Field Tab Dialog Related Samples

In the Settings tab:

1) Select Customize tests and specify the desired test

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-16

Figure 12.15 Settings Tab Dialog Related Samples

12.13 Demonstration: Related Samples Nonparametric
Tests

In this example we will continue to use the data file Census.sav. Several questions asked about the
respondent’s interest in various subjects, on a three-point scale from Very Interested to Not at all
interested. We would like to see whether the respondents have more interest in medical discoveries
(intmed) or in scientific discoveries (intsci).

To understand what we are testing, below are the two frequency tables for these two variables. These
variables are measured on an ordinal scale, so a paired-sample t test not justified. We can see that
the percentage of people saying they are Very Interested in medical discoveries is higher than for
scientific discoveries, but we want to see if this difference is statistically significant. We will use a test
based on the median of the distribution.

NONPARAMETRIC TESTS

12-17

Figure 12.16 Frequencies for Interest in Medicine and Interest in Science

Detailed Steps for Related Samples Nonparametric Test
1) Change the level of measurement to Scale for intmed and intsci
2) In the Objectives tab of the Nonparametric tests: Two or More Related Samples dialog,

select Customize analysis.
3) In the Fields tab, specify intmed and intsci as the Test Field variables
4) In the Settings tab, select the Wilcoxon matched-pair signed-rank test.

Results from Related Samples Nonparametric Test
The model view output, initially condensed in the Viewer window, shows that we reject the null
hypothesis, as the significance of the Wilcoxon Signed Ranks test is .000. This test uses the ranks of
cases on the variables to determine whether there are differences between them.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-18

Figure 12.17 Nonparametric Test of Interest in Medicine and Science

To see the distributions of the differences between these two variables, we need to open the Model
viewer and look in the Auxiliary pane.

The bar chart of differences varies from –2 to 2 because the variables are coded from 1 to 3, so the
difference can vary from (3-1) to (1-3). It is clear that there are more differences in one direction than
another (the actual direction depends on how the variables are coded and the order of variables in
the dialog box). It is the number of positive and negative differences that is used to calculate the test
statistic.

Figure 12.18 Distribution of Differences between Interest in Medicine and Science

The actual output doesn’t tell us which variable has a higher level of interest, which is why we viewed
the frequency tables first. We now know that there is more interest in medical discoveries than
scientific discoveries.

NONPARAMETRIC TESTS

12-19

The distribution of the each variable can be viewed within the Model Viewer, though.

1) Select Continuous Field Information from the View dropdown
2) Select INTERESTED IN MEDICAL DISCOVERIES from the Field(s): dropdown

Figure 12.19 Distribution of Interest in Medical Discoveries

Apply Your Knowledge
1. Would you use a related -samples nonparametric test in the following situations? Select all

that apply.
a. When related variables are not truly interval/scale in measurement.
b. When we want to compare two groups of respondents
c. When we want to compare three ordinal variables measured on the same response

scale

Additional Resources

12.14 Lesson Summary
We demonstrated the use of the Nonparametric Tests procedure in this lesson for data that don’t
meet the assumptions for parametric tests.

For additional information on nonparametric tests,
see:

Daniel, Wayne W. 2000. Applied Nonparametric
Statistics. 2nd ed. Boston: Duxbury Press.

Gibbons, Jean D. 2005. Nonparametric Measures of
Association. Newbury Park, CA: Sage.

Further Info

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

12-20

Lesson Objectives Review
Students who have completed this lesson should now be able to:

• Perform non-parametric tests on data that don’t meet the assumptions for standard statistical
tests

To support the achievement of this primary objective, students should now also be able to:

• Describe when non-parametric tests should and can be used
• Describe the options in the Nonparametric procedure dialog box and tabs
• Interpret the results of several types of nonparametric tests

12.15 Learning Activity
The overall goal of this learning activity is to use nonparametric tests to explore the relationship
between several variables, using the data file SPSS_CUST.SAV.

1. Most of the questions asking for customer evaluation of the software and service are
measured on a five-point scale from Strongly Agree to Strongly Disagree (lower values are
more agreement, equivalent to higher satisfaction). Review the data file.

2. Test whether overall customer satisfaction (satcust) is different by highest degree earned

(degree). Be sure to code satcust as Scale beforehand. Try both the Kruskal-Wallis test for
independent samples and the Median test. What do you conclude? Are the test results
consistent?

3. Look at the pairwise results for each test. Which degree groups are more satisfied overall?

Are the pairwise results consistent? If not, how do they differ? Is there something odd about
the pairwise results for the Kruskal-Wallis test?

4. To help think further about these results, use Crosstabs to request a table of satcust by

degree. Request appropriate percentages and a chi-square test. Is this analysis consistent
with the nonparametric analysis?

5. For those with more time: Temporarily remove the five respondents with only some high

school education and rerun the nonparametric tests. Does this change any of the results?

6. Two questions on the survey ask whether SPSS products are easy to learn (easylrn) and
easy to use (easyuse). Test whether customers think that products are easier to learn than
use, or vice-versa with a related samples test. Use both the related-samples Wilcoxon signed
rank test and the related-samples Friedman test. Are the results consistent for both tests?
What do you conclude?

The SPSS customer satisfaction data file
SPSS_CUST.SAV. This data file was collected from a
random sample of SPSS customers asking about their
satisfaction with the software, service, and other
features, and some background information on the
customer and their company.

Supporting
Materials

COURSE SUMMARY

13-1

Lesson 13: Course Summary
13.1 Course Objectives Review

Now that you have completed the course, you should be able to:

• Perform basic statistical analysis using selected statistical techniques with PASW

Statistics

And you should be able to:

• Explain the basic elements of quantitative research and issues that should be

considered in data analysis
• Determine the level of measurement of variables and obtain appropriate summary

statistics based on the level of measurement
• Run the Frequencies procedure to obtain appropriate summary statistics for categorical

variables
• Request and interpret appropriate summary statistics for scale variables
• Explain how to make inferences about populations from samples
• Perform crosstab analysis on categorical variables
• Perform a statistical test to determine whether there is a statistically significant

relationship between categorical variables
• Perform a statistical test to determine whether there is a statistically significant

difference between two groups on a scale variable
• Perform a statistical test to determine whether there is a statistically significant

difference between the means of two scale variables
• Perform a statistical test to determine whether there is a statistically significant

difference among three or more groups on a scale dependent variable
• Perform a statistical test to determine whether two scale variables are correlated

(related)
• Perform linear regression to determine whether one or more variables can significantly

predict or explain a dependent variable
• Perform non-parametric tests on data that don’t meet the assumptions for standard

statistical tests

13.2 Course Review: Discussion Questions

1. Is there a “correct” order to the steps for a statistical analysis?
2. What factors would help you to decide whether to use a table or a chart to report on an

analysis? Or to use both?
3. Which procedure do you prefer to analyze scale variables, Frequencies or Descriptives?
4. When should you use nonparametric methods of analysis?
5. What should you do when a relationship that is substantively significant is not statistically

significant?

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

13-2

13.3 Next Steps
Thought Starters
How might you use Paired T Tests in analyzing data for your organization?
How might you use ANOVA in data analysis?

How might you use Regression in data analysis?

Next Courses
This course discussed many statistical techniques. In this section we provide direction for what
courses you can attend to broaden your knowledge in specific areas.

If you want to learn more about this: Take this course:

Regression and related methods Advanced Techniques: Regression

ANOVA and related methods Advanced Techniques: ANOVA

A variety of advanced statistical
methods

Advanced Statistical Analysis Using
PASW Statistics

INTORDUCTION TO STATISTICAL ANALYSIS REFERENCES

A-1

Appendix A: Introduction to Statistical
Analysis References
1.1 Introduction
This appendix lists only references that cover several of the techniques and statistical methods
discussed in this course. Some of these references, such as the book by Field, also cover advanced
statistics. More detailed references for specific techniques are included in some of the lessons.

1.2 References
INTRODUCTORY AND INTERMEDIATE STATISTICS TEXTS
Berenson, Mark L., Timothy C. Krehbiel, David M. Levine. 2005. Basic Business Statistics: Concepts
and Applications. 10th ed. New York: Prentice Hall.

Burns, Robert P and Burns, Richard. 2008 . Business Research Methods and Statistics Using SPSS.
London: Sage Publications Ltd.

Field, Andy. 2009. Discovering Statistics Using SPSS. 3rd ed. London: Sage Publications Ltd.

Knoke, David, Bohrnstedt, George W. and Mee, Alisa Potter. 2002. Statistics for Social Data
Analysis. 4th ed. Wadsworth Publishing.

Norusis, Marija J. 2009. SPSS 17.0 Guide to Data Analysis. New York: Prentice-Hall.

Norusis, Marija J. 2010. PASW Statistics 18.0 Statistical Procedures Companion. (forthcoming) New
York: Prentice-Hall.

INTRODUCTION TO STATISTICAL ANALYSIS USING IBM SPSS STATISTICS

A-2

  • IntroStatAnalIBMSPSS18_L00 Cover
    • Introduction to Statistical Analysis Using IBM SPSS Statistics
    • Student Guide
    • Course Code: 0G517
    • ERC 1.0
  • IntroStatAnalIBMSPSS18_L00TOC
  • IntroStatAnalIBMSPSS18_L00
    • Lesson 0: Course Introduction
      • 0.1 Introduction
      • 0.2 Course Objectives
      • 0.3 About SPSS
      • 0.4 Supporting Materials
      • 0.5 Course Assumptions
        • Note about Default Startup Folder and Variable Display in Dialog Boxes
  • IntroStatAnalIBMSPSS18_L01
    • Lesson 1: Introduction to Statistical Analysis
      • 1.1 Objectives
      • 1.2 Introduction
      • 1.3 Basic Steps of the Research Process
        • Research Objectives
      • 1.4 Populations and Samples
      • 1.5 Research Design
      • 1.6 Independent and Dependent Variables
      • 1.7 Note about Default Startup Folder and Variable Display in Dialog Boxes
      • 1.8 Lesson Summary
        • Lesson Objectives Review
      • 1.9 Learning Activity
  • IntroStatAnalIBMSPSS18_L02
    • Lesson 2: Understanding Data Distributions – Theory
      • 2.1 Objectives
      • Introduction
        • Business Context
      • 2.2 Levels of Measurement and Statistical Methods
        • Rating Scales and Dichotomous Variables
        • Implications of Measurement Level
        • Measurement Level and Statistical Methods
        • Apply Your Knowledge
      • 2.3 Measures of Central Tendency and Dispersion
        • Measures of Central Tendency
        • Measures of Dispersion
        • Apply Your Knowledge
      • 2.4 Normal Distributions
      • 2.5 Standardized (Z-) Scores
      • 2.6 Requesting Standardized (Z-) Scores
      • 2.7 Standardized (Z-) Scores Output
      • 2.8 Procedure: Descriptives for Standardized (Z-) Scores
      • 2.9 Demonstration: Descriptives for Z-Scores
        • Detailed Steps for Z-Scores
        • Results from Z-Scores
        • Apply Your Knowledge
      • 2.10 Lesson Summary
        • Lesson Objectives Review
      • 2.11 Learning Activity
  • IntroStatAnalIBMSPSS18_L03
    • Lesson 3: Data Distributions for Categorical Variables
      • 3.1 Objectives
      • 3.2 Introduction
        • Business Context
      • 3.3 Using Frequencies to Summarize Nominal and Ordinal Variables
      • 3.4 Requesting Frequencies
      • 3.5 Frequencies Output
      • 3.6 Procedure: Frequencies
      • 3.7 Demonstration: Frequencies
        • Detailed Steps for Frequencies
        • Results from Frequencies
        • Apply Your Knowledge
      • 3.8 Lesson Summary
        • Lesson Objectives Review
      • 3.9 Learning Activity
  • IntroStatAnalIBMSPSS18_L04
    • Lesson 4: Data Distributions for Scale Variables
      • 4.1 Objectives
      • 4.2 Introduction
        • Business Context
      • 4.3 Summarizing Scale Variables Using Frequencies
      • 4.4 Requesting Frequencies
      • 4.5 Frequencies Output
      • 4.6 Procedure: Frequencies
      • 4.7 Demonstration: Frequencies
        • Detailed Steps for Frequencies
        • Results from Frequencies
        • Apply Your Knowledge
      • 4.8 Summarizing Scale Variables using Descriptives
      • 4.9 Requesting Descriptives
      • 4.10 Descriptives Output
      • 4.11 Procedure: Descriptives
      • 4.12 Demonstration: Descriptives
        • Detailed Steps for Descriptives
        • Results from Descriptives
      • 4.13 Summarizing Scale Variables using the Explore Procedure
      • 4.14 Requesting Explore
        • Explore Output
      • 4.15 Procedure: Explore
      • 4.16 Demonstration: Explore
        • Detailed Steps for Explore
        • Results from Explore
        • Apply Your Knowledge
      • 4.17 Lesson Summary
        • Lesson Objectives Review
      • 4.18 Learning Activity
  • IntroStatAnalIBMSPSS18_L05
    • Lesson 5: Making Inferences about Populations from Samples
      • 5.1 Objectives
      • 5.2 Introduction
        • Business Context
      • 5.3 Basics of Making Inferences about Populations from Samples
      • 5.4 Influence of Sample Size
        • Precision of Percentages
          • Sample Size of 100
          • Sample Size of 400
          • Sample Size of 1,600
        • Sample Size and Precision
        • Precision of Means
          • A Large Sample of Individuals
          • Means Based on Samples of 10
          • Means Based on Samples of 100
        • Apply Your Knowledge
      • 5.5 Hypothesis Testing
      • 5.6 The Nature of Probability
      • 5.7 Types of Statistical Errors
        • Statistical Power Analysis
      • 5.8 Statistical Significance and Practical Importance
        • Apply Your Knowledge
      • 5.9 Lesson Summary
        • Lesson Objectives Review
      • 5.10 Learning Activity
  • IntroStatAnalIBMSPSS18_L06
    • Lesson 6: Relationships Between Categorical Variables
      • 6.1 Objectives
      • 6.2 Introduction
        • Business Context
      • 6.3 Crosstabs
        • Crosstabs Illustrated
      • 6.4 Crosstabs Assumptions
      • 6.5 Requesting Crosstabs
      • 6.6 Crosstabs Output
      • 6.7 Procedure: Crosstabs
      • 6.8 Example: Crosstabs
        • Detailed Steps for Crosstabs
        • Results from Crosstabs
        • Apply Your Knowledge
      • 6.9 Chi-Square Test
        • Chi-Square Test Assumptions
      • 6.10 Requesting the Chi-Square Test
      • 6.11 Chi-Square Output
      • 6.12 Procedure: Chi-Square Test
      • 6.13 Example: Chi-Square Test
        • Detailed Steps for Crosstabs with Chi-Square Test
        • Results from Crosstabs with Chi-Square Test
        • Apply Your Knowledge
      • 6.14 Clustered Bar Chart
        • Clustered Bar Chart Illustrated
      • 6.15 Requesting a Clustered Bar Chart with Chart Builder
      • 6.16 Clustered Bar Chart from Chart Builder Output
      • 6.17 Procedure: Clustered Bar Chart with Chart Builder
      • 6.18 Example: Clustered Bar Chart with Chart Builder
        • Detailed Steps for Clustered Bar Chart
        • Results from the Clustered Bar Chart Created with Chart Builder
      • 6.19 Adding a Control Variable
        • Control Variable Crosstabs Illustrated
      • 6.20 Requesting a Control Variable
      • 6.21 Control Variable Output
      • 6.22 Procedure: Adding a Control Variable
      • 6.23 Example: Adding a Control Variable
        • Detailed Steps for the Two-Way Crosstabs
        • Results for the Two-Way Crosstabs
        • Detailed Steps for Control Variable Crosstabs
        • Results from Adding a Control Variable
        • Chi-Square Tests for Control Variable
        • Apply Your Knowledge
        • Brainstorming Exercise
      • 6.24 Extensions: Beyond Crosstabs
      • 6.25 Association Measures
      • 6.26 Lesson Summary
        • Lesson Objectives Review
      • 6.27 Learning Activity
        • Supporting Materials
  • IntroStatAnalIBMSPSS18_L07
    • Lesson 7: The Independent- Samples T Test
      • 7.1 Objectives
      • 7.2 Introduction
        • Business Context
      • 7.3 The Independent-Samples T Test
      • 7.4 Independent-Samples T Test Assumptions
      • 7.5 Requesting the Independent-Samples T Test
      • 7.6 Independent-Samples T Test Output
      • 7.7 Procedure: Independent-Samples T Test
      • 7.8 Demonstration: Independent-Samples T Test
        • Detailed Steps for Explore Procedure for Number of Children by Gender
        • Detailed Steps for Independent Samples T-Test
        • Results from Independent Samples T-Test
        • Reading an Independent Samples Test Table
        • Apply Your Knowledge
      • 7.9 Error Bar Chart
        • Error Bar Chart Illustrated
      • 7.10 Requesting an Error Bar Chart with Chart Builder
      • 7.11 Error Bar Chart Output
        • Procedure: Error Bar Chart with Chart Builder
      • 7.12 Demonstration: Error Bar Chart with Chart Builder
        • Detailed Steps for Error Bar Chart
        • Results from the Error Bar Chart
      • 7.13 Lesson Summary
        • Lesson Objectives Review
      • 7.14 Learning Activity
  • IntroStatAnalIBMSPSS18_L08
    • Lesson 8: The Paired-Samples T Test
      • 8.1 Objectives
      • 8.2 Introduction
        • Business Context
      • 8.3 The Paired-Samples T Test
      • 8.4 Assumptions for the Paired-Samples T Test
      • 8.5 Requesting a Paired-Samples T Test
      • 8.6 Paired-Samples T Test Output
      • 8.7 Procedure: Paired-Samples T Test
      • 8.8 Demonstration: Paired-Samples T Test
        • Detailed Steps for Paired-Samples T Test
        • Results from Paired-Samples T Test
        • Apply Your Knowledge
      • 8.9 Lesson Summary
        • Lesson Objectives Review
      • 8.10 Learning Activity
  • IntroStatAnalIBMSPSS18_L09
    • Lesson 9: One-Way ANOVA
      • 9.1 Objectives
      • 9.2 Introduction
        • Business Context
      • 9.3 One-Way Anova
      • 9.4 Assumptions of One-Way ANOVA
      • 9.5 Requesting One-Way ANOVA
      • 9.6 One-Way ANOVA Output
      • 9.7 Procedure: One-Way ANOVA
      • 9.8 Demonstration: One-Way ANOVA
        • Detailed Steps for One-Way ANOVA
        • Results from One-Way ANOVA
        • Levene Test of Homogeneity of Variance
        • ANOVA Table
        • Apply Your Knowledge
      • 9.9 Post Hoc Tests with a One-Way ANOVA
      • 9.10 Requesting Post Hoc Tests with a One-Way ANOVA
      • 9.11 Post Hoc Tests Output
      • 9.12 Procedure: Post Hoc Tests with a One-Way ANOVA
        • Why So Many Tests?
          • LSD
          • SNK, REGWF, REGWQ & Duncan
          • Bonferroni & Sidak
          • Tukey (b)
          • Tukey
          • Scheffe
        • Specialized Post Hoc Tests
          • Hochberg’s GT2 & Gabriel: Unequal Ns
          • Waller-Duncan
        • Unequal Variances and Unequal Ns
          • Tamhane T2, Dunnett’s T3, Games-Howell, Dunnett’s C
      • 9.13 Demonstration: Post Hoc Tests with a One-Way ANOVA
        • Detailed Steps for a Post Hoc Test
        • Results for the Post Hoc Tests
        • Apply Your Knowledge
      • 9.14 Error Bar Chart with Chart Builder
      • 9.15 Requesting an Error Bar Chart with Chart Builder
      • 9.16 Error Bar Chart Output
      • 9.17 Procedure: Error Bar Chart with Chart Builder
      • 9.18 Demonstration: Error Bar Chart with Chart Builder
        • Detailed Steps for Error Bar Chart
        • Results from the Error Bar Chart Created with Chart Builder
      • 9.19 Lesson Summary
        • Lesson Objectives Review
        • Lesson Objectives Review
      • 9.20 Learning Activity
  • IntroStatAnalIBMSPSS18_L10
    • Lesson 10: Bivariate Plots and Correlations for Scale Variables
      • 10.1 Objectives
      • 10.2 Introduction
        • Business Context
      • 10.3 Scatterplots
        • Scatterplot illustrated
      • 10.4 Requesting a Scatterplot
      • 10.5 Scatterplot Output
      • 10.6 Procedure: Scatterplot
      • 10.7 Demonstration: Scatterplot
        • Detailed Steps for a Scatterplot
        • Results from the Scatterplot
      • 10.8 Adding a Best Fit Straight Line to the Scatterplot
        • Detailed Steps to Edit Scatterplot
        • Apply Your Knowledge
      • 10.9 Pearson Correlation Coefficient
        • Pearson Correlation Coefficient Assumptions
      • 10.10 Requesting a Pearson Correlation Coefficient
      • 10.11 Bivariate Correlation Output
      • 10.12 Procedure: Pearson Correlation with Bivariate Correlations
      • 10.13 Demonstration: Pearson Correlation with Bivariate Correlations
        • Detailed Steps for Bivariate Correlations
        • Results from Bivariate Correlations
        • Apply Your Knowledge
      • 10.14 Lesson Summary
        • Lesson Objectives Review
      • 10.15 Learning Activity
  • IntroStatAnalIBMSPSS18_L11
    • Lesson 11: Regression Analysis
      • 11.1 Objectives
      • 11.2 Introduction
        • Business Context
      • 11.3 Simple Linear Regression
        • Simple Regression Illustrated
      • 11.4 Simple Linear Regression Assumptions
      • 11.5 Requesting Simple Linear Regression
      • 11.6 Simple Linear Regression Output
      • 11.7 Procedure: Simple Linear Regression
      • 11.8 Demonstration: Simple Linear Regression
        • Detailed Steps for Scatterplot
        • Detailed Steps for Simple Linear Regression
        • Results from Simple Linear Regression
        • Apply Your Knowledge
      • 11.9 Multiple Regression
      • 11.10 Multiple Linear Regression Assumptions
      • 11.11 Requesting Multiple Linear Regression
      • 11.12 Multiple Linear Regression Output
      • 11.13 Procedure: Multiple Linear Regression
      • 11.14 Demonstration: Multiple Linear Regression
        • Detailed Steps for Multiple Linear Regression
        • Results from Multiple Linear Regression
          • Diagnostic Statistics
        • Apply Your Knowledge
      • 11.15 Lesson Summary
        • Lesson Objectives Review
      • 11.16 Learning Activity
  • IntroStatAnalIBMSPSS18_L12
    • Lesson 12: Nonparametric Tests
      • 12.1 Objectives
      • 12.2 Introduction
        • Business Context
      • 12.3 Nonparametric Analyses
      • 12.4 The Independent Samples Nonparametric Analysis
        • Independent Samples Nonparametric Assumptions
      • 12.5 Requesting an Independent Samples Nonparametric Analysis
      • 12.6 Independent Samples Nonparametric Tests Output
      • 12.7 Procedure: Independent Samples Nonparametric Tests
      • 12.8 Demonstration: Independent Samples Nonparametric Tests
        • Detailed Steps for Independent Samples Nonparametric Test
        • Results from Independent Samples Nonparametric Test
        • Apply Your Knowledge
      • 12.9 The Related Samples Nonparametric Analysis
      • 12.10 Requesting a Related Samples Nonparametric Analysis
      • 12.11 Related Samples Nonparametric Tests Output
      • 12.12 Procedure: Related Samples Nonparametric Tests
      • 12.13 Demonstration: Related Samples Nonparametric Tests
        • Detailed Steps for Related Samples Nonparametric Test
        • Results from Related Samples Nonparametric Test
        • Apply Your Knowledge
      • 12.14 Lesson Summary
        • Lesson Objectives Review
      • 12.15 Learning Activity
  • IntroStatAnalIBMSPSS18_L13
    • Lesson 13: Course Summary
      • 13.1 Course Objectives Review
      • 13.2 Course Review: Discussion Questions
      • 13.3 Next Steps
        • Thought Starters
        • Next Courses
  • IntroStatAnalIBMSPSS18_LAA
    • Appendix A: Introduction to Statistical Analysis References
      • 1.1 Introduction
      • 1.2 References
        • INTRODUCTORY AND INTERMEDIATE STATISTICS TEXTS
  • IntroStatAnalIBMSPSS18_LX_LastPages

Evolution Exploration Activity

Answer the questions below, save this file as a .docx, and upload this file to the correct Exploration Activity
submission location.

https://ed.ted.com/on/BwHC69Lx

https://www.khanacademy.org/science/ap-biology/natural-selection/natural-selection-ap/v/biodiversity-and-natural-selection-two

https://www.khanacademy.org/science/ap-biology/natural-selection/natural-selection-ap/a/darwin-evolution-natural-selection

Introduction to Environmental Science Evolution and Biodiversity Activity

Directions

Answer the questions below, save this file as a .docx, and upload this file to the correct Exploration Activity submission location.

Saving this file: When you are done answering these questions – save this template as a WORD DOCUMENT (.docx) named: EA_Evolution_LastName.docx where “LastName” is your last name. Upload to Canvas to submit your assignment. ONLY .docx files will be accepted

Part 1: TED-Ed activity

1. Complete the multiple choice and fill-in-the-blank questions on the TED-Ed website. Note that you must click “save answer” after each question so I can view them. COPY AND PASTE YOUR SHORT ESSAY ANSWERS IN TED-ED below by “Clicking” in the space below.

a) COPY/PASTE your answer to the “Mutation Fill in the Blank question”: In your own words finish this sentence: a mutation is a ________. (add answer in space provided below)

Click or tap here to enter text.

b) Describe the key components of evolution as described in the Rock Pocket Mouse natural selection video (HHMI)? A good answer would consider talking about all of the following: mutation, mouse fur color, habitat, predators, fitness, and selection

Click or tap here to enter text.

Part 2: Khan Academy

2. Using the EVR text, the HHMI video in the TED-ED exercise, and the info in Khan Academy on Biodiversity and Natural Selection, develop answers for the following questions:

a) Using the Redwood Trees example (Biodiversity and Natural Selection Khan Academy video), DESCRIBE in your own words how the biodiversity of an ecosystem is a response to environmental factors.

Click or tap here to enter text.

b) SUMMARIZE the three main factors that contribute to biodiversity presented at the end of the video

Click or tap here to enter text.

c) Explain the process of Natural Selection in one of the Khan Academy multiple choice answers using evidence from the narrative and your selected answer

Click or tap here to enter text.

Remember: When you are done answering these questions – save this template as a WORD DOCUMENT (.docx) named: EA_Evolution_LastName.docx where “LastName” is your last name. Upload to Canvas to submit your assignment

Week 4 DQ Responses AIT

Subject: Applied Information Technology

Q1. Please read the below paragraph and write your opinion.

Note: 150 words with intext citation and references please.

1. In planning a research project, we need to identify that deployment of smart restaurant service required a different type of infra-structure where we can install the conveyor belt for serving food. There are so many other restaurants who tried to introduce robotics in restaurant system but most of the time they don’t meet the customer satisfaction. Sometimes it’s a delay in serving their order and sometimes it’s a glitch in their software/ application. After some research I’ve made my decisions while observing the other’s mistakes/ drawbacks. Their mistakes can be mine key to success. I will keep in mind to reduce the delays in services.
2. The only risk which can occur in my project is hardware failure. Hardware failure could be time taking sometimes and to overcome this problem we need to back up an alternative. Conveyor should be used the higher quality one so the risk factor can be low. We need to higher some technician for maintaining the conveyor in order to get done work smoothly.

Q2. Please read the below paragraph and write your opinion.

Note: 150 words with intext citation and references please.

As for the e-commerce project I’m working for, the meaning is mostly for my self to conclude the technology points and also provide solutions to managers. This project is mostly focusing on the technology insight in cases of NoSQL, message queue system, decoupling, Spring framework and RESTful API design. These tech stacks are also the decision criteria that the technology should catch up the current trend of technologies. As well as the business requirements, the solutions should be competition at least the cost must be lower.

For any projects, there must be some potential risk in. One of the big issue is that the deployment. Due to the complexity of the solutions such as Redis, HBase, Kafka and Zookeeper, these are very complex solutions for the deployment. Potentially, it may cost one week to complete the deployment for development, QA and product environment. Another potential issue is that the future operation, by the decoupling, some web services may not be able to timely reflect the real value such as the inventory on a product. I think it could be figured out by the production design eventually. The last concern is regarding the continual development. Currently, I’ve design the project structure, once in the future the business requirement increased a lot, the architecture becomes more complex, it will require high senior and staff managers and architecture to do the work. Which increased the cost for the company.

Anyway, everything is going on. Let’s see what will happen in the future.


june 2

i need help

MGT 3310 Homework 1 Summer I 2022

Chapter 1-7 Part A

1. According to an annual consumer spending survey, the average monthly Bank of American Visa credit card charge was $1838 (U. S. Airways Attach Magazine, December 2003). A sample of monthly credit card charges provides the following data.

236 1710 1351 825 7450 316 4135 1333

1584 387 991 3396 170 1428 1688

a. Compute the sample mean and the median.

b. Compute the first and third quartiles.

c. Compute the range and interquartile range.

2. Consider a sample with data values of 57, 45, 50, 39, 50, 59, 60, and 48.

Compute the following values.

a. Mode.

b. 40th percentile.

c. Variance.

d. Standard deviation.

e. Coefficient of variation.

3. Five observations taken for two variables follow.

x

y

4

50

6

50

11

40

3

60

16

30

a. Compute and interpret the sample covariance.

b. Compute and interpret the sample correlation coefficient.


Due Date: Jun 2, 11:59pm

Note: Homework solution must be submitted electronically to Homework 1 folder in “Assignments” on D2L.

It is your choice to submit all of your work or just the final answer only. If you only submit the final answer and the answer is wrong, you will lose all the points assigned to that question. If you also show the steps how you get that answer, you still can earn partial credit as long as the steps are not completely wrong.

Publicly traded companies

Research publicly traded companies and select two companies in different sectors. Compare the capital structure for each and explain your conclusions on the similarities and differences. What support can you provide for why each company adheres to their chosen structuring mechanisms?