Category Archives: Politics

Concepts from Max weber

Max Weber (1864 – 1920), who died in the last global pandemic, is the father of modern sociology. His approaches to research and methodology were ground breaking within academia. His definitions have been exceptional, for example, the state as having a monopoly on the legitimate use of physical force and defining charismatic leaders, bureaucracy, methodological individualism and controversially the Protestant Work ethic. Max Weber also had some anti-Polish views which is bizarre and potentially evil. Below are notes on his theories in relation to nationalism, war and strategic ends.

Weber’s Theory of Nationalism: power & prestige

Facts & Figures                                                                                                        

List of previous final exam questions:

How did Weber define and explain nationalism? What role did prestige and power play in his understanding of nationalism?

Why is there no sociological definition of nationalism according to Weber?

Discuss the constructed ethnicity Weber argued.

  1. To what extent, if at all, Weber developed clear concepts and theories of ethnicity, nationality, nation-state and nationalism.
  • Guenther Roth & Claus Wittich (eds), Economy and Society (2 vols., Berkeley & Los Angeles, 1978). vol.1, `Ethnic Groups’, pp.385-398
  • H.H.Gerth & C.Wright Mills (eds), From Max Weber: Essays in Sociology (New York, 1946). `Structures of Power’, pp.159-179 (most of which is also to be found in Economy and Society, vol.2, pp.910-926).
  • Beetham, D. (1974). Max Weber and the Theory of Modern Politics. London. Chapter 5 `Nationalism and the nation-state’
  • M. Guibernau, Nationalisms: The Nation-State and Nationalism in the Twentieth Century (Cambridge, 1996), chapter 1 `Nationalism in classical sociological theory’.
  • Defining, Background, Foundations
  • Of all the writing undertaken in Max Weber’s 56 years, only two significant passages use social science to address the question of nationalism. In order to ascertain why Weber never explicitly formulated a theory of nationalism, this paper will do the following.

Posthumous Works Should Be Questioned: MAJOR point about his papers on Nations and Ethnicity: We would never have known about Weber’s thoughts on ethnic groups and nations had his wife not published it posthumously by Marianne Weber in Economy & Society. It wasn’t his finest material. He says at the end of ethnic groups that there is no ideal type for ethnicity. It is fragmentary like Economy & Society in general.

Nation & Ethnic Group: Weber would never have had it published because there is no ideal type here.Weber’s ethnicity text is associated with a Gemeinshaft concept: it is a belief not a fact: it is a belief in relationships that are rationally calculated: it is pre-modern: it might not work in large scale societies. NOTE that he did study subjective texts

  • Outline the Nation and Ethnic Groups papers and argue Weber forwards an instrumentalist view of these phenomena.
  • Argue that Weber’s ultimate value is informed by the same value-laden pursuit: political power and prestige of the German nation-state.
  • Conclude that Weber never formulated a sociological explanation of nationalism for two reasons,
  • a) the concept had not fully developed as central in the modernization process during his lifetime AND
  • b) he recognized the subjectivity and amorphous tendency of this field of study.

Factfulness by Hans Rosling (A Synopsis)

The following is a synopsis of Factfulness by Hans Rosling. It’s a great read on the Ten Reasons We are Wrong About Everything and Why Things are Not as Bad as We Think

Introduction: Why I Love the Circus

Hans Rosling was a physician, academic and public speaker. Together with his son, Ola Rosling and his daughter in law Anna Rosling Ronnlund, he founded the Gap Minder Foundation in 2005 to fight ignorance and encourage what he calls a more factful approach to life. Although this book, like his TED Talks, was written in his voice it is a collaboration between the three of them.

Although he pursued a career in medicine and became a leading academic, Rosling’s true passion as a child was the circus. He loved everything about it and was convinced he would one day live his dream and run off to become a performer. His parents had other ideas; they wanted him to enjoy the first-rate education they didn’t have and so he studied medicine instead.

Key takeaways from Steve Jobs’ life based on Walter Isaacson’s biography

This is an analysis based on Steve Jobs by Walter Isaacson and other sources of research. Enjoy.

Location Really Does Matter For Entrepreneurs:

You need to be in the right place at the right time. Being exposed to many ideas, variables, and potential inputs for accidental discoveries is better than living in a risk averse environment. In High School, Jobs took an electronics class which would have been less likely in most other cities in the US or Canada. Steve Jobs was fortunate to be raised in Silicon Valley, and because of that location it is less of a mystery as to why Jobs is who he was. Defense contracts in Silicon Valley during the 1950s shaped the history of the valley, military investment was used to build cameras to fly over the USSR, for example. Military companies were on the cutting edge, and made living in Silicon Valley interesting. In the 1930s, Dave Packard moved into Silicon Valley, and his garage was the core of the creation of Hewlett Packard. In the 1960s, HP had 9,000 employees, and it was where all engineers wanted to work. Jobs was ambitious enough at a young age to phone Dave Packard and ask for some parts. That’s how he got a summer job there. Moore’s Law emerged in Silicon Valley, Intel was able to develop the first micro processor. Financial backing was made easier to acquire where rich New Yorker’s retired to…By having the chip technology that could be cost measured for projections, Jobs and Gates would use this metric to revolutionize the technological world.

Continue reading Key takeaways from Steve Jobs’ life based on Walter Isaacson’s biography

Running a Company from the Financial Perspective | Accounting Analysis

Accrual Accounting versus Cash Accounting

Accrual basis = immediate recognition.

Cash basis = when the case is received.

Before we dive into earnings management as a subtopic within business analysis and valuation, it is helpful to understand the difference between Accrual and Cash Accounting. The cash basis is only available for use for companies has no more than $5 million sales per year.

The accrual basis is used by larger companies because matching revenue and expenses in the same reporting period so that the true profitability of an organization can be discerned.

Cashflows are harder to manipulate. The big difference between the two is when the transactions are recorded.

Continue reading Running a Company from the Financial Perspective | Accounting Analysis

Soviet Union to Russia: Understanding what Russia wants through an Academic Lens

Communism, Post-communism & Nationalism

The following are in depth research notes on Communism, Nationalism and Russia from the perspective of both Eastern and Western academic thinkers.

Politics, history, psychology are complicated. When the Soviet Union collapsed, the territorial maps were redrawn. Many Russian nationals become minority citizens of new countries that were formed. The following is an analysis of that story. It’s implications for nationalism studies today and in the future. And in some ways an answer to what Putin wants.

Continue reading Soviet Union to Russia: Understanding what Russia wants through an Academic Lens

American Express Case: The Story of AmEx Canada

American Express Case: the story of American Express Canada

Key Takeaways: Ivey MBA, Howard Grosfield CEO of Amex Canada Article

  • Total Service Experience: replace cards easily over night in the event of a lost card.

Recognize me: be valuable; engaged employees = engaged customers. Empower me: to pay the balance in full! Enable me: leverage technology integrate service provisions.

  • Luxury AMEX Card: differentiated from the Diner’s Club Card (Visa). AMEX has high fees, the rolling debt balance is very bad.
  • Amex is more expensive for merchants however Amex has better customers: wealthier customers. The merchant network is weaker (charge them a higher fee) but the customers are better.
  • Centurion Services: centurion members have access to professional assistance every minute of the day. It’s the Concierge: dedicated team of highly skilled professionals. Centurion webs: privileges platform.
  • Product Expansion
  • Branding / Positioning
  • Strategic Diversification
  • Distribution / Co – Branding
  • Product Innovation

1850 – Founded as express courier service

1891 – Launched travellers cheques

Don’t leave home without them

Early 1900’s – opened offices in Europe

1957 to 1978- green, gold, platinum card

1987 – launched Optima Card

1991 – Boston Fee Party

1999 – Exclusive arrangement with Costco

1999 – Launch of Centurion “Black Card”

Travellers Cheques Advertisements

1981 – acquired Shearson Rhodes

1984 – acquired Lehman Brothers Kuhn Loeb

1984 – acquired IDS

1988 – acquired EF Hutton

1991 – wrote off $300 million on Optima credit card launch

1992 – spun off First Data

1993 – spun off retail brokerage arm

1994 – spun off Lehman Brothers

2005 – spun off Ameriprise

Push to capitalize on brand / expand co branded cards beyond Costco – Starwood, Jet Blue, Delta

2008: GFC affected all credit card issuers, forced to tighten up credit – less impact on AMEX

2010: Paid $300 M for internet payments processor for consumers without bank accounts (Revolution Money became Serve)

2012: Launched BlueBird with Walmart – prepaid credit card as option as lower option to chequing accounts and debit cards

Cost pressures: Airline mergers forced Amex to open airport Amex lounges versus giving cardholder access to airline lounges

Attack on high end customers from Barclays and JP Morgan Chase – Chase now leads card penetration among $125K plus households

2015: Costco switched credit cards to VISA, 10% of Amex’s 112 million cards were Costco

Question about value of Amex brand – 23% of $1 trillion in spending from co branded cards

– “Partner” vs “Vendor”

  • 19 card options
  • Card Type: Personal vs Small Business
  • Card Benefits: Rewards, Concierge, Cash Back
  • Loyalty Programs: AeroplanPlus, Air Miles, SPG, Membership Rewards
  • Card Attractions: No Fee, First Year Waived, Welcome Bonus
  • Blue Sky Credit Cards
  • Response Time When Apply?

Card options

Drop in new customers in 2014 from 150K to 80K – half via referrals, others via traditional methods

Tested pop-up in a shipping container in shopping mall parking lots / in malls (take up eight parking spots) – local area marketing to drive traffic

Signed up more customers in two months than best Scotia branch in a year

Key is credibility of Scotia and single minded focus on customer acquisition – salespeople only in the pop up branch, videoconference customer to an advisor if needed

Now have nine pop up branches that shift location every 60 to 90 days

Now back to 150K; if 8K per popup = 160 weekly / 25 daily

Building A Stronger Exercise Culture

Drivers Versus Conductors

About 33% of preventative healthcare is concerned with physical activity with the other 66% being food consumption mixed with other choices and genetic predispositions. Workers in most jobs appear to spend most of their time sitting down. Back in the mid-20th century, doctors would say to patients “what ever you do, do not exercise!” So there was a lot of confusion about the health benefits. Then there were tests conducted in the 1950s, led by Jerry Morris (, that showed that the conductor in double deckers bus (who would have had to walk up and down stairs) have a better quality of life than then driver of the bus. The conductors had to climb the stairs 1000s of times to check ticket in the upper deck. Cardiovascular activity is critical. It turns out exercise helpful for dealing with heart disease.

Olympics Versus Citizen Wide Exercise

Building a national exercise program is a wiser allocation of funding than building an Olympic stadium according to Simon Kuper. I agree. I love the Olympics, but I love average live expectancy past 90 years old much more (for fellow citizens of my own country and beyond). We need the local facilities while not necessarily commercial viable OVER the Olympic facilities gained through winning a hosting city bid which will rarely get used post games (i.e. take a look at London’s 2012 Olympic stadiums). Spending $9 billion on the Olympics is country brand signalling, cool, but those benefits are notoriously difficult to quantity in financials or otherwise. Expanding the local facilities infrastructure to be all weather in norther countires like Canada, Sweden and the UK is a worthy endeavor. Exercise facilities at work should also be subsidized, potentially by government. Expanding exercise opportunities comes with risk of course; first, what if people don’t show up to use these facilities? It’s kind of crazy that no one has successfully proposed a tax deduction for gym memberships. Being afraid of tax scams is hardly the major concern. There are steps to drive traffic for sure, but the culture of sedentary life is ingrained and a slow moving epidemic we will never “see”. I’m not saying do something foolish like Tennis Canada’s board member who advocated that tennis domes be built in every town under 5000 people. Leave the details to others at this point. But if the federal government were to intervene in any healthcare area (thinking in chunky terms and being blindly cavalier about revenue spending right now) why not look at preventative healthcare via an exercise mandate with teeth.

Civil society in Canada is very weak. On average, people don’t even leave their house if they don’t have to. Health benefits of exercise are massive and then of course, it’ll improve the happiness of people, you will see improvements in actual performance in global competition because you have a healthier population. The subset of people who actually participate in the Olympics is very very minute and it usually upper middle class to wealthy people. Making exercise and sport more accessible to train and compete will boost the quality of life across the income spectrum. Exercise has to be in a physical space: investments are underway, but the next generation needs to be obsessive about social exercise.

Thoughts on Elizabeth Warren’s This Fight is Our Fight

Great Book, Inspiring Author

Elizabeth Warren highlights some incredibly important points relating to economic inequality, poverty and beyond. I really love Elizabeth Warren’s passion against poverty and this book centres around that topic extensively. Better than “Nickled and Dimed” which is a 90s classic, This Fight is Our Fight is kind of an invitation to swim against mighty economic currents. Or at least, think about the downsides of capitalism. And perhaps the immutable reality that success begets success and failure (without learning) = more failure. The core problem that I see is that poverty is in its extremest form life limiting. While this topic is wildly more complicated than language can convey, it is near impossible to disagree with the idea that when equality of opportunity is reduced, the GDP of the entire planet is reduced. Everyone should get a fighting chance at success, however they personally define it. Certainly, the story of Elizabeth Warren‘s upbringing should have readers draw the conclusion that poverty restricts opportunity. At the same time, she’s a direct example of herself succeeding despite and perhaps motivated by that poverty. Anger, for lack of a better term, is good. And her voice is a powerful and credible one in a network of ideas that she hopes to coordinate for her run in 2020. Poverty after all, sucks.

A Background in Financial Poverty, Fighting Spirit Is Inspiring For Everyone

Elizabeth grew up wearing plastic bags for shoes, living in a house where the carpeting was broken apart and worn out. Her experience of poverty is shocking and tough to read. And she discusses it in depth. Some readers will be able to recognize that level of poverty, however most people probably did not wear plastic bags for shoes growing up. No matter what you’re own background, it’s a pretty remarkable level of poverty that Elizabeth endured. Born Elizabeth Herring, Warren’s mother worked at Walmart, which Warren – is quick to point out – is now a huge conglomerate worth billions of dollars. Her mother was paid a really low amount there. Elizabeth Warren made the mistake, in her own summation, of marrying early instead of going to school. However, she was able to get the education (through night school) to validate natural or environmentally induced intelligence and by not turning to drugs, being curious, moving to where the actions at, she was able to kick ass. So in effect, Elizabeth was actually very wealthy in spirit and through economic justice, that error of birth was corrected (maybe an interesting way to think about it)? Poverty made Elizabeth a fighter. Who knew? The government needs to be there for the right person, at the right time, at the right place, and also it’s still the individual that has to get up in the morning; no one will do that for you. It would be cool if she addressed how important she, herself was to herself…in coordination with the support of others.

Economic Injustice Or The Way Things Are Right Now, This View Needs Clarifying

Warren’s mother working at Wal-Mart story is an interesting story; everyone knows someone who has been impacted by Wal-Mart. Wal-Mart is an excellent scapegoat for folks who are negatively effected by its success. And it is indeed a pretty fascinating story in American business and Sam Walton’s Made in America literally inspired Trumps Make America Great Again hats. Success for Wal-Mart has meant driving lower prices, greater economies of scale and consolidating mom & pop operations throughout the US. That’s a mixed outcome for sure.

On the one hand, you have to admit that $4.97 kitchen utensils is kind of amazing for customers; who get the most value out of Wal-Mart not even the CEO gets as much value as customers (in aggregate). CEOs typically only get about 1% of the total revenue that they orchestrated. And also, consider how executive compensation works, you’d be insane to work at Wal-Mart if a similar role paid more for the same amount of work. On the other hand, the actual low function roles of stocking shelves, directing customers to the checkout etc etc, are paid not so well…it’s a frustrating reality! Salary capping would likely led to creative ways of rewarding C-suite executives so the reality is success rewards the successful over time. Would customers like to pay $6.97 for utensils or $4.97? Salaries are expensive and the value that Wal-Mart brings is mostly in low prices, democratizing the utensils! Everyone can afford them.

While reality is more complex, the fact that Warren’s mother was poorly paid is a kind of injustice; being paid to work while others sit planning operations, increasing shareholder value over employee value, it does (on the surface) make for a very compelling story. Why should someone, who’s parents paid for school no doubt, get paid more than someone who didn’t get the education needed to progress? I guess being intellectually free means releasing your mind for ideology, which means you can consider the reality of scarcity in economic terms. Thinking freely, you have to then ask what are the available/future solutions against low wage employment? Did Elizabeth Warren’s mom consider moving to another town to get a better job? Yes, but she made other choices. Schooling? Not something she pursued. So, if the solution Warren proposes is to hinder innovation and economic development, then those solutions have hidden costs. Why should bureaucrats decide which businesses thrive and which die? Does it have to be that businesses are bad, and workers are good? Sometimes, this book reads that way. A thriving middle-class creates the customers that Wal-Mart needs, the job creators aren’t just the entrepreneurs but the middle class people themselves so it’s important that Sam Walton and others not forget who is the real job creator (alongside the enrepreneur). Honestly, I was a bit surprise that Elizabeth’s solutions are lacking in depth perhaps because lawyers aren’t in the creative problem solving business OR more probably because she needs to stay strategically vague on policy so that she can campaign in 2020 without giving away her negotiation positions upfront, too early.

Side Note: I can still enjoy her message without agreeing with everything right? And I should be allowed to point out that it’s lacking in certain areas? Well, political parties do not allow you to criticize the boss. I however am intellectually free.

Emotional Power, Maybe a Bit Much, Though?

If you don’t at least tear up reading this book, you have no soul. But if you don’t start getting concerned about the repetitiveness on stories of poverty no matter how gripping, you have no clue. I mean, if you want to run for president, where is your foreign policy, your policy on NAFTA? etc. And so we have to ask a critical question which is whether Elizabeth Warren being political in her storytelling? Of course, she’s running in 2020. Does this book detail policy objectives? Heck no! Watching Warren grilling a Wells Fargo CEO or a captain of industry, is cathartic and entertaining but perhaps a little bit over dramatic. Just a bit. This book is an extension of those Senate hearing that show-case Warren’s demonization of the big bad corporate bureaucracy, and the complacency of the upper middle class when it comes to how (some) companies* create value.

*Credit card companies, banks etc….provide a service and some executives do not handle that relationship well, according to Warren but it definitely takes two to tango. Complicating.

Bill Gates Joke and How Averages Are Deceiving

Warren has a very hilarious quote: What happens when Bill Gates walks into Moe’s tavern? Congrats, on average, the patrons of that bar just got 51 billion dollars richer! The truth about averages is they are misleading and can potentially mislead in negative ways that aren’t anticipated. You can say that on average American standard of living is getting better but real wages have been static. The truth is that many people aren’t getting the benefits according to the data Warren is looking at from 2015 backwards. Warren’s point is that people are hurting a lot more then is measurable in anecdotal stories. It does sometimes sound like envy actually but it’s cool. What Warren is missing is the perspective of business and value creation. Like the guy who invented the latest product, she would likely say why can’t he share most of his wealth with his employees…the truth is that most of the wealth of a new widget go to the customer through it’s usage. Founders usually only get 1 or 2% of the wealth created from the idea they create. You could say, yes, well that was not created in a bubble, yes, but that entrepreneur did create and then capture that value….without him or her, there is no value…See…it’s complicated.

Warren on (FDR) i.e. Roosevelt, She Should Spend More Energy talking about the Benefits of Business (Small and Corporate)

Elizabeth Warren looks at Franklin Delano Roosevelt and sees an amazing 3 term president. He was a great guy who thought about things in terms of benefiting both finance and the broader society and he is the model for Elizabeth Warren. I think that makes a lot of sense. However, it was an exceptional time in American history. In talking about FDR, she strongly implies that the economic prosperity that followed the new deal can be attributed largely to government Keynesian economics which is hard to know for sure. Why? Because there were many variables at play in the 1940s and 50s that led to American economic leadership on the global stage. In reality, it is more complicated than words can describe. There were entrepreneurs and American industry involved during the FDR era which Warren appears to massively downplay. The Ford motor company was a huge, literal, engine of growth, for example. Also, think about war and industrial build up.

Entrepreneurship Happens When Motivated (both through Poverty or/and Opportunity), Warren Needs to Fix That Claim

Warren is partisan in the sense that she assumes that, for example when the economy is good that’s when entrepreneurship happens. I think that is partly true but also entrepreneurship is increased when someone is unemployed and more people are unemployed when the economy is in a downward portion of the business cycle. New businesses occur more frequently when there aren’t easy jobs with great pay to be had. When the going get tough, the tough create businesses because they can’t find an employer. The opposite is also possible, when there is money to be made, people switch to their own businesses ideas. But on balance, it is more likely that entrepreneurship happens when a person can’t find a job, an immigrant that can’t get job for example, is a budding entrepreneur because desperation (within reason) is a great motivator. Warren is an academic and the challenge with academics is they aren’t directly in touch with the world around them; they are more susceptible to confirmation bias, they are the most analytical people. And believe me, there’s a big difference between intelligence and analysis. Complicating this further is the fact that Warren is building her 2020 campaign with this book. So she can’t honestly be more balanced because she is trying to build a campaign around scapegoating the economic winners in America.

Deprioritizing Economics in Warren’s Political Preferences, It Should Be Addressed More Seriously

Warren seems to channel these compelling emotional stories of poverty in order to support a politically based argumentation. She doesn’t necessarily have solid solutions other than to increase regulation, it’s a tight rope because she wants to be president so what she spends more of the book on is about how abusive some businesses have been. Focusing on the abuses is cathartic and convenient as she doesn’t want to say what she would do as president just yet. Politics is about pulling people behind your bandwagon; it’s persuasion and finding enemies that we can all smack down together. If you de-prioritize how economics works (or doesn’t work) then you will find Elizabeth Warren’s arguments a breath of fresh air, unmitigated by economic reality. And of course, reality can change over time. However, finance and accounting are honest reflections of reality for the most part. Artificially manipulating industry usually makes the economy less efficient unfortunately, increasing the cost of goods and services. At least that’s what’s happened in the past. The data can be manipulated to show the opposite but generally, we know that people are motivated by incentives that benefit them as individuals; sad but true. The problem is Warren is not an economist, she’s a commercial law professor, so that’s reflected here a little bit when she extensively highlights the story of poverty and almost no mention of the fact that most people have to make a living without government support in the business world (private sector). There are excesses, transgressions but most businesses are at war with their competition, it’s about the bottom line. While I have witnessed financial poverty as well as poverty of mindset, poverty has to be fought with precision not inexact redistribution of wealth. When and where the government should show up to help people in need is probably the biggest challenge of people who will live through the 21st century; Warren is pretty simplistic or unsophisticated in the solutions needed to get the right services to the right people are the right time or at least intentionally vague because she’s running in 2020, she’s negotiating with voters.

Warren Strongly Suggests Economic Inequality is a Zero-Sum Game, She Needs to Revise That View

In order for business succeed, the poor have to fail, according to Warren’s more aggressive passages. In reality, the disagreement is to the degree to which banks should be restricted in terms of their practices of predatory lending. They still have a critical duty of resource allocation in the economy. It is a very nuanced and a complex issue, which really requires policy tests, A/B testing regulation for banks; think like a scientist. You’re kind of disagreeing with what I think is an empirical reality around the fact that the “proper allocation of banking resources and accountability towards those who are mathematical inclined to advance their own interests but then also advance the communities interests, is not a zero sum game.” Unfortunately Elizabeth Warren seems to think it is a zero sum game or have indications that that plays well to her support base. If Apple creates another iPhone, does that wealth get distributed to customer’s who use that product as well as employees? I think so. Even if production is overseas? Yes.

Credit Card Companies, An Easy Scapegoat for Elizabeth Warren, This View Needs More Nuance

Credit card companies are providing a service but are misleading customer according to Warren. Warren points out this because there was so much profit being generated from credit card policies, credit card policies that were particularly not focused on making customers aware. She points out that it was almost impossible not to join that chorus of business people making so much money off of customers. You could not feasibly be an executive in a credit card company and argue against misleading customers in terms of interest rates (annual price rate) because you were undermining your own ability to accrue revenue from customers. I would say, it’s hard to say that customers aren’t completely oblivious to debt.

Warren Misses A Better Solution for Credit Card Problems, FacePalm!

But what’s an obvious solution that Warren completely misses is that students in high schools throughout the US DO have to learn about the time value of money. Individuals should be better informed as part of the solution, not simply increasing restrictions on what credit card companies say and do. The fact that students aren’t learning about how finance and accounting work and then are allowed to hold credit cards, start businesses or work in government is a bit baffling. I mean, it’s obvious that teachers themselves might have difficulty teaching these concepts since they are focus on calculus for the 35% of students who pursue science, technology, engineering and math. However, much more important is finance and accounting than calculus? Way more. Warren knows this, but this book is not about solutions I suspect as I’ve mentioned above…

British Petroleum, Another Easy Scapegoat for Elizabeth Warren, This View Is A Bit Biased

Elizabeth Warren makes an excellent point when it comes to the British Petroleum catastrophe in 2010 in the Gulf of Mexico. In this case, they had a bunch of fines from the federal government. What’s interesting about those fines ($7 billion in total) is that they were able to expense those fines. In other words those fines were tax-deductible from their total profit for the year. Remember that a tax deduction reduces the total amount of money in your pool of money that the government can then tax so if I have $1000 profit and then bought a $200 car for my business, I can expense that so that I only have $800 of profit from which the government can tax me at the 25% rate. That means I pay $800 x 25% = $200 rather than $1000 x 25% = $250. So in essence, BP had a huge tax deductible amount in 2010. And BP is very powerful, they have connections in Washington and London.

Money in Politics, Always A Bad Thing for Elizabeth Warren, That View Needs More Nuance

The solution should be for the best ideas to win regardless of where they come from. However, Warren is saying that the lobbyists in the House of Representatives and the Senate are gaining undue traction and affecting public policy with their commercial interests at the centre of decision making. Seems likely but she basically thinks that all lobbyists are a bad thing. Or at least her persuasion tactic is to convince her voting base to believe that all lobbyists are evil. However, she is downplaying the benefits of having lobbyists explain the details and nuances of technical policy to decision-makers in order to get the optimal decision for the best outcome for the economy. Of course, Warren might say the economy is rigged so that’s a complication. Lobbyist restrict the number of doctors in the market, thus increasing their salaries for example. We have to ask if it’s the actual structure of the lobbying that is the problem and that she is incorrectly attributing all lobbying as being bad or if she believes that the self interest of a single organization is a problem even though the self interest of an organization will obviously benefit the broader economy as well as the organization itself. The problem is that the lobbyists have run amok in her view and that might very well be the case however we can’t generalize all obvious as bad and all consumers is good. Also, what’s the solution?

This book is a fundraising solution for Elizabeth Warren for sure!

Unions At the Negotiation Table, Sounds Good, Might Have Complications

Warren was arguing with that union leadership should be at the table (Board of Directors level) as well as the corporate executives. Add a 25% corporate representation versus some 75% union and community leadership representation. The reason she argues this is that you can be sure that the interests of American workers can be protected so that even while the cost of production goes up with wages that corporations can’t do anything about it. For example, the corporation will not be able to do foreign manufacturing in places like Mexico at $0.75 USD per hour. Trump’s position on manufacturing is certainly overlapping Warren here; so that might be a nullified issued if she is the Democrat nominee.

Other Interesting Ideas from This Fight Is Our Fight

  • Walmart is being subsidized by tax payers because employees collect food stamps.
  • Warren advocated boycotting companies like Nabisco that move their production to Mexico? (My thoughts: consumer coordination is pretty difficult in practice, i.e. the prisoner’s dilemma)
  • Astro turf campaigns versus grass roots campaigns….an Astro turf campaign is when a politician is backed by big donors to basically do whatever they want versus grass roots campaigns that raise funds from many small donors; (My thoughts: clearly financing campaigns is pretty daunting in the US…should be a way to fund the best ideas, not the best politicians).
  • Brookings Institution is a bad actor / think tank.
  • “Corporate, corporations”; these are almost dirty words for Warren. Lady Justice can be bought by big business. (My thoughts: She’s too perfect an academic to make the mistakes that sometimes happen in business, sometimes irresponsibly, but more often because mistakes are part of innovation. It is indeed heart wrenching, and easy to point fingers from the side lines, when things go terribly wrong for example: GlaxoSmithKline heart attack deaths due to Avandia)

Warren’s Priority List (Probably):

  • Poverty should be avoidable through government support (my thoughts: hard to disagree, but how, to what degree, how precise? Do businesses help reduce poverty at all?);
  • Prices are something that should be artificially adjusted by governments to help the poorest people (my thoughts: there are really bad hidden costs to restricting businesses like fewer jobs, less dynamic economy, less creativity / innovation, human nature is not as malleable as Warren wants it to be, how do you curb the excesses of capitalism without punishing good businesses as well?);
  • Education is good but there also has to be stable jobs for people who are risk averse (my thoughts: yes, some people cannot survive in a competitive business world, so giving them easy jobs and good pay is a kind of social service, who pays for that though? Through tax revenue, there must be a better way, test out solutions!);
  • Businesses are self-interested and do not care much about their customers (my thoughts: I don’t think that’s really true, it only looks like that when a customer gets a raw deal, it’s easy to point out horror stories because they are memorable and heart wrenching; it doesn’t mean they are the reality for most people);
  • Hidden costs of higher taxes aren’t as important as helping the poor directly (my thoughts: it’s the job of the children of babyboomers to solve this problem, it’s complicated and involves a more scientific way to deliver public services);
  • Great economic growth should be sacrificed because the benefits to the poorest are more important than those who struggled and then successfully created new business and new economic activity (my thoughts: hard to agree, if we focused all resources on the poorest people, then we would be under-serving the people who can create more tax revenue who then contribute to the tax revenue needed. It’s complicated!).


Final Grade


Universal Basic Income and the Policy Experiments of the Future

What if you were given a stipend from the government in order to live comfortably and chase that dream of becoming an ice sculptor or writing the next best seller? Would you sit toiling away at your desk? Or why not watch Jeopardy? Is it possible that different people react differently to the same opportunity?Introducing the universal basic income experiment.

Thomas Paine is the first dude to propose the concept in the modernish era. Ever since Thomas Paine argued that free citizens should have the “power to say no” to bad job opportunities, other academics and policy makers have floated a basic income. Typically, the trigger for advocating for a universal basic income (Ubi) is an economic downturn or perceived adverse pattern relating to human productivity. <Perceived based on predictions about Artificial Intelligence…which in reality are hard to map against the economic benefits of increased productivity that AI is likely to create (predicting the future is kind of difficult). Curiously, there have been advocates on the left as well as the right politically for a UBI. The latest threat to human labour has been Artificial Intelligence and/or automation. Meanwhile, Thomas Friedman is suggesting that “[AI]’s going to be okay” in his latest “Thank You for Being Late“.

Background from the 20th Century

In the early 1970s, Nixon looked into UBI; $1,000 gave the means by which citizens can help themselves. 8,500 Americans were tested under the Nixon administration; people started analyzing the results. The results were mixed: it appeared that many were just enjoying this income. Increased separation and divorce rates was a bi-product so the program was shutdown. The initial plan was that they wanted to do two US states. Start small, expand slowly, let the experiment play out.

The US also has a corporate Benefits Package Idea as well. Happiness and well-being = increased productivity. Trying to figure it out in the corporate world has been an ongoing discussion.

Design Policies in the Way that Your Design Services & Objects

Basic Income is something that has been tested in Nordic welfare states, too.

DemosHelsinki is an organization that asks the critical question, do we employ design thinking for the government? First you have a challenge that needs to be developed. Test: try those ideas, get feedback, and then cycle and make revisions on the design in real time. Legislation by Design: design policies in the design thinking process. Finland increasingly wants to take prototypes of laws that are dynamically derived. You need to make sure that your laws have to treat people equally: the people in the experiments are variables….the special law needs to accommodate experimentation in the Finnish laws according to the DemosHelsinki team. The welfare office: the Basic Income experiment. Google “KELA social insurance” to learn more.

How To Figure Out if UBI is Workable In Any Case: Test Run this Policy!

Select 2000 people 560 euros, not students. Participants did not volunteer. Give half these folks a Basic Income. A/B test like an advertiser would. Within 2 years that the experiment is complete. In this model, participants can take a part-time job if I wasn’t part of the basic income. $175K in the profile to compare these two groups of people. How are these people behaving? There is also a few UBI tests in Ontario which is an ambitious plan to see if this policy could have legs generally…. Any partisan that has a problem with the scientific method is probably not qualified to serve, let’s see the results, people!

Counter-Arguments Against and For Universal Basic Income

There are a lot of people thinking it is not a good idea in Finland. The problem is the social security system. Not everyone wants a flat income. What if you have children with special needs? The basic isn’t enough. And of course, the larger challenge is How much would it cost in terms of redistributing government revenue? It’s a systematic shift; it would change the economy. It might save money however….meanwhile, the upfront is expensive to fund and politicians are replaced regularly so there are those factors….further research required. (Further Research Required = Fr-squared!)

  • Would Basic Income improve general productivity?
  • You might dis-incentivize people from working hard. Is pain a motivator for innovation?
  • You might lead to people taking on low value jobs to cover the remaining..?
  • You might have the next global best seller come out of the participating group…?
  • How does Ubi effect the relationship between T and G? Where T = tax revenue and G = government spending.
  • Who would pay for Ubi realistically? Corporations? Governments?
  • What are the least obvious consequences of implementing a $20K Ubi in Canada and the US, Europe, UK? i.e. Would there be more X and less Y?
  • Why might Ubi be appealing to right and left-wing advocates?
  • Is Ubi more or less feasible in Kenya or other developing markets?
  • What are three possible contingencies relating to unemployment rate and productivity if Ubi were to be implemented in Western countries?

The Scientific Method in Political Science

The Scientific Method in Political Science

These notes are a combination of notes from Matt A and Estelle H. Enjoy.

Topic One: What is the scientific method?

  • Overview
  • Science as a body of knowledge versus science as a method of obtaining knowledge
  • The defining characteristics of the scientific method
  • The scientific method and common sense

The nature of scientific knowledge claims

Four Characteristics of the Scientific Method:

What are the hallmarks of the scientific method?

Empiricism: require systematic observation in order to verify conclusions, tested against our experience

Intersubjectivity  require systematic observation in order to verify conclusions, tested against our experience

  • Explanation: the goal of the scientific method. Generalized understanding by discovering patterns of internal relationships among phenomena. How variations are related.
  • Determinism: a working assumption of scientific method. Assumption that behaviour has causes, recurring regularities & patterns. Causal influence. Must recognize that this assumption is not always warranted.
  • Empiricism requires that every knowledge claim be based upon systematic observation.


Our senses (what we can actually see, touch, hear…) can give us the most accurate and reliable information about what is happening around us. Info gained through senses is the best way to guard against subjective bias, distortion.

Obtaining information systematically through our senses helps to guard against bias.

What is ‘Intersubjectivity’ and why is it so important?

Empiricism is no guarantee of objectivity.

It is safer to work on the assumption that complete objectivity is impossible. Because we are humans studying human behaviour, therefore values may influence research.

Intersubjectivity provides the essential safeguard against bias by requiring that our knowledge claims be:

  • Transmissible
  • The steps followed to arrive at our conclusions must be spelled out in sufficient detail that another researcher could repeat our research. Public, detailed
  • Replicable
  • If that researcher does repeat our research, she will come up with similar results.

In practice, research is rarely duplicated: funding, professional incentives (tenure, difficult to publish)

Transmissibility and replicability enable others to evaluate our research and to determine whether our value commitments and preconceptions have affected our conclusions.


The goal of the scientific method is explanation.A political phenomenon is explained by showing how it is related to something else

If we wanted to explain why some regimes are less stable than others, we might relate variation in political instability to variation in economic circumstances: 

  • The higher the rate of inflation, the greater the political instability.

If we wanted to explain why some citizens are more involved in politics than others, we might relate variation in political involvement to variation in citizens’ material circumstances:

  • The more affluent citizens are, the more politically involved they will be.

Empirical research involves a search for recurring patterns in the way that phenomena are related to one another.

The aim is to generalize beyond a particular act or time or place—to see the particular as an example of some more general tendency.


The search for these recurring regularities necessarily entails the assumption of determinism i.e. the assumption that there are recurring regularities in political behaviour.

Determinism is only an assumption. It cannot be ‘proved’.

The assumption of determinism is valid to the extent that research proceeding from this assumption produces knowledge claims that withstand rigorous empirical testing.

The scientific method versus common sense

In a sense, the scientific method is simply a more sophisticated version of the way we go about making sense of the world around us (systematic, conscious, planned, delibareate)


  • In every day life, we often observe accurately—BUT users of the scientific method make systematic observations and establish criteria of relevance in advance. Using the scientific method.
  • We sometimes jump to conclusions on the basis of a handful of observations—BUT users of the scientific method avoid over-generalizing (premature generalization) by committing themselves in advance to a certain number of observations.
  • Once we’ve reached a conclusion, we tend to overlook contradictory evidence—BUT users of the scientific method avoid such selective observation by testing for plausible alternative interpretations. Commit themselves in advance to do so.
  • When confronted with contradictory evidence, we tend to explain it away by making some additional assumptions—so do users of the scientific method BUT they make further observations in order to test the revised explanation. Can modify theory, provided new observations are gathered for the modified hypothesis.


The nature of scientific knowledge claims

Knowledge claims based on the scientific method are never regarded as ‘true’ or ‘proven’, no matter how many times they have been tested.

To be considered ‘scientific’, a knowledge claim must be testable—and if it is testable, it must always be considered potentially falsifiable.

We can never test all the possible empirical implications of our knowledge claims. It is always possible that one day another researcher will turn up disconfirming evidence.

Topic 2: Concept Formation


  • Role of Concepts in the Scientific Method
  • What are Concepts?
  • Nominal vs. Operational Definitions
  • Four Requirements of a Nominal Definition
  • Classification, Comparison and Quantification

Criteria for Evaluating Concepts


Role of concepts in the scientific method

Concept formation is the first step toward treating phenomena, not as unique and specific, but as instances of a more general class of phenomena. Starting point of scientific study. To describe it, create a concept.

-w/out concepts, no amount of description will lead to explanation

-seeing specific as an instance of something more general


Concepts serve two key functions:

  • tools for data-gathering (‘data containers’): concept is basically a descriptive word. Refers to something that is observable (directly or indirectly). Can specify attributes that indicate the presence of a concept like power.
  • essential building-blocks of theories: a set of interrelated propositions. Propositions tie concepts together by showing how they’re related.


What are Concepts? (Part 1)

  • A concept is a universal descriptive word that refers directly or indirectly to something that is observable. (descriptive words can be universal or particular: we’re interested in universal words that refer to classes on phenomena). Empirical research is concerned with particular and specific, but only as they are seen as examples of something else.


  • Universal versus particular descriptive words:
  • Universal descriptive words refer to a class of phenomena.
  • Particular descriptive words refer to a particular instance of that class. Collection of particulars (data) tells us nothing unless we have a way of sorting it.
  • Conceptualization enables us to see the particular as an example of something more general.
  • Conceptualization involves a process of generalization and abstraction. It is a creative act. Often begins with perception that seemingly disparate phenomena have something in common.

-involves replacing proper names (people, places) with concepts. Can then draw on a broader array of existing theory, research that would be more interesting.

  • Generalization—in classifying phenomena according to the properties that they have in common, we are necessarily ignoring those properties that are not shared. Too many exceptions, look for similarities in exceptions that might show problem with theory.

-form concept -> generalize. But generalizing means losing detail. Tradeoff btwn generality & how many exceptions can be tolerated before theory is invalidated.


  • Abstraction—a concept is an abstraction that represents a class of phenomena by labeling them. Concepts do not actually exist—they are simply labels.

-abstract concepts grasp a generic similarity(like trees)

-a concept allows us to delineate aspects that are relevant to our research. A concept is an abstraction that represents a certain phenomenon: implies that concepts do not exist, and are only labels that we attach to the phenomenon. Are defined, given meaning.

-definition starts with a word (democracy, political culture)


Real definitions: don’t enter directly into empirical research


Nominal vs. Operational Definitions

  • Every concept must be given both a nominal definition and an operational definition.


  • A nominal definition describes the properties of the phenomenon that the concept is supposed to represent. Literally “names,” attributes


  • An operational definition identifies the specific indicators that will be used to represent the concept empirically. Indicate the extent of the presence of the concept. Literally spells out procedures/operations you have to perform to represent the concept empirically.

*When reading research, look to see how concepts are represented, look for flaws.

  • The nominal definition provides a basic standard against which to judge the operational definition—do the chosen indicators really correspond to the target concept?


  • A nominal definition is neither true nor false (though it may be more or less useful).

-very little agreement in poli sci on meaning & measurement. No need to define concept like age, but necessary for racism.


Four requirements of a nominal definition:


  1. Clarity—concepts must be clearly defined, otherwise intersubjectivity will be compromised. Explicit definition.
  2. Precision—concepts must be defined precisely—if concepts are to serve as ‘data containers’, it must be clear what is to be included (and what can be excluded). Nothing vague should denote distinctive characteristics/policies of what is being defined. Provides criteria of relevance when it comes to setting up operational definition.
  3. Non-circular—a definition should not be circular or tautologous e.g. defining ‘dependency’ as ‘a lack of autonomy’.


  1. Positive—the definition should state what properties the concept represents, not what properties it lacks (because it will lack many properties, besides the ones mentioned as lacking in the definition).


Classification, Comparison and Quantification

Concepts are used to describe political phenomena.

Concepts can provide a basis for:


  • Classification—sorting political phenomena into classes or categories. Taking concepts and sorting into different categories. e.g. types of regimes. At the heart of all science.

-1. Exhaustive: every member of the population must fit into a category.

-2. Mutually exclusive: any case should fit into one category and one only.

Concepts can provide a basis for:

  • Comparison—ordering phenomena according to whether they represent more—or less—of the property e.g. political stability. How much.
  • Quantification—measuring how much of the property is present e.g. turnout to vote. Allows us to compare and to say how much more or less. Anything that can be counted allows for a quantitative concept. (few interesting quantitative concepts in empirical research)


Criteria for evaluating concepts:

How? Criteria correspond to functions (data containers and building blocks)

1 Empirical Import—it must be possible to link concepts to observable properties (otherwise concepts cannot serve as ‘data containers’). However, concepts do not all need a directly observable counterpart.


Concepts can be linked to observables in 3 ways:


  • directly—if the concept has a directly observable counterpart e.g. the Australian ballot. Directly observable concepts are rare in political science.
  • indirectly via an operational definition—we cannot observe ‘power’ directly, but we can observe behaviours that indicate the exercise of power. Infer presence from things that are observable (power, ideology)
  • Via their relationship within a theory to concepts that are directly or indirectly observable.g. marginal utility. Such ‘theoretical concepts’ are rare in political science.

Gain empirical import b/c of relation to other part of theory.

2 Systematic (or theoretical) Import

—it must be possible to relate concepts to other concepts (otherwise concepts cannot serve as the ‘building blocks’ of theories).

Goal is explanation. Want to construct concepts while thinking of how they might be related to other concepts.

Topic Three—Theories


  • Overview
  • What is a theory?
  • Inductive versus deductive model of theory-building
  • Five criteria for evaluating competing theories
  • Three functions of theories


What is a theory?

Goal = explanation. Generalize beyond the particular, see it as a part of a pattern. Treating particular as example of something more general

-explanation: step 1 form concepts: identify a property that is shared in common. Step 2 form theories: tie concepts together by stating relationships btwn them

  • Normative theory versus empirical theory
  • Theories tie concepts together by stating relationships between them. These statements are called ‘propositions’ if they have been derived deductively and ‘empirical generalizations’ if they have been arrived at inductively.
  • A theory consists of a set of propositions (or empirical generalizations) that are all logically related to one another. Explain something by showing how it is related to something else.
  • A theory explains political phenomena by showing that they are logically implied by the propositions (or empirical generalizations) that constitute the theory. Theory takes a common set of occurrences & try to define pattern. Once pattern is identified, different occurrences can be treated as though just repeated occurrences of the same pattern. Simplify.

-tradeoff btwn how far we simplify and having a useful theory.

-skeptical mindset, try to falsify theories.


Inductive versus deductive model of theory-building

Inductive model—starts with a set of observations and searches for recurring regularities in the way that phenomena are related to one another.

Deductive model—starts with a set of axioms and uses logic to derive propositions about how and why phenomena are related to one another.


Deductive theory-building

Deductive theory-building is a process of moving from abstract statements about general relationships to concrete statements about specific behaviours.

-theory. Data enters into the process at the end. Develop theory first, then collect data.

-begins with a set of axioms, want them to be defensible.

-from axioms, reason through a set of propositions all logically implied by the same set of assumptions

-proposition asserts relationship btwn 2 concepts

-theory helps us to understand phenomena by showing that it is logically implied. Tells us how phenomena are related and that they are actually related.

-problem: logic is not enough -> need empirical verification.

-theories provide a logical base for expectations, predictions

-design research, choose tools, collect data. See if predictions hold. If so, theory somewhat validated.

-expectations stated in the form of hypotheses (as many as possible)

-a hypothesis states a relationship btwn variables

-variable is an empirical counterpart of a concept, closer to the world of observation, specific.

-any one test is likely to be flawed.

-deductive theory-building is more efficient, asking less of the data.


Inductive Theory-Building


-statistical analysis, try to discover patterns. Data first then use it to develop theory.

-being with a set of observations, discern pattern, and assume that this pattern will hold more generally

-relying implicitly on assumption of determinism

-end up with empirical generalization, which is a statement of relationship that has been established by repeated systematic observation

-ex) regime destabilized when inflation increased. Collect data on other countries. If it holds, then have empirical generalization

-inductive theory ties several empirical generalizations together

-no logical basis, therefore more vulnerable to few disconfirming instances

-less efficient, more complicated questions


-what is proper interplay btwn theory and research? In practice, it is a blend of induction and deduction.

Generalization: always have to test theory using observations other than those use in creating it. If data does not support theory, can go back & modify it. Provided you then go out & collect new data about modified theory.


Five criteria for evaluating competing theories


 –Simplicity (or parsimony) — a simple theory has a higher degree of falsifiability because there are fewer restrictions on the conditions under which it is expected to hold. As few explanatory factors as possible. Why? Less generalizable harder to falsify when more complex.

Internal consistency  (logical soundness) — it should not be possible to derive contradictory implications from the same theory.

 –Testability — we should be able to derive expectations about reality that are concrete and specific enough for us to be able to make observations and determine whether the expectations are supported. Allows us to derive expectations about which we can make observations and see if theory holds. Concrete and specific enough.

 –Predictive accuracy — the expectations derived from the theory should be confirmed. Never consider a theory to be true. Instead, is it useful? Does it have predictive accuracy?

 –Generality — the theory should allow us to explain a variety of political phenomena across time and space. Explains a wide variety of events/behaviours in a variety of different places. Holds as widely as possible.


Why is there inevitably tension among these five criteria?

-different criteria can come into conflict (more generality means less predictive accuracy, more predictive accuracy is less parsimonious)

-always going to be a tradeoff: ability to explain specific cases will tradeoff with ability to explain generally. (forests vs individual trees)

-in practice, you are pragmatic. Do what makes theory more useful.

-very rare to meet all criteria in poli sci


Three functions of theories (2nd way to evaluate)

-how well they perform functions they are meant to perform

Explanation — our theory should be able to explain political phenomena by showing how and why they are related to other phenomena. Part of some larger pattern, explain why phenomena that interest us vary.


Organization of knowledge — our theory should be able to explain phenomena that cannot be explained by existing generalizations and show that those generalizations are all logically implied by our theory. Explain things that other theories cannot. Should be possible to show that existing generalizations are related to theory/one another.


Derivation of new hypotheses (the ‘heuristic function’) — our theory should enable us to predict phenomena beyond those that motivated the creation of the theory.

Suggest new knowledge/generate new hypotheses. Abstract propositions should enable us to generate lots of interesting hypotheses (beyond those that motivate the study)

Topic 4: Hypotheses and Variables


  • What is a variable?
  • Variables versus concepts
  • What is a hypothesis?
  • Independent vs. dependent variables
  • Formulating hypotheses
  • Common errors in formulating hypotheses
  • Why are hypotheses so important?



What is a Variable?


  • Concepts are abstractions that represent empirical phenomena. In order to move from the conceptual-theoretical level to the empirical-observational level, we have to find variables that correspond to our abstract concepts. Highly abstract. Need empirical counter part -> variables

-empirical research always functions at 2 lvls: conceptual/theoretical and empirical/observation. Hardest part is moving from 1 to 2. Must minimize loss of meaning.

  • A variable is a concept’s empirical counterpart.


  • Any property that varies (i.e. takes on different values) can potentially be a variable.


  • Variables are empirically observable properties that take on different values. Some variables have many possible values (e.g. income). Other variables have only two ‘values’ (e.g. sex).

-require more specificity than concepts. Enable us to take statement w/abstract concepts & translate into corresponding statement w/precise empirical reference.

-one concept may be represented by several different variables. This is desirable.


Variables vs. Concepts


Variables require more specificity than concepts.

One concept may be represented by several different variables.


What is a Hypothesis?

In order to test our theories, we have to convert our propositions into hypotheses.

A hypothesis is a conjectural statement of the relationship between two variables.

A hypothesis is logically implied by a proposition. It is more specific than a proposition and has clearer implications for testing. What we expect to observe when we make properly organized observations. Always in the form of a declarative statement. Always states relationships btwn variables.


Independent vs. Dependent Variables


Variables are classified according to the role that they play in our hypotheses


The dependent variable is the phenomenon that we want to explain.


The independent variable is the factor that is presumed to explain the dependent variable. Explanatory factor that we believe will explain variation in DV.


The dependent variable is ‘dependent’ because its values depend on the values taken by the independent variable


The independent variable is ‘independent’ because its values are independent of any other variable included in our hypothesis


Another way to think of the distinction is in terms of the antecedent (i.e. the independent variable) and the consequent (i.e. the dependent variable).


We predict from the independent variable to the dependent variable.

-the same variable can be dependent in one theory and independent in another.


Formulating Hypotheses I


Hypotheses can be arrived at either inductively (by examining a set of data for patterns) or deductively (by reasoning logically from a proposition). Which method we use depends on whether we are conducting exploratory research or explanatory research.


Hypotheses arrived at inductively are less powerful because they do not provide a logical basis for the hypothesized relationship (post hoc rationalization is no substitute for a priori theorizing).


Hypotheses can be stated in a variety of ways provided that (1) they state a relationship between two variables (2) they specify how the variables are related and (3) they carry clear implications for testing.


Like the concepts they represent, variables can classify, compare or quantify. This affects the way the hypothesis will be stated.


Formulating Hypotheses II


-When both variables are comparative or quantitative, state how the values of the DV (dependent variable) change when the IV (independent variable) changes:

-When the IV is comparative or quantitative and the DV is categorical, state which category of the DV is most likely to occur when the IV changes:

-When the IV is categorical and the DV is comparative or quantitative, state which category of the IV will result in more of the DV:

-When both the IV and the DV are categorical, state which category of the DV is most likely to occur with which category of the IV:

Common Errors in Formulating Hypotheses

Canadians tend not to trust their government.

Error #1–The statement contains only one variable. To be a hypothesis, it must be related to another variable. Not general.

To make this into a hypothesis, ask yourself whether you want to explain why some people are less trusting than others (DV) or whether you want to predict the consequences of lower trust (IV):

The younger voters are, the less likely they are to trust the government. (DV)

The less people trust the government (IV), the less likely they are to participate in politics.



Turnout to vote is related to age

Error #2 The statement fails to specify how the two variables are related—are younger people more likely to vote or less likely to vote?

The older people are, the more likely they are to vote.


Public sector workers are more likely to vote for social democratic parties.

Error #3 The hypothesis is incompletely specified (we don’t know with whom public sector workers are being compared). When the IV is categorical, the reference categories must always be made explicit.


Public sector workers are more likely to vote for social democratic parties than for neo-conservative parties.

Error #4 The hypothesis is improperly specified. This is the most common error in stating hypotheses. The comparison must always be made in terms of categories of the IV, not the DV. This is very important for hypothesis testing.

The hypothesis should state:

Public sector workers are more likely to vote for social democratic parties than private sector workers or the self-employed.



The turnout to vote should be higher among young Canadians

Error #5 This is simply a normative statement. Hypotheses must never contain words like ‘should’, ‘ought’ or ‘better than’ because value statements cannot be tested empirically.

This does not mean that empirical research is not concerned with value questions.


To turn a value question into a testable hypothesis, you could focus on factors that encourage a higher turnout or you could focus on the possible consequences of low turnout:

The higher the turnout to vote, the more responsive the government will be.



Mexico has a more stable government than Nicaragua.

Error #6 The hypothesis contains proper names. A statement that contains proper names (i.e. names of countries, names of political actors, names of political parties, etc.) cannot be a hypothesis because its scope is limited to the named entities.

To make this into a hypothesis, you must replace the proper names with a variable. Ask yourself: why does Mexico have a more stable government?

The higher the level of economic development, the more stable a government will be.



The more politically involved people are, the more likely they are to participate in politics.

Error #7 The hypothesis is true by definition because the two variables are simply different names for the same property (i.e. it is a tautology)

Decide whether you want to explain variations in political participation (DV) or to predict the consequences of variations in political participation (IV).

The more involved people are in voluntary organizations, the more likely they are to participate in politics.


*Importance of nominal definition: could be non-circular if meant emotional involvement & behavioural expectations.


Why are Hypotheses so Important?


  • Hypotheses provide the indispensable bridge between theory and observation by incorporating the theory in near-testable form.


  • Hypotheses are essentially predictions of the form, if A, then B, that we set up to test the relationship between A and B.


  • Hypotheses enable us to derive specific empirical expectations (‘working hypotheses’) that can be tested against reality. Because they are logically implied by a proposition, they enable us to assess whether the proposition holds.


  • Hypotheses direct investigation. Without hypotheses, we would not know what to observe. To be useful, observations must be for or against any POV.
  • Hypotheses provide an a priori rationale for relationships. If we have hypothesized that A and B are related, we can have much more confidence in the observed relationship than if we had just happened upon it.
  • Hypotheses may be affected by the researcher’s own values and predispositions, but they can be tested, and confirmed or disconfirmed, independently of any normative concerns that may have motivated them.
  • Even when hypotheses are disconfirmed, they are useful since they may suggest more fruitful lines for future inquiry—and without hypotheses, we cannot tell positive from negative evidence.


-successful hypothesis: do variables covary?

Test for other variables that might eliminate relationship. Control variables. Think about control on data collection stage.

Topic 5: Control Variables



  • What are control variables?


  • Sources of spuriousness


  • Intervening variables
  • Conditional variables


What are control variables?


Testing a hypothesis involves showing that the IV and the DV vary together (‘covary’) in a consistent, patterned way e.g. showing that people who have higher levels of education do tend to have higher levels of political interest.


It is never enough to demonstrate an empirical association between the IV and the DV. Must always go on to look at other variables that might plausibly alter or even eliminate the observed relationship.


Control variables are variables whose effects are held constant (literally, ‘controlled for’) while we examine the relationship between the IV and the DV.


Sources of Spuriousness 

The mere fact that two variables are empirically associated does not mean that there is necessarily any causal connection between them

Think: pollution and literacy rates, number of firefighters and amount of fire damage, migration of storks and the birth rate in Sweden…

These are all (silly!) examples of spurious relationships. In each case, the observed relationship can be explained by the fact that the variables share a common cause


A source of spuriousness variable is a variable that causes both the IV and the DV. Remove the common cause and the observed relationship between the IV and the DV will weaken or disappear. If you overlook SS, you risk research being completely wrong.


To identify a potential (SS) source of spuriousness, ask yourself (1) whether there is any variable that might be a cause of both the IV and the DV and (2) whether that variable acts directly on the DV as well as on the IV.


If the variable only acts directly on the IV, it is not a potential source of spuriousness. It is simply an antecedent. An antecedent is not a control variable.


Sources of Spuriousness II

  • To identify a potential (SS) source of spuriousness, ask yourself (1) whether these is any variable that might be a cause of both the IV and the DV and (2) whether that variable acts directly on the DV as well as on the IV.
  • If the variable only acts directly on the IV, it is not a potential source of spuriousness. It is simply an antecedent. An antecedent is not a control variables.

SS à IV à DV

  • Examples; The higher people’s income, the great their interest in politics.
  • BUT it could be spurious: education could be a source of spuriousness:

Income à                               Interest in Politics

Education (spuriousness)

Education à                           Support for Feminism

Generation (spuriousness)

  • Some variables won’t have a spurious independent variable: ethnicity religion.



Intervening Variables I


Once we have eliminated potential sources of spuriousness, we must test for plausible intervening variables


Intervening variables are variables that mediate the relationship between the IV and the DV. An intervening variable provides an explanation of why the IV affects the DV


The intervening variable corresponds to the assumed causal mechanism. The DV is related to the IV because the IV affects the intervening variable and the intervening variable, in turn, affects the DV.

IV-> Intervening->DV

To identify plausible intervening variables, ask yourself why you think the IV would have a causal impact on the DV.

-can be more than one potential rationale. Intervening variable validates causal thinking.


Intervening Variables II:

  • To identify plausible intervening variables, ask yourself why you thinking the IV would have a causal impact on the DV.
  • Examples:
  • Women are more likely than men to favour an increase in social spending.
  • The lower people’s income the more politically alienated they will be.


Conditional variables I.

-trickiest and most common. What will happen to relation btwn IV and DV?

Once we have eliminated plausible sources of spuriousness and verified the assumed causal mechanism, we need to specify the conditions under which the hypothesized relationship holds.


Ideally, we want there to be as few conditions as possible because the aim is to come up with a generalization.


Conditional variables are variables that literally condition the relationship between the IV and the DV by affecting:

(1) the strength of the relationship between the IV and the DV (i.e. how well do values of the IV predict values of the DV?) and

(2) the form of the relationship between the IV and the DV (i.e. which values of the DV tend to be associated with which values of the IV?)

-focus is always on its effect on hypothesize relation btwn IV and DV (in every category of the conditional variable.  Ex) category = religion. Christian, Muslim, Atheist. Or important, not important, somewhat)


To identify plausible (CV) conditional variables, ask yourself whether there are some sorts of people who are likely to take a particular value on the DV regardless of their value on the IV.

Note: the focus is always on how the hypothesized relationship is affected by different values of the conditional variable.


There are basically three types of variables that typically condition relationships:

(1) variables that specify the relationship in terms of interest, knowledge or concern. Example (interest, knowledge or concern):

Catholics are more likely to oppose abortion than Protestants.

If CV = attends church then: religious affiliation -> support for abortion.

If CV = not attend, then religious affiliation -> does not support

(2) variables that specify the relationship in terms of place or time. (where are they from?) Example (place or time):

The higher people’s incomes, the more likely they are to participate in politics

If CV = non-rural resident, then income -> political participation

If CV = rural resident then income does not -> political participation

(3) variables that specify the relationship in terms of social background characteristics.

Examples (Social Background Characteristics):

The more religious people are, the more likely they are to oppose abortion.

If CV = male then religiosity -> views on abortion

If CV = female then religiosity does not -> abortion


Stages in Data Analysis:

Test hypothesis –> Test for Spuriousness –> If non-spurious, test for intervening variables –> test for conditional variables.


Topic 6: Research Problems and the Research Process



  • What is a research problem?
  • Maximizing generality
  • Why is generality important?
  • Overview of the research process
  • Stages in data analysis



What is a research problem?


A properly formulated research problem should take the form of a question: how is concept A related to concept B?



How is income inequality related to regime type?


How is moral traditionalism related to gender?


How is civic engagement related to social networks?

Maximizing Generality


Aim for an abstract and comprehensive formulation rather than a narrow and specific one.


Example: you want to explain support for the Parti-Québécois.

A possible formulation of the research problem:


How is concern for the future of the French language related to support for the PQ?

A better formulation of the research problem:


How is cultural insecurity related to support for nationalist movements?

Why is Generality Important?


  • Goal of the empirical method is to come up with a generalization.


  • Greater contribution because findings will have implications beyond the particular puzzle that motivated the research.


Access to a more diverse theoretical and empirical literature in developing a tentative answer to the research question.


The Research Process

Find a puzzle of anomally –> Formulate the research problem. How is A related to B? –> Develop hypothesis explaining how and why A and B are related –> Identify plausible sources of spuriousness, intervening variables and conditional variables. –> Choose indicators to represent the IV, DV and control variables (‘operationalization’) –> Collect and analyze the data.


Stages in Data Analysis:

Test hypothesis –> Test for Spuriousness –> If non-spurious, test for intervening variables –> test for conditional variables.

Topic 7: From concepts to indicators



  • What is ‘operationalization’?


  • What are indicators?


  • Converting a proposition into a testable form


  • Key properties of an operational definition


An example: operationalizing ‘socio-economic status’

What is Operationalization?


Operationalization is the process of selecting observable phenomena to represent abstract concepts.

When we operationalize a concept we literally specify the operations that have to be performed in order to establish which category of the concept is present (classificatory concepts) or the extent to which the concept is present (comparative or quantitative concepts).


The end product of this process is the specification of a set of indicators.

What are indicators?

Indicators are observable properties that indicate which category of the concept is present or the extent to which the concept is present.


In order to test our theory, we examine whether our indicators are related in the way that our theory would predict.


The predicted relationship is stated in the form of a working hypothesis.


The working hypothesis is logically implied by one of the propositions that make up our theory. Because it is logically implied by the proposition, evidence about the validity of the working hypothesis can be taken as evidence about the validity of the proposition.


Converting a Proposition into a Testable Form I

Concept -> proposition -> concept

Variable -> hypothesis -> variable

Indicator -> working hypothesis -> indicator




Converting a Proposition into a Testable Form I


Just as it is possible to represent one concept by several different variables, so it is possible—and desirable—to represent one variable by several different indicators.

Concept: variable (2 or more): Indicator (2 or more each).


Key Properties of an Operational Definition


The operational definition specifies the indicators by setting out the procedures that have to be followed in order to represent the concept empirically.


A properly framed operational definition:

-adds precision to concepts

-makes propositions publicly testable


This ensures that our knowledge claims are transmissible and makes replication possible.

An Example: Operationalizing ‘Socio-Economic Status’


The first step in representing a concept empirically is to provide a nominal definition that sets out clearly and precisely what you mean by your concept:


Socio-Economic Status: ‘a person’s relative location in a hierarchy of material advantage’.

Socio economic status: 1. Income -> earnings from employment, annual household income

  1. wealth: value of assets, home ownership

Topic Eight: Questionnaire Design and Interviewing



-The function of a questionnaire

-The importance of pilot work and pre-testing

-Open-ended versus close-ended questions

-Advantages and disadvantages of close-ended questions

-Advantages and disadvantages of open-ended questions

-Ordering the questions

-Common errors in question wording

-A checklist for identifying problems in the pre-test


important to know what makes good survey research

-simply a formal way of asking people questions: attitude, beliefs, background, opinions

-follows a highly standardized structured, thought out sequence


The Function of a Questionnaire

-The function of a questionnaire is to enable us to represent our variables empirically.

-Respondents’ coded responses to our questions serve as our indicators.

-The first step in designing a questionnaire is to identify all of the variables that we want to represent (i.e. independent variables, dependent variables, control variables).

Do not pose hypothesis directly. One question cannot operationalize two variables.

-We must always keep in mind why we are asking a given question and what we propose to do with the answers.

-A question should never pose a hypothesis directly. We test our hypotheses by examining whether people’s answers to different questions go together in the way that our hypotheses predicted.


The Importance of Pilot Work

Second step: pilot work

Careful pilot work is essential in designing a good questionnaire. Background work to prepare surveys.


Pilot work can involve:

-lengthy unstructured interviews with people typical of those we want to study

-talks with key informants

-reading widely about the topic in newspapers, magazines and on-line in order to get a sense of the range of opinion.


The Importance of Pre-testing

Third step: draft a questionnaire

Fourth step: pretest questionnaire

Once a questionnaire has been drafted, it should be pre-tested using respondents who are as similar as possible to those we plan to survey

-ideally, people you test are typical of group you want to represent.

-purposif/judgmental sampling: use knowledge of population to choose subjects

-pretest very important & often humbling

Pre-testing can help with:

  • identifying flawed questions
  • improving question wording
  • ordering questions
  • determining the length of time it takes to answer the questionnaire or interview the respondents
  • assessing whether responses are affected by characteristics of the interviewer
  • improving the wording of the survey introduction (who am I, what I’m doing, why I’m doing it. Doesn’t say what hypotheses are.)





Open-Ended versus Close-Ended Questions


Surveys typically include a small number of open-ended questions and a larger number of close-ended questions.

In open-ended questions, only the wording of the question is fixed. The respondent is free to answer in his or her own words. The interviewer must record the answer word-for-word, w/out abbreviations.


In close-ended questions, the wording of both the question and the possible response categories is fixed. The respondent selects one answer from a list of pre-specified alternatives. (don’t read out “other”, but should be present in case they say something else)


Advantages of Close-Ended Questions

  • help to ensure comparability among respondents
  • ensure that responses are relevant. Allows comparison
  • leave little to the discretion of the interviewer. Respondent has control over classification of their answer.
  • take relatively little interviewing time: quick to ask & answer
  • easy to code, process, and analyze the responses
  • give respondents a useful checklist of possibilities
  • help people who are not very articulate to express an opinion


Disadvantages of Close-Ended Questions

  • may prompt people to answer even though they do not have an opinion (preferable not to offer “no opinion” but have it on questionnaire. Difference btwn don’t know and no answer.
  • may channel people’s thinking, producing responses that do not really reflect their opinion. Bias results.
  • may overlook some important possible responses
  • may result in a loss of rapport with respondents: throw in open-ended to engage people
  • misunderstanding (if using terms that could be difficult, provide definition for interviewers. Don’t adlib.)

The responses to close-ended questions must always be interpreted in light of the pre-set alternatives that were offered to respondents.


Advantages and Disadvantages of Open-Ended Questions



Open-ended questions avoid the disadvantages of close-ended questions. They can also provide rich contextual material, often of an unexpected nature. (quotes can make report more interesting).

-avoid putting ideas in people’s heads

-can engage people



Open-ended questions are easy to ask—but they are difficult to answer and still more difficult to analyze. Open-ended questions:

  • take up more interviewing time and impose a heavier burden on the interviewer
  • increase the possibility of interviewer bias if the interviewer ends up paraphrasing the responses
  • require more processing
  • increase the possibility of researcher bias since the responses have to be coded into categories for the purpose of analysis (must reduce to a set of numbers. Introduce risk of bias. Getting others to code for intersubjectivity is time consuming and expensive.)
  • the classification of responses may misrepresent the respondent’s opinion. Respondent’s have no control over how their response is used.
  • transmissibility and hence replicability may be compromised by the coding operation
  • respondents may give answers that are irrelevant. Solution: use open-ended in pilot study, then create close ended with answers. Some amount of info lost, less likely to overlook important alternative.


-close-ended response categories must be mutually exclusive and cover every category.

-avoid multiple answers (which is closest, comes closest to point of view)

-can have open & close-ended versions of same question, spread out in survey. Always open first.


Ordering the Questions


Question sequence is just as important as question wording. The order in which questions are asked can affect the responses that are given:

  • make sure that open-ended and close-ended versions of the same question are widely separated and that the open-ended version is asked first. (sufficiently separated)
  • if two questions are asked about the same topic, make sure that the first question asked will not colour responses to the subsequent question. Change order or separate questions.
  • avoid posing sensitive questions too early in the questionnaire.
  • begin with non-threatening questions that engage the respondent’s interest and seem related to the stated purpose of the survey. Help create rapport.
  • ensure some variety in the format of the questions in order to hold the respondent’s attention.

-when reading over questionnaire, try to think how you would react.  Not intimidating. Shouldn’t seem like a test

-have you unwittingly made your own views obvious and favoured a particular position?

-worded in a friendly, conversational way. Should seem natural.

-writing questions is likened to catching a particularly elusive fish.

-making assumptions that everyone understands the question the same way. The way you intended, assuming people have necessary information. Make questions unambiguous. Problem: people will express non-attitudes.

-if problems writing questions, often b/c not completely clear on topic concept. Importance of nominal definition.


Common Errors in Question Wording

‘Do you agree or disagree with the supposition that continued constitutional uncertainty will be detrimental to the Quebec economy?’

Error #1: the question uses language that may be unfamiliar to many respondents. The wording should be geared to the expected level of sophistication of the respondents.

‘Please tell me whether you strongly agree, somewhat agree, somewhat disagree or strongly disagree with the following statements:

People like me have no say in what the government does


The government doesn’t care what people like me think’

Error #2: the wording of the statements is vague (the federal government? the provincial government? the municipal government?) Questions must always be worded as clearly as possible. (time, place, lvl of govt)


‘It doesn’t matter which party is in power, there isn’t much governments can do these days about basic problems’

Error #3: this is a double-barreled question. A respondent could agree with one part of the question and disagree with the other.


‘In federal politics, do you usually think of yourself as being on the left, on the right, or in the center?’

Error #4: this question assumes that the respondent understands the terminology of left and right.


‘Would you favor or oppose extending the North American Free Trade Agreement to include other countries?”’

Error #5: this question assumes that respondents are competent to answer. Also doesn’t say to what other countries. Solution: filter question: Do you happen to know what NAFTA is? People will want to answer even if they don’t know what it is (ex, fictitious topics). Lack of information.


‘Should welfare benefits be based on any relationship of economic dependency where people are living together, such as elderly siblings living together or a parent and adult child living together or should welfare benefits only be available to those who are single or married and/or have children under the age of 18 years?’

Error #6 this question is too wordy. In a self-administered survey, a question should contain no more than 20 words. In a face-to-face or telephone survey, it must be possible to ask the question comfortably in a single breath.


‘Do you agree that gay marriages should be legally recognized in Canada?’

Error #7: this is a leading question that encourages respondents to agree. The problem could be avoided by adding ‘or disagree. Especially important to avoid in regard to sensitive topics.


‘Canada has an obligation to see that its less fortunate citizens are given a decent standard of living’.

Error #8: this question is leading because it uses emotionally-laden language e.g. ‘less fortunate’, ‘decent’. Can also be leading by identifying with prestigious person or institution like Supreme Court, or w/someone who is disliked.


How often have you read about politics in the newspaper during the last week?

Error #9: this question is susceptible to social desirability bias because it seems to assume that the respondent has read the newspaper at least once during the previous week. People answer through filter of what makes them look good. “Have you had time to read the newspaper in the last week?”


-don’t abbreviate

-no more than 1 question per line

-open-ended must have space to write

-clear instructions

-informed consent


A Checklist for Identifying Problems in the Pre-Test

  • Did close-ended questions elicit a range of opinion or did most respondents choose the same response category?
  • Do the responses tell you what you need to know?
  • Did most respondents choose ‘agree’ (the question was too bland -> should protect nature) or did most respondents choose ‘disagree’ (the question was too strongly worded -> abortion is murder)?
  • Did respondents have problems understanding a question? Were there a lot of don’t knows? (if they don’t get it, ask it again and move on)
  • Did several respondents refuse to answer the same question?
  • Did open-ended questions elicit too many irrelevant answers? (can you code responses)
  • Did open-ended questions produce yes/no or very brief responses? Add a probe. (best probe is silence, pen poised to record)


Topic 9: Content Analysis



What is content analysis?

What can we analyze?

What questions can we answer?

Selecting the communications

Substantive content analysis

Substantive content analysis: coding manifest content

Substantive content analysis: coding latent content

Structural content analysis

Strengths of content analysis

Weaknesses of content analysis


What is content analysis?

-involves the analysis of any form of communication

-communications form the basis for drawing inferences about causal relations

-Content analysis is ‘any technique for making inferences by systematically and objectively identifying specified characteristics of communications’. (Holsti)

-Systematically means that content is included or excluded according to consistently applied criteria.

-Objectively requires that the identification be based on explicit rules. The categories used for coding content must be defined clearly enough and precisely enough that another researcher could apply them to the same content and obtain the same results



What can we analyze?


Content analysis can be performed on virtually any form of communication (books, magazines, poems, songs, speeches, diplomatic exchanges, videos, paintings…) provided:

  • there is a physical record of the communication.
  • the researcher can obtain access to that record

A content analysis can focus on one or more of the following questions: ‘who says what, to whom, why, how, and with what effect?’ (Lasswell)

-who/why: inferences about sender of the communication, causes or antecedents. Why does it take the form that it does?

-with what effect: inferences about effects on person(s) who receives it

What questions can we answer?

Content analysis can be used to:

  • test hypotheses about the characteristics or attributes of the communications themselves (what? how?)
  • make inferences about the communicator and/or the causes or antecedents of the communication (who? why?)
  • make inferences about the effect of the communication on the recipient(s) (with what effect?)


Rules of Content analysis

i.specify rules for selecting communications that will be analyzed

  1. specify characteristics you will analyze (what aspects of content)

iii. formulate rules for identifying characteristics when they appear

  1. apply the coding scheme to the selected communications


Selecting the communications


The first step is to define the universe of communications to be analyzed by defining criteria for inclusion.


Typical criteria include:

  • the type of communication
  • the location, frequency, minimum size or length of the communication
  • the distribution of the communication
  • the time period
  • the parties to the communication (if communication is two-way or multi-way)


If too many communications meet the specified criteria, a sampling plan must be specified in order to make a representative selection.

-if study is comparative, must choose comparable communications. Control in content analysis is the way communications are chosen (as similar as possible except one thing).


Type of Analysis (substantive vs structural)

Substantive content analysis

-In a substantive content analysis, the focus is on the substantive content of the communication—what has been said or written.

-A substantive content analysis is essentially a coding operation.

-The researcher codes—or classifies—the content of the selected communications according to a pre-defined conceptual framework


  • coding newspapers editorials according to their ideological leaning
  • coding campaign coverage according to whether it deals with matters of style or substance


Substantive Content Analysis: Coding Manifest Content

-A substantive content analysis can involve coding manifest content and/or latent content

-Coding manifest content means coding the visible surface content i.e. the objectively identifiable characteristics of the communication

-list of words/phrases that are empirical counterparts to your concept (the hard part!)

-important to relate it to some sort of base -> longer means more likely to use particular words

-Example: choosing certain words or phrases as indicators of the values of key concepts and then simply counting how often those words or phrases occur within each communication.


  1. Ease
  2. Replicability
  3. Reliability (consistency)



  1. meaning depends on context
  2. loss of nuance and sublety of meaning

-possible that word is being used in an unexpected way (irony, sarcasm)

-validity: are we really measuring what we think we’re measuring?


Substantive Content Analysis: Coding Latent Content

Coding latent content involves coding the underlying meaning. (tone of media, etc)



  • reading an entire newspaper editorial and making a judgment as to its overall ideological leaning.

reading an entire newspaper story and making a judgment as to whether the person covered is reflected in a positive, negative, or neutral light.


(1) less loss of meaning and thus higher validity.


(1) requires the researcher to make judgments and infer meaning, thus increasing risk of bias.

(2) lower reliability.-> differences in judgment

(3) lower transmissibility and hence replicability. -> cannot communicate to a reader exactly how judgement was made

-researcher is making judgments about meaning, which may be influenced by own values

Solution: take 1 hypothesis & test it different ways. More compelling, more experience w/ pros and cons of content analysis. Test hypothesis as many ways as possible.

-strive for high intercoder reliability (2 people recode independently, 90% similarity)

-use all 3 methods


Structural Content Analysis


A structural content analysis focuses on physical measurement of content.(time, space)



  • how much space does a newspaper accord a given issue (number of columns, number of paragraphs, etc.)?
  • how much prominence does a newspaper accord a given issue (size of headline, placement in the newspaper, presence of a photograph, etc.)?
  • how many minutes does a news broadcast give to stories about each political party?
  • Column inches, seconds of airtime, order of stories, pages, paragraphs, size of headline, photograph= measures of prominence


Measurements of space and time must always be related to the total size/length of the communication

-standardize: relative to size w/same paper, not compare headline size in 2 papers


  1. reliability
  2. replicability -easy to explain methods


  1. loss of nuance & subtlety of meaning

-less valid: can you really represent subtle nuanced ideas by counting/measuring?


Strengths of Content Analysis


-generalizability (external validity). Representative, more confidence.

-safety: risk of missing something, time, etc not existant here. You can recode.

-ability to study historical events or political actors: asking people means you get answers they think now, not what they thought then

-ability to study inaccessibly political actors (supreme court justices)

-unobtrusive (non-reactive)

-reliability: highly reliable way of doing research, consistent results (structural, manifest)

-few ethical dilemmas. Communications already been produced, won’t harm or embarrass people.


Weaknesses of content analysis

-requires a physical record of communication

-need access to communications

-loss of meaning (low validity): are we measuring what we think we’re measuring?

-risky to infer motivations—political actors do not necessarily mean what they write or say. (Take into account purpose of communication if asking why)

-laborious and tedious

-subjective bias -> important elements of subjectivity (latent analysis: making judgements, inferences about meaning)

-> no one best way of doing content analysis. Do all 3.



Major Coding Categories

-warfare: a battle royal, political equivalent of heat seeking missiles, fighting a war on several fronts, a night of political skirmishes, took a torpedo in the boilers, master of the blindside attack

-general violence: a goold old-fashioned free-for-all, one hell of a fight, assailants in the alley

-sports and games: contestants squared off, left on the mat, knockout blow

-theatre and showbiz: a dress rehearsal, got equal billing, put their figures in the spotlight

-natural phenomena: nothing earth-shattering, an avalanche of opinion



Coding Statements

-descriptive: present the who, what, where, when, without any meaningful qualification or elaboration

-analytical: draw inferences or reach conclusions (typically about the causes of the behaviour or event) based on fact not observed

-evaluative: make judgments about how well the person being reported on performed


Topic 10: Measurement



What is measurement?


Rules and levels of measurement


Nominal-level measurement


Ordinal-level measurement


Interval-level measurement


Ratio-level measurement


What is Measurement?

-foundation of statistics

Measurement is the process of assigning numerals to observations according to rules.


These numerals are referred to as the values of the variable we are measuring (not numbers, but numberals, simply symbols or labels whereas numbers have quantitative meaning).


Measurement can be qualitative or quantitative.


If we want to measure something, we have to make up a set of rules that specify how the numerals are to be assigned to our observations.





Rules and Levels of Measurement


-The rules determine the level, or quality, of measurement achieved. <- most important part of definition.

-The level of measurement determines what kinds of statistical tests can be performed on the resulting data.

-The level of measurement that can be achieved depends on:

  • the nature of the property being measured
  • the choice of data collection procedures

-The general rule is to aim for the highest possible level of measurement because higher levels of measurement enable us to perform more powerful and more varied tests.

-The rules can provide a basis for classifying, ordering or quantifying our observations.

-no hierarchical order, can substitute any numeral for any other numeral. All they indicate is that the categories are different.


4 Levels: NOIR

Nominal-level measurement

Ordinal-level measurement

Interval-level measurement

Ratio-level measurement


Nominal-level measurement

-Nominal-level measurement represents the lowest level of measurement, most primitive, least information

-Nominal measurement involves classifying a variable into two or more (predefined) categories and then sorting our observations into the appropriate category.

-The numerals simply serve to label the categories. They have no quantitative meaning. Words or symbols could perform the same function. There is no hierarchy among the categories and the categories cannot be related to one another numerically. The categories are interchangeable.


-Rule: do not assign the same numeral to different categories or different numerals to the same category. The categories must be exhaustive and mutually exclusive.

Ex) sex, religion, ethnic origin, language


Ordinal-Level Measurement

-Ordinal-level measurement involves classifying a variable into a set of ordered categories and then sorting our observations into the appropriate category according to whether they have more or less of the property being measured. Allows ordering and classifying. Notion of hierarchy.

-The categories stand in a hierarchical relationship to one another and the numerals serve to indicate the order of the categories. Numerals stand for relative amount of the property.

-classify, order

-more useful, direction of relation btwn variables

-With ordinal-level measurement, we can say only that one observation has more of the property than another. We can not say how much more.

Ex) social class, strength of party loyalty, interest in politics


Interval-Level Measurement

-Interval-level measurement involves classifying a variable into a set of ordered categories that have an equal interval (fixed and known interval) between them and then sorting our observations into the appropriate category according to how much of the property they possess.

-There is a fixed and known interval (or distance) between each category and the numerals have quantitative meaning. They indicate how much of the property each observation has (actual amount).

-Classify, order, meaningful distances.

-With interval-level measurement, we can say not only that one observation has more of the property than another, we can also say how much more.

-BUT we cannot say that one observation has twice as much of the property than another observation. Zero is arbitrary.

Ex) celcius and farenheit scales of temperature


Ratio-Level Measurement (highest)

-The only difference between ratio-level measurement and interval-level measurement is the presence of a non-arbitrary zero point.

-A non-arbitrary zero point means that zero indicates the absence of the property being measured.

-Now we can say that one observation has twice as much of the property as another observation.

-Any property than can be represented by counting can be measured at the ratio-level.

-classify, order, meaningful distance, non-arbitrary zero

Ex) income, years of schooling, gross national product, number of alliances, turnout to vote


-in poli sci, few things are above the ordinal level. Stretches credulity to believe that we could come up with equal units of collectivism or alienation.

-anything that can be measured at a higher lvl can be measured at a lower lvl

-always try to achieve highest lvl of measurement. Constrained by technique used to collect data.

Topic 11: Statistics: Describing Variables



Descriptive versus inferential statistics

Univariate, bivariate and multivariate statistics

Univariate descriptive statistics

Describing a distribution

Measuring central tendency

Measuring dispersion


Descriptive versus Inferential Statistics


Descriptive statistics are used to describe characteristics of a population or a sample.


Inferential statistics are used to generalize from a sample to the population from which the sample was drawn. They are called ‘inferential’ because they involve using a sample to make inferences about the population.


Univariate, Bivariate and Multivariate Statistics


Univariate statistics are used when we want to describe (descriptive) or make inferences about (inferential) the values of a single variable.


Bivariate statistics are used when we want to describe (descriptive) or make inferences about (inferential) the relationship between the values of two variables.


Multivariate statistics are used when we want to describe (descriptive) or make inferences about (inferential) the relationship among the values of three or more variables.

-can all be descriptive or inferential


Univariate Descriptive Statistics


Data analysis begins by describing three characteristics of each variable under study:

  • the distribution : how many cases take each value?
  • the central tendency: which is the most typical value? best represents a typical case
  • the dispersion: how much do values vary? how spread out are cases across the possible categories? If there is much dispersion, measure of central tendency may be misleading.


-frequency value tells us how many cases take each of the possible values. Records the frequency with which each possible value occurs.


Describing a Distribution I


Knowing how the observations are distributed across the various possible values of the variable is important because many statistical procedures make assumptions about the distribution. If those assumptions are not met, the procedure is not appropriate.


A frequency distribution is simply a list of the number of observations in each category of the variable. It is called a frequency distribution because it displays the frequency with which each possible value occurs.

-frequency value tells us how many cases take each of the possible values. Records the frequency with which each possible value occurs.


Describing a distribution:

Raw frequencies (how many cases took off diff possible values)

-title informative, tell us variable for which data is being presented. Not interpret table

-source: name source


-totals are difficult to compare, translate into %

-gives a relative idea of what to expect in the rest of the population

-gives a consistent base to make comparisons

-never report % w/out also reporting total # of cases in survey. Makes data meaningful.

– no % w/fewer than 20 cases: present raw frequency

-if data come from a sample, round off percentages to the nearest whole number, should assume that there is error.

-round up to .6-.9. round down .1-.4. with 0.5, round to nearest even number.

-99, 100, and 101% are acceptable totals. Can add note saying that numbers may not add up to 100.

-present in form of graph or chart. Contains exact same info, but easier to visualize. More interpretable, more appealing. Pie-chart, line graph.

-tricks: truncated scale to make things look better/worse. Always check the scaling.

-need to check distribution to make sure that its appropriate to use a particular statistic


Interval/ratio: not simply numerals, but numbers w/quantitative meanings. Can’t use bar or pie chart. To present distribution, must collapse lvls of variables into small groups.

-guidelines: 1. At least 6, but no more than 20 intervals. Lose to much info about distribution if too small, but more than 20 defeats the purpose of creating class intervals & data is not readily accessible.

  1. intervals must all have same width, encompass same # of values to be comparable (can have larger open-ended category at the end)
  2. don’t want them to be too wide. Want to be able to consider every case within a given interval to be similar, makes sense to treat cases within the interval as the same.
  3. must be exhaustive and mutually exclusive.


Describing a distribution: interval lvl data

-create a line graph.

-the only pts w. any info are the dots. Connect to remind reader that original distribution was continuous.



,relative frequencies, bar charts, pie-chart, interval level data,


Central Tendency versus Dispersion


A measure of central tendency indicates the most typical value, the one value that best represents the entire distribution


A measure of dispersion tells us just how typical that value really is by indicating the extent to which observations are concentrated in a few categories of the variable or spread out among all of the categories.

-evaluating central tendency. Important for evaluating sample size. Don’t want to only describe variables (see if covary in predicted ways)

-2 distributions could have similar central tendency, but be very different. Use more than one measure.

A measure of dispersion tells us how much the values of the variable vary. Knowing the amount of dispersion is important because:

  • the appropriate sample size is highly dependent on the amount of variation in the population. The greater the variation, the larger the sample will need to be.
  • we cannot measure covariation unless both variables do vary.



Measuring Central Tendency and Dispersion (Nominal-Level)

The mode is the most frequently occurring value—the category of the variable that contains the greatest number of cases. The only operation required is counting.

The proportion of cases that do not fall in the modal category tells us just how typical the modal value is. This is what Mannheim and Rich call the variation ratio.

-bimodal distribution: 2 are tied for most cases

V= f nonmodal


-dispersion: wht % of people were not in the modal category. The proportion who do not fall in the modal category tells us how typical the modal value is. Manheim and Rich call: variation ratio -> the lower the variation ratio, the more typical and meaningful the mode.

– in the case of bimodal or multimodal cases, select on mode arbitrarily.


Measuring Central Tendency and Dispersion (Ordinal-Level) I

Central Tendency:

-always present categories in order, natural order, should retain it

-central tendency based on order or relative position


The median is the value taken by the middle case in a distribution. It has the same number of cases above and below it. If even # of cases, take average of the two middle cases.

-cumulative frequency: eliminating raw frequency, tells # of cases that took that value or lower.



The range simply indicates the highest and lowest values taken by the cases. Problem: could overstate variability. Range doesn’t tell us anything about how things are distributed btwn points.

The inter-quantile range is the range of values taken by the middle 50 percent of cases—inter-quantile because the endpoints are a quantile above and below the median value.

Measuring Central Tendency (Interval and Ratio-Level) I


The measure of central tendency for interval- and ratio-level data is the mean (or average value). Simply sum the values and divide by the number of cases:


Fall term grades: 70 75 78 82 85

GPA (or mean grade) = 78


-The mean is the preferred measure of central tendency because it takes into account the distance (or intervals) between cases. The fact that there are fixed and known intervals between values enables us to add and divide the values.

-The mean is sensitive to the presence of a small number of cases with extreme values:

When an interval-level distribution has a few cases with extreme values, the median should be used instead.

  • The mean is sensitive to the presence of a small number of cases with extreme values: 26,000. 28,000. 29,000. 32,000, 34,000, 36,000: mean = 31,000 median=32,000
  • Group #2 15,000. 18,000/ 19,000/ 22,000/ 23,000/ 25,000/ 95,000 mean=31,000 median 22,000

-Because the mean is subject to distortion, the mean value should always be presented along with the appropriate measure of dispersion.

-problematic when a few values are extreme cases. Mean take account of how far each case is from the others.


Measuring Dispersion (Interval- and Ratio-level) II


The standard deviation is the appropriate measure of dispersion at the interval-level because it takes account of every value and the distance between values in determining the amount of variability.


The standard deviation will be zero if—and only if—each and every case has the same value as the mean. The more cases deviate from the mean, the larger the standard deviation will be.


We cannot use the standard deviation to compare the amount of dispersion in two distributions that use different units of measurement (e.g. dollars and years) because the standard deviation will reflect both the dispersion and the units of measurement.


N= the number of cases, Xi = the value of each individual case, X= the mean see page 264.


Calculating Standardized Scores or Z-Values


-If we want to compare the relative position of two cases on the same variable or the relative values of the same case on two different variables like annual income and years of schooling, we can standardize the values by converting them into Z-scores.


The Z score allows us to compare scores that are based on very different units of measurement (for example, age measured in number of years and height measured in inches). -Z-scores tell us the exact number of standard deviation units any particular case lies above or below the mean:


Zi =  (Xi  – X)/S


where Xi is the value for each case, X is the mean value and S is the standard deviation.


Example: person1 has an annual income of $80,000 and person2 has an annual income of $30,000. The mean annual income in their community is $50,000 and the standard deviation is $20,000


Z1 = ($80,000 – $50,000)/$20,000 =  1.5

Z2 = ($30,000 – $50,000)/$20,000 =  – 1

Topic Twelve: Statistics — Estimating Sampling Error and Sample Size


What is sampling error?

What are probability distributions?

Interpreting normal distributions

What is a sampling distribution?

The sampling distribution of the sample means

The central limit theorem

Estimating confidence intervals around a sample mean

Estimating sample size—means

Estimating confidence intervals around a sample proportion

Estimating sample size–proportions


What is sampling error?


No matter how carefully a sample is selected, there is always the possibility of sampling error (i.e. some discrepancy between our sample value and the true population value).


We cannot determine the amount of sampling error directly because we typically don’t know the true population value. But we can use inferential statistics to estimate the probable sampling error associated with any sample value. Use of probability distributions.


What are probability distributions?


Estimating sampling error involves using probability distributions.


Probability distributions are theoretical distributions that indicate the likelihood, or the probability, of certain values occurring, given certain assumptions about the nature of the distribution.


By far the most important class of probability distributions take the form of a normal distribution.


The normal distribution takes the form of a symmetrical bell-shaped curve. The mean, median and mode of normally distributed data coincide with the highest point of the curve (have the same value). Can use standard deviation to interpret distribution.


Interpreting normal distributions I


The standard deviation is used to interpret data that are normally distributed.


IF data are normally distributed, 68.3% of the cases will fall within one standard deviation of the mean of the distribution, 95.5% of the cases will fall within 2 standard deviations of the mean, and 99.7% of the cases will fall within 3 standard deviations of the mean.

These proportions are equal to the proportion of the area under the curve between these values.

Interpreting normal distributions II


We can determine the proportion of cases falling within any number of standard deviations, integer or non-integer, from the mean e.g. 83.8% of cases will fall within 1.4 standard deviations of the mean


Since we use standard deviation units and not simply the original values to interpret the normal distribution, we transform the original values into standard deviation units or Z-scores.


Z= (xi – X)/s


Z-scores tell us the exact number of standard deviation units any particular case lies above or below the mean.


If our data are normally distributed, all we have to do to estimate the probability of any range of values occurring around the mean is to convert the data into Z-scores and consult the appropriate table.


What is a sampling distribution?


The sampling distribution is a theoretical probability distribution that in actual practice would never be calculated.


The sampling distribution of the sample means is the distribution that we would obtain if:

  • every conceivable sample of a certain size were drawn from the same population
  • the sample means were calculated for each sample and
  • the sample means were arranged in a frequency distribution.


Different cases would be included in different samples so the sample means would not all be identical (e.g. some samples would contain only the very rich and some samples would contain only the desperately poor). But:

  • most sample means would tend to cluster around the true population mean value and
  • this clustering around the true mean value would increase if the sample size were increased

The sampling distribution of the sample means


IF the sample size is sufficiently large (at least 30 cases), the sampling distribution of the sample means will be approximately normally distributed and the mean of the sampling distribution of the sample means will coincide with the true population mean.

-can make use of the fact that it is normally distributed, and we can use that to estimate placement of the mean.

-the standard error of the mean is equal to the standard deviation of the population, divided by the square roots of the sample size


The Central Limit Theorem


The sampling distribution is a theoretical distribution–in real life, we select only one sample. But the fact that sample means will be normally distributed enables us to evaluate the probable accuracy of our particular sample mean.


Provided that our sample (1) is randomly selected (every case has a known probability of inclusion and a non-zero probability of inclusion) and (2) has at least 30 cases, the central limit theorem tells us that we can use our knowledge of the area under the curve to estimate how probable it is that the true population mean will fall within any given range of values of our sample mean.


e.g. since we know that 95.5% of sample means will lie within 2 standard deviation units of the true population mean, we can be 95.5% confident that our sample mean will also lie within 2 standard deviations of the true population mean.


Estimating confidence Intervals around a Sample Mean I


Conventionally, we want to be 90% confident, 95% confident or 99% confident. The corresponding Z-values are 1.64, 1.96 and 2.57


i.e. we can be 90% confident that our sample mean will lie within 1.64 standard deviations of the population mean, 95% confident that it will lie within 1.96 standard deviations, and 99% confident that it will lie within 2.57 standard deviations. These ranges of values are called confidence intervals.


A confidence interval is a range of values, estimated on the basis of sample data, within which we can say, with a pre-specified degree of confidence that the true population value will lie.

-the higher the confidence lvl, the wider the confidence interval must become.


Confidence level: the likelihood that our sample is in fact representative of the larger population within the degree of accuracy we have specified.


The lower the percentage of sampling error and the greater the level of confidence, the better a piece of research will be.


The size of the confidence interval will depend on how confident we want to be that the interval does contain the true unknown population mean. The more confident we want to be, the wider the confidence interval will have to be.


Estimating confidence intervals around a sample mean II


In order to determine what 1.96 standard deviations actually means in terms of our original measurement scale (e.g. dollars, years), we need to estimate the value of the standard deviation of the sampling distribution of the sample means.


The standard error of the mean is equal to the standard deviation of the population, divided by the square root of the sample size.

This makes sense intuitively:

  • the more variability there is in the population, the more variability there will be in the sample estimates.
  • as the sample size increases, the variability in the sample estimates should decrease because extreme values will have less of a distorting effect on the calculation of the sample mean.

Since we typically do not know the true population standard deviation, we use our best estimate i.e. the standard deviation from our particular sample.


Estimating confidence intervals around a sample mean III


We then simply multiply our estimate of the sampling error of the mean by the Z-value associated with our chosen confidence level (1.64, 1.96 or 2.57) and we have the familiar plus or minus term:

Confidence lvl: X +- Zc.l. SX

-can be confident that something lies btwn 2 levels.


Estimating sample size I


Exactly the same concepts are used to help determine sample size. The formula for calculating the sample size simply involves rearranging the terms:


E = ( Zc.l. S)/square root N

  • ZL. is the Z-value associated with the desired confidence level
  • S is the estimate of the population standard deviation
  • E is the amount of error we are willing to tolerate (i.e. the plus or minus term)

-variability, how accurate you want to be, how confident you want to be that you are that accurate.

-what is not a factor in this calculation? Population size. What matters is how much variation there is.

-calculation of sample size: constrained by resources, by variability w/in population


Estimating sample size II


In other words, we need 3 pieces of information in order to calculate sample size:


  • the amount of variability or heterogeneity in the population on the characteristic that we want to estimate. We typically do not know this, so we have to use our best estimate based on e.g. prior studies or a pilot study
  • the amount of error we are willing to tolerate i.e. how wide do we want our confidence interval to be?
  • the confidence level–how confident do we want to be that our sample estimate is that accurate?

The population size does not affect the sample size unless the sample is going to constitute 5 percent or more of the population


Example: to estimate mean GPA within + 2 points with a 95% level of confidence and an estimated population standard deviation of 12 points:


Estimating confidence intervals around a sample proportion


The logic is exactly the same when we want to estimate a population proportion on the basis of a sample proportion.


This time we draw on our knowledge of the fact that the sampling distribution of the sample proportions will be normally distributed and we have to calculate the standard error of the proportion.

(see 12.15)


If we have no basis for estimating the sample proportion, we should use the value that assumes the maximum amount of variability.

The maximum possible value for the standard error of the sample proportion occurs when we assume a population proportion of .5

TOPIC 13: Causal Thinking and Research Design



Why is research design so important?

The nature of causal inferences

The classic experimental design

Internal validity

Extrinsic threats to internal validity

Intrinsic threat to internal validity

Threats to external validity

Variations on the classic experimental design

Quasi-experimental designs


-generalize causal inferences

-determine causal connection

-> our ability to do this hinges on how we design our research. Don’t rule our plausible causal interpretations.

Why is research design so important?


Purpose: to impose controlled restrictions on our observations of the empirical world.


A good research design:

  • allows the researcher to draw causal inferences with confidence
  • defines the domain of generalizability of those inferences

The way we structure our data-gathering strongly affects the nature of the causal interpretations we can place on the results.


The research must be designed so that we can rule out plausible alternative interpretations of the observed relationships.


The nature of causal inferences


We can never be certain that one variable ‘causes’ another–but we can increase confidence in our causal inferences if we are able to:

-demonstrate co-variation

-eliminate sources of spuriousness

-establish time order

Covariation—show that the IV and DV vary together in a patterned, consistent way (if A, then B)


NonSpuriousness — rule out the possibility that the IV and DV only co-vary because they share a common cause


Time order — show that a change in the IV preceded a change in the DV


How can we get more confident?

-don’t say what causes what. Assume there is some sort of causal influence involved

-change in value of 1 variable enhanced another’s change in value

-fundamental problem of causal inference: always has one causal influence. Demonstrate covariation, demonstrate non-spuriousness, time order.

-demonstrating covariation at the heart of hypothesis testing. Time order: demonstrate that IV occured before DV. Cause b4 effect.

-causal interpretation cannot come from data itself. However, can design research so that some outcomes are impossible and/or use statistical methods to analyze data & rule out possibilities ex-post facto. Can only do this if thought of it at research design stage.


The classic experimental design I


The classic experimental design consists of two groups: an experimental group and a control group.


These two groups are equivalent in every respect, except that the experimental group is exposed to the IV and the control group is not.


To assess the effect of differential exposure to the IV, the researcher measures the values of the DV in both groups, before and after the experimental group is exposed to the IV.


The first set of measurements is called the pre-test and the second set of measurements is called the post-test.


If the difference between the pre-test and post-test is larger in the experimental group, this is inferred to be the result of exposure to the IV.





Time 1



Time 2

Exposure to IV


Time 3




Why is the classic experimental design so powerful?


The classic experimental design has 3 essential components that enable us to meet the 3 requirements for demonstrating causality:

Comparison -> covariation

Manipulation -> time order

Control -> non-spuriousness


-able to study impact of IV free of all other conflicting inferences

-unfortunately, much of what we study is not amenable to this design

-even in non-experimental research, we try to mimic this design.



Internal Validity

-absolute basic requirement of a research design

A research design has internal validity when it enables us to infer with reasonable confidence that the IV does indeed have a causal influence on the DV. Must enable to us to rule out plausible alternative causal relations.


To demonstrate internal validity, our research design must enable us to rule out other plausible causal interpretations of the observed co-variation between the IV and DV.


The factors that threaten internal validity can be classified into those that are extrinsic to the actual research and those that are intrinsic.


Extrinsic threats to internal validity


Extrinsic threats to internal validity typically arise from the way we select our cases.


They refer to selection biases that cause the experimental group and the control group to differ even before the experimental group is exposed to the IV.


If the two groups are not equivalent, then a possible explanation for any difference in the post-test results is that the two groups differed to begin with.


Intrinsic threat to internal validity I


Intrinsic threats to internal validity arise once study is under way from:

-changes in the cases being studied during the study period (history)

-flaws in the measurement procedure

-the reactive effects of being observed


There are six major intrinsic threats:

History—events may occur while the study is under way which affect values on the DV quite independently of exposure to the IV. The longer the study, the greater this threat.

Maturation—physiological and /or psychological processes may affect values on the DV quite independent of exposure to the IV

Mortality—selective dropping out from the study may cause the experimental group and the control group to differ on the post-test, quite independent of exposure to the IV.

Instrumentation—if our measuring instruments do not perform consistently, this unreliability may explain why cases differ before and after exposure to the IV.

The regression effect—if cases score atypically high or atypically low when they are pre-tested, it is likely that their scores will appear more typical when they are post-tested, quite apart from exposure to the IV.

Reactivity (‘test effect’)—the very fact of being pre-tested may cause people’s values to change, quite apart from exposure to the IV.


Countering extrinsic threats to internal validity I


Extrinsic threats are countered by ensuring that the experimental group and the control group are equivalent. (selection bias might cause groups to differ before exposure) There are 3 ways of ensuring equivalence:

Precision matching (also known as ‘pairwise matching’)—each case in the experimental group is literally matched with another case in the control group which has an identical combination of characteristics.


This method can be impractical because of the difficulty of finding matched pairs of cases.

Countering extrinsic threats to internal validity II


Frequency distribution matching—instead of matching cases on combinations of characteristics, the distribution of characteristics within each group is matched (i.e. the two groups should have the same proportion of men and women, the same average income level, the same ethno-linguistic composition, etc.)

This method is easier to achieve, don’t have to reject a lot of potential cases, but:

  • the effects of any one characteristic may be conditioned by the presence of other characteristics e.g. the effects of age may differ for men and women.
  • we can only match social background characteristics, but people who share the same social characteristic may differ in other ways.
  • we can never be confident that we have matched on all relevant characteristics.


Countering extrinsic threats to internal validity III

Randomization—cases are assigned to the experimental group and the control group in such a way that each case has an equal probability of being assigned to either group i.e. selection is left entirely to chance. (table of random, numbers, flip a coin)


If the randomization is done properly, the two groups should be equivalent.


This method controls for numerous factors simultaneously without the researcher having to make decisions about which factors might have a confounding effect.


BUT randomization requires a large number of cases in order to work effectively.


Countering intrinsic threats to internal validity


The presence of a control group that is equivalent in every respect to the experimental group except that is not exposed to the IV counters the intrinsic threats to internal validity:

History–both groups are exposed to the same events—so any difference in their post-test values must reflect differential exposure to the IV.

Maturation–both groups undergo the same maturational processes

Mortality–selective dropping out will affect both groups equally.

Instrumentation—both groups will be equally affected by random errors in measurement.

Regression effect—both groups will be equally susceptible.

Reactivity—if the pre-test does affect values on the post-test, this will be true of both groups.


-any difference must be because of the IV because everything else has been controlled for

-unambiguous basis for knowing that change in the IV occurred before change in the DV in time

-very strong internal validity, strong basis for inferring causal relations

problem: causal relations may only apply to case that you studied -> weak external validity = weak basis for generalizing


Threats to external validity


External validity concerns the extent to which the research findings can be generalized beyond the particular cases that were studied.


There are 3 threats to external validity:

-unrepresentative cases (people who volunteer are not representative)

-the artificiality of the research setting (people do not react the same way in the real world)

-reactivity—the pre-test may sensitize participants to respond atypically to the IV


The classic design is strong on internal validity and weak on external validity.


The Solomon 3-control group design

(also known as the Solomon 4-group design)


-This design has stronger external validity because it enables the researcher to assess the reactive effects of the pre-test experience.

-enhance external validity, helps assess reactive effect of the pretest

-This design is similar to the classic experimental design but it adds two more control groups. One group is exposed to the IV, but the other group is not. Neither group is pre-tested, but both groups are post-tested.


The post-test only control group design

The Solomon 3-control group design is stronger on external validity but:

  • often impractical
  • too costly


Another solution is to omit the pre-test altogether. This is only possible if we are very confident that the experimental group and the control group are really equivalent.

-avoid problem of testing, but still problem w/unrepresentativeness & artificiality.

-in practice cannot maximize internal & external validity. The more generalizability, the less internal validity.

-which matters most? Internal validity. Unequivocal basis for making causal inferences. However, typically study things as they are already, can’t manipulate countries/education, etc in experiments. Studying people already exposed to IV, so must use designs that are weaker in internal validity.


Quasi-experimental designs I


Experimental designs provide the most unequivocal basis for inferring causal relationships—but political phenomena are typically not amenable to experimental manipulation.


Quasi-experimental designs attempt to use the logic of the experimental design in situations where the researcher cannot randomly assign observations to experimental and control groups or control exposure to the IV.


In this design, comparison and control are achieved statistically. Multivariate statistical analysis is the most common alternative to experimental methods of control.


Quasi-experimental designs II


-The ex post facto experiment is most common type of quasi-experimental design. It attempts to approximate the post-test only control group design by using multivariate statistical methods. Try to apply logic of experimental design after having collected data. Cross-tabulations. Compare in order to demonstrate covariation.


-The researcher collects data on the IV, the DV and any other variables that might plausibly alter or even eliminate any observed covariation between the IV and the DV.


-At the analysis stage, cases are assigned to groups depending on their values on the IV. Then the researcher compares each group’s values on the DV. Any difference is inferred to be the result of the fact that the groups differ on the IV.

-To demonstrate non-spuriousness, the cases are divided into groups based on their values on the plausible source of spuriousness variable and the researcher compares values on the IV and the DV (as above) within each group. If the IV and DV continue to covary within each group, the relationship is not spurious.

-when we examine categories, we are matching: same drawback. Researchers must decide what are relevant variables and possible SS>

-taking liberties w/notion of control, try to mimic logic of the control group

-demonstrate non-spuriousness, correlation, but can’t demonstrate time-order.

Topic Fourteen: Statistics — Cross-Tabulations and Statistical Significance



Demonstrating covariation

Creating a cross-tabulation (nominal-level relationship)

Interpreting a cross-tabulation

Statistical significance

Type I versus Type II error

Estimating the probability of Type I error

The Logic of the Chi Square Test

Calculating Chi Square

Using and Abusing the Chi Square Test


Demonstrating Covariation

Demonstrating covariation involves answering 3 questions:


ü Degree–how strong is the relationship between the IV and the DV? Strength of association. Descriptive statistics.


ü Form–which values of the DV are associated with which values of the IV? Descriptive statistics. Positive or negative relationship.


ü Statistical significanceif the data are taken from a sample, can the relationship be generalized to the population from which the sample was drawn? Could we have obtained this relationship if there wasn’t one in the population? Inferential statistics


The tests that are used to answer these questions will depend on the level of measurement of the IV and the DV. The higher the level of measurement, the more varied and the more powerful the tests that can be used.

-cases can be affected by frequency distribution.


Creating a Cross-Tabulation (nominal-level relationship) I


The first step in describing the relationship between two variables is to arrange the data so that we can get an initial visual impression of the relationship.


If both variables are measured at the nominal level, this involves arranging the data in the form of a contingency table or cross-tabulation.


-A cross-tabulation involves classifying cases according to their values on the IV and then cross-classifying them according to their values on the DV. The cells of the table display the number of cases having each possible combination of values on the IV and the DV.

-eliminate irrelevant categories (missing data). Eliminate Categories that are useless for meaningful analysis (numbers too small). Can only do this in nominal level.

-The single most common error in constructing a cross-tabulation is to percentage the wrong way.

-The cell percentages must be calculated in terms of the total number of cases in each category of the independent variable. If we are testing the hypothesis that women are less likely to vote for new right parties than men, we have to compare the % of women who voted Alliance with the % of men who voted Alliance.

-A cross-tabulation is interpreted by comparing categories of the independent variable in terms of the percentage distribution of the dependent variable i.e. we compare the % of women who voted Alliance with the % of men who voted Alliance.

-If the independent variable forms the columns of the table, the percentages are calculated by column and then the columns are compared i.e. percentage down and compare across columns.

-total in each column/row are marginal frequencies. Literally, on margin of table.

-reasons to % table: 1. If don’t have equal cases in diff IV categories, difficult to compare cell frequencies 2. Even if equal, easier to read out of 100 than other things.

-don’t use decimal to avoid a false sense of precision in %


Interpreting a Cross-Tabulation (nominal-level relationship) II

  • check whether there are differences in the distribution of the DV for the different categories of the IV.
  • if there are differences, check whether they are consistent with the hypothesis.
  • if the percentage differences are consistent with the hypothesis, see how big they are. The larger the differences, the stronger the relationship.
  • if the data come from a sample, check how likely it is that differences this large could have occurred by chance (as a result of sampling error) i.e. how confident can we be that the relationship observed in the sample exists in the population at large?



  • inter-ocular strike test: no substitute for eyeballing the table. If no difference, then there is no relation.
  • Are differences continuous w/hypothesis? Is the gap the one predicted? (form)
  • If % are in hypothesized direction, how big are the differences? Bigger the difference, the more impact IV is having. But % don’t have to be drastic to be meaningful.
  • Statistical significance


Statistical Significance


Statistical significance indicates how likely (or probable) it is that the relationship between two variables observed in a sample might have occurred by chance and might not exist in the population from which the sample was drawn.


This probability is termed the level of statistical significance. The lower the probability, the higher the level of statistical significance. Want a low probability (.05 or less is conventional. 5% chance they don’t generalize)


A test of statistical significance is an inferential statistic. Purpose to estimate how likely it is that results occurred by chance & is not representative of population.


Type I versus Type II Error


In making inference about the whole population based on the results of a sample, we risk making one of two types of error:


  • inferring that there is a relationship when none actually exists.


  • inferring that there is no relationship when there really is a relationship.

The risk of Type I error is always viewed as much more serious than Type II error:

  • the analogy of a court of law—just as we’d rather risk letting a guilty person go free than convicting an innocent one, so we’d rather risk missing a relationship than inferring one where none exists.
  • If our sample indicates that there is no relationship, we are usually ready to accept this verdict without worrying how confident we should be.

-much harder to calculate type II error


How to calculate type I error

-rely on theoretical frequency distribution, which provides us with criteria for assessing risk of error

-theoretical now gives likelihood of each possible degree of association in a sample if there was no relation w/in population.

-chisq distribution. Chisq= appropriate w/nominal level relations, also ordinal lvl

-cross tab is interpreted by comparing categories of the IV in terms of the % distribution of the DV. (% women alliance voters with % men alliance voters)

-if the IV forms the columns, the % are calculated by column and then the columns are compared (percentage down, compare across)

-use knowledge of theoretical distribution to judge how confident we can be that results will hold in population


Estimating the Probability of Making a Type I Error


Estimating the probability of making a Type I error (i.e. determining the level of statistical significance) involves the use of a theoretical sampling distribution.


For nominal-level relationships, the appropriate sampling distribution is the Chi-square distribution. This distribution gives the likelihood of each possible degree of relationship occurring in a sample if there were no relationship in the population from which the sample was drawn.


We use this theoretical distribution to determine how likely it is that we would have found a relationship as strong as the one observed in our sample if there were really no relationship in the population.


The Logic of the Chi Square Test

  • set up a null hypothesis i.e. assume that there is no relationship in the population.
  • calculate the cell frequencies you would expect to observe if the null hypothesis were true.
  • compare the expected cell frequencies with the observed cell frequencies — the greater the differences, the less risk of Type I error, and the bigger chisq, the more confident we can be that there is a relationship in population.
  • make a partial adjustment for sample size since the absolute amount of difference between the expected and observed cell frequencies is also a function of sample size.
  • calculate the degrees of freedom—the more cell there are in a table, the greater the opportunity for the observed distribution to depart from the expected distribution.
  • consult the theoretical Chi Square distribution to determine the significance level (SPSS automatically does this for you).


Calculating Expected Frequencies

-To obtain the expected cell frequency for a given cell, multiply the column marginal by the row marginal and divide by the total number of cases e.g.:

-The expected cell frequency tells us how many women we would expect to vote Alliance if the vote distribution for women matched the vote distribution for the sample as a whole. (461/1357) x 100 = 34% of the sample voted Alliance—so we would expect 34% of women to vote Alliance.


Calculating Chi Square

Xsquared =  (fo – fe)squared/fe

Where: fo = the frequency observed in each cell.

fe = the frequency expected in each cell

Degrees of freedom (# of columns minus one)(# of rows minus one) = 1 x 3 = 3


Chi Square = 29.2       significance level = .001


i.e. there is less than one chance in a 1,000 that we would have obtained a relationship like the one observed in our sample if there were really no relationship in the population.

-square to get rid of negative signs so that #’s don’t cancel out

-(fo-fe)squared/fe: other things being equal, the larger size, the larger the discrepancy. Therefore want to compensate for that by making a partial adjustment. Divide by expected frequency for each cell.

-only partial adjustment b/c larger samples are more reliable.


-distributional freedom: adjust for differences in the size of the table (differences btwn tables in the # of cells that they have). the more cells in a table, the more chances there are to deviate from the random model & want to adjust for this)


Chi square:

-significant at the .001 lvl (1/1000 chance)

-if relation is .006 can talk about it being borderline, approaching statistical significance

FOR EXAM: define statistical significance, name a nominal level test, describe the logic


Using and Abusing the Chi Square Test


  • Chi Square assumes that the researcher has hypothesized a relationship in advance.


  • Chi Square assumes that the sample was selected randomly. (non-zero chance of inclusion)


  • Chi Square assumes that no more than 25 percent of the cells have an expected frequency of less than five. More of an issue if it appears to be significant, must alert reader of the problem.


  • the larger the number of cases, the larger Chi Square will be since the adjustment for sample size is only partial. This is as it should be since a larger sample reduces the risk of Type I error. BUT this means that Chi Square should NEVER be used to draw conclusions about the strength of the relationship between IV and DV (since trivial relationships will attain statistical significance if the sample is large enough). Cannot compare size of chi square from one table to another


  • a non-significant Chi Square does NOT mean that our sample is unrepresentative. What it usually means is that the relationship we have observed is so weak that it could easily have occurred by chance.

Topic Fifteen: Statistics — Nominal-Level Measures of Association


What is a Measure of Association?

What are PRE-Based Measures of Assssociation?

Calculating Lambdaa

Interpreting Lambdaa

Why Lambdaa can be misleading


What is a Measure of Asssociation?


A measure of association (or correlation coefficient) is a single number that summarizes the degree of association between two variables.


There is a wide range of measures available for describing how strongly two variables are related. Some differ in their basic approach, but even when the basic approach is similar, measures may differ with respect to:


  • the type of data for which they are appropriate
  • their computational details


This means that different measures of association are not directly comparable. Never compare how strong different relationships are unless the same measure of association has been used.


What are PRE-Based Measures of Association?


The logic of proportional reduction in error (PRE) provides an intuitive approach to measuring association. It involves asking: how much does knowing the values of cases on the independent variable help us improve our ability to predict their values on the dependent variable?


If two variables are perfectly related, knowing a case’s value on the IV will enable us to predict its value on the DV with complete accuracy. Conversely, if two variables are completely unrelated, knowing the value of a case on the IV will be no help at all in predicting its value on the DV.


If two variables are partially related, knowing the value of a case on the IV will be some help in predicting its value on the DV. PRE-based measures enable us to summarize that improvement in predictive ability.


Calculating Lambdaa I


Lambdaa is a PRE-based measure of association that is appropriate when one or both variables are measured at the nominal level.


Lambdaa measures how much our predictive ability is improved by knowing the values of cases on the IV. It ranges in value from .00 (no improvement) to 1.00 (perfect predictability).

If you had to guess how any one person voted, your best guess would be the modal category (Liberal).

And if you had to make the same guess for every person, you would make the fewest errors if you always guessed the modal category.

-single # that summarizes degrees of correlation btwn 2 variables

-many diff variables of association: conceptualize in different ways. Cannot compare different measures of association.

-widely used. Employ logic that is very direct literal interpretation

-lamda a = asymetrical lamda

-lamda is attractive measure of association b/c it is easily readable. Don’t want to take it that literally, not strong relationship til .50


Lamda = fi-fd/n-fd


Where fi = maximum frequency w/in each subclass or category of the IV

Fd= maximum frequency in the totals of the DV

N = number of cases


Interpreting Lambdaa


The value of Lambdaa depends on which variable is used as the predictor variable—the column variable or the row variable.


Lambdaa is asymmetric Lambda (hence the subscript), meaning that it is used when we want to predict the values of one variable based on the values of a second variable. There is also symmetric Lambda which is used when we want to summarize the degree of mutual predictability between two variables (how much does our predictive ability improve if we use each variable to predict the other?)


SPSS provides all three Lambdas—so be sure to choose the asymmetric Lambda that corresponds to your DV.


Why Lambdaa can be misleading

Lambdaa will always be zero if the modal value is the same for all categories of the IV.

-be skeptical if get .00. If modal category is the same for categories of the DV, lamda will be .00. statistic no longer giving appropriate distribution of variables.


If the modal value is the same for all categories of the IV, then Cramer’s V will be an appropriate measure to use for nominal-level relationships. Cramer’s V is based on the logic of Chi Square (i.e. it is not a PRE-based measure). It adjusts Chi Square to minimize the effects of sample size and distributional freedom(the more cells in a table, the more opportunities there are to differ from population) and to constrain the coefficient to range between .00 and 1.00.

-cannot give literal interpretation of cramer’s v. only gains meaning when comapred to diff tables & strength of association

-cannot compare cramer’s v and lamda

-not a PRE based measure.


-arrange data to get initial visual impression. Can create rank-ordering (used w/ordinal variable w/ large # of possible values. Very few cases w/ same values) or cross tabulation/contingency table. Btwn 3 & 7 values.

Topic Sixteen: Statistics — Ordinal-Level Measures of Association


Creating a cross-tabulation

Measuring association at the ordinal level

The logic of PRE at the ordinal level

Calculating Gamma

Why Gamma can be misleading

Ordinal measures of association: Tau

Choosing a measure of association


Creating a Cross-Tabulation I


The first step in describing a relationship between two variables is to arrange the data so that you can get an initial visual impression of whether there is a relationship or not. With ordinal-level data, there are two methods for doing this:

  • rank orders are used when there are few cases having the same value (i.e. when there are few “ties”).
  • cross-tabulations are used when there are many ties and/or when both variables have only a small number of possible values.

When cross-tabulating ordinal variables, it is important that the values of both variables be listed in the same order (e.g. from low to high, from weak to strong, etc.).


The best general indication of a relationship in a cross-tabulation between two ordinal variables is a consistent increase in the %s in one direction across the top row and in the opposite direction across the bottom row. Pattern where % increase in the top & bottom row. Do they increase in opposite directions? If so, relationship.

-always compare across rows

-focus on the gap, but not to the exclusion of what happens btwn the endpoints. Have to see a steady pattern on incrase.



Measuring Association at the Ordinal Level


Having checked that Chi Square is statistically significant (i.e. the significance level is .05 or less), the next step is to calculate a measure of association.


Measures of association at the ordinal level differ from measures of association at the nominal level in ranging from –1.00 to +1.00 (instead of .00 to 1.00).


A negative coefficient indicates that cases with high values on the IV tend to have low values on the DV (and vice versa). This indicates that there is a negative relationship between the IV and the DV.


A positive coefficient indicates that cases with high values on the IV also tend to have high values on the DV (and vice versa). This indicates that there is a positive relationship between the IV and the DV.


The Logic of PRE at the Ordinal Level I


-Gamma is an ordinal measure of association that uses the logic of proportional reduction in error.

-Association is still treated as a matter of predictability, but the nature of the predictions changes because we have ordered categories.

-With ordinal data, we are interested in measuring how much knowing the relative position (or ranking) of a pair of cases on the IV will help us to improve our ability to predict their relative position (or ranking) on the DV.


The Logic of PRE at the Ordinal Level II


There are 2 conditions under which the ranking of a pair of cases will be perfectly predictable:

  • if all the cases are ranked in exactly the same order on both variables (perfect agreement) i.e. cases that have low values on the IV all have low values on the DV, etc.
  • if all the cases are ranked in exactly the opposite order on both variables (perfect inversion) i.e. cases that have low values on the IV all have high values on the DV, etc.


In either case, we can predict the relative position of a pair of cases on the DV from their relative position on the IV with perfect accuracy.


The degree of predictability (or association) is a function of how close the rankings on the two variables are to either perfect agreement or perfect inversion. Both situations represent perfect association—the only difference lies in the direction of the association.


Calculating Gamma I


We use probabilistic logic to calculate and interpret Gamma.

-If two variables are in perfect agreement, the probability of drawing a positive pair (a pair of cases ranked in the same order on both variables) will be 100%:

-If two variables are in perfect inversion, the probability of drawing a negative pair (a pair of cases ranked in the opposite order on both variables) will be 100%:

-If two variables are totally unrelated, the probability of drawing a positive pair will equal the probability of drawing a negative pair.


In order to calculate the chance of drawing positive and negative pairs, we have to count the total number of positive and negative pairs.


To compute the number of positive pairs, begin with the cell in the upper leftmost corner and multiply it by the sum of the frequencies in all the cells below and to the right. Cells below will have higher values on the DV and cells to the right will have higher values on the IV. Repeat for every cell that has cells below and to the right:


To compute the number of negative pairs, begin with the cell in the upper rightmost corner and multiply it by the sum of the frequencies in all the cells below and to the left. Cells below will have higher values on the DV and cells to the left will have lower values on the IV. Repeat for every cell that has cells below and to the right:


Interpreting Gamma


If positive pairs predominate, Gamma will be positive. If negative pairs predominate, Gamma will be negative.


Gamma is literally interpreted as indicating the probability of correctly predicting the order of a pair of cases on the DV once we know their order on the IV, ignoring ties. Still using the logic of guessing.


The size of the coefficient indicates the strength of the relationship, while the sign (positive or negative) indicates the direction of the relationship. Strength of association. Clsoer to 1 = stronger.


Why Gamma can be misleading

In calculating Gamma, we ignore cases that have the same value on one or both variables (‘ties’). Cases that have the same value on one variable, but a different value on the other variable violate the notion of association. Ignoring these cases causes Gamma to overstate the degree of association.


Ordinal Measures of Association: Tau


Because Gamma can be inflated, it is preferable to use Tau. Tau does take into account cases that are tied on one variable, but not on the other (cases that are tied on both variables are consistent with the notion of association).


Like Gamma, Tau ranges in value from –1.00 to +1.00


Taub is used when both variables have the same number of values (i.e. the table is symmetrical, with an equal number of columns and rows).


Tauc is used when one variable has more values than the other variable (i.e. the table is asymmetrical, with an unequal number of columns and rows).


[There is also a Taua but this is not used with cross-tabulations since it assumes that there are no ties.]

-only use if both measures are ordinal (exception: dichotomous variables can be treated as ordinal, evne interval)


Left-right self-placement x support for free enterprise:

Gamma = .37  Taub = .23


Choosing a Measure of Association I


Gamma and Tau should only be used when both variables are measured at the ordinal level unless one or both variables is a dichotomy.


A dichotomous variable has only 2 categories (e.g. sex). As such, it satisfies the requirements for both interval-level (there is only one interval which, by definition, is equal to its self) and ordinal-level (the ordering is arbitrary but neither ordering violates the mathematical requirements) measurement


IV                    DV                   Measure of Association


nominal           nominal                       Lambda or Cramer’s V

nominal           dichotomy

nominal           ordinal

dichotomy       nominal

ordinal            nominal


ordinal            ordinal            Gamma or Tau

dichotomy       ordinal

ordinal            dichotomy

dichotomy       dichotomy


Topic Seventeen: Statistics — Examining the Effects of Control Variables



How are controls introduced?

Interpreting control variables

Sources of Spuriousness

Intervening variables

Conditional variables

Replicated relationships


How are controls introduced?


It is never enough to demonstrate covariation. We must always go on to examine the effect of other variables (‘control variables’) that might plausibly alter or even eliminate the observed covariation.


In order to determine whether some third variable affects the observed relationship between the IV and DV, we must be able to hold the effects of that variable constant and then re-examine the relationship between the IV and the DV. Note: the focus is always on what happens to the IV – DV relationship.


With nominal variables or with ordinal variables that have only a small number of possible values, we use a physical control i.e. we divide our cases into groups based on their values on the control variable and then re-examine the original relationship separately for each of these groups, using a series of cross-tabulations.


Interpreting control variables


When you do this, one of three things can happen to the original relationship:

  • it can stay more or less the same in every category of the control variable (replicated relationship).
  • it can weaken or disappear in every category of the control variable (spuriousness OR intervening variable). Gap smaller, measure of association smaller, no sig chisq
  • it can weaken in some categories and strengthen in others or even assume different forms in different categories of the control variable (conditional variable).

Note: there is no statistical technique for distinguishing between an intervening variable and a source of spuriousness. You have to decide on substantive grounds which interpretation makes the most sense. Usually, this is decided on the basis of time order. Draw chart.

-data analysis is not just a mechanical process -> process of imparting meaning to data by interpreting them.


Source of Spuriousness I


-The first priority must be to test for spuriousness i.e. we must ask whether there is some common factor that could cause both the IV and the DV.


-If the relationship between the IV and the DV is spurious, the relationship will weaken or disappear when we control for the source of spuriousness variable (remove the common cause and the observed covariation will weaken or disappear).

-If your relationship turns out to be spurious, you should make the source of spuriousness variable your new independent variable and then test the relationship between this variable and your dependent variable. You will then test for the effects of two plausible control variables.


If the original relationship weakens in every category of the control variable, but there is still some relationship in every category (i.e. the significance level is .05 or less and the measure of association is close to .20), you have a partial source of spuriousness. In this case, you do not need to change your hypothesis because there is still some covariation even controlling for the common cause.


If there is more than one plausible source of spuriousness, you must test for these additional possibilities.


Intervening Variable I


If your relationship is not spurious (or is only partially spurious), the next priority is to test for a plausible intervening variable.


Intervening variables are variables that mediate the relationship between the IV and the DV. An intervening variable provides an explanation of why the IV affects the DV.


The intervening variable corresponds to the assumed causal mechanism. The DV is related to the IV because the IV affects the intervening variable and the intervening variable, in turn, affects the DV.

To identify plausible intervening variables, ask yourself why you think the IV would have a causal impact on the DV.


Ex) The relationship has weakened in both categories of the control variable, but it has not disappeared. This indicates that ideology is a partial intervening variable (it only explains some of the observed relationship between religious affiliation and vote choice).


Conditional variables I


Once we have eliminated plausible sources of spuriousness and verified the assumed causal mechanism, we need to specify the conditions under which the hypothesized relationship holds.


Ideally, we want there to be as few conditions as possible because the aim is to come up with a generalization.


Conditional variables are variables that literally condition the relationship between the IV and the DV by affecting:

(1) the strength of the relationship between the IV and the DV (i.e. how well do values of the IV predict values of the DV?) and

(2) the form of the relationship between the IV and the DV (i.e. which values of the DV tend to be associated with which values of the IV?)


Conditional variables II

To identify plausible conditional variables, ask yourself whether there are some sorts of people who are likely to take a particular value on the DV regardless of their value on the IV.

Note: the focus is always on how the hypothesized relationship is affected by different values of the conditional variable


There are basically three types of variables that typically condition relationships:

(1) variables that specify the relationship in terms of interest, knowledge or concern.

(2) variables that specify the relationship in terms of place or time.

(3) variables that specify the relationship in terms of social background characteristics.


Replicated relationship II


What matters is what happens to the differences across the columns. Even though the cell percentages may change, the impact of the IV on the DV will be similar to the uncontrolled relationship if the gap across the columns in each control table remains more or less the same (and the measure of association indicates that the strength of the relationship is more or less similar)


Topic Eighteen: Validity and Reliability


Validity versus reliability

Systematic versus random errors

Face validity

Criterion-related validity

Construct validity

Test-retest reliability

Parallel-forms reliability

Internal consistency

Sub-sample reliability


-central issue: how well do empirical indicators respond to abstract concepts

-can we build into our data collection a provision to collect info that we need to persuade that our measures work.

Validity versus reliability


Validity—are we measuring what we think we are measuring? i.e. does our indicator really represent our target concept?


Reliability—does our measurement process assign values consistently? i.e. if we repeated our research, would we assign the same values to the same observations?


Validity and reliability are jeopardized by measurement errors.


Measurement errors are differences in the values assigned to observations that are attributable to flaws in the measurement process i.e. they do not reflect authentic differences between observations in the property we want to measure


Measurement errors can be either systematic or random.

Systematic versus Random Errors I


Systematic errors occur when our indicator is picking up some other property, in addition to the property it is supposed to measure. This type of error systematically affects our results. Constant. Biasing effect is predictable, once identified.


Random errors are chance fluctuations in the measurement results that do not reflect true differences in the property being measured. These errors occur as a matter of chance and affect each observation differently.

-can be due to transient aspect of case being measured

–could be due to measurement situation (interviewer has an off day)

-measurement procedure itself that varies from case to case

-b/c of vague/ambiguous instructions

-random b/c amount of error varies from one case to another in unpredictable ways

Systematic versus random errors II


Random errors make our measures unreliable. If a measure is unreliable, it cannot be valid because at least some of the differences in the values assigned to observations will result from random measurement errors.


BUT a reliable measure is not necessarily valid. This is because reliability is only threatened by random error—whereas validity is threatened by both random error and systematic error.


Systematic errors are no threat to reliability precisely because they are systematic i.e. they consistently affect our measurement results. Could, after the fact, introduce a control variable to deal with the bias from systematic error.


Content Validity I—Face Validity

Content validity is concerned with the substance, or content, of what is being measured. It addresses directly the question: are we measuring what we think we are measuring?


Validity is the basic problem of social science.


To have content validity, a measure must be both appropriate and complete.


If we wanted to measure public education in cities: we may try to count the number of teachers in city schools, this is inappropriate.


Face validity involves the criterion of appropriateness—can knowledgeable people be persuaded that the measure is an appropriate indicator of the target concept? Ask experts. Some measures are based on such direct observation of the behaviour in question that there seems to be no reason to question their validity. Ex: state law to present license of compliance visibly. We shouldn’t trust the face value alone.


Potential problems:

-the method relies on subjective judgment

-there are no replicable rules for evaluating the measure (can’t say how expert reached their decision)

Intersubjectivity enhances confidence in the face validation approach.


Content validity II—sampling validity

Sampling validity involves the criterion of completeness—does our measure represent the full range of meaning on the target concept?


This approach assumes that every concept has a theoretical universe of content consisting of all the things that could possibly be observed about the property it represents. A valid measure is one that constitutes a representative sample of this universe of content.


Potential problems:

  • the method relies on subjective judgment
  • there are no replicable rules for evaluating the measure (nominal definitions are crucial for this reason)
  • it is difficult to specify the universe of content of abstract concepts
  • it is even harder to represent that content completely/adequately


Criterion-related validity I (pragmatic, empirical, predictive, concurrent)


Criterion-related validity assumes that an indicator is valid if there is an empirical correspondence between the results obtained using the indicator and the results obtained using another indicator of the same concept that is already known (or assumed) to be valid.


Ex: street light test: multiple indicators improve the chance of validity.


There are two types of criterion-related validity:


  • concurrent criterion-related validity simply involves comparing the results with those obtained using another indicator.
  • predictive criterion-related validity involves asking how well the indicator predicts a behavior that is known to reflect the concept being measured e.g. how well do LSAT scores predict performance in law school?


The emphasis in both cases is on the correlation between our indicator and the criterion (hence the alternative names: pragmatic validity and empirical validity).

Criterion-related validity II


This form of validation raises three questions:

  • why not use the criterion instead? In some cases, the criterion may be impractical or expensive to use. In other cases, we need to measure the property before we make use of the criterion (i.e. we want to measure aptitude for law school before we admit students).
  • how do we know the criterion is valid?
  • what if we lack a valid criterion? This is typically the case unless we are engaged in applied policy research.


Construct Validity I


Construct validity involves relating an indicator to an overall theoretical framework.


Based on our theoretical understanding of the concept we want to measure and on previous research, we postulate various relationships between that concept and other specified concepts. The indicator is valid to the extent that we observe the predicted relationships.


These relationships are in addition to the ones that are the focus of our research.


e.g. we want to test a theory about the relationship between political efficacy and political engagement. We might try to validate our indicators of efficacy by seeing whether they produce the relationship we would expect with indicators of education (i.e. the more education people have, the more efficacious they will feel).


This is known as external validation.

-different from criterion-related (looking at your measure & another measure of same concept) here (looking at your measure & other measure of different concept)


Construct validity II


The process of external validation is very much like testing a hypothesis. The problem is that, like any hypothesis, the predicted relationships may not hold. This could mean any one of three things:

  • Our indicator is not valid
  • the theoretical framework that generated the predicted relationships is flawed.
  • the indicators of the other concepts were not valid.


The solution is to conduct multiple tests. If most of the predicted relationships hold, we can be confident that our indicator is valid. If most of the predicted relationships fail to hold, we would have to conclude that our indicator is the problem.


Construct validity III

Convergent-discriminant validity (also known as the multi-trait multi-method matrix method) is a more sophisticated form of construct validity.


Convergent validity (also known as internal validity) means that different methods of measuring the same concept should produce similar results.


Discriminant validity means that two indicators should not correlate highly if they measure different properties, even if they involve similar methods of measurement.


Construct Validity IV


The convergent-discriminant approach requires indicators of at least two different concepts, each measured using at least two different methods. When these indicators are correlated, we should observe the following pattern:

Concept A/Method 1          Concept B/Method 2


Concept A/Method 2       high correlation           low correlation

Concept B/Method 1       low correlation            high correlation


This approach is difficult to implement because we typically cannot use more than one method for measuring our concepts. However, this approach can be approximated by comparing alternative indicators of different concepts. (Concept A/Indicator 1, etc.)


NOTE: we cannot always be certain that our measures of the key concept are valid, and we should therefore always be careful about concluding that a measure is valid or invalid from any one test of validity.


Assessing Reliability (don’t need to know)


Assessing reliability is basically an empirical matter.


The best way to achieve high reliability is to be aware of the sources of unreliability and to guard against them.


There are four major ways of assessing reliability.

The test-retest method


The test-retest method corresponds most closely to the conceptual definition of reliability i.e. if we repeat the measurement process on the same cases, will we get the same results?


This method is intuitively appealing, but it has important drawbacks:

-it may not be feasible

-there is the risk of reactivity e.g. in a survey, respondents may consciously strive to appear consistent in their responses (over-estimate reliability); respondents may pay less attention the second time around (under-estimate reliability); the fact of being interviewed the first time may change responses the second time around (under-estimate reliability).

– real change may occur in the cases being measured between the first and the second measurement period (under-estimate reliability).


This approach is most appropriate with non-reactive methods of data collection, like content analysis.

The alternative forms (or parallel forms) method


The alternative forms (or parallel forms) method involves using two parallel forms of the measuring instrument on the same cases.


The advantages of this method are:


  • there is no reactivity problem because no case is measured twice using the same measuring instrument.
  • there is no time elapse between the measurements so there is no confounding effect from possible changes in the cases themselves.
  • feasibility


The disadvantages of this method are:

-difficulty of ensuring that the two forms are parallel.

-difficulty of coming up with two measuring instruments.


The alternative forms (or parallel forms) method

A variant of this method is the split-half method. It avoids the problem of having to come up with two parallel forms. The researcher comes up with a single measuring instrument with twice as many items as needed. Reliability is assessed by randomly dividing the items in half and comparing the results. If the randomization works properly, the two halves should be equivalent.


The disadvantages of this method are:

  • the difficulty of coming up with sufficient items.
  • making sure that the two halves really are equivalent (randomization will not ensure equivalence if the number of items involved is small).
  • different splits may lead to different assessments of reliability.

The internal consistency method


The most common approach to assessing internal consistency is the calculation of coefficient Alpha. This coefficient is based on the average correlation for every possible combination of items into two half-tests. Items that produce low correlations are deleted.


Possible values of coefficient Alpha range from 0 to 1. An Alpha of 0.8 is conventionally taken as denoting an acceptable level of reliability


This method shares the advantages of the alternative forms method while avoiding the problem of having to determine equivalence.

The Subsample method


The subsample method is used in survey research. It involves dividing the sample randomly into several subsamples. If the subsamples are large enough, randomization should ensure that the subsamples are similar in composition. The same items are administered to each subsample and reliability is assessed by the similarity of responses across the subsamples.


The advantages of this method are:

  • there is no reactivity problem because no case is measured twice using the same measuring instrument.
  • there is no time elapse between the measurements so there is no confounding effect from possible changes in the cases themselves.
  • no need to come up with twice as many items as needed.

The disadvantages are:

  • a large sample size is required in order for randomization to produce equivalent subsamples.

 Topic 19: Scaling


What is scaling?

Five criteria for assessing scales

Likert scaling

Guttman Scaling


What is scaling?


Scaling involves rank-ordering individuals in terms of whether they possess more (or less) of the target property e.g. alienation, political interest, authoritarianism

We’re trying to assign a single representative value or score to a complex attitude or behaviour.


Ex: College student might be judged on a myriad of possible levels.


The individual’s score on the scale is determined by his or her responses to a series of questions, each of which provides some indication of the individual’s relative alienation, political interest, etc.


Combining items to form a scale serves two important functions:

  • reduces measurement error and thus enhances reliability and validity. A single item may produce idiosyncratic results and/or capture only a limited aspect of the target property
  • simplifies data analysis

-scale is measuring instrument, therefore must remember properties of good measures


Ex: The Cubans are evil and cannot be trusted: need to be more specific in statesments.


Five criteria for assessing scales

  • unidimensionality—the scale should measure one property and one property only
  • linearity and equal intervals—increasing scores should correspond to increasing amounts of the target property and the scores should be based on interchangeable units
  • reliability—the scale should assign values consistently
  • validity—the scale should measure the target property
  • reproducibility—knowing an individual’s total score should enable us to predict correctly which items s/he agreed with and which items s/he disagreed with


Likert scaling I


Likert’s primary concern was unidimensionality.


He eliminated the need for judges (as required by Thurstone’s method) by getting respondents in a pilot sample to place themselves on an attitude continuum running from “strongly agree” to “strongly disagree” on a series of statements relating to the attitude to be measured.


Likert scaling requires a pool of attitude statements, some indicating a favourable attitude and some indicating an unfavourable attitude—but none worded so blandly that almost everyone would agree or so extremely that almost everyone would disagree.


These statements are administered to a pilot sample of 100 or more respondents who are similar to those who will be participating in the survey proper. Each respondent is asked to indicate how strongly s/he agrees or disagrees with each statement.


Each respondent’s responses are scored. Scores typically range from 1 to 5 (more complex scoring schemes have been shown to possess no advantages). The researcher has to decide whether ‘1’ indicates a very favourable attitude or a very unfavourable attitude. It does not matter as long as the scoring is consistent.


If ‘5’ indicates a very favourable attitude, strongly agreeing with a favourable statement is scored ‘5’ and so is disagreeing with an unfavourable statement.


Once the individual responses have been scored, a total score is computed for each respondent by simply adding up the scores for each statement (hence the alternative name of summated rating scale). If there are 20 statements, possible scores will range from 20 to 100.

Likert scaling III


The next step is to perform an item analysis to determine which are the best items to retain in the final scale. The purpose of this analysis is to ensure unidimensionality. There are three different ways to do this:

  • correlate each statement with a reliable criterion that is known or assumed to reflect the target attitude and retain those statements that produce the highest correlations. Such external criteria are typically not available.
  • internal consistency method—for each statement, correlate the score with the respondent’s total score minus the score for that statement. Retain those statements that produce the highest correlations. Factor analysis (correlate every item with every other tieam. Search for measurs that intercorrelate highly) offers a more sophisticated way of ensuring internal consistency


-Both ways of ensuring internal consistency have been criticized for violating the assumptions underlying the statistical methods employed (i.e. using ratio-level methods with ordinal-level data)

  • index of item discrimination—retain those statements that best distinguish between respondents scoring in the top 25% and respondents scoring in the bottom 25%. If respondents with high scores and respondents with low scores respond similarly to a given statement, it cannot be measuring the same attitude as the statements as a whole.


-Once the statements have been selected, the scale is administered to respondents in the survey proper and their total scores are calculated. Scores are typically averaged in order to yield a scale that runs from 1 to 5 (purists use the median score since the level of measurement is only ordinal).


Advantages of Likert scales:

  • reliability—respondents like the format and find it easier to answer when they can qualify their agreement or disagreement. Perform consistently
  • ease of construction
  • unidimensionality—if the statements are internally consistent and/or discriminate among respondents, it is likely that they are all measuring the same attitude.



-lack of reproducibility—the same total score (or average score) can be obtained in many different ways. Two respondents may have the same total score and yet have answered quite differently

-unidimensionality is no guarantee of validity.

-lack of equal intervals—this criticism is questionable since it is unrealistic to think that we could come up with equal ‘units’ of alienation, interest, authoritarianism, etc.

-measuring the same thing, but not necessarily the target property


Guttman scaling I


In Guttman scaling, the twin concerns are achieving unidimensionality and reproducibility. Reproducibility means that we can predict a respondent’s responses to individual scale items knowing only his or her total score

Specifically, Guttman scaling enables us to predict each respondent’s responses to individual items with no more than 10% error for the sample as a whole.


The items that comprise a Guttman scale have the properties of being ordinal and cumulative. (can rank order in terms of having more or less of the property)


The scale is like a ladder—if someone has reached a higher rung, we can be fairly sure that they have climbed the lower rungs as well. Similarly, if the respondent says ‘yes’ to an item that indicates more of the property being measured, we can be reasonably confident that s/he will also have said ‘yes’ to all of the items that indicate less of the property.

-aim for somewhat equal intervals, avoid a big leap.


Guttman scaling II


Creating a Guttman scale involves using scalogram analysis to test a set of items for scalability. Scalogram analysis enables us to see how far our items and people’s responses to them deviate from perfect reproducibility. Scalability is indicated by a coefficient of reproducibility of .90 or higher.


It involves arranging and re-arranging both the items and the respondents in a table. The items are ordered across the top of the table from most to least according to the number of ‘yes’ responses they received. Respondents are ordered down the side of the table from most to least according to how many ‘yes’ answers they gave. Software is available for this purpose.

Guttman scaling II


The aim is to achieve a triangular pattern:

Items that produce too many deviations from these patterns are dropped and so are redundant items (i.e. items that do not lead to greater differentiation among respondents). Also dropped are items to which almost everyone said ‘yes’ (or almost everyone said ‘no’) to guard against inflated estimates of reproducibility.


If we have a large sample of respondents, we should randomly divide the sample into subsamples and repeat the scalogram analysis for each subsample to check for consistency.


Advantages of Guttman scaling:

  • while there is no guarantee of unidimensionality, it is likely that items that meet the test of scalability are measuring the same property.
  • reproducibility is high by definition.
  • produces short but highly effective scales.
  • can be used to scale behaviours and events (e.g. political participation, acts of international aggression) as well as attitudes.


-may be impossible to achieve an acceptable level of reproducibility.

-items may scale in a pilot study but not in the survey proper. Not all areas of study will yield an acceptable Guttman’s scale.

Topic 20:Designing a sample


Probability versus non-probability sampling

Simple random samples

Systematic random samples

Proportionate stratified random samples

Disproportionate stratified random samples

Multi-stage random cluster samples

Convenience samples

Purposive samples

Quota samples

Probability versus non-probablity sampling


In probability (or random) sampling, every member of the population has a known and non-zero probability of being included in the sample.


In non-probability (or non-random) sampling, there is no way of specifying the probability of inclusion and there is no assurance that every member of the population has at least some probability of inclusion.


Probability sampling has two crucial advantages:

-it avoids conscious or unconscious bias on the researcher’s part because the research has no say in deciding which cases get included

-it allows us to use inferential statistics to estimate the likelihood that our sample results differ from those we would have observed if we had studied the entire population.

Despite these advantages, non-probability sampling is used when:

  • the advantages of convenience and economy outweigh the risk of having an unrepresentative sample. Short notice.
  • no population list or surrogate population list is available. Can only do probability if access to full population list.


Simple random samples


Simple random sampling is the most basic probability sampling design and forms the basis for more complex designs.


Simple random sampling gives every member of the population an equal probability of inclusion and gives every possible combination (of the desired sample size) of members of the population an equal probability of inclusion.


For a small population, a simple random sample can be drawn using the lottery method. For larger samples, a random number generator is used.



  • can produce extreme samples (e.g. only the rich, only the poor) because every possible combination of people has an equal probability of inclusion. This is improbable, but it is not impossible.
  • tedious and time-consuming unless a population list is available in an electronic format.

Systematic random samples I


Systematic random sampling involves dividing the total population size by the desired sample size to yield the sampling interval (which is conventionally denoted ‘k’). Then, beginning with a randomly selected person from among the first k people, the researcher selects every kth person. Example:

Population size = 10,000   Desired sample size = 500 k = 10,000/500 = 20

The researcher would randomly select one person from among the first 20—say, the 14th person–and then select every 20th person (14, 34, 54, 74, etc.)


Provided the first person is selected randomly, there is a priori no restriction on the probability of inclusion.

Systematic random samples II



  • less cumbersome than simple random sampling—only one random number is required and thereafter it is simply a matter of counting off every kth
  • reduces the risk of extreme samples since only combinations of people k people apart have an equal probability of inclusion.


-can produce extreme samples if there is a cyclical order in the population list and this order coincides with the sampling interval.

-Only feasibly with small populations


Proportionate stratified random samples I

Proportionate stratified random sampling is used to ensure that key groups within the population are represented in the correct proportion. It provides a better solution to the problem of extreme samples.


Instead of sampling the entire population, the population is divided into homogeneous groups, or ‘strata’, and a series of samples is selected, one from each stratum. These samples are then combined to produce a representative sample of the population as a whole.


The number of people selected from each stratum is proportional to that stratum’s share of the population. Simple random sampling or systematic random sampling is used to select the samples from the strata and so there is no departure from the principle of randomness

The stratification variables must be:

  • relevant to the phenomenon to be explained i.e. people within strata should be similar with respect to the DV—and people in different strata should differ with respect to the DV.
  • operationalizable—this means that we require information about the value of each person in the population on the stratification variable(s) before conducting our study.


  • avoids extreme samples for the characteristics that are used to stratify the population
  • increases the level of accuracy for a given total sample size OR achieves the same accuracy at a lower cost. This follows from the formula that is used to calculate the confidence interval

Stratification reduces variability (S)–and the less variability there is in the population being sampled, the smaller the error term (E) will be. Or, conversely, the less variability there is, the smaller the sample size (N) can be to achieve the same level of accuracy (E)

Disproportionate stratified random samples


Disproportionate stratified random sampling is the same as proportionate stratified random sampling except that the research deliberately over-samples some strata and/or under-samples others.


This is done for analytical reasons:

  • to facilitate statistical analysis by having an equal number of cases in the different categories of the IV.
  • to ensure sufficient cases for meaningful analysis where a stratum is small but substantively or theoretically important.

By definition, people belonging to some strata have a higher probability of inclusion. This is no problem when the sub-samples are being analysed separately or comparatively. However, if the sub-samples are combined into a single sample, corrective weights must be used ensure proportionality.


Multi-stage random cluster samples I


All the methods described so far require a complete list of the population. Multi-stage cluster sampling is used when no population list is available (e.g. all university students in Canada, all eligible voters, all Catholics). Sampling proceeds in stage.


At the first stage, the researcher randomly selects groupings, or ‘clusters’, of population members (e.g. a university is a ‘cluster’ of university students). At the second stage, the researcher randomly selects people from within the selected ‘clusters’.  So lists only have to be obtained and/or compiled for the selected clust65ers.


Depending on the population being sampled, several stages may be involved.


e.g. randomly selecting electoral districts, then randomly selecting polling divisions within the selected districts, and finally selecting eligible voters from the selected polling divisions.


Or randomly selecting school boards, then randomly selecting schools from within the selected school boards, then randomly selecting students from within the selected schools.


  • obviates the need for a complete population list.
  • reduces costs in sampling a geographically scattered population by concentrating interviews within selected localities.


  • increases the risk of sampling error because each stage has its associated risk of sampling error.

Accuracy can be increased by:

  • increasing the sample size—but there is a trade-off between increasing the number of clusters to be selected and increasing the number of cases to be selected from those clusters.
  • increasing accuracy by reducing variability—i.e. combine stratification with multistage random cluster sampling.

-so far: all avoid bias. Enable to use inferential statistics, can be simple/complex.


Convenience samples


There are three different basic non-probability sampling designs. In increasing order of desirability, they are: convenience sampling, purposive sampling and quota sampling.


Convenience sampling is just what its name implies—the researcher selects whatever people happen to be conveniently available e.g. the first 100 people who agree to be interviewed, students in an introductory psychology class.


This method is easy and inexpensive—but it is likely to yield unrepresentative samples. It should only be used (if at all) for pilot studies or for pre-testing questions.

Purposive samples


Purposive (or judgmental) sampling offers a better approach. The researcher uses his or her judgement and knowledge of the target population to select the sample, purposively trying to obtain a sample that appears to be representative.


With this method, the probability of being included depends entirely upon the judgement of the researcher.


In the hands of a skilled researcher, this method has been known to yield surprisingly accurate sample estimates.

Quota samples I


Quota sampling is the most sophisticated method of drawing a non-probability sample. The goal is to select a sample that represents a microcosm of the target population.


Interviewers are given a quota of individuals to select, specified by attributes such as age, sex, ethnicity, education, and income. They are required to select individuals displaying various combinations of these characteristics in proportion to their share of the population.

Quota sampling II


This method is generally superior to convenience or purposive sampling, but it has several limitations:

  • it requires up-to-date and accurate information about the target population.
  • there is ample opportunity for bias—the only constraint is that interviewers fill their quotas. The selected individuals may display the requisite combination of characteristics, but that does not guarantee their representativeness.
  • the number of characteristics that can be taken into account in determining quotas is limited. Say there are four characteristics—sex plus religion (4 categories), ethnicity (3 categories), and education (4 categories). That means 2 x 4 x 3 x 4 = 96 different types of people i.e. it becomes prohibitively expensive to track down people who meet the quota requirements.


TOPIC 21 Data Gathering Techniques


Basic Ethical Principles

The meaning of informed consent

Why can the principle of informed consent be problematic?

The cost-benefit approach


Basic Ethical Principles:

  • There should be no deception involved in the research
  • There should be no harm (physical, psychological or emotional) done to participants.
  • Participation should be voluntary
  • Participation should be based on informed consent.


The Meaning of Informed Consent:

Informed consent can be defined as ‘the procedure in which individuals choose whether to participate in an investigation after being informed of facts that would be likely to influence their decision.


This definition raises 4 issues:

– Competence – do participants have the mental or emotional capacity to provide consent?

– Voluntarism – are participants in a situation where they can exercise self-determination

– Full information – do participants have the information they need to give informed consent?

– Comprehension – do participants understand the potential risk involved?


Why can the principle of informed consent be problematic?

How much information is needed to consent to be ‘informed’?

  • What if it is extremely important that participants not know the true purpose of the study?
  • The trade-off between ethnical considerations and methodological considerate is often cast in terms of a conflict of rights.

Balancing Respect for Human dignity, Free and Informed Consent, Vulnerable people. Privacy and confidentiality, justice and inclusiveness, harms and benefits, minimizing harm May cause embarrassment, loss of trust in social relations, lower self-esteem. There can be cases of risk of physical harm: Rex Brynen interviews people diplomatic in bag for information.


The Cost-benefit Approach:

  • The cost-benefit approach involved weighing the potential contribution to knowledge and human welfare against the potential negative effects on the dignity and welfare of the participants.

This approach can be problematic:

  • The ethical issues involved can be subtle, ambiguous and debatable.
  • We are not necessarily weighing predictable costs and benefits but possible costs and benefits.
  • The process of balancing cost and benefits is necessarily subjective and value-laden.


Milgram’s obedience to authority: this is the ethical research. It was the research that triggered the ethical questions.

  • Emotional psychological stress shapes people’s actions.
  • Milgram test: if someone gave the wrong answer they were shocked at an incrementally higher rate from shock to shock.


TOPIC 22 Observational-Methods

What is observational research?

Some advantages of observational research

The trade-offs involved in observational research

Types of observational research

Other drawback of observational research.


What is Observational Research?

  • It is the direct observation of political behaviour as it occurs in the natural setting. The researcher can study the behaviour as it occurs.
  • Observational research differs from other methods, observational research melds data collection and theory generating. The researcher doesn’t come in with carefully formulated hypothesis.
  • Data collection and data analysis are not discrete stages. Instead, the researcher attempts to develop a generalized understanding of an unfolding process over an extended time period, through a blend of induction and deduction.


Some advantages of observational research

  • Flexibility – the research can modify the research design in the light of emerging theoretical understandings and/or changes in the situation being studied.
  • Feasibility – no elaborate preparations necessary.
  • Low cost – observational research does not require expensive equipment or staff.
  • Depth of understanding – observational research enable the research to develop a comprehensive and nuanced understanding.
  • External validity – behaviour is studied in its natural setting (minimizes or eliminates artificiality)
  • Contextual understanding – the researcher is able to analyse the context in which behaviour occurs.
  • Immediacy – the research does not have to rely on participants’ recall.

The Trade-Offs involved in observational research

Ethical considerations, reactivity and access.

  • If people know they are being observed, their behaviour may be affected. They may even refuse permission. BUT if they are observed without their permission or under false pretences in order to avoid the reactivity problem and/or solve the access problem, the research becomes ethically problematic.


Types of observational research I

  • Covert participant observation is intended to solve both the reactivity problem and the access problem. The researcher is either a genuine participant in what is being observed or pretend to be a genuine participant.
  • The researcher’s true identity is unknown to the other participants. They perceive the researcher to be just another participant BUT:
  • This type of observational study raises significant ethical issues (lack of informed consent, deception, violation of privacy).
  • It does not necessarily solve the reactivity problem (the research’s own behaviour may affect the behaviour under study).
  • There is a risk of getting caught up in the assumed role.


Types of observational research II

  • Assuming the role of participant-as-observer is intended to resolve the ethical issue, but poses problem of reactivity.
  • The researchers participates fully in the behaviour under study, but make it clear that he or she is also undertaking research.
  • The difficulty with this type of research is being accepted in this role. Access may be denied.


Types of observational research III

  • In the role of observer-as-participant, the researcher identifies him or herself as a researcher and makes no pretence of being a participant.
  • There are still the problems of access and reactivity, but there is less risk of getting caught up in the behaviour that is being observed.
  • Finally, there is the role of complete observer. The researcher observes the behaviour without becoming part of it in any way. Typically, the behaviour is being observe in a setting that is regularly open to the public.
  • This role avoids ethical dilemmas and the problems of access and reactivity. The researcher is less likely to lose his or her scholarly perspective, but is also less likely to develop a full appreciation of the bahviour under study.


Other drawbacks of observational research:

  • Unreliability – there are ample opportunities for random error and we cannot be sure that another research observing the same behaviour would draw the same conclusion.
  • Lack of generalizability – because of the personal nature of the observations and the potential for biased ‘samples’
  • Low transmissibility and replicability.


Difference btwn inferential & descriptive statistics and example of each