- Bayesians: Belief question;
- Classical/Error statistics: decision question;
- Likelihood: evidence question;
- AIC: prediction question
e.g. Client’s balance sheet acceptable? Sample.
e.g. Investment recommendations.
dividend yiled above average -> maybe underpriced.
e.g. Manufacture purchase sales data -> promotions.
e.g. x-bar chart control production process.
e.g. Make forecasts.
- Inflation rates
- Producer Price Index (PPI), the unemployment rates, manufacturing capacity utilization.
- Q: What is data? What is not?
Facts and figures. //so true and complete matters…
(Below is not information, but the features of data itself.)
Any data, it’s very important to understand what it is about — variables.
- Nominal (even numeric, still nominal data)
- Ordinal (properties of a categorical data: categorical + can be ranked)
- Interval scale (interval meaningful)
- Ratio scale
- Categorical data: can be grouped as categories (See Chapter 2)
- Arithmetic not meaningful.
- Quantative data (See Chapter 3)
- Arithmetic meaningful.
Cross-Sectional: at same time.
Data companies: Dun & Bradstreet, Bloomberg, and Dow Jones & Company are three firms that provide extensive business database services to clients.
Government agencies: e.g. the U.S. Department of Labor maintains considerable data on employment rates, wage rates, size of the labor force, and union membership.
- identify variables (interests)
- effects between variables
- no attempt to control the variables of interest
Data acquisition: requirements + time cost matters.
Wrong data worse than no data!
Experienced data analysts take great care in collecting and recording data to ensure that errors are not made.
- Error during collection
- Interviewee misinterpret the questions;
- Recording error;
- Meaningless error (e.g. 20+ years old people has 20 years work experience);
bar chart, histogram, etc.
- Population and sample
- Sample survey
- Statistical inference
- Population mean is unknown -> Sample mean is known (Sample mean => Population mean)
- Margin of error? Confidence level?
One of the definition of data mining:
the automated extraction of predictive information from (large) databases.
Relationships in the data and predicting future outcomes.
Be careful over-fitting.
to be fair, thorough, objective, and neutral as you collect data, conduct analyses, make oral presentations, and present written reports containing information developed.
When you see statistics in newspapers, on television, on the Internet, and so on, it is a good idea to view the information with some scepticism, always being aware of the source as well as the
purpose and objectivity of the statistics provided.
e.g. Misleading analysis. Bulbs example. Should stated.
statistical practitioners should avoid any tendency to slant statistical work toward predetermined outcomes.
e.g. unrepresentative samples are used to make claims.