Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

$ hexo new "My New Post"

More info: Writing

Run server

$ hexo server

More info: Server

Generate static files

$ hexo generate

More info: Generating

Deploy to remote sites

$ hexo deploy

More info: Deployment

STATS: Descriptive Statistics

2.1 Summarizing Categorical Data

nonoverlapping classes”.

Frequency Distribution

Relative Frequency and Percent Frequency Distributions

Bar Charts & Pie Charts

2.2 Summarizing Quantitative Data

Frequency Distribution

The three steps necessary to define the classes for a frequency distribution with quantitative data are:

  1. Determine the number of nonoverlapping classes.
    • 5~20 classes. (depends on the size of the data)
  2. Determine the width of each class.
    • we recommend that the width be the same for each class.
    • Approximate class width 􏰔 = Largest data value 􏰀- Smallest data value / Number of classes (can be rounded for convenience)
  3. Determine the class limits.
    (optional: class midpoint).

Relative Frequency and Percent Frequency Distributions

Dot Plot


  • Left/Right/Symmetric skew

Cumulative Distributions

(<= upper limit of each class)


2.3 Exploratory Data Analysis: The Stem-and-Leaf Display

2.4 Crosstabulations and Scatter Diagrams


Statistics for Business

Chapter 1 Data and Statistics

Philosophy of Statistics

  • Bayesians: Belief question;
  • Classical/Error statistics: decision question;
  • Likelihood: evidence question;
  • AIC: prediction question

Applications in Business and Economics


e.g. Client’s balance sheet acceptable? Sample.


e.g. Investment recommendations.
dividend yiled above average -> maybe underpriced.


e.g. Manufacture purchase sales data -> promotions.


e.g. x-bar chart control production process.


e.g. Make forecasts.

  • Inflation rates
  • Producer Price Index (PPI), the unemployment rates, manufacturing capacity utilization.


  • Q: What is data? What is not?

Facts and figures. //so true and complete matters…

(Below is not information, but the features of data itself.)

Elements, Variables, and Observations

Any data, it’s very important to understand what it is about — variables.

Scales of Measurement

  1. Nominal (even numeric, still nominal data)
  2. Ordinal (properties of a categorical data: categorical + can be ranked)
  3. Interval scale (interval meaningful)
  4. Ratio scale

Categorical and Quantitative Data

  • Categorical data: can be grouped as categories (See Chapter 2)
    • Arithmetic not meaningful.
  • Quantative data (See Chapter 3)
    • Arithmetic meaningful.

Cross-Sectional and Time Series Data

Cross-Sectional: at same time.

Data Sources

Existing Sources

Data companies: Dun & Bradstreet, Bloomberg, and Dow Jones & Company are three firms that provide extensive business database services to clients.


Government agencies: e.g. the U.S. Department of Labor maintains considerable data on employment rates, wage rates, size of the labor force, and union membership.

Statistical Studies

  • Experimental
    • identify variables (interests)
    • effects between variables
    • control
  • Observational
    • no attempt to control the variables of interest

Data acquisition: requirements + time cost matters.

Data Acqusition Errors

Wrong data worse than no data!
Experienced data analysts take great care in collecting and recording data to ensure that errors are not made.

  • Error during collection
    • Interviewee misinterpret the questions;
    • Recording error;
    • Outliers;
    • Meaningless error (e.g. 20+ years old people has 20 years work experience);

Descriptive Statistics

bar chart, histogram, etc.

Statistical Inferences

  • Population and sample
  • Census
  • Sample survey
  • Statistical inference
    • Population mean is unknown -> Sample mean is known (Sample mean => Population mean)
    • Margin of error? Confidence level?

Computers and Statistical Analysis

Data Mining

data warehousing
One of the definition of data mining:

the automated extraction of predictive information from (large) databases.

Relationships in the data and predicting future outcomes.
Be careful over-fitting.


to be fair, thorough, objective, and neutral as you collect data, conduct analyses, make oral presentations, and present written reports containing information developed.

When you see statistics in newspapers, on television, on the Internet, and so on, it is a good idea to view the information with some scepticism, always being aware of the source as well as the
purpose and objectivity of the statistics provided.

e.g. Misleading analysis. Bulbs example. Should stated.

statistical practitioners should avoid any tendency to slant statistical work toward predetermined outcomes.
e.g. unrepresentative samples are used to make claims.

Most commonly used Linux commands


Local files -> Local Git Repo

cd project_folder
git init
git add . (git add aFile)
git commit -m "Initial commit"
Read more »