Introduction to Data Visualization
Bar Chart: A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a line graph.
A bar graph is a nice way to display categorical data.
Below is the pictorial representation for the bar chart
[codesyntax lang=”python”]
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns
[/codesyntax]
[codesyntax lang=”python”]
prob={"X_axis finite discrete random variables":['x0=1','x1=5','x2=10','x3=10','x4=5','x5=1'],'col2':[1/32,5/32,10/32,10/32,5/32,1/32]} prob=pd.DataFrame(prob) a=sns.barplot(x='X_axis finite discrete random variables',y='col2',data=prob).set_ylabel('1/32 5/32 10/32')
Histogram: A histogram is an accurate representation of the distribution of numerical data. It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one. To construct a histogram, the first step is to “bin” (or “bucket”) the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size.
A histogram is a best way to display continuous data.
Below is the pictorial representation for the histogram:
[codesyntax lang=”python”]
generated_ages_of_people=(np.linspace(20,30,21).tolist()+np.linspace(30,40,31).tolist() +np.linspace(40,50,40).tolist() +np.linspace(50,60,10).tolist() +np.linspace(60,70,6).tolist() +np.linspace(70,80,4).tolist() +np.linspace(80,90,1).tolist()) data_frame=pd.DataFrame(generated_ages_of_people,columns=['Age']) ax=data_frame.Age.hist() ax.set_xlabel("AGE") ax.set_ylabel("Number of people")
[/codesyntax]
Box whisker plot: A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum.
In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median. The whiskers go from each quartile to the minimum or maximum.
Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution.
Here is the example of the box whisker plot:
[codesyntax lang=”python”]
sns.boxplot(x="Age",data=data_frame)
[/codesyntax]
Line plot: This shows the trend of the data. The scale is very important when comparing two or more line plots.
A line chart allows us to track the development of several variables at the same time. It is best to use a line plot when comparing fewer than 25 numbers.
[codesyntax lang=”python”]
monthly_sales={'month':['july','Aug','Sept','Oct','Nov','Dec','Jan','Feb','March','Apr','May','June'],\ 'price':[10,11,10,12,13,12,12.5,17,15,16,17,15], \ 'sales':[12,10,11,13,12,11,10.5,13,14,16,18,19]} sales_df=pd.DataFrame(monthly_sales) plt=sns.lineplot(x='month',y='price',data=sales_df,sort=False,sizes=[1,20]).set(ylim=(0, 20)) plt=sns.lineplot(x='month',y='sales',data=sales_df,sort=False,sizes=[1,20]).set(ylim=(0, 20))
[/codesyntax]
Scatter plot: A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables – one plotted along the x-axis and the other plotted along the y-axis.
Below is the example for the scatter plot:
[codesyntax lang=”python”]
rng = np.random.RandomState(0) x = rng.randn(100) y = rng.randn(100) colors = rng.rand(100) plt.scatter(x, y, c=colors, alpha=0.9, cmap='viridis')
[/codesyntax]
–By
Vamsi Krishna Yadav Chukka
The first 4 options you have mentioned are all BI tools and not necessarily for visualization. You need to create the cubes and the infrastructure to get the reports out of them.