Part 7: Managing Data

Topic 1: Looking at Data Types

When working with data in python it’s good practice to change dates to datetime and sometimes a column of values from strings. To see the data types:

print(<df>.dtypes)

To change data types:

<df>[<column>] = <df>[<column>].astype(‘<datatype>’)

 

Dtype

Usage

Object

Text or mixed numerical and non numerical values

int64

integers

float64

floats

bool

Boolean true and false values

datetime64

Date and time values

timedelta[ns]

Time between two datetimes

category

Set amount of text values 

Example: States(50 values) 

Topic 2: Concat

Concatenating two dataframes in python is essentially forcing the two sets to combine either vertically or horizontally. If the datasets have the same column headers, then it will group the data from each set into its respective column. Below we combine the records for four students.

If you do not specify how you would like to concatenate the data, python will stack the datasets horizontally by default.

However, if you would like the data to be grouped side-by-side, then specifying “axis = 1” in the concatenation function will simply put the second dataset to the right of the first.

Using the documentation you can see all of the different methods available to use when concatenating like sorting and concatenating on certain axes.

Topic 3: Join

Suppose you’re given a dataset such as the ones below where series list is a full record, unlike the previous datasets that had a different series for each variable which we then combined them on.

The join method combines data sets that share the same indexes (row headers). In the example below, both data sets have the same four indexes so they are able to be combined into one set, resulting in vertically-oriented student records.

 

Got it Down? Click here for Part 8!