Part 7: Managing Data
Topic 1: Looking at Data Types
When working with data in python it’s good practice to change dates to datetime and sometimes a column of values from strings. To see the data types:
print(<df>.dtypes)
To change data types:
<df>[<column>] = <df>[<column>].astype(‘<datatype>’)
Dtype |
Usage |
Object |
Text or mixed numerical and non numerical values |
int64 |
integers |
float64 |
floats |
bool |
Boolean true and false values |
datetime64 |
Date and time values |
timedelta[ns] |
Time between two datetimes |
category |
Set amount of text values Example: States(50 values) |
Topic 2: Concat
Concatenating two dataframes in python is essentially forcing the two sets to combine either vertically or horizontally. If the datasets have the same column headers, then it will group the data from each set into its respective column. Below we combine the records for four students.
If you do not specify how you would like to concatenate the data, python will stack the datasets horizontally by default.
However, if you would like the data to be grouped side-by-side, then specifying “axis = 1” in the concatenation function will simply put the second dataset to the right of the first.
Using the documentation you can see all of the different methods available to use when concatenating like sorting and concatenating on certain axes.
Topic 3: Join
Suppose you’re given a dataset such as the ones below where series list is a full record, unlike the previous datasets that had a different series for each variable which we then combined them on.
The join method combines data sets that share the same indexes (row headers). In the example below, both data sets have the same four indexes so they are able to be combined into one set, resulting in vertically-oriented student records.