Part 5: Importing Data and Writing Files

Packages

Since the creation of python reading in files has become much easier with each update and with each added package. To work with csv and xlsx files the easiest package is the pandas package because it has the functionality to edit and subset the data along with writing the edited data back to a csv or excel file. Download and import the pandas package. Here’s the documentation where you can see all of the things that pandas can currently do.

>>> Import pandas as pd 

This is general statement you will see when you’re working with pandas. It imports the pandas package and make the call characters pd. Instead of having to type pandas.read_csv(“somefile.csv”) you can just write pd.read_csv(“somefile”)

Topic 1: Reading .csv, .txt, .xlsx Files

Reading in .csv Using read_csv() Function

To read in a csv file you can use the command pd.read_csv(PATH). The path is the location of the file on your computer it tends to look like  “C:\Users\yourname\etc…”. When using a path you will have to use two backslashes instead of one. This applies anytime you use a path. The only time you won’t have to designate the path of a file is when the file is in the same place as the python file.

Reading in .txt Using open() Function

To read in a txt file you can use the “open” function. The open function has 3 modes “r, w, a”. The read mode limits you to opening the file and reading what’s inside without being able to change or add to it. Write mode creates a file.

Reading the file once it’s opened is done using the read() and readlines() function in Python.

The read function returns everything in the file but you can specify the amount of characters. Also From the output you see “\n” this indicates a new line. This is customary notation when reading txt files there are more but they’re not important to know at this moment.

Whenever you read in a file you’re going to want to make sure you data.close()  the file because keeping it open takes away from your computer’s memory, system resources, and if left open it could corrupt the file when you terminate the program.

Reading Data using .xlsx

Reading in an excel sheet is the same as reading in a csv but when reading in an xlsx file you have to specify the sheet name or index to determine which sheet you want to import.

Topic 2: Reading in Data From URLs

To read in data from a URL you will have to use the urllib package this will allow you to takes files from the internet without downloading them and leaving your programming environment. 

Using the URL retrieve function you can take the URL and assign the csv to a file and then read the file into your environment.

Topic 3: Writing Data to .csv, .xlsx

Reading out data is pretty simple when it comes to csv’s. Dataframe objects have the method to_csv. Within that you’re able to select the path where the file should be put, the name of the file, and whole lot of other things you can look at in the documentation here.

Just like with csv files dataframes have a to_excel method but you again have to designate a sheet name to create it.

Got it Down? Click here for Part 6!