Become A Patron and get exclusive content! Get access to ML From Scratch notebooks, join a private Slack channel, get priority response, and more! I really appreciate the support!

# How To Load Machine Learning Data From Files In Python

28 Apr 2020

The common data format in Machine Learning is a CSV file (comma separated values). In this Tutorial I show 4 different ways how you can load the data from such files and then prepare the data. I also show you some best practices on how to deal with the correct data type, missing values, and an optional header. The 4 approaches are:

• with the `csv` module
• with `numpy`: `np.loadtxt()` and `numpy.genfromtxt()`
• with `pandas`: `pd.read_csv()`

If you enjoyed this video, please subscribe to the channel!

The code and all Machine Learning tutorials can be found on GitHub.

``````import csv
import numpy as np
import pandas as pd

FILE_NAME = "spambase.data"

# 1) load with csv file
with open(FILE_NAME, 'r') as f:
data = np.array(data, dtype=np.float32)
print(data.shape)

# skiprows=1
print(data.shape, data.dtype)

data = np.genfromtxt(FILE_NAME, delimiter=",", dtype=np.float32)
print(data.shape)

# split into X and y
n_samples, n_features = data.shape
n_features -= 1
X = data[:, 0:n_features]
y = data[:, n_features]
print(X.shape, y.shape)
print(X[0, 0:5])
# or if y is the first column
# X = data[:, 1:n_features+1]
# y = data[:, 0]