Importing a Data Set for Machine Learning in Python

Day 19 in #100DaysOfCode

After we successfully install Jupyter Notebook, we can now get into more details and have a full technical start to Machine Learning with Python. Importing a data set is the way to go. First, we will go to Kaggle.com and register. Once, you sign it, you go to the search bar and type "video game sales" which is a popular data test that we are going to be using as an example. And, we click on the first result that appears.

sfsdfsf.png

Data sources are files with an extension of .csv. Click download and unpack the zip file on your computer. There should be a file named "vgsales.csv". Paste this file in the same place as your jupyter file. Now we come back to Jupyter that was on the server with the link of "localhost:8888/notebooks". If you do not already have the Jupyter file you can open it from the desktop. You can check the previous blog for this Code this:

import pandas as panda
df=panda.read_csv('vgsales.csv')
df

And you should see the vgsales.csv file here.

data set me.png

Shape Method

Let us check some of the useful methods now. For example "shape". A shape is a tuple that indicates the number of dimensions in the array. The shape is an integer tuple. The lengths of the respective array dimension are denoted by these numbers.

import pandas as panda
df=panda.read_csv('vgsales.csv')
df.shape

Here is the shape of the dataset that comes out when we run the program.

(16598, 11)

Describe Method

The describe function returns a descriptive statistics summary for a specified data frame. This comprises all of the characteristics' mean, count, standard deviation, percentiles, and min-max values.

If we do it, we will have a result like this one:

jj.png

Values Method

A view object is returned by the values() function. The dictionary values are stored in the view object as a list. Any changes made to the dictionary will be reflected in the view object, as seen in the example below.

df.values

```

Result:

array([[1, 'Wii Sports', 'Wii', ..., 3.77, 8.46, 82.74],
       [2, 'Super Mario Bros.', 'NES', ..., 6.81, 0.77, 40.24],
       [3, 'Mario Kart Wii', 'Wii', ..., 3.79, 3.31, 35.82],
       ...,
       [16598, 'SCORE International Baja 1000: The Official Game', 'PS2',
        ..., 0.0, 0.0, 0.01],
       [16599, 'Know How 2', 'DS', ..., 0.0, 0.0, 0.01],
       [16600, 'Spirits & Spells', 'GBA', ..., 0.0, 0.0, 0.01]],
      dtype=object)