Menu

Executive Programs

Workshops

Projects

Blogs

Careers

Placements

Student Reviews


For Business


More

Academic Training

Informative Articles

Find Jobs

We are Hiring!


All Courses

Choose a category

Loading...

All Courses

All Courses

logo

Loading...
Executive Programs
Workshops
For Business

Success Stories

Placements

Student Reviews

More

Projects

Blogs

Academic Training

Find Jobs

Informative Articles

We're Hiring!

phone+91 9342691281Log in
  1. Home/
  2. Sushant Ovhal/
  3. Project 1

Project 1

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns   auto= pd.read_csv("auto_clean.csv") print(auto) symboling normalized-losses make aspiration num-of-doors \ 0 3 122 alfa-romero std two 1 3 122 alfa-romero std two 2 1 122 alfa-romero std two 3 2 164 audi std four 4 2 164…

    • Sushant Ovhal

      updated on 17 Oct 2022

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
     
    auto= pd.read_csv("auto_clean.csv")
    print(auto)
         symboling  normalized-losses         make aspiration num-of-doors  \
    0            3                122  alfa-romero        std          two   
    1            3                122  alfa-romero        std          two   
    2            1                122  alfa-romero        std          two   
    3            2                164         audi        std         four   
    4            2                164         audi        std         four   
    ..         ...                ...          ...        ...          ...   
    196         -1                 95        volvo        std         four   
    197         -1                 95        volvo      turbo         four   
    198         -1                 95        volvo        std         four   
    199         -1                 95        volvo      turbo         four   
    200         -1                 95        volvo      turbo         four   
    
          body-style drive-wheels engine-location  wheel-base    length  ...  \
    0    convertible          rwd           front        88.6  0.811148  ...   
    1    convertible          rwd           front        88.6  0.811148  ...   
    2      hatchback          rwd           front        94.5  0.822681  ...   
    3          sedan          fwd           front        99.8  0.848630  ...   
    4          sedan          4wd           front        99.4  0.848630  ...   
    ..           ...          ...             ...         ...       ...  ...   
    196        sedan          rwd           front       109.1  0.907256  ...   
    197        sedan          rwd           front       109.1  0.907256  ...   
    198        sedan          rwd           front       109.1  0.907256  ...   
    199        sedan          rwd           front       109.1  0.907256  ...   
    200        sedan          rwd           front       109.1  0.907256  ...   
    
         compression-ratio  horsepower  peak-rpm city-mpg highway-mpg    price  \
    0                  9.0       111.0    5000.0       21          27  13495.0   
    1                  9.0       111.0    5000.0       21          27  16500.0   
    2                  9.0       154.0    5000.0       19          26  16500.0   
    3                 10.0       102.0    5500.0       24          30  13950.0   
    4                  8.0       115.0    5500.0       18          22  17450.0   
    ..                 ...         ...       ...      ...         ...      ...   
    196                9.5       114.0    5400.0       23          28  16845.0   
    197                8.7       160.0    5300.0       19          25  19045.0   
    198                8.8       134.0    5500.0       18          23  21485.0   
    199               23.0       106.0    4800.0       26          27  22470.0   
    200                9.5       114.0    5400.0       19          25  22625.0   
    
        city-L/100km  horsepower-binned  diesel  gas  
    0      11.190476             Medium       0    1  
    1      11.190476             Medium       0    1  
    2      12.368421             Medium       0    1  
    3       9.791667             Medium       0    1  
    4      13.055556             Medium       0    1  
    ..           ...                ...     ...  ...  
    196    10.217391             Medium       0    1  
    197    12.368421               High       0    1  
    198    13.055556             Medium       0    1  
    199     9.038462             Medium       1    0  
    200    12.368421             Medium       0    1  
    
    [201 rows x 29 columns]
    auto.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 201 entries, 0 to 200
    Data columns (total 29 columns):
     #   Column             Non-Null Count  Dtype  
    ---  ------             --------------  -----  
     0   symboling          201 non-null    int64  
     1   normalized-losses  201 non-null    int64  
     2   make               201 non-null    object 
     3   aspiration         201 non-null    object 
     4   num-of-doors       201 non-null    object 
     5   body-style         201 non-null    object 
     6   drive-wheels       201 non-null    object 
     7   engine-location    201 non-null    object 
     8   wheel-base         201 non-null    float64
     9   length             201 non-null    float64
     10  width              201 non-null    float64
     11  height             201 non-null    float64
     12  curb-weight        201 non-null    int64  
     13  engine-type        201 non-null    object 
     14  num-of-cylinders   201 non-null    object 
     15  engine-size        201 non-null    int64  
     16  fuel-system        201 non-null    object 
     17  bore               201 non-null    float64
     18  stroke             197 non-null    float64
     19  compression-ratio  201 non-null    float64
     20  horsepower         201 non-null    float64
     21  peak-rpm           201 non-null    float64
     22  city-mpg           201 non-null    int64  
     23  highway-mpg        201 non-null    int64  
     24  price              201 non-null    float64
     25  city-L/100km       201 non-null    float64
     26  horsepower-binned  200 non-null    object 
     27  diesel             201 non-null    int64  
     28  gas                201 non-null    int64  
    dtypes: float64(11), int64(8), object(10)
    memory usage: 45.7+ KB
    auto[auto == '?']
      symboling normalized-losses make aspiration num-of-doors body-style drive-wheels engine-location wheel-base length ... compression-ratio horsepower peak-rpm city-mpg highway-mpg price city-L/100km horsepower-binned diesel gas
    0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    196 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    197 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    198 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    199 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    200 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

    201 rows × 29 columns

    autoclean = auto.replace('?',np.NaN)
    autoclean
      symboling normalized-losses make aspiration num-of-doors body-style drive-wheels engine-location wheel-base length ... compression-ratio horsepower peak-rpm city-mpg highway-mpg price city-L/100km horsepower-binned diesel gas
    0 3 122 alfa-romero std two convertible rwd front 88.6 0.811148 ... 9.0 111.0 5000.0 21 27 13495.0 11.190476 Medium 0 1
    1 3 122 alfa-romero std two convertible rwd front 88.6 0.811148 ... 9.0 111.0 5000.0 21 27 16500.0 11.190476 Medium 0 1
    2 1 122 alfa-romero std two hatchback rwd front 94.5 0.822681 ... 9.0 154.0 5000.0 19 26 16500.0 12.368421 Medium 0 1
    3 2 164 audi std four sedan fwd front 99.8 0.848630 ... 10.0 102.0 5500.0 24 30 13950.0 9.791667 Medium 0 1
    4 2 164 audi std four sedan 4wd front 99.4 0.848630 ... 8.0 115.0 5500.0 18 22 17450.0 13.055556 Medium 0 1
    ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    196 -1 95 volvo std four sedan rwd front 109.1 0.907256 ... 9.5 114.0 5400.0 23 28 16845.0 10.217391 Medium 0 1
    197 -1 95 volvo turbo four sedan rwd front 109.1 0.907256 ... 8.7 160.0 5300.0 19 25 19045.0 12.368421 High 0 1
    198 -1 95 volvo std four sedan rwd front 109.1 0.907256 ... 8.8 134.0 5500.0 18 23 21485.0 13.055556 Medium 0 1
    199 -1 95 volvo turbo four sedan rwd front 109.1 0.907256 ... 23.0 106.0 4800.0 26 27 22470.0 9.038462 Medium 1 0
    200 -1 95 volvo turbo four sedan rwd front 109.1 0.907256 ... 9.5 114.0 5400.0 19 25 22625.0 12.368421 Medium 0 1

    201 rows × 29 columns

    auto.isna().sum()
    null=auto.isnull().any(axis = 1)
    nullvalue=null.index[null.values]
    nullvalue
    Int64Index([46, 52, 53, 54, 55], dtype='int64')
    missingrows = auto.iloc[nullvalue,:]
    missingrows
     
      symboling normalized-losses make aspiration num-of-doors body-style drive-wheels engine-location wheel-base length ... compression-ratio horsepower peak-rpm city-mpg highway-mpg price city-L/100km horsepower-binned diesel gas
    46 0 122 jaguar std two sedan rwd front 102.0 0.921192 ... 11.5 262.0 5000.0 13 17 36000.0 18.076923 NaN 0 1
    52 3 150 mazda std two hatchback rwd front 95.3 0.812110 ... 9.4 101.0 6000.0 17 23 10945.0 13.823529 Low 0 1
    53 3 150 mazda std two hatchback rwd front 95.3 0.812110 ... 9.4 101.0 6000.0 17 23 11845.0 13.823529 Low 0 1
    54 3 150 mazda std two hatchback rwd front 95.3 0.812110 ... 9.4 101.0 6000.0 17 23 13645.0 13.823529 Low 0 1
    55 3 150 mazda std two hatchback rwd front 95.3 0.812110 ... 9.4 135.0 6000.0 16 23 15645.0 14.687500 Medium 0 1

    5 rows × 29 columns

    auto.isnull().all(axis=0).sum()
    0
    auto.isnull().all(axis=1).sum()
    0 
    round(auto.isnull().sum().sort_values(ascending =False)/len(auto) *100,2)
    stroke               1.99
    horsepower-binned    0.50
    symboling            0.00
    engine-size          0.00
    diesel               0.00
    city-L/100km         0.00
    price                0.00
    highway-mpg          0.00
    city-mpg             0.00
    peak-rpm             0.00
    horsepower           0.00
    compression-ratio    0.00
    bore                 0.00
    fuel-system          0.00
    num-of-cylinders     0.00
    normalized-losses    0.00
    engine-type          0.00
    curb-weight          0.00
    height               0.00
    width                0.00
    length               0.00
    wheel-base           0.00
    engine-location      0.00
    drive-wheels         0.00
    body-style           0.00
    num-of-doors         0.00
    aspiration           0.00
    make                 0.00
    gas                  0.00
    dtype: float64
    auto[['normalized-losses','price','peak-rpm','horsepower']] = auto[['normalized-losses','price','peak-rpm','horsepower']].apply(pd.to_numeric)
    auto[['normalized-losses','price','peak-rpm','horsepower']].describe()
     
      normalized-losses price peak-rpm horsepower
    count 201.00000 201.000000 201.000000 201.000000
    mean 122.00000 13207.129353 5117.665368 103.405534
    std 31.99625 7947.066342 478.113805 37.365700
    min 65.00000 5118.000000 4150.000000 48.000000
    25% 101.00000 7775.000000 4800.000000 70.000000
    50% 122.00000 10295.000000 5125.369458 95.000000
    75% 137.00000 16500.000000 5500.000000 116.000000
    max 256.00000 45400.000000 6600.000000 262.000000
    auto.loc[:,'normalized-losses'].fillna(auto['normalized-losses'].mean(),inplace = True)
    round(auto.isnull().sum().sort_values(ascending = False)/len(auto) * 100,2)
    stroke               1.99
    horsepower-binned    0.50
    symboling            0.00
    engine-size          0.00
    diesel               0.00
    city-L/100km         0.00
    price                0.00
    highway-mpg          0.00
    city-mpg             0.00
    peak-rpm             0.00
    horsepower           0.00
    compression-ratio    0.00
    bore                 0.00
    fuel-system          0.00
    num-of-cylinders     0.00
    normalized-losses    0.00
    engine-type          0.00
    curb-weight          0.00
    height               0.00
    width                0.00
    length               0.00
    wheel-base           0.00
    engine-location      0.00
    drive-wheels         0.00
    body-style           0.00
    num-of-doors         0.00
    aspiration           0.00
    make                 0.00
    gas                  0.00
    dtype: float64
    df.loc[:,'Price'].fillna(df['Price'].mean(),inplace = True)
    df.loc[:,'Stroke'].fillna(df['Stroke'].mean(),inplace = True)
    df.loc[:,'Bore'].fillna(df['Bore'].mean(),inplace = True)
    df.loc[:,'Peak RPM'].fillna(df['Peak RPM'].mean(),inplace = True)
    df.loc[:,'Horsepower'].fillna(df['Horsepower'].mean(),inplace = True)
    round(df.isnull().sum().sort_values(ascending = False)/len(df) * 100,2)
     
    auto.loc[:,'price'].fillna(auto['price'].mean(),inplace =True)
    auto.loc[:,'stroke'].fillna(auto['stroke'].mean(),inplace= True)
    auto.loc[:,'bore'].fillna(auto['bore'].mean(),inplace = True)
    auto.loc[:,'peak-rpm'].fillna(auto['peak-rpm'].mean(),inplace =True)
    auto.loc[:,'horsepower'].fillna(auto['horsepower'].mean(),inplace = True)
    round(auto.isnull().sum().sort_values(ascending = False)/len(auto) * 100,2)
     
    horsepower-binned    0.5
    symboling            0.0
    engine-size          0.0
    diesel               0.0
    city-L/100km         0.0
    price                0.0
    highway-mpg          0.0
    city-mpg             0.0
    peak-rpm             0.0
    horsepower           0.0
    compression-ratio    0.0
    stroke               0.0
    bore                 0.0
    fuel-system          0.0
    num-of-cylinders     0.0
    normalized-losses    0.0
    engine-type          0.0
    curb-weight          0.0
    height               0.0
    width                0.0
    length               0.0
    wheel-base           0.0
    engine-location      0.0
    drive-wheels         0.0
    body-style           0.0
    num-of-doors         0.0
    aspiration           0.0
    make                 0.0
    gas                  0.0
    dtype: float64
    auto[['horsepower-binned']]
    auto['horsepower-binned'].unique()
    array(['Medium', 'Low', 'High', nan], dtype=object)
    auto['horsepower'].astype('category').value_counts()
    68.0                19
    69.0                10
    116.0                9
    70.0                 9
    110.0                8
    95.0                 7
    114.0                6
    62.0                 6
    101.0                6
    88.0                 6
    76.0                 5
    82.0                 5
    145.0                5
    84.0                 5
    97.0                 5
    160.0                5
    102.0                5
    92.0                 4
    111.0                4
    86.0                 4
    123.0                4
    90.0                 3
    85.0                 3
    121.0                3
    73.0                 3
    182.0                3
    207.0                3
    152.0                3
    161.0                2
    156.0                2
    155.0                2
    162.0                2
    94.0                 2
    112.0                2
    52.0                 2
    104.256157635468     2
    100.0                2
    176.0                2
    184.0                2
    56.0                 2
    175.0                1
    200.0                1
    154.0                1
    48.0                 1
    106.0                1
    143.0                1
    142.0                1
    140.0                1
    135.0                1
    134.0                1
    120.0                1
    115.0                1
    78.0                 1
    72.0                 1
    64.0                 1
    60.0                 1
    58.0                 1
    55.0                 1
    262.0                1
    Name: horsepower, dtype: int64
    df['No. of Doors'].astype('category').value_counts()
    df.loc[:,'No. of Doors'].fillna('four',inplace = True)
    df['No. of Doors'].astype('category').value_counts()
    df.to_csv('clean_auto.csv')
    auto.to_csv('Cleandata_auto.csv')
     
    auto.hist(figsize=(30,30))
    plt.show()
      
    plt.figure(figsize=(15,10))
    sns.heatmap(auto.select_dtypes(include='number').corr(),annot =True,cmap='coolwarm')
    plt.title("Numerical features")
    plt.show()
     
     
     
    plt.figure(figsize=(15,10))
    sns.countplot(auto['normalized-losses'])
    plt.title("values")
    plt.show
     
     
    <function matplotlib.pyplot.show(close=None, block=None)>
     
     
     
    auto['normalized-losses'].describe()
     
    count    201.00000
    mean     122.00000
    std       31.99625
    min       65.00000
    25%      101.00000
    50%      122.00000
    75%      137.00000
    max      256.00000
    Name: normalized-losses, dtype: float64
     
    sns.displot(auto['normalized-losses'],kde=True)
    plt.title("Distribution of losses")
    plt.show()
     
     
    plt.figure(figsize=(10,7))
    sns.heatmap(auto.select_dtypes(include='number').corr(),annot = True,cmap='coolwarm')
    plt.title("correlation od all number")
    plt.show()
     
     
     
     
     
     
    auto.drop(['symboling','normalized-losses','compression-ratio','peak-rpm'],axis=1,inplace=True)
      
    plt.figure(figsize=(10,7))
    sns.heatmap(auto.select_dtypes(include='number').corr(),annot = True,cmap='coolwarm')
    plt.title("correlation od all number")
    plt.show()
     
     
     
     
     
    auto.select_dtypes(exclude='number').head()
      make aspiration num-of-doors body-style drive-wheels engine-location engine-type num-of-cylinders fuel-system horsepower-binned
    0 alfa-romero std two convertible rwd front dohc four mpfi Medium
    1 alfa-romero std two convertible rwd front dohc four mpfi Medium
    2 alfa-romero std two hatchback rwd front ohcv six mpfi Medium
    3 audi std four sedan fwd front ohc four mpfi Medium
    4 audi std four sedan 4wd front ohc five mpfi Medium
     

    Leave a comment

    Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.

    Please  login to add a comment

    Other comments...

    No comments yet!
    Be the first to add a comment

    Read more Projects by Sushant Ovhal (22)

    Project 1 - Implement and deploy CNN model in real-time using python on Fashion MNIST dataset

    Objective:

     Implement and deploy CNN model in real-time using python on Fashion MNIST dataset

    calendar

    20 Dec 2022 07:04 AM IST

      Read more

      Project 2

      Objective:

      Project 2

      calendar

      30 Nov 2022 11:41 AM IST

        Read more

        Project 1

        Objective:

        Project 1  

        calendar

        30 Nov 2022 05:44 AM IST

          Read more

          Project 2 - Supply and Demand Gap Analysis

          Objective:

          Aim The aim of this project is to collect and analyze the data in detail of the Pickup point of the Airport and City Request id in Uber Request Data.   Introduction Uber provides the 2016 data. By using this data find out the demand and supply Analysis gap of the cab. The main objective of this project is to analyze…

          calendar

          28 Oct 2022 01:16 PM IST

            Read more

            Schedule a counselling session

            Please enter your name
            Please enter a valid email
            Please enter a valid number

            Related Courses

            coursecardcoursetype

            Accelerated Career Program in Embedded Systems (On-Campus) - Powered by NASSCOM

            Recently launched

            0 Hours of Content

            coursecard

            5G Protocol and Testing

            Recently launched

            4 Hours of Content

            coursecard

            Automotive Cybersecurity

            Recently launched

            9 Hours of Content

            coursecardcoursetype

            Pre-Graduate Program in Bioengineering and Medical Devices

            Recently launched

            90 Hours of Content

            coursecardcoursetype

            Pre-Graduate Program in 5G Design and Development

            Recently launched

            49 Hours of Content

            Schedule a counselling session

            Please enter your name
            Please enter a valid email
            Please enter a valid number

                        Do You Want To Showcase Your Technical Skills?
                        Sign-Up for our projects.