型を指定しないと、int64などの大きい型になりメモリを食うため。 Titanicのデータセットで実験した結果は以下。
型指定なし
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB
型指定あり
<class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int16 Survived 891 non-null int8 Pclass 891 non-null int8 Name 891 non-null object Sex 891 non-null object Age 714 non-null float16 SibSp 891 non-null int8 Parch 891 non-null int8 Ticket 891 non-null object Fare 891 non-null float16 Cabin 204 non-null object Embarked 889 non-null object dtypes: float16(2), int16(1), int8(4), object(5) memory usage: 43.6+ KB
メモリ使用量が半分ぐらいになってる。これは毎回指定しないとだめだな。。
参考: