[ changing dtypes of structured/record arrays ]
Q1. When recasting a column to a different data type, is np.array
or np.astype
preferred? I've seen examples using np.astype
but both seem to return the desired result (both return copies of the original array).
import numpy as np
## recasting string to integer
x = np.rec.array([('a','1'),('b','2')],names='col1,col2')
##
In []: x
Out[]:
rec.array([('a', '1'), ('b', '2')],
dtype=[('col1', '|S1'), ('col2', '|S1')])
##
dt = x.dtype.descr
dt[1] = (dt[1][0],'int')
## which is more appropriate:
y = np.array(x,dtype=dt)
## or
y = x.astype(dt)
## ?
In []: y
Out[]:
rec.array([('a', 1), ('b', 2)],
dtype=[('col1', '|S1'), ('col2', '<i4')])
Q2. Renaming columns: integer columns become zero when calling np.array
, but retains its values with np.rec.array
. Why? My understanding is that with the former, you get a structured array and the latter returns a record array; for most purposes I thought they were the same. And this behavior is surprising, in any case.
## rename 2nd column from col2 to v2
dt = copy.deepcopy(y.dtype)
names = list(dt.names)
names[1] = 'v2'
dt.names = names
## this is not right
newy = np.array(y,dtype=dt)
In []: newy
Out[]:
array([('a', 0), ('b', 0)],
dtype=[('col1', '|S1'), ('v2', '<i4')])
## this is correct
newy = np.rec.array(y,dtype=dt)
In []: newy
Out[]:
rec.array([('a', 1), ('b', 2)],
dtype=[('col1', '|S1'), ('v2', '<i4')])
Answer 1
Q1: Both the np.array
and np.astype
approaches do the same work in the same way under-the-hood. Using np.astype
involves a little less typing, and it is more clear to the reader that the intention is to change the datatype.