python - Writing and reading .mat using scipy.io changes dictionary contents -
this question has answer here:
i trying write dictionary .mat file using scipy.io.savemat(), when do, contents change!
here array wish assign dictionary key "genes":
vectorizeddf.index.values.astype(np.str_)
which prints
array(['44m2.3', 'a0a087wsv2', 'a0a087wt57', ..., 'tert-rmrp_human', 'tert-terc_human', 'wisp3 varinat'], dtype='<u44')
then
genedict = {"genes": vectorizeddf.index.values.astype(np.str_), "x": vectorizeddf.values, "id": vectorizeddf.columns.values.astype(np.str_)} import scipy.io sio sio.savemat("goa_human.mat", genedict)
but when load dictionary using
goadict = sio.loadmat("goa_human.mat")
my strings padded spaces!
>>> goadict['genes'] array(['44m2.3 ', 'a0a087wsv2 ', 'a0a087wt57 ', ..., 'tert-rmrp_human ', 'tert-terc_human ', 'wisp3 varinat '], dtype='<u44')
which far ideal. on other hand, when access
genedict['id']
i
array(['go:0000002', 'go:0000003', 'go:0000009', ..., 'go:2001303', 'go:2001306', 'go:2001311'], dtype='<u10')
which original format of array before saving. seems me issue in dtype, did best cast both of them strings. not sure why 1 <u44
, other <u10
. how might resolve this?
thank you!
let's try save variety of objects:
in [597]: d={'alist':['one','two','three','four'], .....: 'adict':{'one':np.arange(5)}, .....: 'strs': np.array(['one','two','three','four']), .....: 'objs': np.array(['one','two','three','four'],dtype=object)} in [598]: d out[598]: {'alist': ['one', 'two', 'three', 'four'], 'adict': {'one': array([0, 1, 2, 3, 4])}, 'objs': array(['one', 'two', 'three', 'four'], dtype=object), 'strs': array(['one', 'two', 'three', 'four'], dtype='<u5')} in [599]: io.savemat('test.mat',d) in [600]: dd=io.loadmat('test.mat') in [601]: dd out[601]: {'adict': array([[([[0, 1, 2, 3, 4]],)]], dtype=[('one', 'o')]), 'strs': array(['one ', 'two ', 'three', 'four '], dtype='<u5'), 'alist': array(['one ', 'two ', 'three', 'four '], dtype='<u5'), '__header__': b'matlab 5.0....', '__version__': '1.0', 'objs': array([[array(['one'], dtype='<u3'), array(['two'], dtype='<u3'), array(['three'], dtype='<u5'), array(['four'], dtype='<u4')]], dtype=object), '__globals__': []}
this scipy version, '0.14.1'; not particularly new one, haven't read of recent changes in io
code.
and in octave get:
octave:14> data = load('test.mat') data = scalar structure containing fields: alist = 1 2 3 4 adict = scalar structure containing fields: 1 = 0 1 2 3 4 objs = { [1,1] = 1 [1,2] = 2 [1,3] = 3 [1,4] = 4 } strs = 1 2 3 4
the list
, str
array both produce (4,5)
character arrays in octave, while dtype=object
array produces cell array of strings.
in both d
, dd
, strs
array u5
, takes 80 bytes (4 words*5 char/word *4 bytes/char), in dd
, strings have been padded blanks.
in [617]: d['strs'][0] out[617]: 'one' in [618]: dd['strs'][0] out[618]: 'one ' in [619]: d['strs'][0].tostring() out[619]: b'o\x00\x00\x00n\x00\x00\x00e\x00\x00\x00' in [620]: dd['strs'][0].tostring() out[620]: b'o\x00\x00\x00n\x00\x00\x00e\x00\x00\x00 \x00\x00\x00 \x00\x00\x00'
i haven't paid attention why arrays d['strs']
don't display strings padding. how it's distinguishing between blanks , 'empty' bytes. note py3, default string unicode. don't know if py2 byte strings different (except take 1 byte/char).
so yes, io.savemat
change string array (and lists) adding blanks full dtype
width. purpose create matlab style character matrix.
@zeemonkeez's link covers this, including way of converting character matrix cell:
octave:25> cellstr(data.strs) ans = { [1,1] = 1 [2,1] = 2 [3,1] = 3 [4,1] = 4
Comments
Post a Comment