Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
282 views
in Technique[技术] by (71.8m points)

python - calculate means in a pandas dataframe over certain discrete dimensional ranges

I am really not sure about my terminology here so please feel free to correct my title.

supose I have a pandas dataframe D with columns X and Y both discrete values [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] a column R with only the two possible values 0.1 and 0.2 and V with various (in this example random) values.

X and Y are like coordinates so there exists every possible value exactly one time in X for every value in Y and vice versa.

now I want to "halve" the resolution of X and Y by calculating the means of all values in V and reducing the steps in X and Y to [1, 3, 5 ,7, 9]. effectively if I in the new dataframe Dnew i wanted the slice (example output below):

Dnew.loc[(Dnew.X == 1) & (Dnew.Y == 1)]
>> X  Y   R   V
0  1  1  0.1  35
1  1  1  0.2  31

to return a dataframe containing only one value for V which is the means of all four values in V you'd get when doing the following slice in the previous dataframe D (example output below):

D.loc[(D.X >= 1) & (D.X <= 2) & (D.Y >= 1) & (D.Y <= 2)]
>> X  Y   R   V
0  1  1  0.1  10
1  1  2  0.1  50
2  2  1  0.1  35
3  2  2  0.1  45
4  1  1  0.2  33
5  1  2  0.2  19
6  2  1  0.2  60
7  2  2  0.2  12 

What would be a pythonic way that makes use of the special characteristics of pandas dataframes to calculate what I am looking for.

Also if this works I would like to explicate the calculations by making a distinction between all values with D.R == 0.1 and D.R == 0.2 so that would mean these "means groups" I just described would exist two times with different values in V for the two possible values in R.

I really hope I was able to get my point across. The topic is fairly abstract and my knowledge of pandas dataframes still has to grow. I am also not an English native speaker so please point me to any mistake I made or to things I could explain better.

In my example output here row 0 of the first example has the means of V values from rows 0 - 3 in the second example. Row 1 of the first example has the means of V values from rows 4 - 7 of second example.

[edit: I added example output to the mentioned slices as suggested]


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...