Connect with us

Hi, what are you looking for?

Latest

6 ways to sort your Pandas Dataframe: Pandas Tutorial

6 ways to sort your Pandas Dataframe: Pandas Tutorial

6 ways to sort your Pandas Dataframe: Pandas Tutorial

How can I sort the panda data frame by column, multiple columns and rows?

It is often necessary to sort the data frame of pandas in a certain way. Normally you can sort the Panda Data Frame based on the values in one or more columns or based on the Panda Data Frame row index or row names. The Pandas data frame has two useful functions

  1. sort_values(): to sort panda data frames on one or more columns
  2. sort_index(): to sort panda data by line index

Each of these functions has many options, such as sorting the data frame in a specific order (up or down), sorting on the spot, sorting with skipped values, sorting with a specific algorithm, and so on.

Here is a small guide for pandas on how to use the sort_values() and sort_index() functions to sort the panda data frame using a real record (gapminder).

Let’s start by downloading the Gammands data from the joinery URLs.

1. How do I sort pandas by column value?

We can sort panda data frames based on the values of a single column by specifying the name of the column we want to sort as an input argument for the function sort_values(). For example, we can sort the LifeExp column values in Gapmander data, such as

1 sort_by_life = gapminder.sort_values (‘lifeExp’).
1

2

3

4

5

print(sort_by_life.header(n=3)).

Year Country Pop Continental LifeEhr gpPerkap

1292 Rwanda 1992 7290203,0 Africa 23 599 737 068595

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

552 Gambia 1952 284320,0 Africa 30 000 485 230659

Note that by default sort_values sort and produce a new data frame. The new sorted data frame is in ascending order (small values – first, large – last). The main function shows that the first rows have a shorter lifespan. By using the queueing function of the sorted data frame, we can see that the last lines have a longer lifespan.

1

2

3

4

5

print(sort_by_life.tail(n=3)).

Year Country Pop Continental LifeEhr gpPerkap

802 Japan 2002 127065841,0 Asia 82 000 28604,59190

671 Hong Kong China 2007 6980412.0 Asia 82,208 39724,97867

803 Japan 2007 127467972.0 Asia 82,603 31656.06806

2. How do I sort pandas by column value (in step-by-step order)?

To sort the data frame by column values, but in descending order so that the largest column values are at the top, you can use the argument ascending=decreasing.

1 sort_by_life = gapminder.sort_values(‘lifeExp’,ascending=false).

In this example you can see that after sorting the dataframe lifeExp with ascending = deviating, the countries with the highest life expectancy take the lead.

1

2

3

4

5

print(sort_by_life.header(n=3)).

Year Country Pop Continental LifeEhr gpPerkap

803 Japan 2007 127467972.0 Asia 82,603 31656.06806

671 Hong Kong China 2007 6980412.0 Asia 82,208 39724,97867

802 Japan 2002 127065841,0 Asia 82 000 28604,59190

3. How do you sort pandas by column and put missing values first?

Often a data frame can contain missing values, and if we sort a data frame on a column with a missing value, we may need a row with missing values in the first or last row.

We can specify the required position for missing values with the argument after_position. For after_position= first, strings with missing values are displayed first.

1 sort_na_first = gapminder.sort_values(‘lifeExp’,na_position=’first’).

In this example, there are no missing values, so when sorting with option na_position=first, there are no na values in the upper part.

1

2

3

4

5

sort_na_first.head()

Year Country Pop Continental LifeEhr gpPerkap

1292 Rwanda 1992 7290203,0 Africa 23 599 737 068595

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

552 Gambia 1952 284320,0 Africa 30 000 485 230659

4. How do I sort the Panda Database by installed column?

By default, when sorting a panda data frame, a new data frame is created with the functions sort_values() or sort_index(). If you don’t want to create a new data frame after sorting and only want to sort locally, you can use the argument inplace = True. Here is an example of sorting existing panda data frames without creating a new data frame.

1 gapminder.sort_values(‘lifeExp’, inplace=true, ascending=false).

We can see that the data frame sorted on lifeExp values above is the smallest and that the indices of the strings are not correct.

1

2

3

4

5

print(gapminder.head(n=3)).

Year Country Pop Continental LifeEhr gpPerkap

1292 Rwanda 1992 7290203,0 Africa 23 599 737 068595

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

552 Gambia 1952 284320,0 Africa 30 000 485 230659

Note that the string index of the sorted data frame is different from the string index of the data frame for sorting.

5. How do you sort pandas by index (in situ)?

We can use the function sort_index() to sort the panda data frame by line index or by name. In this example, the warp index consists of numbers, and in the previous example, we sorted the data frame by lifeExp, making the warp index confusing. We can sort by string index (with the option inplace=True) and get the original data frame.

1 gapminder.sort_index(instead of true)

We now see that the chain indices start with 0 and are sorted in ascending order. Compare it with the previous example where the first line index is 1292 and the line indexes are not sorted.

1

2

3

4

5

print(gapminder.head(n=3)).

Year Country Pop Continental LifeEhr gpPerkap

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

1 Afghanistan, 1957. 9240934,0 Asia 30 332 820 853030

2 Afghanistan 1962 10267083,0 Asia 31,997 853 100710

6. How do I sort the Panda Database into multiple columns?

It is often necessary to sort the data framework on the basis of different column values. We can specify the columns we want to sort as a list in the argument to the function sort_values(). For example, you can sort on the values of two columns.

1 sort_by_life_gdp = gapminder.sort_values([‘lifeExp’, ‘gdpPercap’]).

We see that the lifeExp column is sorted in ascending order and gdpPercap is sorted for each value in lifeExp.

1

2

3

4

5

print(sort_by_life_gdp.header()).

Year Country Pop Continental LifeEhr gpPerkap

1292 Rwanda 1992 7290203,0 Africa 23 599 737 068595

0 Afghanistan 1952 8425333,0 Asia 28 801 779 445314

552 Gambia 1952 284320,0 Africa 30 000 485 230659

Note that when sorting by multiple columns, the panda function sort_value() first uses the first variable, then the second. You can make the difference by changing the order of the column names in the list.

1 sort_by_life_gdp = gapminder.sort_values([‘gdpPercap’, ‘lifeExp’]).
1

2

3

4

5

print(sort_by_life_gdp.header()).

Year Country Pop Continental LifeEhr gpPerkap

334 Congo Dem. Representative 2002 55379852,0 Africa 44 966 241 165877

335 Congo Dem. Representatives 2007 64606759.0 Africa 46 462 277 551859

876 Lesotho 1952 748747.0 Africa 42 138 298 846212

 

 

 pandas dataframe sort by index,pandas series sort,'dataframe' object has no attribute 'sort',pandas sort by column name,pandas sort by multiple columns,pandas sort columns,pandas dataframe sort tutorial,sort row in pandas dataframe

You May Also Like