Pandas iloc and loc – quickly select rows and columns in DataFrames

Table Of Contents

Pandas Data Selection
Selection and Indexing Methods for Pandas DataFrames
1. Pandas iloc data selection
2. Pandas loc data selection
3. Selecting pandas data using ix
Setting values in DataFrames using .loc

Pandas Data Selection

There are multiple ways to select and index rows and columns from Pandas DataFrames. I find tutorials online focusing on advanced selections of row and column choices a little complex for my requirements, but mastering the Pandas iloc, loc, and ix selectors can actually be made quite simple.

Selection Options

There’s three main options to achieve the selection and indexing activities in Pandas, which can be confusing. The three selection cases and methods covered in this post are:

Selecting data by row numbers (.iloc)
Selecting data by label or by a conditional statement (.loc)
Selecting in a hybrid approach (.ix) (now Deprecated in Pandas 0.20.1)

Data Setup

This blog post, inspired by other tutorials, describes selection activities with these operations. The tutorial is suited for the general data science situation where, typically I find myself:

Each row in your data frame represents a data sample.
Each column is a variable, and is usually named. I rarely select columns without their names.
I need to quickly and often select relevant rows from the data frame for modelling and visualisation activities.

For the uninitiated, the Pandas library for Python provides high-performance, easy-to-use data structures and data analysis tools for handling tabular data in “series” and in “data frames”. It’s brilliant at making your data processing easier and I’ve written before about grouping and summarising data with Pandas.

iloc and loc indexing is achieved with pandas using two main arguments for rows and columns — Summary of iloc and loc methods discussed in this blog post. iloc and loc are operations for retrieving data from Pandas dataframes.

Selection and Indexing Methods for Pandas DataFrames

For these explorations we’ll need some sample data – I downloaded the uk-500 sample data set from www.briandunning.com. This data contains artificial names, addresses, companies and phone numbers for fictitious UK characters. To follow along, you can download the .csv file here. Load the data as follows (the diagrams here come from a Jupyter notebook in the Anaconda Python install):

example data for pandas iloc loc and ix indexing examples. — Example data loaded from CSV file.

1. Pandas iloc data selection

The iloc indexer for Pandas Dataframe is used for integer-location based indexing / selection by position.

The iloc indexer syntax is data.iloc[<row selection>, <column selection>], which is sure to be a source of confusion for R users. “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the data frame. You can imagine that each row has a row number from 0 to the total rows (data.shape[0]) and iloc[] allows selections based on these numbers. The same applies for columns (ranging from 0 to data.shape[1] )

There are two “arguments” to iloc – a row selector, and a column selector. For example:

Multiple columns and rows can be selected together using the .iloc indexer.

There’s two gotchas to remember when using iloc in this manner:

Note that .iloc returns a Pandas Series when one row is selected, and a Pandas DataFrame when multiple rows are selected, or if any column in full is selected. To counter this, pass a single-valued list if you require DataFrame output.

When using .loc, or .iloc, you can control the output format by passing lists or single values to the selectors.

When selecting multiple columns or multiple rows in this manner, remember that in your selection e.g.[1:5], the rows/columns selected will run from the first number to one minus the second number. e.g. [1:5] will go 1,2,3,4., [x,y] goes from x to y-1.

In practice, I rarely use the iloc indexer, unless I want the first ( .iloc[0] ) or the last ( .iloc[-1] ) row of the data frame.

2. Pandas loc data selection

The Pandas loc indexer can be used with DataFrames for two different use cases:

a.) Selecting rows by label/index
b.) Selecting rows with a boolean / conditional lookup

The loc indexer is used with the same syntax as iloc: data.loc[<row selection>, <column selection>] .

2a. Label-based / Index-based indexing using .loc

Selections using the loc method are based on the index of the data frame (if any). Where the index is set on a DataFrame, using <code>df.set_index()</code>, the .loc method directly selects based on index values of any rows. For example, setting the index of our test data frame to the persons “last_name”:

Pandas Dataframe with index set using .set_index() for .loc[] explanation. — Last Name set as Index set on sample data frame

Now with the index set, we can directly select rows for different “last_name” values using .loc[<label>] – either singly, or in multiples. For example:

.loc is used by pandas for label based lookups in dataframes — Selecting single or multiple rows using .loc index selections with pandas. Note that the first example returns a series, and the second returns a DataFrame. You can achieve a single-column DataFrame by passing a single-element list to the .loc operation.

Select columns with .loc using the names of the columns. In most of my data work, typically I have named columns, and use these named selections.

selecting columns by name in pandas .loc — When using the .loc indexer, columns are referred to by names using lists of strings, or “:” slices.

You can select ranges of index labels – the selection </code>data.loc[‘Bruch’:’Julio’]</code> will return all rows in the data frame between the index entries for “Bruch” and “Julio”. The following examples should now make sense:

Note that in the last example, data.loc[487] (the row with index value 487) is not equal to data.iloc[487] (the 487th row in the data). The index of the DataFrame can be out of numeric order, and/or a string or multi-value.

2b. Pandas Loc Boolean / Logical indexing

Conditional selections with boolean arrays using data.loc[<selection>] is the most common method that I use with Pandas DataFrames. With boolean indexing or logical selection, you pass an array or Series of True/False values to the .loc indexer to select the rows where your Series has True values.

In most use cases, you will make selections based on the values of different columns in your data set.

For example, the statement data[‘first_name’] == ‘Antonio’] produces a Pandas Series with a True/False value for every row in the ‘data’ DataFrame, where there are “True” values for the rows where the first_name is “Antonio”. These type of boolean arrays can be passed directly to the .loc indexer as so:

The .loc indexer can accept boolean arrays to select rows — Using a boolean True/False series to select rows in a pandas data frame – all rows with first name of “Antonio” are selected.

As before, a second argument can be passed to .loc to select particular columns out of the data frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice “:” operation.

Multiple column selection example using .loc — Selecting multiple columns with loc can be achieved by passing column names to the second argument of .loc[]

Note that when selecting columns, if one column only is selected, the .loc operator returns a Series. For a single column DataFrame, use a one-element list to keep the DataFrame format, for example:

.loc returning Series or DataFrames depending on selection — If selections of a single column are made as a string, a series is returned from .loc. Pass a list to get a DataFrame back.

Make sure you understand the following additional examples of .loc selections for clarity:

Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .

3. Selecting pandas data using ix

Note: The ix indexer has been deprecated in recent versions of Pandas, starting with version 0.20.1.

The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc.

Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.

As an example:

Setting values in DataFrames using .loc

With a slight change of syntax, you can actually update your DataFrame in the same statement as you select and filter using .loc indexer. This particular pattern allows you to update values in columns depending on different conditions. The setting operation does not make a copy of the data frame, but edits the original data.

As an example:

That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas Documentation and take a look at selecting by callable functions for more iloc and loc fun.

107 Comments

Inline Feedbacks

View all comments

Jay Jay (@TheValleyRock)

8 years ago

Thanks for going straight to the point.

S C

8 years ago

Part 1:
data.iloc[:,1] # second row of data frame (last_name)
data.iloc[:,-1] # last row of data frame (id)

row should be column?

Shane

Author

Reply to S C

8 years ago

You’re absolutely correct Shucai. I’ll fix that ASAP and thanks for the spot!

Pandas Notes | wallstbullean

8 years ago

[…] For more details: http://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/ […]

sujithpadaru (@sujithpadar)

8 years ago

It was really helpful!

Shane

Author

Reply to sujithpadaru (@sujithpadar)

8 years ago

I’m glad you found it so Sujith! Thanks for the comment.

hedgehoginventions

8 years ago

Awesome post! I was struggling with pandas for a few days already and then I found your post, it really is the missing manual for pandas selectors that helped me understand it all. Much appreciated!

Liza K

8 years ago

Hi there! Much easier to understand than the pandas documentation.
I’ve got a question though, the answear to which eludes me despite trying to get through the documentation and hours of searching.

Is it possible, using .loc to exclude the line numbers and column headings from the output?

Cheers!

Shane

Author

Reply to Liza K

8 years ago

Hi Lisa. Glad you found it useful. Pandas data frames usually have column headers. You can reset indices using reset_index and maybe get rid of columns by changing to a numpy matrix using as_matrix I think.

arkojyoti2016

8 years ago

Hi Shane,

Thanks for the post. However, I have a small doubt. In the following line of code:

data.iloc[[0,4,7,25], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns

Shouldn’t it be:
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns

Considering that the indexing in Python starts from 0? Please correct my understanding on the same. Also, if you could also provide some more examples on .ix it would be great 🙂

Shane

Author

Reply to arkojyoti2016

8 years ago

Hi! You’re spot on there – that’s a massive misprint by me. I will update pronto! Thanks for spotting it.

As for ix – its mainly in the Pandas API for legacy reasons as I understand things – you are almost always better off being explicit and using iloc or loc for selection.

Jagroop Singh

Reply to Shane

7 years ago

Hi Shane,

I have one problem. I want to subset my dataframe on the basis of indexes. I have dataframe of 52 observations.
I have two tuples :
inc
Out[248]: [1, 24, 39]

tot
Out[249]: (14, 37, 52)

Now i want to subset my dataframe by picking indexes from these two tuples. See following

first subset should be : inc[0] , tot[0] which would be (1,14)
second subset should be : inc[1] , tot[1] which would be (24,37)
third subset should be : inc[2] , tot[2] which would be (39,52)

Can you please help ?

Thanks in advance!!

BR,
Jagroop

ToddpG

8 years ago

Great post, thanks for the examples.

Dave

7 years ago

Thanks for a clear presentation, Shane. Is there a way to index on two columns? E.g. in your example some entries might have the same last name, and therefore not unique. Could the index be both last and first names?

Shane

Author

Reply to Dave

7 years ago

Hi Dave. Good question, and I probably should have mentioned more about this. You can definitely select based on two or more different columns using logical operators. For e.g.:

data.loc[(data[‘first_name’] == ‘Shane’) & (data[‘last_name’] == ‘Lynn’)]

will work well. Just make sure you have the parentheses ()!

Vicente Centelles

7 years ago

Thanks for your post.
I think the panda systax is very confusing, and this structured presentation is very helpful.

When I use multiindex in columns o rows, it is impossible for me to adivine the correct syntax.

Do you plan ton write a new post about selecting columns and rows working with multiindex dataframas?

Thanks again for sharing your work

Shane

Author

Reply to Vicente Centelles

7 years ago

Hi Vicente, I think you’re not alone with struggling to get to grips with the syntax at the start – there’s definitely some practice needed. I actually am planning a multi-index post, because I think this is an issue for people.

Review: Pandas .loc vs. iloc – Learning the Machine

7 years ago

[…] the end of this section, I would like to give a little tip of the hat to a blog post on Shane Lynn’s blog that summed up the above in a nice little […]

agbe

7 years ago

Just found your blog and can’t thank you enough for it!

Ayush Garg

7 years ago

Very Nice tutorial for the first time in my life I come to know the difference between .iloc, .loc and .ix

Franco Arda

7 years ago

Awesome post about iloc vs. loc Shane!

Interesting: for ML/DL, we use iloc all the time while you rarely use it.

Eric

7 years ago

hi Shanelynn, Thanks for this article. after reading the possible ways to use iloc, i was able to index the entire column from csv to make new with exactly the same size. thanks and keep up

Python Pandas DataFrame: load, edit, view data | Shane Lynn

7 years ago

[…] Use iloc, loc, & ix for DataFrame selections […]

TBone

7 years ago

Refreshingly clear. Nice job. Don’t stop. 🙂

Shane

Author

Reply to TBone

7 years ago

Thanks!

Mahesh

7 years ago

This one helped me a lot, its comprehensive, Thanks a lot

HGeo

7 years ago

I use df.at[str1=row label,str2=column name]

Jeswin Augustine

7 years ago

The best post i have read by far on the loc and iloc, Crisp and clear, straight to the point.

James

7 years ago

This is a great guide many thanks for sharing!

Regards,

James

Art Seredyuk

7 years ago

Thanks, great post! Just wondering is there any difference between . loc and this approach?
df[df[‘column_name’] == ‘value’]?

Shane

Author

Reply to Art Seredyuk

7 years ago

From my understanding, I don’t think so – this is effectively a .loc selection with a binary vector to choose the columns, which should have the same effect.

kishoremarodia

7 years ago

Just great stuff

LPC123

7 years ago

Can you explain this?

df6 = pd.read_csv(‘py_all1a.csv’) # file with multiple columns
df7 = pd.read_csv(‘artexclude1.csv’) # file with multiple columns
mask = df6.iloc[:,1].isin(df7.iloc[:,8]) #csv df6 col1is the same as col 8 in df7; I want to mask rows in df6 that have any
# matching col value in df7
df6[~mask].to_csv(‘py_all1b.csv’, index=False) # I tried this with (empty result) and without the tilde (no change to df6)

What am I missing?

Shane

Author

Reply to LPC123

7 years ago

Can you confirm the contents of the ‘mask’ variable – use mask.value_counts() – i.e. is the boolean index being calculated correctly?

ndnd

7 years ago

Hi – is there a way to select non-continuous and continuous rows/columns by index? For instance, say I want to select columns 1, 2, and 4 through 19. Could I do that in one line?

Shane

Author

Reply to ndnd

7 years ago

Hi there. So yeah, that’s possible, you need to hand a list with the index values into the iloc selector. The key is in concatenating the lists to form the selections you want.

For example, try:
data.iloc[, [1,2] + list(range(4, 20))]