Problems with memset() implementation on GCC 10.2.0

Programming, for all ages and all languages.
Post Reply
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Problems with memset() implementation on GCC 10.2.0

Post by kzinti »

I just upgraded my cross compiler to GCC 10.2.0 and my OS crashes early on memset().

I am sure I am doing something wrong and GCC 10.2.0 compiles it into something unexpected:

Code: Select all

void* memset(void* ptr, int value, size_t num)
{
    for (unsigned char* p = ptr; num; --num)
    {
        *p++ = (unsigned char)value;
    }

    return ptr;
}

Code: Select all

ffffffff80006360 <memset>:
ffffffff80006360:	48 85 d2             	test   %rdx,%rdx
ffffffff80006363:	74 13                	je     ffffffff80006378 <memset+0x18>
ffffffff80006365:	55                   	push   %rbp
ffffffff80006366:	40 0f b6 f6          	movzbl %sil,%esi
ffffffff8000636a:	48 89 e5             	mov    %rsp,%rbp
ffffffff8000636d:	e8 ee ff ff ff       	callq  ffffffff80006360 <memset>
ffffffff80006372:	5d                   	pop    %rbp
ffffffff80006373:	c3                   	retq   
ffffffff80006374:	0f 1f 40 00          	nopl   0x0(%rax)
ffffffff80006378:	48 89 f8             	mov    %rdi,%rax
ffffffff8000637b:	c3                   	retq   
ffffffff8000637c:	0f 1f 40 00          	nopl   0x0(%rax)
What happens is I call memset with a non-zero length (in %rdx)... so the code above ends up calling memset() recursively at address ffffffff8000636d until I run out of stack space.

Please help if you can. I refuse to believe the problem is with GCC, I must be missing something.
nexos
Member
Member
Posts: 1081
Joined: Tue Feb 18, 2020 3:29 pm
Libera.chat IRC: nexos

Re: Problems with memset() implementation on GCC 10.2.0

Post by nexos »

It might be better just to use __builtin_memset IMO.
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Problems with memset() implementation on GCC 10.2.0

Post by kzinti »

Agreed. I would still like to understand why it is broken though.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Problems with memset() implementation on GCC 10.2.0

Post by kzinti »

Well what do you know, I am not the first to run into this:

https://github.com/micropython/micropython/issues/6053

It looks like GCC detects that the loop is memset and optimizes the loop by calling... memset. Good times.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Problems with memset() implementation on GCC 10.2.0

Post by kzinti »

Adding "-fno-builtin" when compiling the kernel fixes the issue, but clearly not what I want.
Octocontrabass
Member
Member
Posts: 5588
Joined: Mon Mar 25, 2013 7:01 pm

Re: Problems with memset() implementation on GCC 10.2.0

Post by Octocontrabass »

GCC assumes it can emit calls to memcpy(), memmove(), memset(), and memcmp() at any point - including inside your attempt at implementing one of those four functions. As the optimizer gets smarter, it will get better at creating endless recursion loops.

Various GCC bug reports suggest the following function attribute:

Code: Select all

__attribute__((optimize("no-tree-loop-distribute-patterns")))
You can also disable this optimization at a global level, although that seems like a poor choice.

You can also implement those four functions in assembly, to be sure GCC can never create an endless recursion loop.

You can also use Clang, which seems to automatically avoid infinite recursion and/or emitting C library calls in freestanding mode.
nexos wrote:It might be better just to use __builtin_memset IMO.
No, __builtin_memset() is only an optimization hint. The optimizer may still translate __builtin_memset() into a memset() call, and then you'll have a link error due to the undefined function.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Problems with memset() implementation on GCC 10.2.0

Post by kzinti »

Thanks, I went with the following at the top of my file:

Code: Select all

#pragma GCC optimize "no-tree-loop-distribute-patterns"
moonchild
Member
Member
Posts: 73
Joined: Wed Apr 01, 2020 4:59 pm
Libera.chat IRC: moon-child

Re: Problems with memset() implementation on GCC 10.2.0

Post by moonchild »

Can also implement strings functions in assembly; this also gives you a pretty easy perf boost, at least on x86. Here are a couple:

Code: Select all

memcpy:
mov rcx, rdx
mov rax, rdi
rep movs byte ptr [rdi], byte ptr [rsi]
ret

memmove:
cmp rdi, rsi
ja memcpy
mov rax, rdi
mov rcx, rdx
lea rdi, [rdi + rdx - 1]
lea rsi, [rsi + rdx - 1]
std
rep movs byte ptr [rdi], byte ptr [rsi]
cld
ret

memset:
mov rcx, rdx
mov rdx, rdi
mov al, sil
rep stos byte ptr [rdi]
mov rax, rdx
ret
Post Reply