3,569 questions
Advice
1
vote
2
replies
70
views
Can .asm work in other cpus if they are compiled?
I am making a game that I am gonna include assembly for the best graphics, since I have a amd cpu, I use nasm so is that mean if I compile nasm code to .exe would it still work on other cpus?
2
votes
2
answers
137
views
Disassembly of VCVTSD2USI in 16bit mode - can it write a 16-bit register?
Intel instruction VCVTSD2USI (and some similar AVG512 instruction with GPRs) is encoded as
EVEX.LLIG.F2.0F.W0 79 /r VCVTSD2USI r32, xmm1/m64{er},
for instance VCVTSD2USI EAX,XMM0 is assembled to 62 F1 ...
0
votes
1
answer
64
views
How to use NeuralForecast and PyTorch Lightning on Intel GPU (XPU / torch.xpu)?
PyTorch supports Intel GPU through torch.xpu, but PyTorch Lightning does not currently have built-in XPU accelerator support.
Because NeuralForecast uses Lightning under the hood, that also blocks ...
Advice
0
votes
5
replies
111
views
How are cache line and next-page prefetchers made aware of page sizes?
I have done some tests and verified that on more-"recent" Intel and AMD processors, the cache line prefetcher behaves differently when a line belongs to a base page vs a huge page. How is ...
Advice
1
vote
3
replies
144
views
C Compiler Optimization MUL 0x70078071?
I've been looking at the zlib1.dll that comes with Win 11 Pro and I was hoping for some assistance with the following passage:
56b: b8 71 80 07 80 mov eax,0x80078071
[570: 41 0f 42 ...
2
votes
0
answers
69
views
Difference between db string and other data sizes in assembly for strings [duplicate]
Assume this code in x86_64 assembly:
section .data
msg db "Hello, world!"
section .text
global _start
_start:
;; system call 1 is sys_write
mov rax, 1
...
0
votes
0
answers
78
views
Bootloader stopped working after I changed the syntax from gas to nasm
I have this bootloader I made a while ago and I would like it to be in nasm:
.intel_syntax noprefix
.code16
.equ STACK_TOP, 0x7C00
.equ SELF_LOAD, 0x7C00
.equ ELF_HDR_LOAD, 0x7E00
.equ SECT_SIZE, ...
6
votes
1
answer
214
views
What is the performance effect (on x64) of __atomic_fetch_add that ignores its result?
My code is
...
fragment1 // compares several regions in D1$ to D1$/D3$
__atomic_fetch_add(&lock,-1,__ATOMIC_ACQ_REL); // stmt A
fragment2 // moves several regions from D1$/D3$ to D1$
...
1
vote
1
answer
135
views
Does Intel CPU have instruction for paging translation result
I wonder if Intel (and Intel compatible) CPUs have an instruction (for diagnostic/debugging purposes) which, for a given linear address, returns the result of paging translation (i.e. the ...
1
vote
0
answers
103
views
Intuition over TBB parallel scan/parallel prefix requirements
I am reading a paragraph about the tbb::parallel_scan algorithm from the book Intel Threading Building Blocks, and I understood what the operation does serially, but I am not understanding what are ...
Best practices
1
vote
2
replies
139
views
Loading a byte: Partial register stall for intel cpus (r8 vs r64)
My assembly program reads characters in a text file by loading them one by one in register 'al'. However I sometime need to use rax fully, and I think this causes a partial register stall. Now I think ...
0
votes
1
answer
125
views
Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]
I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false.
How is this possible?
How ...
7
votes
1
answer
248
views
Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?
I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled).
My test loop is ...
3
votes
1
answer
161
views
JavaFX app freezes or flickers after Intel Iris Xe driver update [closed]
I have a JavaFX desktop application that started having rendering issues after updating the Intel Iris Xe graphics driver.
On Java 11 + JavaFX (Zulu distribution):
openjdk version "11.0.25" ...
2
votes
0
answers
102
views
What is the relationship between Intel Extension for PyTorch and PyTorch XPU versions?
A while ago, I was training a deep learning model on a computer without an NVIDIA GPU but with an Intel GPU. I only used the CPU for training, which was painfully slow. It suddenly occurred to me: can ...
